Co-op Highlights 9

This past week, I was able to finish up the project that I referenced in my Co-op Highlights 6 post.  This project involved generating an EER Model on the MySQL Workbench and creating a visual diagram showing the relationships of different tables within the database.  I used Schema Spy to identify the initial grouping sets and then used the EER Diagram function in MySQL to create the diagrams.  Using MySQL, you are connected to the database.  When you choose EER Diagram, you can create multiple diagrams within one file or create an individual file for each diagram.  The list of tables in the database is visible in the left column and you simply drag over the tables onto the design surface (based on the relationships identified in SchemaSpy).  Additionally, you can move the table blocks around to maximize the space on the design area.  Relationships between tables are shown by lines that adjust dynamically when you move the tables to different locations.

SchemaSpy is an online tool (Java-based) that analyzes the metadata of a schema in a database.  It then generates a visual representation of the database in your browser.  You are able to see the hierarchy of tables in the database which shows the child and parent relationships both in a list format and diagram format.  Our IT department has used SchemaSpy to help identify the relationships between tables.  This EER mapping project is an extension of that.

The last piece of this project is to document my process for creating the diagrams, so that it can be recreated easily in the future, if needed.  I will be writing this and sharing it with the other folks in our IT department tomorrow.

What has been interesting is to see the complexity of the database.  I work with this database on a routine basis from an end-user perspective – adding names of new guests, registering guests for classes or events, and updating information such as addresses or phone numbers.  The web application that the end user sees and works with is just the tip of the iceberg.  Seeing the tables and their relationships in an EER Model diagram is like seeing how big the iceberg really is.

EER stands for Enhanced Entity-Relationship model or Extended Entity-Relationship model and was developed to visually display the properties and constraints of complex databases.  I’m sure that there are a lot of details in the EER Model section of MySQL that I have yet not explored.

Co-op Highlights 8

I’ve recently helped with a project to clean up some odd characters that have been showing up when looking for firstname and lastname in our database.  At the last Database Architect meeting, it was discovered that a person had ‘***Jane’ as a first name, which led us to query different characters (!@#$%^&*()_+=:”;’?/<>,.) to see if any records contained those characters in the firstname or lastname fields.  And, yes, there were a surprising number of oddities!

This led us to starting searching for strings of letters, where someone might have used ‘xxxSmith’ or ‘zzzJane’ in the last name or first name fields.  And we found a number of those records also.  We also did a query for the string ‘Don’t’ and ‘test’ and found a number of records with those words in them.  A person’s first name or last name is MOST LIKELY NOT going to be ‘test’ or ‘don’t,’ so those were corrected.

I found this website, http://www.w3resource.com/mysql/string-functions/mysql-substring-function.php, to be very helpful in explaining substring functions in MySQL.  I’ve also included the query syntax that we used for this project.  All in all, I believe we corrected about 300 name records in the database, which is probably a small (even trivial) number compared to a database that has millions of records, but for our environment, it was a small win.

Query for repeating characters in name fields:

SELECT id, firstname, middlename, lastname, nickname FROM people
WHERE firstname REGEXP ‘(aaa|bbb|ccc|ddd|eee|fff|ggg|hhh|iii|jjj|kkk|lll|mmm|nnn|ooo|ppp|qqq|rrr|sss|ttt|uuu|vvv|www|xxx|yyy|zzz)’
or lastname REGEXP ‘(aaa|bbb|ccc|ddd|eee|fff|ggg|hhh|iii|jjj|kkk|lll|mmm|nnn|ooo|ppp|qqq|rrr|sss|ttt|uuu|vvv|www|xxx|yyy|zzz)’
or middlename REGEXP ‘(aaa|bbb|ccc|ddd|eee|fff|ggg|hhh|iii|jjj|kkk|lll|mmm|nnn|ooo|ppp|qqq|rrr|sss|ttt|uuu|vvv|www|xxx|yyy|zzz)’
or nickname REGEXP ‘(aaa|bbb|ccc|ddd|eee|fff|ggg|hhh|iii|jjj|kkk|lll|mmm|nnn|ooo|ppp|qqq|rrr|sss|ttt|uuu|vvv|www|xxx|yyy|zzz)’;

Query for numbers in name fields:

SELECT id, firstname, middlename, lastname, nickname FROM people
WHERE firstname REGEXP ‘(1|2|3|4|5|6|7|8|9|0)’
or lastname REGEXP ‘(1|2|3|4|5|6|7|8|9|0)’
or middlename REGEXP ‘(1|2|3|4|5|6|7|8|9|0)’
or nickname REGEXP ‘(1|2|3|4|5|6|7|8|9|0)’;

SELECT id, firstname, lastname, nickname FROM people
WHERE firstname LIKE ‘%.%’
or lastname LIKE ‘%.%’
or nickname LIKE ‘%.%’;

Query for odd characters in name fields:

SELECT id, firstname, lastname FROM people
WHERE firstname LIKE “%!%” or firstname LIKE “%@%”
or firstname LIKE “%#%” or firstname LIKE “%$%”
or firstname LIKE “%\%%” or firstname LIKE “%^%”
or firstname LIKE “%&%” or firstname LIKE “%*%”
or firstname LIKE “%(%” or firstname LIKE “%)%”
or firstname LIKE “%?%” or firstname LIKE “%/%”
or firstname LIKE “%~%” or firstname LIKE “%+%”
or firstname LIKE “%=%” or firstname LIKE “%,%”
or firstname LIKE “%:%” or firstname LIKE “%;%”
or lastname LIKE “%!%” or lastname LIKE “%@%”
or lastname LIKE “%#%” or lastname LIKE “%$%”
or lastname LIKE “%\%%” or lastname LIKE “%^%”
or lastname LIKE “%&%” or lastname LIKE “%*%”
or lastname LIKE “%(%” or lastname LIKE “%)%”
or lastname LIKE “%?%” or lastname LIKE “%/%”
or lastname LIKE “%~%” or lastname LIKE “%+%”
or lastname LIKE “%=%” or lastname LIKE “%,%”
or lastname LIKE “%:%” or lastname LIKE “%;%”

Query for odd words (Don’t or test) in name fields:

SELECT id, firstname, lastname FROM people
WHERE firstname LIKE “%Don’t%” or firstname LIKE “%test%”
or lastname LIKE “%Don’t%” or lastname LIKE “%test%”;

Overall, it was a great experience to use MySQL to produce the queries and to research a little more about sub-strings.  I think there is a better way to query for repeating strings of letters (like 3 consecutive letters at a time in a string), but I wasn’t able to find that in my research.