Home » Uncategorized » Co-op Highlights 8

Co-op Highlights 8

I’ve recently helped with a project to clean up some odd characters that have been showing up when looking for firstname and lastname in our database.  At the last Database Architect meeting, it was discovered that a person had ‘***Jane’ as a first name, which led us to query different characters (!@#$%^&*()_+=:”;’?/<>,.) to see if any records contained those characters in the firstname or lastname fields.  And, yes, there were a surprising number of oddities!

This led us to starting searching for strings of letters, where someone might have used ‘xxxSmith’ or ‘zzzJane’ in the last name or first name fields.  And we found a number of those records also.  We also did a query for the string ‘Don’t’ and ‘test’ and found a number of records with those words in them.  A person’s first name or last name is MOST LIKELY NOT going to be ‘test’ or ‘don’t,’ so those were corrected.

I found this website, http://www.w3resource.com/mysql/string-functions/mysql-substring-function.php, to be very helpful in explaining substring functions in MySQL.  I’ve also included the query syntax that we used for this project.  All in all, I believe we corrected about 300 name records in the database, which is probably a small (even trivial) number compared to a database that has millions of records, but for our environment, it was a small win.

Query for repeating characters in name fields:

SELECT id, firstname, middlename, lastname, nickname FROM people
WHERE firstname REGEXP ‘(aaa|bbb|ccc|ddd|eee|fff|ggg|hhh|iii|jjj|kkk|lll|mmm|nnn|ooo|ppp|qqq|rrr|sss|ttt|uuu|vvv|www|xxx|yyy|zzz)’
or lastname REGEXP ‘(aaa|bbb|ccc|ddd|eee|fff|ggg|hhh|iii|jjj|kkk|lll|mmm|nnn|ooo|ppp|qqq|rrr|sss|ttt|uuu|vvv|www|xxx|yyy|zzz)’
or middlename REGEXP ‘(aaa|bbb|ccc|ddd|eee|fff|ggg|hhh|iii|jjj|kkk|lll|mmm|nnn|ooo|ppp|qqq|rrr|sss|ttt|uuu|vvv|www|xxx|yyy|zzz)’
or nickname REGEXP ‘(aaa|bbb|ccc|ddd|eee|fff|ggg|hhh|iii|jjj|kkk|lll|mmm|nnn|ooo|ppp|qqq|rrr|sss|ttt|uuu|vvv|www|xxx|yyy|zzz)’;

Query for numbers in name fields:

SELECT id, firstname, middlename, lastname, nickname FROM people
WHERE firstname REGEXP ‘(1|2|3|4|5|6|7|8|9|0)’
or lastname REGEXP ‘(1|2|3|4|5|6|7|8|9|0)’
or middlename REGEXP ‘(1|2|3|4|5|6|7|8|9|0)’
or nickname REGEXP ‘(1|2|3|4|5|6|7|8|9|0)’;

SELECT id, firstname, lastname, nickname FROM people
WHERE firstname LIKE ‘%.%’
or lastname LIKE ‘%.%’
or nickname LIKE ‘%.%’;

Query for odd characters in name fields:

SELECT id, firstname, lastname FROM people
WHERE firstname LIKE “%!%” or firstname LIKE “%@%”
or firstname LIKE “%#%” or firstname LIKE “%$%”
or firstname LIKE “%\%%” or firstname LIKE “%^%”
or firstname LIKE “%&%” or firstname LIKE “%*%”
or firstname LIKE “%(%” or firstname LIKE “%)%”
or firstname LIKE “%?%” or firstname LIKE “%/%”
or firstname LIKE “%~%” or firstname LIKE “%+%”
or firstname LIKE “%=%” or firstname LIKE “%,%”
or firstname LIKE “%:%” or firstname LIKE “%;%”
or lastname LIKE “%!%” or lastname LIKE “%@%”
or lastname LIKE “%#%” or lastname LIKE “%$%”
or lastname LIKE “%\%%” or lastname LIKE “%^%”
or lastname LIKE “%&%” or lastname LIKE “%*%”
or lastname LIKE “%(%” or lastname LIKE “%)%”
or lastname LIKE “%?%” or lastname LIKE “%/%”
or lastname LIKE “%~%” or lastname LIKE “%+%”
or lastname LIKE “%=%” or lastname LIKE “%,%”
or lastname LIKE “%:%” or lastname LIKE “%;%”

Query for odd words (Don’t or test) in name fields:

SELECT id, firstname, lastname FROM people
WHERE firstname LIKE “%Don’t%” or firstname LIKE “%test%”
or lastname LIKE “%Don’t%” or lastname LIKE “%test%”;

Overall, it was a great experience to use MySQL to produce the queries and to research a little more about sub-strings.  I think there is a better way to query for repeating strings of letters (like 3 consecutive letters at a time in a string), but I wasn’t able to find that in my research.

2 thoughts on “Co-op Highlights 8

  1. Great clean-up work Kim! These are the things that bog down our database and make it hard for the users to access good info. Keep up the good work!
    Connie

  2. Kim,

    Thought this might help. I found a regexp for repeating characters. I believe this will work to find 3 or more characters in a row. ‘(.)\1{2,}’

    http://stackoverflow.com/questions/1660694/regular-expression-to-match-any-character-being-repeated-more-than-10-times

    I also found a great website to test regular expressions. Syntax might be slightly different for MySQL so there is a chance it won’t work for you.

    http://www.regexr.com/

    Cheers,
    David Specht

Thoughts?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s