Co-op Highlights 2

There are a couple of projects that I have worked on this week for our IT department.

The first project is simple, but meets a real need – create a diagram to document the relationship between the MySQL database, the Vine web application, the personalized website component and list the IP addresses, server names, and type/function for each one.  I have been at my current job for 8 years and this was a really interesting project for me.  I knew a lot of the information in my head, but to see it diagrammed out in a couple of Venn diagrams to show the relationships really helped to understand exactly what we are working with.  A second aspect of this project was to diagram the current development process, including developing code, testing (QA), repository and version handling, and production.  I included the different server names, the common names for those servers, and their functions.  This was also very interesting as it really helped me to see both the server name and the common name together.  It also was helpful to see the process.  I mistakenly had thought that code sitting in the QA server went directly into Production.  But that is not the case.  The code is first checked out from the Repository and put into QA for testing.  Once it passes QA testing, then the original code from the Repository is put into Production.  I always thought that the code moved from QA testing into Production.  

The second project I have begun work on is a mapping or diagramming project also.  This is creating EER models in MySQL Workbench for the tables in our database (the Vine) to show the relationships between the tables.  One of the folks in IT has used Schema Spy to do this initially and it is great, but there are a couple of tables that pretty much everything else relates to, so it is hard to visually see the relationships to the other tables in Schema Spy.  The EER model will allow me to print large diagrams (34 x 44 inches) and enable me to spread out the different tables, providing a cleaner view of the relationships.  There are several components to this project and I was able to complete one diagram this week.  It still needs a little tweaking as far as sizing and printing, which I will finish on Monday.

Good data, bad data

At my current employer, I have the opportunity to interact with the database that contains all of the records – names, emails, phone numbers, financial gifts, volunteering, groups, events, family history, and attendance – for all of the people that come our way.

This past week, I was registering a person for a class that is coming up in September.  As I started searching for her record, I realized there actually 3 records for the same person.  All 3 had the same email address, but with varying pieces of other information (one had a mailing address, two had an age, etc).  One record had a child associated with it (Jane Doe).  Once I combined all 3 records, merging them into one, I wanted to add Jane Doe back in to the household.

I then discovered that there were 2 records for Jane Doe, age 10, plus another record for Jane Smith, age 10 and same address, plus a third record for Jane Jones, different age, same address.  In addition to that there were:  Julie Doe, Julie Smith, and 2 records for Julie Jones, all with same addresses, emails, ages, or combinations that linked back to the original parent.  Plus, I found John Doe and 3 records for John Smith – not originally connected to the parent, but same address as some of the Jane records and Julie records.

As I see it, there were 12 different records for 3 kids.  This is certainly a case of data being entered without verification – caused by either the parent not letting the organization know that their personal information had changed or data entry being done without checking for already existing same or similar records.

In this case, it is only 12 records, but I can imagine that over time, this situation can cause some database bloat – records that are added in for one reason or another, but not cross-checked to see if any information already exists.  These 12 records for 3 kids can make it hard when you want to register them for an event – is it this one?  Or this one?

I think my point in mentioning this here is that the front side of a database, the place where data entry happens, is important, maybe almost as important as the back side of the database, the place where rows and columns and data types all come together.

I may write/type more on this, but wanted to get the point out there before the week was over.