During their time in the game, almost all developers are going to have to manipulate a database. This can be a good thing, or an absolute disaster. Unfortunately it seems to be the latter in most cases. This is not due to a developer not caring, but sometimes some developers are not formally trained in how to design a database. For example, just look at most programming books, when they get to the data access section they barely cover the database design. I've read many programming books on VB, Classic ASP, PHP, Perl, .NET, etc, and I've seen very few examples where the author actually makes an attempt to advise the developer to ensure that the database is properly normalized during their design of the database. I understand that not all developers are not going to work with databases directly. Some companies have dedicated DBA's that handle all communication with the database and they don't let the developers get involved with the database. Even though this may be the case, it's still good to know the basics so you can be aware of what's going on in the background. That's why we're all developers right? We all wanted to know how it worked, so we figured it out, right? Well, maybe not, but you get the point...
So, why is lacking Database design knowledge a bad thing? Its simple ... if you're using a database as your datastore, the database is the foundation of your application. Regardless of how cool your application is, regardless of how many design patterns you use, regardless if you are the worlds best coder in the last 10 years, it comes down to this...
A great application can not cover up a poor data foundation.
Think of it like this: (I'm no construction contractor so don't flame me here) When you build a house, you need a solid foundation. You need solid ground, a good cement base. You need the plumbing and electricity to be routed in from the street. Once this good foundation is built, a good contractor can build a very sturdy house on top of the foundation, hell, they could build a mansion if they wanted to. All of this takes planning and foresight. Where does the plumbing and electricty come from? Is the ground solid enough? Or are we on a swamp? etc, etc.
Now what would happen if we did not know what we were doing, if we did not have the foresight into the plan? We might end up building a foundation on top of a swamp, therefore the house sinks (the application breaks down and gets abandoned by the users). We might forget to pull the plumbing up before we pour cement (A new feature has to be installed all the way through the system, a major cost for the client, and a huge pain in the butt). We might forget about the electricity until we need it (Same thing as the plumbing problem!). More problems = more money that your client has to spend. This client might be your boss, or if you consult it could be the guy signing your checks. Either way, your performance is less than stellar in this situation (if you built the foundation of the system)and you're not looking like the golden child they once thought you were.
Other issues... Most of all developers are under a time crunch, and in time crunches we take shortcuts. We don't do things correctly like we normally would. If these projects involve changing a database and the developer is not exactly sure how to do correctly they will do it the easiest way they can, which, in my experience is the wrong way.
This stems from these lines we've all heard: "Well, we needed it yesterday and its the highest of priority, we need it ASAP!" Or..."When you're done with that project, we have another than needs to be done ASAP. Please make it snappy. Oh yeah, we need those TPS reports in an hour."
That's normally the case in a lot of development projects. The project target date is well beyond what the raealistic development time actually is. When this happens developer tend to take shortcuts. But when shortcuts need to be taken, it should not be done at the database level (it should be avoided at all costs, but we know thats not possible).
Examples of problematic changes
Example 1 - Client: I need you add another contact record to the "ACME Corporation". You then look at the database, and notice that the Organization table schema looks like this:
Organization--------------------------------Organization_Id intOrganization_Name varchar(100) Org_Contact_Name nvarvhar(100)Org_Contact_Phone nvarchar(50)Org_Contact_Email nvarchar(100) ... (other columns ommitted for example use)
What this tells us is that each Organizaiton only has ONE and only ONE Contact record that can be associated with it. You can solve this a couple different ways. I'll show you a wrong way and then a correct way.
The Wrong Way (You'd be surprised how many times I've seen this. Which is the reason for this post.)
Organization--------------------------------Organization_Id intOrganization_Name varchar(100) Org_Contact_Name nvarvhar(100)Org_Contact_Phone nvarchar(50)Org_Contact_Email nvarchar(100) Org_Contact_Name2 nvarvhar(100)Org_Contact_Phone2 nvarchar(50)Org_Contact_Email2 nvarchar(100) ... (other columns ommitted for example use)
Well, we did what we wanted right? We added another contact to the organization table. Enabling "ACME Corporation" to have another contact. Cool. I'm done. Unfortunately this also lends itself to a lot of null fields floating around in your database (when a company doesnt have two contacts, the *2 fields will be null). Plus, next time the same developer needs to add another contact they're probably just going to add three more columns to hold the data, aka: Contact_Name3, etc. This is NOT what you want to do, because this design would be breaking a lot of basic fundamental database design guidelines (more on this below).
The Right Way
What the developer wants to do is to separate the Contacts into a different table and reference the Organization in the contact table. This says "This contact is part of xxxx organization". This is done through foreign keys. Again, this is a very basic concept, but a lot of developers do not know this (which is unfortunate).
Organization--------------------------------Organization_Id intOrganization_Name varchar(100) ... (other columns ommitted for example use)
Contact---------------------------------Org_Contact_Id intOrg_Contact_Name nvarvhar(100)Org_Contact_Phone nvarchar(50)Org_Contact_Email nvarchar(100) Organization_Id int (a foriegn key, that points to the organization database).
In this very basic example we have removed a few columns from the Organization table and put them into the Contact table. Then in the contact table we've referenced the Organization that the contact is part of. This allows the organization to have ZERO to n amount of contacts a zero to many relationship. Now, we have no null data lying around in the organization. Plus, if we want to add 10 contacts to one organization and 2 to another, we can do so, VERY easily.
The Overall Solution
Basic Database Design - Normalization
What is normalization? Its a design technique that helps eliminate inconsistencies and anomalies inside of databases. When designing/changing a database the developer needs to try to keep things in a proper normal form. I strive for a minimum of Third Normal Form or further (4th or 5th).
There are a ton of sites that cover database design so I'm not going to attempt cover it in depth, just Google the topic. I just wanted to say how important it is for a developer to know the basics. Normalization IS something each developer needs to know in order to develop systems that are scalable and extensible. Shorcuts as noted in the example above are done on a daily basis by developers around the world, and why? Most likely because its the quick and easy way to get the project done. Even though refactoring the data into a new table might take longer (initially) the return on that investment is tenfold, at minimum. Hopefully more people start to take pride in the code that they write and databases they design/alter.
Think of it like this, would you want to be the guy who has to come in and fix an application because its slow and hard to extend, due to the database not even being in first normal form, and the database has 1 million+ rows? I've been in that situation before and I can tell you, its a TON of work to get things fixed. It could have been avoided very easily with a small amount of planning and foresight, and it could have saved my client A TON of money. So, before you alter that table, think... "is this really the 'right way' to do this?" You may be saving hundreds of thousands of dollars, with that one decision.
:)
Remember Me
original dasBlog theme by Mads Kristensen altered by Donn Felker |
Disclaimer The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.