How to Create a Relational Data Model
Tasked with creating an enterprise data model? Need to build a relational
database that can house terabytes worth of data?
1Don't worry about tables just yet. It's obvious that you're building a
database, and databases (relational ones anyway) are primarily made up of
tables, which are made up of rows and columns (tuples and attributes if
you're really into it).<
2Worry about entity relationships. Your first goal is to map out the
relationships that different business objects have. This is the "logical
modeling" portion. The "physical model" is the actual implementation.
Confuse / combine the two at your peril.
3The requirements are hard to get and painful. A talented business
analyst at this time would be Heaven sent.
4Be prepared to wage a solo war, with only you dedicated to quality
normalization. Most databases are pieces of garbage because the people
who design them are lazy and "just want to get something out there. We
can always fix it later." Yeah, right.
5Once it's time to write tables, concentrate on lookups and type tables
(zip codes, statuses, product categories, etc). You'll need them for
foreign key relationships on your "real" tables. Plus, it gives you a
little warm up before you get into the core transactional tables.
6As a rule of thumb: don't store data that can be inferred from other
fields. If you know the date of birth and some start date, then you also
know the age at the start date, so don't include this age in the table.
7No nulls. A null value represents an undefined attribute of an entity.
If entities can have or not have a particular attribute, then it needs to
be handled via an intersect table.
8Contradiction NULL values are in themselves useful to identify
attributes that have not yet been populated by users. This is especially
useful when a user needs to select a default value in order to determine
proper business rules to apply. Case 2 how would you design an address
table where Address1 was populated and Address2 was not required, but if
Address2 was populated it must conform to the business rules of the
field. Sure you could default an empty space, is this better than knowing
the user did not edit the field? Try 3rd normal form on an international
address... Can it be done probably but look at the complexity of
restructuring the data in a meaningful way.
9NULL/NOT NULL Check any database forums and this is a hot topic
advocates on both sides advantages / disadvantages for each.However all
agree that you should never allow nulls in key variables. These are
fields that are being used to identify a record uniquely, e.g. a customer
The null school say that you should use the nulls freely in all other
fields. For instance customers are not obliged to have a cell phone, nor
to tell you their number. Using a null and nothing but a null is the most
efficient to record that cell phone is not available.
If it is really important to know why it not there it is better to
introduce a new variable that states the reason, as opposed to
introducing fancy codes to be stored in the placeholder for cell phone
numbers. Be reluctant to add fields like these, because a) the customer
is also not obliged to tell the reason why he is not giving his cell
phone number, nor does this question make a nice conversation, nor is he
likely to tell his reasons spontaneously, and b) nobody will ever look at
them because of a). Why missing variables generally just waste time.
Be aware that yes/no variables (booleans) often can't hold a null.
Therefore they often contain useless information, such as "either he was
republican, or he refused to answer".
10Get comfortable with intersect (many to many) tables. You'll use them
everywhere if you built things right. One example would be a high school
database where one table is a list of teachers and another for students.
Students have more than one teacher, and teachers have more than one
student, so the intersect table, separate from 'teacher' and 'student',
would have two columns?: foreign keys pointing at both these two. The
primary key would then be the combination of the two.
11Use a good naming convention. For the invoices, put them in a table
called "invoice". Products go in "product". The intersect would be
"invoiceProduct", or "productInvoice", depending upon which table is
really the center of the relationship.
12If you're going to have replication or log shipping, try to have that
set up as you develop so you can see how it works.
13Inner joins are great, but there's probably a lot of LEFT OUTER JOIN
statements that you'll be doing as well. Get used to the different join
statements (except UNION).
14If you have to deal with a legacy application, build your schema
independent of its (don't even look at it). Focus on the business rules
and relationships that it is trying to enforce, but you can get
distracted if you look at the way that someone set it up. Refer to step
15Migrating from your legacy systems into a tighter model with proper
normalization is difficult, but can be made a little more manageable by
using temporary tables for your imports. Also, keep tabs on the legacy
IDs for people to search by.<
Backups are important in development too. Make sure they are at least
nightly. Verify them every week to make sure you don't lose months worth
of work (and possibly your job) if there's a massive hardware casualty.
In regards to entity relationships, as an example, a client can have many
phone numbers. A client can also have many contacts, and each of those
contacts can have many phone numbers. An invoice can only be associated
with one client, however. There is one account rep that can be assigned
to a client, except in certain cases where there are two, etc. These are
the types of things you need to map out well before you write a single
line of sql.
Since you're defining absolute relationships, a good way to get questions
from reluctant participants is to ask things like "So is it absolutely
true that there can only be one client on an invoice?" Questions like
that tend to invoke a response in people.
Leave the "s" off your table names ("invoice" table); it's understood
that since it's a database there's a good chance that there will be more
than one invoice in there.
Don't expect replication, log shipping, or mirroring to work when you go
into production. Develop and test with it from the start. Make it a part
of your application.
The flexibility and power from a relational model is mind boggling
compared to a flatter structure.
There are some extremely specific cases where you need to denormalize
tables for performance reasons. But that's easy to do; focus on the hard
part which is proper normalization.
Data modeling is a critical skill, and very few people building
relational databases are any good at it.
Don't cheap out and do bad work just because it's easier. If they wanted
garbage, they would have hired someone else.
If you do cheap out (nulls, denormalized tables for bad reasons), it will
have direct and real consequences (orphans, crummy data integrity, joins
wont work, etc).
If you do not normalize correctly then the reports made from the database
data will sometimes be plain wrong, and your boss will be very upset.
How to Create a Customer Database
How to Absolutely Secure Your Data
How to Access Data of a Password Protected User in Windows XP in Case the
PC Fails to Boot
How to Access Secure Sites