So far we have focused on modeling relationships. In addition there are frequently additional pieces of data that we want to store.
For example, we might want to record a person’s date of birth, or the weight of a book. These are not about the relationship between objects in the database, they are characteristics of the things in the database. We call these “attributes” when modeling, and they end up as additional columns in our tables.
For example, we might add a
date_of_birth column to the
people table. Or a
weight column to the
In drawing our ER diagrams we represent these attributes in this way, adding to our Entity boxes:
Attributes don’t affect other aspects of the ER diagram. Here I show our
Person habtm Hobby example, but adding a
date_of_birth attribute to the Person entity.
We do not, though, need to have all our columns from our table as attributes in our ER diagram. The rule is that we only include attributes to communicate how we decided to model something. For example we have always had a
name column, but we haven’t included that as an attribute, because it is clear that things have names.
We also never include attributes for columns like
id or various
foreign_id columns. Those are already represented in the numbers and lines between entities, and they are just specific conventions about representing these relationships in tables. They don’t belong in the logical data modeling world of the ER diagram.
Note also that while our columns in our table sketch move horizontally (they are next to each other) the attribute list is vertical, one attribute per line.
Try to minimize which attributes are included in your ER diagrams. Use them communicatively to call out your modeling decisions, not to exhaustively represent all columns/characteristics. Remember that database designs communicate (and since they are usually drawn on a whiteboard you really want to save space).
Knowing when something needs to be an entity and an attribute is part of your data modeling expertise. The answer is not always obvious and your sense will develop over time.
For example, it is entirely possible to model a date of birth as an entity:
We would then say
DateOfBirth has_many Person Person belongs_to DateOfBirth
but then we would have introduced a whole new table
dates_of_birth with it’s own
id column, and a
name column which would include the actual date
That’s a very long winded way of specifying a piece of information!
So there are times when it just makes more sense to model something as an attribute than as its own entity. As a rule of thumb:
If a table ends up with only an id and name column, consider using the name as an attribute
And take it from me: it is very rare that you want to model Year as its own entity. It just doesn’t accomplish much to create a
years table with its own made up
id which then map to the name of the year which has a value
2002 … it is better to just use columns with
2002 directly as the data value.
Conversely, if you find that there are lots of columns that seem to “go together” they may be better expressed as an entity. One common time we see this is when columns have similar names, especially prefixes or suffixes. For example, think of the different ways we refer to states, sometimes using “Texas” sometimes “TX” or even sometimes “State of Texas”. These are different kinds of names, so we might start off adding columns
name_formal. In that case we are starting to encode pieces of data in the column names, and that would be better modeled as
State has_many StateNames where we can have the kind of name as values in a
A similar situation arises when we want to record things that help us trace how data came to be in the database. For example, if we are modeling Photos and Tags, even if we only allow a single tag, we might want to record who added it and when. So we might start with just the tag
id, name, image_path, tag ----------------------------- 32, photo1, photo1.jpg, happy
but then want to know when the tag was added, or maybe a free text reason that the tag was added.
id, name, image_path, tag, tag_added_date, tag_add_reason ----------------------------- 32, photo1, photo1.jpg, happy, 2011-02-21, Told to by Diane
In that case we would be better modeling the
tag_ columns as their own entity,
Tag with an entity that models the event in which that Tag was added to the photo and a reason. Something like:
A common question is whether, given the
has_and_belongs_to_many relationships, there is a
We’ve seen the patterns
* ... 1 (or 1 ... *)
which maps to
* ... *
which maps to
Clearly, another logical possibility is
1 ... 1
These are not impossible or neccessarily wrong (indeed I show one in the ER just above when we added the
Tagging entity even though each
Photo could only have a single
Tag). However, when you see them you should ask yourself:
Might this 1 …. 1 relationship be better modeled as an attribute?
We might see these in situations like a
User having a
User Account. Each user has one and only one user account, and each user account is for just one user … so we could take the columns from one and add them to the other (for example adding
user_account_name as an attribute of
In the real world, though, we don’t always have full control like that, so if we are using an ER diagram to document data that others control, we very well may have to leave those tables as separate.
Or, just as with
Tagging above, we may decide that we have other reasons to keep the
1 ... 1 in place.
You will develop a clearer sense of this over time, especially as we learn to write queries.
A good place for further worked examples is the book Learning MySQL which is available in full online through the UT Austin library. It uses diagrams that differ a bit from those we’ve used, though.
Another great source is Churcher, C. (2012). Beginning Database Design: From Novice to Professional (2nd ed. edition). Apress. This is available through UT Library but not online; I think it is available through the Internet Archive library. This book uses diagrams very similar to those we’ve used.
If you are curious for further reading on data modeling for relational database you can start with this Wikipedia page on “Normalization” which shows a worked example.
Another important source for data modeling, especially for those thinking Data Engineering in industry, is Kimball, R. (2013). The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition (3rd Edition). Wiley. I believe this is available online though the UT library. Note that this implements a different set of conventions and concentrates on things like “fact tables” and “star schemas”.