Often when we are dealing with many to many relationships we want to go further than simple associations. For example if we were modeling people and their hobbies, we might want to know when the person started having their hobby. Or if we were modeling the colors that objects were painted, we might want to know when to know when the colors were changed. If we were modeling sporting events we might want to store results (who won, came second, or their race times) as well as who competed.
To continue our example of Class and Student, we might want to store an additional attribute of the connection between a single student and a single class, such as semester in which the student enrolled in that class. In mysql tables we could do this:
This takes our Simple Association table
classes_students and adds a column to it
semester. Now we can represent that Sarah took Poisons twice, once in Fall 2008 and once in Spring 2008. So far so good. However our primary key for this table is the combination of
student_id. Can you see that we have repeated that combination? The first row and the second row have the same primary key (
class_id: 1 and
student_id: 10). We can’t do that (the database server will reject the second row).
To make our simple association table work with extra attributes we would have to add the extra attribute to our composite primary key as well, making the combination of all three columns the unique thing about the row.
There are two problems with this. First, the key becomes unwieldy as it gets longer. Second, you may have an extra attribute that is not unique. For example, imagine a class code like “Independent Study” that could be taken twice in the same semester by the same student. Now even the three columns (
semester) in combination aren’t sufficient to make the row unique. In this example we could try to find something further that made the classes unique (perhaps they have a different instructor?). However then we have to manage keys of length 4. Moreover it may be that we simply don’t have data to distinguish between the two relationships, no matter how many attributes we try to add.
So when we have additional attributes about a relationship, rather than extending our composite primary key, we handle this by turning the relationship into its own Entity, with its own entity name and the extra data as an attribute of that entity. As we turn our modeling into mysql tables, this means that we have a new
id column. To continue the Student-Class example, we would add the
id column to our growing middle table and make it, rather than the combination of
semester the primary key.
As far as mysql is concerned that is fine and you will definitely see databases that are set up like this (and you very well might see composite keys of length 3 or even 4). The real world of databases can be a messy place.
For our database design conventions, though, that is a problem because we still have our relationship table named
classes_students which indicates a simple association table. More importantly, although more abstractly, by adding an
id column we have promoted the table to an Entity. Yet if we left our modeling as
Student habtm Class we wouldn’t know to expect the
id column on that table (or the additional attribute of
Thus, to keep everything consistent, we reflect the changes in our tables in our modeling by turning the relationship into an Entity:
Now we can say:
“A single student makes between 0 and many enrollments”
“A single enrollment is about a single student”
“A single enrollment adds to a single class”
“A single class fills by between 1 and many enrollments”
To link all this together in our relational vocab rather than saying:
Student has_and_belongs_to_many Class Class has_and_belongs_to_many Student
we say (and the relational vocab. for has_many through is four lines):
Student has_many Class through Enrollment Class has_many Student through Enrollment Enrollment belongs_to Student Enrollment belongs_to Class
Enrollment is to the left of
belongs_to? Just as with
has_many this indicates that the foreign keys are in the
enrollments table, which is nice and consistent with
Finally, modeling a many to many relationship as a separate Entity allows us to more easily use the relationship entity
id as a foreign key for additional modeling. That doesn’t come up often but when it does it is crucial, because you really don’t want to use a composite primary key as a foreign key (you have to have all three of the columns in the other table.)
Now we turn to another example, relevant to modeling our Book Store.
In the figure below we model a very common situation of the relationship between People and Address. People often have more than a single address: for example they have a “shipping” address (where to send the package) and a “billing” address (where your credit card bill is sent). Sometimes, of course, those are actually the same place. And of course more than one person may use a single address as their address (if they live together, for example). So the relationship between Person and Address is many-to-many, but if we use
has_and_belongs_to_many then we just end up with a
people_addresses table with only two columns (as shown in the example above for Class and Student.)
We are going to model the new entity that joins People and their Addresses as a seperate entity, just as we did with
Enrollment above. We’ll call this new entity
AddressRecord (using the CamelCase rules to join the two words.) Sometimes there is a natural name for the relationship, such as a
Enrollment for students and classes or
Result for the relationship between a
Competitor and an
Event, and I recommend using that when it’s available. But sometimes we have to make one up. Adding
Record to one of the entities is one useful way, thus,
In the diagram above we show the relationship between
Address going through
AddressRecord. We also include the attributes of the relationship (here they are
valid_from, a date showing when an address is valid, and
type, for “mailing”, “billing” or other address types).
As we read the diagram, we can mention all three entities in one sentence. We can read this diagram as saying (left to right):
A single person provides between 0 and many AddressRecords, each of which is about a single Address.
And back right to left:
A single Address could be used in between 0 and many AddressRecords, each of which is about a single Person.
Note the way I say “each of which” in the middle as I move toward the outside box. An alternative way of reading this would be to start in the middle and read to each side.
A single AddressRecord brings together a single Person and a single Address.
When we translate that into our more formal Relational Vocabulary we have to say four lines:
Person has_many Address through AddressRecord Address has_many Person through AddressRecord AddressRecord belongs_to Person AddressRecord belongs_to Address
Just as with a straight
has_many the foreign keys go into the table for the entity to the left of
belongs_to, giving us:
Note that we get the foreign keys in the middle table, just as we would with using
has_and_belongs_to_many. However the AddressRecord now has its own primary key and it has columns which are about the relationship between the person and address referred to by those foreign keys. Note also that the table is called
address_records (the plural of AddressRecord) which follows from the entity being
AddressRecord rather than deriving from
Person as we considered earlier. Note that you can distinguish this from a simple association table between some
Address and some
Record table which would be called
addresses_records (pluralizing both).
We can recognize candidates for
has_many through by the pattern of maximum cardinalities (ignore minimums)
* just as we did for
has_many / belongs_to and
has_many / belongs_to * -------- 1 habtm * -------- * has_many through 1 ------- * * -------- 1
The pattern for
has_many through has a
* for maximum on both sides of the middle box and
1 (for maximum, minimum could be 0 or 1) on the outside.
Form groups of four and you will be assigned one of the relationships below. Each should be modeled with three entities (two boxes) with the center entity being the relationship between the outside ones. Hint: they are all
has_many through :)
has_many through and they will have three boxes and two lines.)
Citizenship (Naturalization and Birth)
e.g., James has two citizenships, one for the United Kingdom which he obtained by birth, and one for Australia where he was naturalized in 1997.
Recipes and Ingredients
e.g., My recipe for omelette includes 3 eggs and 15ml of milk. milk is also an ingredient in a hot chocolate, which uses 3 teaspoons of sugar and one cup of milk.
Degrees awarded by universities to students.
e.g., Shuyen graduated from Peking University with a BA in finance in 2000. She graduated with her MSIS from University of Texas at Austin in 2012.
e.g., Roundrock library subscribed to “The Economist” magazine from 1972 through 2014. Kyle library subscribed to “The Economist” in 1985 and still has a subscription; they also subscribe to “The Atlantic Monthly” which they began in 2003.
Sporting Event, Competitors, Results.
Use the results in the screenshot as an example. You can ignore country or include it as an attribute of the competitor. Hint: the Event is “800m Freestyle Women”. See more results here: Rio 2016 Swimming Results
Don’t forget to brainstorm some additional concrete instances, write them down, and include them as sample data in your table sketches!