This class shows how to use our database designs to “move” across our databases, following the lines in the design to join tables. You can think of the lines as paths that you can follow from one table, through others, gathering up the data that you need to provide the answer to your query. As a path, hopefully it is clear that you can’t skip a table, to get from one to the other you have to follow the lines.
When I think about queries I think about start and end points. The start point is the table that has the data given in the question. The end point has the data that the question is asking about. For example, using the Objects and Colors example:
And the question:
What color is the mug?
The start point is the data we are given: “mug”. That data is in the
objects.name field, and so our starting point is the
The end point is what we’re being asked about, in this case “the color”. That data is in the
colors.name column, so our end point is the
At this point I look at the ERD and “place myself” in the
SELECT * FROM objects
I can the apply the starting condition:
SELECT * FROM objects WHERE objects.name = "mug"
From here I need to “move” to the colors table, by joining it in. To do this I have to think about the join condition, which I can figure out from the relational vocab. For
has_many / belongs_to the foreign key is always in the table to the left of
belongs_to. Therefore the foreign key is in the
objects table. This is enough information to create the join condition, since the foreign key is always named after the table it links to and must be
color_id. Thus the join condition is:
objects.color_id = colors.id
I can now use this to add to my query:
SELECT * FROM objects WHERE objects.name = "mug" -- Joining in colors SELECT * FROM objects JOIN colors ON objects.color_id = colors.id WHERE objects.name = "mug"
After checking that the join worked sensibly (running the query and examining the rows), at this point I have reached my destination field:
colors.name and I can write the
-- Now get just the colors.name column SELECT colors.name FROM objects JOIN colors ON objects.color_id = colors.id WHERE objects.name = "mug"
The process works the same when moving across more than two tables. Here is our full Objects database:
and a query:
What color are the clicky objects?
The start point here is “clicky” which we would find in the
noises.name table. The end point (what we’re being asked to answer) is in
colors.name table. Thus we begin in the
noises table and will have to move across to the
colors table. We can’t get there directly, so we have to follow the lines.
SELECT * FROM noises SELECT * FROM noises WHERE noises.name = "clicky"
Now we have to join in the next table, which is
objects. Looking at the relational vocab I can see that the foreign key is in
objects which means it must be called
objects.noise_id. The order one writes a join condition in doesn’t matter, so I like to write mine “from” the table I’m “in” to the table I’m “moving to”. Given that we are “in”
noises this makes the join condition
noises.id = objects.noise_id.
SELECT * FROM noises WHERE noises.name = "clicky" -- Join in the objects table. SELECT * FROM noises JOIN objects ON noises.id = objects.noise_id WHERE noises.name = "clicky"
We run that query and check that everythign worked. We then have to continue to move, now moving from the
objects table to the
colors table. Again we examine the relational vocab. Object is to the right of
belongs_to and so the foreign key is again in the
objects table. Therefore it must be called
objects.color_id and our join condition must be
objects.color_id = colors.id.
SELECT * FROM noises JOIN objects ON noises.id = objects.noise_id WHERE noises.name = "clicky" -- moving over to colors SELECT * FROM noises JOIN objects ON noises.id = objects.noise_id JOIN colors ON objects.color_id = colors.id WHERE noises.name = "clicky"
Again, running that query to ensure we arrived in
colors ok and then we can change our
SELECT part to give us just what we were asked for (the color).
-- Now pulling out just the answer we were asked for. SELECT colors.name FROM noises JOIN objects ON noises.id = objects.noise_id JOIN colors ON objects.color_id = colors.id WHERE noises.name = "clicky"
In joins beyond two tables (we have three above) I find the question of what the 3rd table is being joined to a bit hard to think about. The answer is that it is being joined to the combination of the first two tables. Think of the combination of all the rows produced by joining the first two tables; that is called an “intermediate join table” (we won’t ever see it, but it logically exists) and is what we then join the 3rd table to. If you build up the query step by step, executing each step, then this is hopefully more clear; certainly it becomes intuitive over time.
Joining across a habtm relationship follows the same rules but one has to remember that the foreign keys are in the extra table that doesn’t show up in the Entity diagram. We call that table a “Simple Association Table” and it only has the two foreign keys.
Consider this habtm relationship:
To move from Book to Genre we have to go through the simple association table. Recall that the simple association table is named by the alphabetical combination of the Entities, so in this case it is called
To illustrate we’ll build up a query to answer the question:
What genres is the book "Wuthering Heights" in?
The start point is
books and the end point is
SELECT * FROM books SELECT * FROM books WHERE books.name = "Wuthering Heights"
Now we want to follow the line to
genres but to do so we have to go via the simple association table (
books_genres), joining it in on the way. We also know that the foreign keys are always in the simple association table. So our foreign key here is
books_genres.book_id which connects to
books.id that makes our join condition
books.id = books_genres.book_id.
SELECT * FROM books WHERE books.name = "Wuthering Heights" -- move into the simple assocation table SELECT * FROM books JOIN books_genres ON books.id = books_genres.book_id WHERE books.name = "Wuthering Heights"
Now we are “in”
books_genres and need to move onward to
genres. We swap to the other foreign key in
books_genres which is
books_genres.genre_id and link to the primary key for
genres making the join condition:
books_genres.genre_id = genres.id
SELECT * FROM books JOIN books_genres ON books.id = books_genres.book_id WHERE books.name = "Wuthering Heights" -- move out of books_genres to genres SELECT * FROM books JOIN books_genres ON books.id = books_genres.book_id JOIN genres ON books_genres.genre_id = genres.id WHERE books.name = "Wuthering Heights" -- Finally we can change our SELECT part. SELECT genres.name FROM books JOIN books_genres ON books.id = books_genres.book_id JOIN genres ON books_genres.genre_id = genres.id WHERE books.name = "Wuthering Heights"
Thus far we’ve talked about queries where we have a data condition at our starting point, such as finding the “red” objects. To answer those we apply a condition, such as
colors.name = "red" in our starting table and then move across the ER diagram, joining tables as needed.
However we can also apply data conditions at places “along the route,” as we add in tables using join conditions. Take, for example, this
has_many through relationship:
And this question:
Which people have subscriptions to "The Week" ending this year (2015)?
In this question we’re given two pieces of data: the name of a magazine (“The Week”) as well as (part of) a date, and we’re asked to supply people’s names. The magazine part is just like our queries above, we can apply it in the “starting” table, which would be
magazines. The end point is also similar to above, because we’re asked for all the people’s names, so we’ll be joining across to
people. However, the date part is a little different. The date fields are in the middle table (
subscriptions), so we will add that condition as we “pass through” that table.
One other thing: because the question is asking about subscriptions that end in 2015 we’ll use that data in the
end_year field, rather than the
-- starting table SELECT * FROM magazines -- add starting condition SELECT * FROM magazines WHERE magazines.name = "The Week" -- join over to subscriptions -- subscription to right of belongs_to, thus foreign key there. SELECT * FROM magazines JOIN subscriptions ON magazines.id = subscriptions.magazine_id WHERE magazines.name = "The Week" -- Now apply year condition, before moving to people. SELECT * FROM magazines JOIN subscriptions ON magazines.id = subscriptions.magazine_id WHERE magazines.name = "The Week" AND subscriptions.end_year = 2015 -- Now add people with the join condition. SELECT * FROM magazines JOIN subscriptions ON magazines.id = subscriptions.magazine_id JOIN people ON subscriptions.person_id = people.id WHERE magazines.name = "The Week" AND subscriptions.end_year = 2015 -- And finally, change the SELECT part to just what we were asked for. SELECT people.name SELECT * FROM magazines JOIN subscriptions ON magazines.id = subscriptions.magazine_id JOIN people ON subscriptions.person_id = people.id WHERE magazines.name = "The Week" AND subscriptions.end_year = 2015
Hopefully you can see that addressing these queries is a matter of:
2015 and "The Week"
subscriptions.end_year = 2015)
SELECT *to the fields identified in step 5.
I find the “moving across the ERD” way of thinking about building up queries to be useful and I recommend it to you. I find it useful for two reasons: it helps me break the problem down into steps, and building the query up step by step helps me locate errors.
However it is worth noting that the actual order in which conditions and joins are placed in the final query is irrelevant. So it is equally valid (logically identical, in fact) to think of joining all the tables that you need, then applying all the data conditions to that one large table. Going a step further back it is equally valid to think of creating one giant cross-product table (by putting the tables into the
FROM clause and getting all possible combinations of their rows) and then applying the data conditions, and applying the join conditions last.
Nonetheless, for that approach to be accurate you would still have to ensure that you have all the join conditions needed and that you hadn’t included any tables without relevant join conditions. And that requires you to inspect the database design to see how the tables are related to each other.
So, there’s nothing special about the order that the conditions and joins are applied, but thinking about “moving across the ERD” will help you put queries together correctly.