9  Moving across many tables with joins in SQL

This class shows how to use our database designs to “move” across our databases, following the lines in the design to join tables. You can think of the lines as paths that you can follow from one table, through others, gathering up the data that you need to provide the answer to your query. As a path, hopefully it is clear that you can’t skip a table, to get from one to the other you have to follow the lines.

When I think about queries I think about start and end points. The start point is the table that has the data given in the question. The end point has the data that the question is asking about. For example, using the Objects and Colors example

And the question:

What color is the mug?

The start point is the data we are given: “mug”. That data is in the objects.name field, and so our starting point is the objects table.

The end point is what we’re being asked about, in this case “the color”. That data is in the colors.name column, so our end point is the colors table.

At this point I look at the ERD and “place myself” in the objects table:

SELECT *
FROM objects

I can the apply the starting condition:

SELECT *
FROM objects
WHERE objects.name = 'mug'

From here I need to “move” to the colors table, by joining it in. To do this I have to think about the join condition, which I can figure out from the direction of the crowsfeet: the foreign key is in the objects table. This is enough information to create the join condition, since the foreign key is always named after the table it links to and must be color_id. Thus the join condition is:

objects.color_id = colors.id

I can now use this to add to my query:

SELECT *
FROM objects
WHERE objects.name = 'mug'

-- Joining in colors

SELECT *
FROM objects
  JOIN colors 
    ON objects.color_id = colors.id
WHERE objects.name = 'mug'

After checking that the join worked sensibly (running the query and examining the rows), at this point I have reached my destination field: colors.name and I can write the SELECT part:

-- Now get just the colors.name column
SELECT colors.name
FROM objects
  JOIN colors 
    ON objects.color_id = colors.id
WHERE objects.name = 'mug'

The process works the same when moving across more than two tables. Here is our full Objects database:

and a query:

What color are the clicky objects?

The start point here is “clicky” which we would find in the noises.name table. The end point (what we’re being asked to answer) is in colors.name table. Thus we begin in the noises table and will have to move across to the colors table. We can’t get there directly, so we have to follow the lines.

SELECT *
FROM noises

SELECT *
FROM noises
WHERE noises.name = 'clicky'

Now we have to join in the next table, which is objects. Looking at the relational vocab I can see that the foreign key is in objects which means it must be called objects.noise_id. The order one writes a join condition in doesn’t matter, so I like to write mine “from” the table I’m “in” to the table I’m “moving to”. Given that we are “in” noises this makes the join condition noises.id = objects.noise_id.

SELECT *
FROM noises
WHERE noises.name = 'clicky'

-- Join in the objects table.

SELECT *
FROM noises
  JOIN objects 
    ON noises.id = objects.noise_id
WHERE noises.name = 'clicky'

We run that query and check that everything worked. We then have to continue to move, now moving from the objects table to the colors table. Again we examine the ER diagram. The crowsfoot points to the objects table, so the foreign key is again in the objects table. Therefore it must be called objects.color_id and our join condition must be objects.color_id = colors.id.

SELECT *
FROM noises
  JOIN objects 
    ON noises.id = objects.noise_id
WHERE noises.name = 'clicky'

-- moving over to colors
SELECT *
FROM noises
  JOIN objects 
    ON noises.id = objects.noise_id
  JOIN colors 
    ON objects.color_id = colors.id
WHERE noises.name = 'clicky'

Again, running that query to ensure we arrived in colors ok and then we can change our SELECT part to give us just what we were asked for (the color).

-- Now pulling out just the answer we were asked for.
SELECT colors.name
FROM noises
  JOIN objects ON noises.id = objects.noise_id
  JOIN colors ON objects.color_id = colors.id
WHERE noises.name = 'clicky'

In joins beyond two tables (we have three above) I find the question of what the 3rd table is being joined to a bit hard to think about. The answer is that it is being joined to the combination of the first two tables. Think of the combination of all the rows produced by joining the first two tables; that is called an “intermediate join table” (we won’t ever see it, but it logically exists) and is what we then join the 3rd table to. If you build up the query step by step, executing each step, then this is hopefully more clear; certainly it becomes intuitive over time.

9.1 Joining with many to many.

Joining across a many to many relationship follows the same rules but one has to remember that the foreign keys are in the extra table that doesn’t show up in the Conceptual ER diagram. We call that table a “Simple Association Table” and it only has the two foreign keys.

Consider this many to many relationship:

To move from Book to Genre we have to go through the simple association table. Recall that the simple association table is named by the alphabetical combination of the Entities, so in this case it is called books_genres.

To illustrate we’ll build up a query to answer the question:

What genres is the book "Wuthering Heights" in?

The start point is books and the end point is genres.

SELECT *
FROM books

SELECT *
FROM books
WHERE books.name = 'Wuthering Heights'

Now we want to follow the line to genres but to do so we have to go via the simple association table (books_genres), joining it in on the way. We also know that the foreign keys are always in the simple association table. So our foreign key here is books_genres.book_id which connects to books.id that makes our join condition books.id = books_genres.book_id.

SELECT *
FROM books
WHERE books.name = 'Wuthering Heights'

-- move into the simple assocation table

SELECT *
FROM books
  JOIN books_genres
    ON books.id = books_genres.book_id
WHERE books.name = 'Wuthering Heights'

Now we are “in” books_genres and need to move onward to genres. We swap to the other foreign key in books_genres which is books_genres.genre_id and link to the primary key for genres making the join condition: books_genres.genre_id = genres.id

SELECT *
FROM books
  JOIN books_genres
    ON books.id = books_genres.book_id
WHERE books.name = 'Wuthering Heights'

-- move out of books_genres to genres
SELECT *
FROM books
  JOIN books_genres
    ON books.id = books_genres.book_id
  JOIN genres
    ON books_genres.genre_id = genres.id
WHERE books.name = 'Wuthering Heights'

-- Finally we can change our SELECT part.
SELECT genres.name
FROM books
  JOIN books_genres
    ON books.id = books_genres.book_id
  JOIN genres
    ON books_genres.genre_id = genres.id
WHERE books.name = 'Wuthering Heights'

9.2 Adding conditions en route

Thus far we’ve talked about queries where we have a data condition at our starting point, such as finding the “red” objects. To answer those we apply a condition, such as colors.name = 'red' in our starting table and then move across the ER diagram, joining tables as needed.

However we can also apply data conditions at places “along the route,” as we add in tables using join conditions. Take, for example, this many to many with attributes relationship:

And this question:

Which people have subscriptions to 'The Week' ending this year (2015)?

In this question we’re given two pieces of data: the name of a magazine (‘The Week’) as well as (part of) a date, and we’re asked to supply people’s names. The magazine part is just like our queries above, we can apply it in the ‘starting’ table, which would be magazines. The end point is also similar to above, because we’re asked for all the people’s names, so we’ll be joining across to people. However, the date part is a little different. The date fields are in the middle table (subscriptions), so we will add that condition as we “pass through” that table.

One other thing: because the question is asking about subscriptions that end in 2015 we’ll use that data in the end_year field, rather than the start_year field.

-- starting table
SELECT *
FROM magazines

-- add starting condition
SELECT *
FROM magazines
WHERE magazines.name = 'The Week'

-- join over to subscriptions
-- subscription to right of belongs_to, thus foreign key there.
SELECT *
FROM magazines
  JOIN subscriptions
    ON magazines.id = subscriptions.magazine_id
WHERE magazines.name = 'The Week'

-- Now apply year condition, before moving to people.
SELECT *
FROM magazines
  JOIN subscriptions
    ON magazines.id = subscriptions.magazine_id
WHERE magazines.name = 'The Week'
  AND subscriptions.end_year = 2015

-- Now add people with the join condition.
SELECT *
FROM magazines
  JOIN subscriptions
    ON magazines.id = subscriptions.magazine_id
  JOIN people 
    ON subscriptions.person_id = people.id
WHERE magazines.name = 'The Week'
  AND subscriptions.end_year = 2015

-- And finally, change the SELECT part to just what we were asked for.
SELECT people.name
FROM magazines  
  JOIN subscriptions
    ON magazines.id = subscriptions.magazine_id
  JOIN people 
    ON subscriptions.person_id = people.id
WHERE magazines.name = 'The Week'
  AND subscriptions.end_year = 2015

Hopefully you can see that addressing these queries is a matter of:

  1. Identifying data you are given. e.g., 2015 and 'The Week'
  2. Finding the relevant fields in the database for each. (e.g., “subscriptions ending in” –> subscriptions.end_year)
  3. Constructing data conditions for each (e.g., subscriptions.end_year = 2015)
  4. Identifying the data you are being asked for (e.g., “Which people …”)
  5. Finding relevant fields for the answer (“Which people …” –> people.name)
  6. Locating starting and ending tables and a path between them.
  7. Navigating across the ER diagram, applying conditions as you move through the relevant tables (always executing the query to check for errors)
  8. Converting the SELECT * to the fields identified in step 5.

I find the “moving across the ERD” way of thinking about building up queries to be useful and I recommend it to you. I find it useful for two reasons: it helps me break the problem down into steps, and building the query up step by step helps me locate errors.

However it is worth noting that the actual order in which conditions and joins are placed in the final query is irrelevant.

So it is equally valid (logically identical, in fact) to think of joining all the tables that you need, then applying all the data conditions to that one large table. Going a step further back it is equally valid to think of creating one giant cross-product table (by putting the tables into the FROM clause and getting all possible combinations of their rows) and then applying the data conditions, and applying the join conditions last.

Nonetheless, for that approach to be accurate you would still have to ensure that you have all the join conditions needed and that you hadn’t included any tables without relevant join conditions. And that requires you to inspect the database design to see how the tables are related to each other.

So, there’s nothing special about the order that the conditions and joins are applied, but thinking about “moving across the ERD” will help you put queries together correctly.

9.3 Exercises

  1. Review this conceptual ER

  1. Write down the join conditions to move across the database. Don’t forget to identify many to many relationships and include the simple association table.

  2. Write a query that Finds all advertisers in the sports industry that made bids on any post that shared a link from utexas.edu

  3. Write a query that Finds all advertisers that were interested in showing advertisements to user “strongbad” before 2020-09-01.