There are three queries beyond SELECT:
Each of them reuses syntax from SELECT.
INSERT requires us to provide a list of columns and a list of data to go into each column.
INSERT INTO table_name(column1, column2) VALUE (data1, data2)
We do not have to provide data for all columns and we don’t have to use the same order as the columns were created.
Imagine a database with
Book habtm Author so we have three tables (books, authors, and books_authors). A convenient syntax for quickly describing tables in text is `tablename(id_col,other_cols…):
books(id, title) books_authors(book_id, author_id) authors(id, name)
To insert the data that “Emily Bronte” wrote “Wuthering Heights” (assuming neither were in the database already) we’d do:
INSERT INTO books(id, title) VALUE (7, 'Wuthering Heights') INSERT INTO authors(id, name) VALUE (23, 'Emily Bronte') INSERT INTO books_authors(book_id, author_id) VALUE (7, 23)
In these examples I’m providing a manual
23 above). However the database can also provide an integer value for
id automatically. To do that we tell the database that the
id column should be
AUTO_INCREMENT. In phpmyadmin we do that by checking the “A_I” checkbox for the id column and the full word appears in the “extras” column in phpmyadmin’s Structure tab:
id is set to
AUTO_INCREMENT then we do not need to provide a value for the id in an insert statement (and we can leave id from the list of columns). So we could have just done:
INSERT INTO books(title) VALUE ('Wuthering Heights')
This can be very useful, but if we want to use the generated id to describe relationships we have to obtain the id that the server provides. We can get that by adding a short little query
SELECT LAST_INSERT_ID() separated by a semi-colon
INSERT INTO books(title) VALUE ('Wuthering Heights'); SELECT LAST_INSERT_ID()
We will do this later in the course using python, as we read our CSV files, constructing the queries needed to insert the data from the csvs into the database design for our project.
For example, imagine we have a csv file with these data:
author,title "Chimamanda Ngozi Adichie","We Should All Be Feminists" "Wu Cheng'en","Journey to the West" "Kim Man-jung","The Nine Cloud Dream"
For each row, we would have to do these queries (changing the text each time):
INSERT INTO books(title) VALUE ("Journey to the West"); SELECT LAST_INSERT_ID()
then remember the id assigned (let’s say we got back
INSERT INTO authors(name) VALUE ("Wu Cheng'en"); SELECT LAST_INSERT_ID()
then remember the id assigned (let’s say we got back
Finally we can then make the connection:
INSERT INTO books_author(book_id, author_id) VALUE (120, 1007);
So each row in that csv would require at least three queries. I say at least three queries because unless we know that this is just a single book per author, we would probably need to run a
SELECT first to check if the author was already in the database, to ensure that we reuse the id.
We will cover these sequences of SELECT and INSERT queries when we are doing them via Python later in the course.
Note that inserting lots of data (like millions of rows) using this approach can be slow. I examine quicker ways to insert data into MySQL in the “Advanced Tips” Module toward the end of the course. The relevant screencast is Loading Very Large CSV files.
UPDATE allows us to change data already in the database. These queries work by first identifying rows to be changed (which re-uses what we’ve learned about the WHERE clause already) and then getting new values for particular columns.
For example, we could change the spelling of Emily Bronte’s name to include a diaersis over the final e. We first find the record we want to update using
SELECT authors.id FROM authors WHERE name = "Emily Bronte"
The we use the
id returned (
23) to effect the change:
UPDATE authors SET name = "Emily Brontë" -- Change spelling to fancy style WHERE authors.id = 23
Here we’re being cautious to minimize and check our edits to the database by using the
SELECT first and using
id to identify the records to change.
While it is often sensible to use an id (or a group of ids) in the WHERE clause to be sure of the rows to be changed, any WHERE clause will work. For example, if we had a last name column (which we don’t) we could change all the Bronte to Brontê with one query (the database will tell you how many rows were changed).
UPDATE authors SET lastname = "Brontë" WHERE lastname = "Bronte" -- Change all the Brontes at once.
We can also change more than one column at a time, by providing a comma separated list of fields and value pairs in the SET clause. For example we could change both the venue and the band for a performance (keeping the start and end times).
UPDATE performances SET venue_id = 23, band_id = 7 WHERE id = 2
btw, I think the lack of consistency between INSERT (two comma separated lists of columns and values) and UPDATE SET (one comma separated list column = value) is confusing. I don’t know why it is that way.
The question often arises about whether we can do
UPDATEs based on joins (adding a
JOIN clause). In general yes, that is possible. However, it is confusing (am I updating the temporary join table or the underlying tables?) and can be hard to read and to check. I recommend finding the ids of the rows that you want to update (using a
SELECT), then updating those using that SELECT in a sub-query.
For example if we wanted to update the fee for all bands playing in the BMI venue, adding $100 to each band’s fee.
-- Find the list of band ids. SELECT bands.id FROM bands JOIN performances ON bands.id = performances.band_id JOIN venues ON performances.venue_id = venues.id WHERE venues.name = "BMI" -- UPDATE using a sub-query to provide the list of ids. UPDATE bands SET bands.fee = bands.fee + 100 WHERE bands.id IN ( SELECT bands.id FROM bands JOIN performances ON bands.id = performances.band_id JOIN venues ON performances.venue_id = venues.id WHERE venues.name = "BMI" )
Finally we can delete data in the database. The DELETE statement is a SELECT statement (to find the rows you want to delete) but rather than specifying the columns you want, change the SELECT clause to DELETE. Using SELECT first helps you ensure that you will delete the right rows! No Undo button (at least not in the way we have SQL set up, undo is possible using a feature called “transactions”).
SELECT * FROM books WHERE id = 23 DELETE FROM books WHERE id = 23
In MySQL setup as we have done in this class we will have to manually delete references to this rows from tables with foreign keys.
DELETE FROM books_authors WHERE book_id = 23
Note that one would need to check if the author had written any other books before deleting the author.
DELETEs can’t be undone, so only delete using
ids and do
It is possible to automatically delete linked rows using “cascading deletes” but usually there are other things that need to be checked and changed, so largely the logic for that has moved out of the database and is implemented through python, ruby, or php. It is worth, though, thinking briefly about those situations.
The instruction “Delete Emily Bronte” seems straightforward.
SELECT * FROM people WHERE people.name = "Emily Bronte" -- Check that it all looks good and ok to delete, no unexpected rows. DELETE FROM people WHERE people.name = "Emily Bronte"
However, deleting Emily from the people table does not remove her id from the database entirely. For example it would leave her id in the book_roles table associated with books she had written (and potentially in other tables) Even if we delete all of her book_roles what ought to happen to books that she co-authored with others, should they now also be removed? (Are we erasing her from history?) If we don’t then we have erroneous information about those books.
Let’s say that we want to remove any books (and book_roles) that were associated with Emily Bronte. In these situations it is best to work backwards from the further location (ie the Book), back towards the Person. (It’s easier to find the books that Emily Bronte was involved in before Emily Bronte’s record in the person table is deleted!)
-- Find all the books associated with Emily Bronte in any form. SELECT * FROM people WHERE people.name = "Emily Bronte" -- Navigate over to books, this gets any books associated with Emily. SELECT * FROM people JOIN book_roles ON book_roles.person_id = people.id JOIN books ON book_roles.book_id = books.id WHERE people.name = "Emily Bronte" -- Just get books.id -- This returns a list of book ids like (7, 23, 45) SELECT books.id FROM people JOIN book_roles ON book_roles.person_id = people.id JOIN books ON book_roles.book_id = books.id WHERE people.name = "Emily Bronte" -- Now use that list to do the delete from books. -- Could write the results of last query down -- (I just made these numbers up) and use like this: SELECT * FROM books WHERE books.id IN (7, 23, 45) -- But it's better to use a sub-query that generates the list. -- Here I've just substituted the query two up for the list of ids. SELECT * FROM books WHERE books.id IN ( SELECT books.id FROM people JOIN book_roles ON book_roles.person_id = people.id JOIN books ON book_roles.book_id = books.id WHERE people.name = "Emily Bronte" ) -- Now convert the SELECT * to DELETE. DELETE FROM books WHERE books.id IN ( SELECT books.id FROM people JOIN book_roles ON book_roles.person_id = people.id JOIN book ON book_roles.book_id = books.id WHERE people.name = "Emily Bronte" ) -- Now delete all book_roles associated with those books -- SELECT * FROM people JOIN book_roles ON book_roles.person_id = people.id WHERE people.name = "Emily Bronte" -- Get book_ids for those. SELECT book_roles.book_id FROM people JOIN book_roles ON book_roles.person_id = people.id WHERE people.name = "Emily Bronte" -- Now find all remaining records associated with those books. SELECT * FROM book_roles WHERE book_roles.book_id IN ( SELECT book_roles.book_id FROM people JOIN book_roles ON book_roles.person_id = people.id WHERE people.name = "Emily Bronte" ) -- Note that this deletes a supplier role as well. Possibility that -- we "orphan" a supplier person if they are no longer associated -- with the book. See below. -- And convert to a delete. DELETE FROM book_roles WHERE book_roles.book_id IN ( SELECT book_roles.book_id FROM people JOIN book_roles ON book_roles.person_id = people.id WHERE people.name = "Emily Bronte" ) -- Finally delete Emily. DELETE FROM people WHERE people.name = "Emily Bronte"
An advance would be to find all orphaned people (could be Emily’s translator or supplier if that was their last book). That type of query requires a form of JOIN that we haven’t discussed, called an OUTER JOIN. See https://blog.codinghorror.com/a-visual-explanation-of-sql-joins/