22 Basic SQL queries
To access a database from python, we need to use a client library. For duckdb that library is called duckdb
.
Using the folded setup code here at the top of your notebook.
!pip install jupysql --quiet
!pip install duckdb-engine --quiet
import duckdb
class ClassDictCursor():
def __init__(self, duckdb_con):
self.con = duckdb_con
self.column_names = [desc[0] for desc in self.con.description]
def __iter__(self):
return self
def __next__(self):
if (next_result := self.con.fetchone()):
return dict(zip(self.column_names, next_result))
else:
raise StopIteration
%reload_ext sql
%config SqlMagic.autopandas = True
%config SqlMagic.feedback = False
%config SqlMagic.displaycon = False
%sql duckdb:///duckdb-file.db
Then download class_music_festival.zip and upload to your DataCamp notebook.
You can load these files using this DuckDB sql:
%%sqlDROP TABLE IF EXISTS venues;
CREATE TABLE venues AS FROM read_csv_auto('venues.csv');
With that setup code in place, we can connect to duckdb and execute a query using:
with duckdb.connect('duckdb-file.db') as con:
"SELECT * FROM venues") con.execute(
We want to access our results like a list of dicts. This is typical with other databases (e.g., postgres) but DuckDB doesn’t make that easy … so to do this we are using some custom code for the class, ClassDictCursor
. In the code below we call that just after executing the query.
Once we are in the for row in results:
section everything works just as it did when we were directing reading csv files. Therefore we can do anything we want with strings, numbers, and dates.
with duckdb.connect('duckdb-file.db') as con:
"SELECT * FROM venues")
con.execute(= ClassDictCursor(con)
results
for row in results:
print(row)
print("The venue {name} has the capacity {capacity}".format(**row))
While printing results out to the screen is interesting, for your project you will want to export csv files. We can do this by combining our SQL executing code with our csv writing code.
import csv
with duckdb.connect('duckdb-file.db') as con:
"SELECT * FROM venues")
con.execute(= ClassDictCursor(con)
results
with open('exported-from-sql-query.csv', 'w') as csvfile:
= ["id", "name", "capacity"]
column_names = csv.DictWriter(csvfile,
myCsvWriter =column_names)
fieldnames
myCsvWriter.writeheader()
for row in results:
myCsvWriter.writerow(row)
print("Done writing csv")
The CSV file will be written into the same folder as the current notebook. Change tabs to the file listing, and you can open the CSV file.
Getting the column names from a query is surprisingly difficult. You can declare the column names manually (as above). There is another method, which is short but uses some syntax that you don’t need to worry about at present.
# https://stackoverflow.com/questions/10252247/how-do-i-get-a-list-of-column-names-from-a-psycopg2-cursor
= [i[0] for i in con.description] column_names
Iterating over the cursor is good for queries that return lots of results, but if we’re just getting a single value such as a COUNT(*)
of rows, we can also use fetchone()
which returns just a single row, rather than a list of dictionaries. Without our ClassDictCursor, though, fetchone() returns only a list, not a dict, so we don’t get the column names, and we have to add a [0]
to get the value out of the first column.
with duckdb.connect('duckdb-file.db') as con:
"SELECT COUNT(*) FROM venues")
con.execute(
# fetchone returns a list not a dict.
= con.fetchone()[0]
venue_count
print(venue_count)
22.1 Exercises
- Output all records from the people table into a file called “people.csv”
- Output the first 10 tickets sold to the festival into a file called “earliest_tickets.csv”
- Use print to get a count of performances. Use
fetchone()