Data Wrangling Course

James Howison's Data Wrangling course from the Information School at the University of Texas at Austin.

We will also write CSV using the csv library. Relevant screencast is here: Writing CSV Files.

The full code used in the screencast is at the bottom of this page.

As with reading CSV files (and perhaps more so) writing them can be tricky (because of fields with commas, newlines, or quoting characters with in the fields). We definitely want to use a library.

As with reading we will work to write a list of dictionaries to the CSV, although it is also possible to write a list of lists.

import csv

# A list of dicts.
venues = [{'capacity': 700, 'id': 1, 'name': 'AMD'},
{'capacity': 2000, 'id': 2, 'name': 'Honda'},
{'capacity': 2300, 'id': 3, 'name': 'Austin Kiddie Limits'},
{'capacity': 2000, 'id': 4, 'name': 'Austin Ventures'}]

Since dictionaries don’t have order, but CSV files write each row in order, you have to declare a list of column names

column_names = ['id','name','capacity']
# If you don't care about column order
# can also use column_names = venues[0].keys()

As with reading csv files we use a with block and provide some basic information to create a csv.DictWriter as we created a csv.DictReader.

with open('exported.csv', 'w') as csvfile:
    myCsvWriter = csv.DictWriter(csvfile,
                                 delimiter=',',
                                 quotechar='"',
                                 fieldnames = column_names)

    myCsvWriter.writeheader() # uses contents of column_names

    # write the rows. We iterate through the list of dicts.
    for row in venues:
        # every row must have all keys from column_names
        myCsvWriter.writerow(row)

print("Done writing csv")

The print command ends the for loop and the with block (because it is not indented). Python knows that you are done with myCsvWriter and csvfile. It closes the file, making it available for others to read.

You can look at the file using the Jupyter browser.

Full csv_writer.py:

import csv

# A list of dicts.
venues = [{'capacity': 700, 'id': 1, 'name': 'AMD'},
{'capacity': 2000, 'id': 2, 'name': 'Honda'},
{'capacity': 2300, 'id': 3, 'name': 'Austin Kiddie Limits'},
{'capacity': 2000, 'id': 4, 'name': 'Austin Ventures'}]

column_names = ['id','name','capacity']
# column_names = venues[0].keys()

with open('exported.csv', 'w') as csvfile:

    myCsvWriter = csv.DictWriter(csvfile,
                                 delimiter=',',
                                 quotechar='"',
                                 fieldnames = column_names)

    myCsvWriter.writeheader() # uses contents of column_names

    for row in venues:
        # every row must have all keys from column_names
        myCsvWriter.writerow(row)

print("Done writing csv.")

Exercises:

  1. Change the order of columns, so that capacity is written first.
  2. Use an if statement to avoid writing rows with a capacity less than 2000.
  3. Add a column called “venue_type” and set the value in each row to “outdoor”.