James Howison's Data Wrangling course from the Information School at the University of Texas at Austin.
We store data in variables. Regular variables (like strings or numbers) just hold a single item. But often we want to hold more then one item together. That’s what lists and dicts (dictionaries) are used for.
Lists and dicts are groups of variables. Think of them like a pill box broken up into slots for different days. You can put things into each slot and each slot has a name.
For Lists, the slots are named by numbers (0, 1, 2).
For Dicts, the slots are named with strings (e.g., “Mon”, “Tues”, “Wed”).
courseList = ["Introductory Zoology", "Zoology Lab", "Penguin Studies"]
We can access the parts of the list using this syntax:
print("My second course is " + courseList[1]) # ==> Zoology Lab
Wait … what now? courseList[1]
shows the 2nd item? Why is that not courseList[2]
?
The answer is because we start counting from 0, so courseList[0]
is the 1st item.
Why? It’s about offsets in memory, see https://en.wikipedia.org/wiki/Zero-based_numbering
But really it’s just one of those things. list[0] is the first item.
The number is called the “index”.
We can add items to the List with an empty index. And the computer will assign the next (kinda like AUTOINCREMENT in MySQL)
courseList.append("GnuMath") #==> now courseList[3] holds the string "GnuMath"
Dicts are the same as Lists except that instead of numbers as indexes they have words. They use curly braces instead of square ones. The word indexes are called “keys”. Each key points to a “value”.
courseDict = { "101c": "Intoductory Zoology",
"210s": "Zoology Lab",
"315p": "Penguin Studies" }
The keys have to be unique, but the values don’t have to be unique. Unlike Lists, Dicts are not always ordered, but this changed recently so the keys are now ordered by default.
print("One of my courses is " + courseDict["210c"])
Adding things works in a very similar way, but you have to provide a key (Python can’t guess, as it can with adding items to a List
courseDict["325m"] = "GnuMath"
There are good examples for Lists and Dicts at: http://learnpythonthehardway.org/book/ex39.html (don’t worry at all about the “Make your own Dictionary Module” part
One thing that we do with List and Dicts is go through item by item and do things with them. That way we can write a short piece of code and have it work on each item. Eventually we’ll use this to process csv files and the results of SQL queries.
courseList = ["Introductory Zoology", "Zoology Lab", "Penguin Studies"]
for currItem in courseList:
print("You are enrolled in a " + currItem + " course")
Note that the variable currItem is not a special word, it’s just a variable name, so you can use anything. Use something that makes sense to you. Whatever you use gets assigned to the next item in the list each time the block of code is run.
for myCourse in courseList:
print("You are enrolled in " + myCourse)
Unsurprisingly we can also iterate over a dictionary.
show_info = {} # empty dictionary
show_info["band_name"] = "Beardyman"
show_info["venue"] = "BMI"
for key in show_info:
print(key)
Produces:
band_name
venue
So when we iterate in this basic way over a dictionary we get each of the keys (in a random order). We can actually use these keys to get the values (following the principle of replacement).
show_info = {} # empty dictionary
show_info["band_name"] = "Beardyman"
show_info["venue"] = "BMI"
for key in show_info:
print("The " + key + " is " + show_info[key])
When key
is set to "band_name"
then show_info[key]
is the exact same as writing show_info["band_name"]
(another example of the principle of replacement).
Note that a List is its own order, but we can re-order it and then use that order that’s called “sorting”. The easiest thing to sort by is alphabetical order note that numbers can also be sorted this way (alphabetical plus numbers is called “lexical” order.)
As Information School students we know that keeping the “original order” is sometimes very important!
mySortedList = sorted(courseList) # that creates a new list, leaving the original list
for currItem in mySortedList:
print("You are enrolled in " + currItem)
but if you don’t care about the original order you can use: courseList.sort()
# this sorts the list “in place”, changing the order
courseList.sort()
for myCourse in courseList:
print("You are enrolled in " + myCourse)
In the database part of the course we talked about four operations using the acronum CRUD (Create, Read, Update, and Delete). That’s a convenient way to remember the basic operations for lists and dicts.
"""Show CRUD operations for lists."""
# Create an empty list
my_list = []
# Create a manually declared list
my_list = ["First item", "Second item"]
# Add an item
my_list.append("New item")
# Read an item using an index. Remember they are 0 indexed so
# first item is actually 0.
print(my_list[1])
# You can also reference from the end of the list using negative numbers
print(my_list[-1])
# Change an item at a position using assignment to the index
my_list[1] = "Changed second item in the list"
# Remove an item from the list
del my_list[1]
removed_item = my_list.pop(1) # pop returns the item.
# Delete the whole list
del my_list # you rarely end up needing to do this.
Now for dicts. Dicts are created with curly braces {}
but read with square ones []
.
"""Show basic operations for dicts."""
# Create empty dict
my_dict = {}
# Create manually declared dict
my_dict = {"a key": "a value",
"another key": "another value"}
# Add an item
my_dict["new key"] = "new value"
# Read an item. Use the key in square brackets. Not curly brackets.
# Yes, that's pretty inconsistent, isn't it.
print(my_dict["a key"])
# Change the value at a key using assignment to the key.
my_dict["another key"] = "A new value"
# Remove an item from the dict. Removes key and value.
del my_dict["new key"]
removed_value = my_dict.pop("new key") # returns removed value
# Delete a whole dictionary
del my_dict # one rarely does this.
We briefly touched on using a for
loop to iterate over items in a list. The material below provides additional insight into how those work, but it is optional, you can use for
loops in this course without understanding this material.
A while
loop is a more manual way to iterate, compared to a for
loop.
The while loop is also explained in this previous semester Screencast on While Loops. The screencast uses the code below.
This code celebrates with “hip, hip, hurray” but you can customize it
for greater anticipation (e.g., “hip, hip, hip, hip, hurray”) by changing
todo. The test on line 13 (todo > done
) is repeated after each line 15.
todo = 3
done = 0
while(todo > done):
print("hip")
done = done + 1
print("hurray")
A few questions to ask yourself:
The figures used in the screencast, showing the state of variables is below:
The for
loop allows us to iterate over lists and dictionaries. It is simpler form of the while loop.
my_list = ["zero", "one", "two", "three"]
for item in my_list:
print(f"The next item is: {item}")
could be rewritten manually using a while loop. We have to point to each item in the list one by one. We know how to point to a single item using my_list[0]
and my_list[1]
so we use a number as a counter in the while loop, comparing it to the length of the list len(my_list)
.
my_list = ["zero", "one", "two", "three"]
counter = 0
while (counter < len(my_list)):
print("The next item is: ")
print(my_list[counter])
counter = counter + 1
Notice we have to increment the counter variable at the end of each loop: counter = counter + 1
.
We can make this even more similar to the for
loop above by using a temporary variable item = my_list[counter]
and f-strings.
my_list = ["zero", "one", "two", "three"]
counter = 0
while (counter < len(my_list)):
item = my_list[counter]
print(f"The next item is: {item}")
counter = counter + 1
While the while
loop helps us understand what the for
loop is doing, we will use the for
structure a lot as we move down through the lines of a csv file (and sometimes across the fields on each line as well).
More in this screencast: Iterating over Lists and Dicts.