Data Wrangling Course

James Howison's Data Wrangling course from the Information School at the University of Texas at Austin.

Command Line Introduction

Working with the computer means getting behind the interface and using the command line (Terminal.app on Mac, PuTTY or Chrome Secure Shell on Windows).

This screencast shows you how to get started, how to log into Holden, and start running commands: Introduction to the Command Line

In order to run our python scripts we will be using a command line interface. This interface enables us to issue “commands” to the computer and the computer runs some code and responds in some way, such as printing output to the screen, creating a file, downloading something and so on.

The terminal looks like this (a screenshot from Terminal.app on Mac OS on my laptop):

Blank Terminal

The window is showing a first line with some information about the last time we used the command line interface and the second line is waiting for our input. Working backward from right to left, the grey square is the cursor; this shows us where we can type commands to the computer. Just before the cursor is a $, called “the prompt” and before that is some information about the computer we are on (It shows the name of my laptop (“DeptOfAir”, then the “path”, which shows us where we are in the file-system (more on that below) and then the name of the user that the computer knows us as, here “howison”. Working with the command line is like a conversation with a very literal, very quick, genius. If you get the command right, everything works, but if you get it just a little wrong, nothing works at all. In particular the command line (just like SQL) is very finicky about spaces. This is because the command line uses spaces to separate the things that it is going to work with; it would treat a name made up from a firstname and a lastname as two totally separate, unconnected things. Unlike a human, who might use their contextual knowledge to figure out that the two names are connected.

We need to connect to the remote server (our old friend holden).

First, if we are off campus we have to connect to the UTexas VPN, review UTexas VPN instructions here, I recommend downloading the Cisco AnyConnect VPN client.

Then, we use the ssh command (it stands for “secure shell”) to talk to the holden server.

Logging In

Here I’ve typed ssh testuser@holden.ischool.utexas.edu but I haven’t yet pressed return. That tells ssh to try to connect to a server at holden.ischool.utexas.edu with the username “testuser”. Hopefully it’s clear that you’d use your own username, which is your iSchool username (not your EID). Once I press return I’ll see:

Logged In

First, the ssh program asked for my password. It doesn’t show the password as you type, but once you’ve typed it, hit return and, if you got the password right, it will connect to holden. First there’s a welcome message and some info about the system (don’t worry about understanding that, it’s not important).

Then you can see the new command prompt (testuser@holden:~$) and the cursor, waiting for our commands.

Windows users can use one of two approaches to getting a shell and ssh client. The first is a program called PuTTY; the second is a Chrome plugin called “Chrome Secure Shell”. These connect a little differently but once you are connected things will be similar. Hostname is holden.ischool.utexas.edu and port is 22 (remember that if you are off campus you have to connect to the VPN first).

Once we are connected one thing we can do is move around the file system, so that we “sit” in different folders and can run different files. When we start out we are in our “home” directory. The computer abbreviates our home directory to “~” (pronounced til-de).

We can always find out where we are with the command “pwd” which stands for “present working directory”. Below I’ve typed “pwd” and hit return.

pwd

The computer has responded with “/home/testuser”. We read this as “slash home slash testuser” and the slashes show us a hierarchy of directories. Together that is called a “path” because it shows us how to navigate through directories. /home/testuser is the home directory for testuser (and ~ means the same thing).

We can see the files in the directory that we’re currently in with the ls command (I always think of this as “list”). Below I typed ls and hit return.

ls no response

Hmmm, the computer didn’t provide any response at all … that’s what I mean about a literal idiot … or genius. It didn’t provide any response because there are no files in the home directory of testuser. So, in a way, ls did list absolutely all of the contents … but sometimes it’s hard to tell if the command worked or not. Anyway …

Now I’ll create a folder (using Atom and remote-ftp, which I’ll show you in a minute). I’ve created a folder called “python” which is where we’ll keep and run our python scripts. Now when I type ls I’ll see that folder in the listing:

ls with response

Now when I type ls and hit return we see python on the next line. On the Mac and Terminal.app folders show up in blue (that’s convenient but doesn’t always happen; if you can’t read the folder names let me know).

Now we can change directory, which we do with the cd command (it stands for “change directory”). I type cd then a space, then py. Now I can keep typing the whole word or I can hit the Tab key and the command line will try to complete what I’ve typed. That’s very convenient and helps cut down on frustrating typos! So it’s a good habit to get into.

ls with response

Once I hit the Tab key I see:

ls with response

Notice that the command line added a slash after python (indicating that it’s a directory). Now I can hit Return and we’ll be in the python directory. See how the path part of the prompt now says “~/python”?

ls with response

Now that we’re in the python folder I can type ls again and see what’s in that folder:

ls with response

You can see that there is a file in there called “test.py”; we’ll create that in your folders today. Now we’re ready to run our first python code. We usually call little files like this “scripts” but there’s no difference between “code”, “script” or “program”, they are all instructions to the computer.

We run the program by typing python3 test.py. You can get there by typing py and hitting tab (but you’ll see lots of options of commands that start with py you so you have to type pyth then hit tab, then type “3”. Then type te and hit tab, which will complete to test.py. That command is telling the python program (nothing to do with the python folder) to run the code in the test.py file.

Once you hit return the code will execute and give you the traditional welcome message from a computer language :)

ls with response

Why python3 and not just python?

The newer version of python is version 3. Python 3 is partially incompatible with python version 2.x, but it has lots of improvements that make things easier, so in this class we’re learning python3. Because computers might have older python scripts that rely on the python 2 stuff that doesn’t work in python 3, generally systems use python to refer to python 2 and python3 to refer to python 3. You can always find out the version that you are using by typing python --version or python3 --version.

Handy hints for easier command line use

  1. Bash remembers the commands you have put on terminal! Simply use the (up arrow) on your keyboard to scroll back through the commands you’ve used. You can use the left and right arrow keys to move the cursor and edit the previous command as well.
  2. Want to read a text file on terminal? cat shows the whole file. head shows the first lines and tail shows the last lines.
    cat my_file.txt
    head my_file.txt
    tail my_filt.txt
    
  3. Want to get out of a current shell? Ctrl</kdb>+D
  4. Want to terminate a script in an infinite loop? (Trapped in the program running) Ctrl+C This sends a KeyboardInterrupt signal to your program and causes it to stop immediately.