Data Wrangling software setup

The software for the class will run entirely on your laptop. First we have to install two pieces of software: Docker and git. On this page there are commands that can be easily coped. They have grey backgrounds and a little blue icon in the top left which will enable you to copy them.

Download and install Docker

Docker for Mac download is available here. Download it and copy to your Applications folder (as normal for installing an application).

Windows has two paths depending on whether you are running Windows 10 Pro/Enterprise or Windows 10 Home etc.

Docker for Windows 10 Pro is available here. Download, double-click, and follow installation instructions.

Docker for Windows 10 Home is available here. Follow instructions on that page. After you’ve clicked the “Docker Quickstart Terminal” the system will install/configure some more things, you may need to click some Windows permission boxes or hit return if things appear stuck. (For James it often seems to stick at “Waiting for an IP” and takes upwards of 60seconds). Follow the instructions until you can do get a successful docker run hello-world.

Check installation

Once the applications are installed, please check that each installed correctly by going into your Command Line Interface (for Mac that’s Terminal.app found in /Applications/Utilities for Windows that’s Docker Quickstart Terminal or Git Bash); these are sometimes called “shells”. Once you are staring at a command line interface that shows a $ prompt, type (or copy) each of these commands. The $ do not need to be copied. Think of them as line indentions or place holders for your cursors position in the Command Line Interface.

Note that on windows the usual key command for Paste might not work, try shift + insert keys.

git version
docker version

Each of these should cause your Command Line Interface to display the “model” or version of your application, which also tells you that it downloaded successfully. If you don’t get any message or an error message displays, then you know that the application was not installed right and please ask for help.

Clone the wrangling-docker repo

If you had previously done a git clone and already have a wrangling-docker directory but want to start over use

cd ~
rm -rf wrangling-docker

In your CLI (either Terminal.app or Docker Quickstart Terminal) use this command to pull from a repository set up for the class.

git clone https://github.com/howisonlab/wrangling-docker.git

This command will make a copy of the Wrangling-Docker repository. This repository contains all the files and applications that we will be using for the class. If you need more help or are confused, please feel free to contact the professor or TA.

Get Docker to run the software needed

Once the Wrangling-Docker is cloned, you will need to move into the repository. Essentially, just because you cloned the repository, doesn’t mean your computer knows you actually want to use it. So, still in the Command Line, use the following commands:

cd wrangling-docker

cd stands for ‘change directory’ and you’re basically just changing from one file folder to another. Then type:

docker-compose up

Docker will download quite a bit of data, then build some stuff. Let it do it’s thing, eventually it stops :) although it doesn’t show the $ prompt again. Once it stops outputting (~10 secs) there are servers running on your laptop and we can access them via your web browser. You do need to keep the terminal open to access the links below, if you close it the servers close with it (but can be re-opened without loosing anything).

Applications

Windows

Windows 10 Home

Use Chrome or Firefox, not Edge.

phpmyadmin: http://192.168.99.100:8080

User: root, no password.

jupyter: http://192.168.99.100:8888

password is data.

Trouble? On Windows 10 Home we can’t use the 127.0.0.1 ip address. Usually 192.168.99.100 will work instead. Sometimes, though, the system creates a different address. You can find if that’s the case by running the Docker Quickstart Terminal and then running the command

docker-machine ip default

If that shows something other than 192.168.99.100 then copy it down, as you will have to use that everywhere you see 192.168.99.100 or 127.0.0.1.

Note that I had trouble accessing the 192.168.99.100 addreesses via the Microsoft Edge browser, but it worked through Firefox and Chrome.

Windows 10 Enterprise

phpmyadmin: http://127.0.0.1:8080

User: root, no password.

jupyter: http://127.0.0.1:8888

password is data.

Mac and Linux

phpmyadmin: http://127.0.0.1:8080

User: root, no password.

jupyter: http://127.0.0.1:8888

password is data.

Troubleshooting

Sometimes docker containers are already running and the docker-compose up command fails (because it’s trying to map to already used ports or access already used volumes). The error might look like this one:

ERROR: for wrangling_docker_mariadb_1  Cannot start service mariadb: b'driver failed programming external connectivity on endpoint wrangling_docker_mariadb_1 (0c49e26bf2a71913245f06fbcf9211da325b4a5c66e881577c78ddc0bd089fac): Bind for 0.0.0.0:3306 failed: port is already allocated'

ERROR: for mariadb  Cannot start service mariadb: b'driver failed programming external connectivity on endpoint wrangling_docker_mariadb_1 (0c49e26bf2a71913245f06fbcf9211da325b4a5c66e881577c78ddc0bd089fac): Bind for 0.0.0.0:3306 failed: port is already allocated'

or:

Error for already running docker Error starting Userland proxy

Error for already running docker Error starting Userland proxy

In that case you can run:

docker stop $(docker ps -aq)

then

docker-compose up`