CORGIS: The Collection of Really Great, Interesting, Situated Datasets. Former student says, “They do a good job of explaining what’s in each dataset and what it means.”
Kaggle. But please try to trace back to original datasets to confirm they come from different organizations and they haven’t been cleaned up.
HTML tables are relatively easy to convert to csv using online tools. ConvertCSV has been useful for students (although be conscious that you have to choose the right table on the page):
Wikipedia tables can provide useful data. There are a range of tools that can convert them. Students have used http://wikitable2csv.ggor.de/ and http://import.io. Another option is the wikidata project which provides CSV downloads of InfoBoxes (and perhaps other things).
PDF files can provide useful data, especially from tables, but they have to be converted. The conversion tool that I like the best is http://tabula.technology/
Websites with data in structured formats other than tables can be extracted through “scraping” but that is out the course scope. Students have had luck with http://import.io to set up scraping.
Examples of data that are harder to deal with:
Images of plots (very, very hard, I’ve heard of people doing this but I never have). e.g., Video Game Popularity plot, no underlying data download.
“Record format” (requires a sophisticated scraper). e.g., Dog Breed Personalities
Sort of a combo of record format and images. e.g., A menu archive
Proprietary formats (requires an importer).e.g, https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/6QWX7Q/X1EKIG&version=2.0
Nested headers (and merged cells) (can be dealt with, common in Census data). Data in PDFs (try tabula.technology) https://www.cdc.gov/nchs/data/nvsr/nvsr68/nvsr68_13_tables-508.pdf (Also a PDF, see above).
Non-rectangular tabular data: https://www.nps.gov/aboutus/visitation-numbers.htm (bottom table). We can deal with this using Python, although there are some challenges.
Sort of a mix of lots of types: https://www.daytranslations.com/blog/popular-video-games-continent/