Data sets and writing about data

A work-in-progress collection of easy (and some not so easy) to access data sets. Created for the Data Analytics for Economists course at the University of Wisconsin -- Madison, but all are welcome. Suggestions and corrections are always appreciated (ruhl2@wisc.edu).

Data repositories

Aggregate Economic Data

Data on Individuals

  • American Community Survey Social, economic, housing, and demographic data often used to produce county-level analysis. The public use microdata sample (PUMS) contains anonymous data on individuals.

  • Current Population Survey Household level data on employment, income, and education.

  • NLSY79 These data follow a cohort of men and women who were 14-22 years old in 1979. They are then re-surveyed each year until 1994. Not the easiest data to access, but there is a lot to learn from it.

  • National Survey of Family Growth Interviews with females about pregnancy and associated topics. Includes demographic data.

  • FBI Crime Data Explorer What kinds of crimes are being committed? How are they changing over time?

Wisconsin Data

  • WI Dept. of Health Services Data on Asthma, Zika, and lots in between. It takes some clicking around, but many of datasets can be visualized as a map to get you thinking. Look for the download button in the top right corner.

  • Wisconsin Voting Data A lot of detail. There is an api, too.

  • City of Madison More data on the city, including lots of spatial data. The tax rolls are interesting—-I can see my house in this dataset!

Micro Export Data

  • Brookings Export Monitor Exports by industry at the county, metro, and state levels. (aggregates, too) This data tracks goods according to where they were produced.

  • USA Trade Sign up for a free account to use. Imports and exports by product and U.S. state. This data tracks goods according to origin of movement rather than production.

Business Data

  • Inside Airbnb Data on listing, reviews and calendar data. Doesn't have data for a Wisconsin city, but Minneapolis and Chicago are in there.

  • Yahoo Finance Historical and current financial data. The api in `pandas_datareader` is broken, but you can still download files from the site.

  • FDIC Aggregate data on US banks, including balance sheet and income statement data. The data on bank failures might make for an interesting analysis.

  • Airline routes (T-100) Route-segment based data. Monthly observations on number of passengers, seats, and cargo transported on a given route segment for each airline.

  • Airline itineraries (DB1B) Quarterly sample of 10% of passenger itineraries from major airlines. Includes price data.

  • Zillow Housing and rental data by metro area.

Medical Data

  • HRSA Data Grant, loan, and scholarship program data, as well as data about availability of healthcare. The data on health professional shortage areas looks interesting.

  • Dartmouth Atlas of Health Care Compiled from medicare data, the database provides information about health care at detailed levels, right down to the hospital.

Sports

  • Baseball Database by Sean Lahman: batting and pitching statistics from 1871-2017 plus much more.

Arts and Culture

  • Cooper Hewitt Open access to data about the collection.

  • MovieLens Movie ratings and demographic data about the raters. Some very large datasets, but some small ones for getting your code up and running.

  • New York Philharmonic Data on more than 20,000 performances.

Education

  • College Scorecard University/College level data about the school and its student body.

Other Data Collections

  • NBER Datasets that go with NBER working papers. Some data is easy to access some is not (and some is missing). The associated papers are full of good questions, too.

  • ICPSR A large collection of social science data. We have not used this data—-let us know if you do, we would like to hear about it.

  • Kaggle This site runs competitions and warehouses lots of data and code.

  • Chicago data portal. Lots of data about the city.


Data blogs and other resources

  • The FRED blog: Short posts on topical economic questions and observations.

  • Reddit's data is beautiful: As much chaff as wheat, but worth an occasional look.

  • Fivethirtyeight: Economics, sports politcs...a bit of everything.

  • COMTRADE visualization: Showcases visualizations that use COMTRADE data. Some good stuff and some good examples of overwrought, hard to follow visualizations.

  • VizWiz by Andy Kriebel: Makeover Monday, Tableau Tip Tuesday, and Workout Wednesday