Data Biography

Your assignment is to write a “data biography” about a historical dataset that I have selected for you. First, read Heather Krause, “Data Biographies: Getting to Know Your Data” Global Investigative Journalism Network (March 27, 2017) to get a better sense for what a “data biography” is and why it is important.

This is the dataset you will be examining: https://github.com/swat-ds/datasets/tree/main/1847census. I am not providing you with any additional information about the dataset beyond the above link. You will need to put on your detective hats and try to familiarize yourself with the data and its history. Make sure that you download the actual dataset and take a look at its contents in addition to tracking down its history.

Your data biography should tell a story about the dataset that addresses the following:

  • Introduce the dataset and its contents. What kind of information is in there? How much data is there?
  • Where did it come from?
  • Who collected, processed, and made it available?
  • How was it collected, processed, and made available?
  • Why was it collected, processed, and made available?
  • How is it stored today? How did you access it?
  • Potential problems with the data - are there any limitations, biases, missing data or gaps, or ethical considerations to consider when using this data?

Note that, like most historical datasets, answering some of these questions will require you to think about the multiple stages through which this information has passed to get to its current state as machine-readable data. So “Who collected it?” needs to include both the original historical actors who created the information along with the subsequent people who ultimately made it available for you.

Due as a Word document uploaded to Canvas by Friday 10/8 at 11:59PM