Overview

In this assignment, you will write a “data biography” (I borrow this term from Heather Krause) in which you investigate both the origins and contents of a historical dataset that I have selected for you. Your data biography will conduct a close analysis of the dataset and how it was produced in roughly 1,000 words.

Goals

  • Understand the historical, social, and cultural contexts of data creation
  • Develop skills in exploring tabular datasets in Python
  • Practice how to effectively write and communicate about data

Technical Requirements

  • Written in a Jupyter Notebook file
  • Uses Markdown cells for written analysis (approximately 1,000 words)
  • Uses Python code cells to open the dataset and illustrate some of its contents for the reader
  • Incorporates visual images (screenshot, photograph, etc.) into your written analysis
  • Final version rendered as a page on your Github Pages portfolio site

Instructions

The Dataset

This is the dataset you will be examining. Note: you can find a CSV version of the dataset available under the data-biography folder in our sp25-class-materials repository. I am not providing you with any additional information about the dataset beyond the above link. You will need to put on your detective hats and try to familiarize yourself with the dataset and its history.

Analyis and Writing

  • Take as many notes as you can to answer the following questions about the dataset:
    • Who collected, processed, and made this information available?
    • Where and when was this information collected, processed, and made available?
    • How was this information collected, processed, and made available?
    • Why was this information collected, processed, and made available?
    • How is this data stored today? How did you access it?
    • What kind of data is it? What information does it contain?
    • What are the limitations or weaknesses of the dataset? What is missing?
    • What social, cultural, or political contexts should a user keep in mind while using the dataset?
  • Look over your notes and come up with an outline for your data biography. It needs to include a mix of written analysis, Python code cells that explore the contents of the data, and visual images illustrating some of your points.
  • You will write your data biography in a Jupyter Notebook file. This will allow you to weave together your written analysis inside Markdown cells along with using Python code cells and images.
  • Once your assignment is finalized and ready for submission, you will then create a second version of the file that renders your notebook as a page on your Github Pages portfolio site.

⚠️ ⚠️ ⚠️ Follow these instructions when you are ready to start writing. This will teach you how do things like insert images into your Jupyter Notebook file and render it as a page on Github Pages.

Tips

  • Don’t assume your reader has any existing knowledge about the dataset. You’ll need to offer an introduction and overview of the dataset before diving into more detail.
  • You will need to address the multiple stages through which this information has passed to get to its current state as machine-readable data. So, for example: in answering the question “Who collected, processed, and made it available?” you need to include both the original historical actors who collected the information and the subsequent people who turned that information into a computational dataset.
  • Review your readings from Data Feminism for inspiration, including concepts like:
    • Raw vs. cooked data
    • All knowledge is situated
    • What gets counted counts
    • Rethink binaries and heirarchies

Submission

Make two submissions on the Canvas assignment page:

  1. Website URL: a link to your data biography’s page on your Github Pages site
  2. File upload: Upload your original Jupyter Notebook .ipynb file.