Introduction

Much of digital history revolves around data - finding it, understanding it, and processing it. In this example we are going to use material from Cameron’s undergraduate class History of the Western U.S. Each of his students in that class brainstormed a list of words they associate with the western United States and wrote them down on a notecard.

notecard

This is a familiar kind of “source” for historians: hand-written words on a piece of physical paper. Our job is to turn these pieces of paper into machine-readable data. Once we have a dataset of all of the words that Cameron’s students wrote down, we could do some basic text analysis. What are the most common words that this group of undergraduates associate with the western United States? What kinds of words tend to appear together? Although it’s a very small dataset in terms of the number of students, these are interesting questions that touch on issues of historical memory and popular culture.

Transcription

Each person is going to “digitize” one notecard.

Group Discussion:

Read notecard and think about the different words and phrases. How should we go about digitizing this kind of source?

Some things to consider:

  • What is the purpose of this data?
  • What is it going to be used for? Is this meant to archive these sources in a way that preserves their original form as much as possible? Or do we want to capture the information that’s recorded on them?
  • Do you see any issues or have any questions about the mechanics of transcribing your notecard? Ex. What are you going to do with misspelled words?

Data Wrangling and Analysis

Now that everyone has transcribed a notecard, we’re shift over to Python to wrangle our freshly made data and see if we can do some basic analysis of it. To get started, download the Jupyter Notebook for Data Wrangling.