Launch Jupyter Labs and create a new directory within your class folder data-a-user-manual
called working-with-text
. Navigate to that folder and create a new Jupyter Notebook, renaming the filename: working-with-text.ipynb
.
Today we’ll be working with a single text: a narrative dictated by Venture Smith, a successful businessman and formerly enslaved man in colonial-era New England. The narrative, A Narrative of the Life and Adventures of Venture, a Native of Africa: But Resident above Sixty Years in the United States of America, Related by Himself, is available in several formats online but for the purposes of today I’ve put it into the primary kind of file you will want to use when working with textual data in Python: a text file (ending in .txt
).
- Download Smith’s narrative by clicking on this link(if it opens in your browser without downloading you might need to use File -> Save As OR right click the link and select “Save link as”). Move the .txt file into your
working-with-text
folder. Take a look at the text file in Jupyter Lab by double-clicking on the file in the left pane. - In Jupyter Lab, add a code cell and create a new variable called
smith
that assigns the contents of theventure-smith.txt
file using theopen()
andread()
functions (Walsh instructions). - In a new code cell, add two lines of code. In the first, just run your new variable
smith
to display its contents. In the second line, useprint()
to display its contents. What is the difference between these two? - Use
index()
function to show the first character (letter) of Smith’s narrative. It should show:'A'
. - Use
slice()
to show the first 100 characters (letters) of Smith’s narrative. It should show:'A NARRATIVE OF THE LIFE AND ADVENTURES\n\nOF VENTURE, A NATIVE OF AFRICA,\n\nBut resident above sixty ye'
- Let’s just isolate the title of Smith’s narrative. This is comprised of the first 158 characters. Make a new variable called
smith_title
and assign it the first 158 characters of the text file (since Python starts counting at 0, this means we want to usesmith[0:157]
). - Use
string.title()
to reformat the title of Smith’s narrative by making the first letter of each word capitalized. Hint: thestring.
in this example is your new variablesmith_title
. - Notice how your main variable
smith
contains newline characters (\n
). This is a “hidden” character contained in text files that tells a text editor to show the following text as starting on a new line (like hitting Enter or Return in a Word document). Usestring.split('delim')
to “split” up Smith’s narrative into separate lines. What would you use in place ofstring.
anddelim
to do this? Assign this new collection of separate lines to a new variable calledsmith_lines
. - The
len()
function tells you how long something is. In this case, we’ve created a variable calledsmith_lines
containing a list of all the lines from Smith’s narrative. Uselen()
andsmith_lines
to show the length of Smith’s narrative in terms of the number of lines.
Bonus Practice:
- Use
string.split()
,index()
, andlen()
to:- Print the 200th word in Smith’s narrative
Print the length of the 200th word in Smith’s narrative measured by number of characters/letters.- Print the length of Smith’s narrative measured by number of words
- Use
string.split()
to break Smith’s narrative apart into separate chapters and thenlen()
to calculate how long Chapter II is based on the number of characters in that chapter.