First create a new directory within your class folder data-a-user-manual called pandas-1. Launch Jupyter Labs and create a new Jupyter Notebook in that folder with the filename: pandas-1.ipynb.
We’re going to be working with census data collected about Colorado counties from 1900-1950. Download this CSV file and make sure it is in your pandas-1 folder.
To replicate best coding practices, you’re going to use alternating Markdown and Code cells in your Jupyter Notebook. Copy and paste each of the following steps into a new Markdown cell that documents in your own words what you’re doing in the following code cell. Then insert a new code cell and write your Python code that completes the task for that step.
- Import the pandas library (
import pandas as pd) - Read in the contents of the CSV of Colorado census data and assign it to a variable
co_df(ie. Colorado dataframe) - Show the first 10 rows of the
co_dfdataframe - Show a random sample of 5 rows from the
co_dfdataframe - Follow Walsh’s example to just filter/select census data from 1940 and assign it to a new variable called
co_df_1940. - Apply the
max()function to thepopulationcolumn ofco_df_1940to find the largest number people counted in a single Colorado county in the 1940 census. - What county had the largest number of people in 1940? Hint: use filter/select in conjunction with your steps from the previous step (
max()and thepopulationcolumn). - Use filter/select to create a new dataframe named
denver_dfthat only has records from Denver. - Use
denver_df.plot('year', 'population')to create a line graph of Denver’s population growth between 1900-1950.
Bonus Practice:
- Save your line graph for Denver’s population growth as a .png file
- Export a new CSV file that just contains records for Denver (
denver_df) - see Walsh’s example.
