First create a new directory within your class folder data-a-user-manual
called pandas-1
. Launch Jupyter Labs and create a new Jupyter Notebook in that folder with the filename: pandas-1.ipynb
.
We’re going to be working with census data collected about Colorado counties from 1900-1950. Download this CSV file and make sure it is in your pandas-1
folder.
To replicate best coding practices, you’re going to use alternating Markdown and Code cells in your Jupyter Notebook. Copy and paste each of the following steps into a new Markdown cell that documents in your own words what you’re doing in the following code cell. Then insert a new code cell and write your Python code that completes the task for that step.
- Import the pandas library (
import pandas as pd
) - Read in the contents of the CSV of Colorado census data and assign it to a variable
co_df
(ie. Colorado dataframe) - Show the first 10 rows of the
co_df
dataframe - Show a random sample of 5 rows from the
co_df
dataframe - Follow Walsh’s example to just filter/select census data from 1940 and assign it to a new variable called
co_df_1940
. - Apply the
max()
function to thepopulation
column ofco_df_1940
to find the largest number people counted in a single Colorado county in the 1940 census. - What county had the largest number of people in 1940? Hint: use filter/select in conjunction with your steps from the previous step (
max()
and thepopulation
column). - Use filter/select to create a new dataframe named
denver_df
that only has records from Denver. - Use
denver_df.plot('year', 'population')
to create a line graph of Denver’s population growth between 1900-1950.
Bonus Practice:
- Save your line graph for Denver’s population growth as a .png file
- Export a new CSV file that just contains records for Denver (
denver_df
) - see Walsh’s example.