Pandas Practice I

Get Started

Open GitHub Desktop and select your course repository (lastname-sp25-data-materials)
Click Fetch origin to check for any changes
Go to Branch → Merge into current branch → select upstream/main and click Create a merge commit if there are updates
Click Pull origin if it’s available (if not, you’re up to date!)
Click Push origin to sync everything up

Launch Jupyter Labs and navigate to this week’s folder
Create a new Jupyter Notebook in this week’s folder with the filename: yourlastname-pandas-1.ipynb.
We are going to be working with census data from the state of Colorado between 1900-1950. This is contained in the file: co-census-skinny.csv
Complete the following steps:

Import the pandas library (import pandas as pd)
Read in the contents of the CSV of Colorado census data and assign it to a variable co_df (ie. Colorado dataframe)
Show the first 10 rows of the co_df dataframe
Show a random sample of 5 rows from the co_df dataframe
Follow Walsh’s example to just filter/select census data from 1940 and assign it to a new variable called co_df_1940.
Apply the max() function to the population column of co_df_1940 to find the largest number people counted in a single Colorado county in the 1940 census.
What county had the largest number of people in 1940? Hint: use filter/select in conjunction with your steps from the previous step (max() and the population column).
Use filter/select to create a new dataframe named denver_df that only has records from Denver.
Use denver_df.plot('year', 'population') to create a line graph of Denver’s population growth between 1900-1950.

Bonus Practice:

Save your line graph for Denver’s population growth as a .png file
Export a new CSV file that just contains records for Denver (denver_df) - see Walsh’s example.