Body Mass Index Daemen College Data Analysis Pro
Python Pandas Reference Resources
Consult these learning and reference resources as needed. Of course you may also search the wider web, but the pandas documentation is excellent and worth searching first.
Python Pandas Udemy Course
This is still the best all-around introduction to pandas attributes and methods, with great explanations and examples to illustrate. But now you can start using it topically by browsing the video titles to find what you need.
- Data Analysis with Pandas and Python (Links to an external site.)
https://www.udemy.com/course/data-analysis-with-pandas/learn (Links to an external site.)
Reference Resources from the Pandas Documentation
These reference resources are well organized, with convenient options for searching and scanning relevant attributes and methods. The examples are brief, but they provide a good reference and reminder as you become more confident.
- Pandas User Guide (Links to an external site.)
https://pandas.pydata.org/pandas-docs/stable/user_guide/index.html (Links to an external site.) - Tutorial: 10 Minutes to pandas (Links to an external site.)
https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html (Links to an external site.) - And as always, well-phrased web searches
3.1 Project: NBA Data Prep plus Calculated Fields
Overview
We have been tasked to prepare the NBA data set for analysis. Specifically, we have been asked to create two new columns as calculated fields: Height_Inches and BMI (Body Mass Index).
Required Learning Resources
Data Analysis with Pandas and Python – Udemy Course
https://www.udemy.com/course/data-analysis-with-pandas/learn (Links to an external site.)
Required Videos:
- 60. Drop DataFrame Rows with Null Values
- 62. Convert DataFrame Column Types with the astype Method
- 97. More DataFrame String Methods — strip, lstrip, and rstrip
- 99. Split Strings by Characters with the str.split Method
- Other videos as needed
Other Sources to Consult as Needed
These reference resources are well organized, with convenient options for searching and scanning relevant attributes and methods. The examples are brief, but they provide a good reference and reminder as you become more confident.
- Pandas User Guide
https://pandas.pydata.org/pandas-docs/stable/user_guide/index.html (Links to an external site.) - Tutorial: 10 Minutes to pandas
https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html (Links to an external site.) - And as always, well-phrased web searches
Project Guide
Download the project guide:
3.1 NBA Data Prep plus Calculated Fields.docx
Video Tips
Below are a series of short videos to provide help with specific parts of this project. As always, feel free to do as much as you can on your own, and then consult my videos as needed to help you over a hump.
If you’re still stuck with something after watching my videos — that can happen! Please share your case in our project discussion board.
Creating a Table of Contents for your Jupyter Notebook
I’d like you to start using markdown cells to provide useful detail regarding your projects. Here’s a quick tip on setting up a table of contents at the top of a notebook — something I’d like you to do from now on.
https://www.loom.com/share/d1983a7e2b394b93ac41ba0a7bf5ae7b
Reviewing Data with .info() and .describe()
In this 7-minute video, I highlight a few things I look for — data types, null values, summary statistics — when loading and reviewing data. Toward the end of the video, I discuss the custom code snippet I use to make the statistical summary from .describe() easier and quicker to read.
https://www.loom.com/share/22c6ed0059634cd4b33f5b7019d00190
Dropping Useless Records and Reviewing the Results
In this very quick video, I briefly review the .dropna() parameters we needed for this step, and I review the updated data set.
https://www.loom.com/share/b693c85aebe646df9d4f93e9c989babe
Converting Jersey Numbers to Strings
There’s a little challenge in this process. I walk through an efficient way to tackle it.
https://www.loom.com/share/7b5fd95c6e7e4a00b00a98e565d261d8
Part 2 of the above: Replacing the Current Number Column with the New Data Type
https://www.loom.com/share/6e76fbaa02cd44318549d5413fd6942b
Create the Height_Inches Column
This is one of the thornier problems. I’ll walk you through it. Be sure to consult and learn about the .str.split()method!
https://www.loom.com/share/33a5140a47ef448f8a8e29c7ce890b5c
Create the BMI Column
After what we just did, this step is pretty easy! We’ll use the Weight and Height_Inches columns to create a new BMI column.
https://www.loom.com/share/5f4f6b09682f4437ac4355dd24ab894d
Organize Columns
Here’s a handy way to reorganize the columns in your dataframe.
https://www.loom.com/share/fa0a266987f34e8287422bbe9c24b8fe
You handle the rest!
I’m confident you can do it. Be sure to consult your learning and reference resources as needed. (And as always, well-written web searches are your friend.)
3.1 Discussion
Use this discussion board to share tips, recommendations, and discussion regarding the assigned project(s).
To foster dialogue, I’m requiring everyone to post something — whether a question, a reflection, or feedback to other users. Here are recommended ideas to prompt your contributions:
- Have you gotten stuck? You’re undoubtedly not alone. Please describe your problem and include screenshots if relevant.
- Can you provide help in response to another student’s question? If so, please do!
- Did you discover any helpful tips in this process? Please share it. Provide relevant screenshots, links, etc.
3.2 Notes: Descriptive Statistics Concepts
In our upcoming projects we’ll use the pandas .describe()method to see a statistical summary of the numerical columns in our data sets.
To prepare for that, I’d like you to load up on — or refresh your memory regarding — a few key statistical concepts. The insights we gain from these summary statistics can be important and helpful!
Project Guide
Download and follow the project guide: