CS628 – Data Science
Week 12 Assignment
Monroe College
Note: Read the resources posted in week 12 folder and the chapter 7 of the ebook in
the week 12 folder (Python for Data Analysis).
Solve the following problems 1 to 3, work with the Nutrition_subset data set. The data set
contains the weight in grams along with the amount of saturated fat and the amount of
cholesterol for a set of 961 foods. Use Python.
1. The elements in the data set are food items of various sizes, ranging from a
teaspoon of cinnamon to an entire carrot cake.
a. Sort the data set by the saturated fat (saturated_fat) and produce a listing of
the five food items highest in saturated fat.
b. Comment on the validity of comparing food items of different sizes.
2. Derive a new variable, saturated_fat_per_gram, by dividing the amount of
saturated fat by the weight in grams.
a. Sort the data set by saturated_fat_per_gram and produce a listing of the
five food items highest in saturated fat per gram.
b. Which food has the most saturated fat per gram?
3. Derive a new variable, cholesterol_per_gram.
a. Sort the data set by cholesterol_per_gram and produce a listing of the five
food items highest in cholesterol fat per gram.
b. Which food has the most cholesterol fat per gram?
Solve the following problems 4 to 6, work with the adult_ch3_training data set. The
response is whether income exceeds $50,000. Use Python.
4. Add a record index field to the data set.
5. Determine whether any outliers exist for the education field.
6. Do the following for the age field.
a. Standardize the variable.
b. Identify how many outliers there are and identify the most extreme outlier.