## Learning Pandas with McDonalds - Part 1

### What is pandas?
In computer programming, pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. It is free software released under the three-clause BSD license. The name is derived from the term "<b>pan</b>el <b>da</b>ta", an econometrics term for multidimensional, structured data sets.

(Source: https://en.wikipedia.org/wiki/Pandas_(software))

### Today, we'll be exploring the Mcdonalds Dataset! 
![alt text](images/mcdonalds.jpg)

#### About the Dataset: 
This dataset provides a nutrition analysis of every menu item on the US McDonald's menu, including breakfast, beef burgers, chicken and fish sandwiches, fries, salads, soda, coffee and tea, milkshakes, and desserts.

The menu items and nutrition facts were scraped from the McDonald's website.

Source: https://www.kaggle.com/mcdonalds/nutrition-facts

#### Understanding Nutritional Facts (FAQs)
- [Understand Nutritional Facts using this guide.](https://www.fda.gov/Food/IngredientsPackagingLabeling/LabelingNutrition/ucm274593.htm)
    - Everyone should familiarize themselves with these facts, not just for this exercise, but for their overall health!


- <b>Percent (%) Daily Value (DV)</b> on the Nutrition Facts label is a guide to the nutrients in one serving of food. 
    - If the label lists <i>15 percent for calcium</i>, it means that one serving provides <i>15 percent of the calcium you need each day</i>.
    - DVs are based on a <b>2,000-calorie diet for healthy adults</b>.
    - DVs also tell you whether a food is high or low in a specific nutrient:
        - 5 percent or less of a nutrient is low.
        - 20 percent or more of a nutrient is high.
    - <b>Note</b>: The FDA hasn't set a DV for trans fat because experts recommend that Americans <b>avoid foods with trans fat</b>.
    - It is highly recommended that you get in the habit of checking DVs to choose foods 
        - <b>high in vitamins, minerals and fiber</b>
        - <b>low in saturated fat, added sugar and sodium</b>. ([Source](https://www.mayoclinic.org/healthy-lifestyle/nutrition-and-healthy-eating/expert-answers/food-and-nutrition/faq-20058436))
        
        
- <b>Calories vs Calories from Fat</b>
    - <b>Calories</b> stands for the total calories in one serving from all sources-fat, carbohydrates, and protein
    - <b>Calories from fat</b> stands for just the calories you’re receiving from fat.
    - Fat intake is important to monitor. It affects your overall health and heart disease risk. Total fat intake should be 30% of your total daily calories.   So, if you consume 2000 calories each day, no more than 600 calories should come from fat.   That is where the “calories from fat” on food labels can come in handy. ([Source](https://www.healthcentral.com/article/whats-difference-between-calories-calories-fat))
    
    
- <b>Sugar</b> (metrics: grams)
    - According to the American Heart Association (AHA), the maximum amount of added sugars you should eat in a day are: 
        - Men: 150 calories per day (37.5 grams or 9 teaspoons)
        - Women: 100 calories per day (25 grams or 6 teaspoons). ([Source](https://www.healthline.com/nutrition/how-much-sugar-per-day))
        
        
- <b>Protein</b> (metrics: grams)
    - The DRI (Dietary Reference Intake) is 0.8 grams of protein per kilogram of body weight, or 0.36 grams per pound.
    - This amounts to:
        - 56 grams per day for the average sedentary man.
        - 46 grams per day for the average sedentary woman.

### Getting Started
1. Download: [Mcdonalds dataset](https://github.com/wwcodemanila/WWCodeManila-ML.AI/blob/master/datasets/mcdonalds.csv)
2. Import the necessary libraries (`pandas`, `numpy`, `matplotlib.pyplot`)
3. Load the dataset 

### Warmup
- How many menu items are there in the dataset?
- What are the columns in the dataset?
- What are the datatypes of each column?
- What are the first 3 menu items in the dataset?
- What are the last 3 menu items in the dataset?
- Print a summary of the data (count, mean, min, max, etc.)

### Cleaning the Data
1. Right now, the dataset is indexed by numbers 0 - 800. You can confirm this by printing `df.index`. Suppose we want to index the pokemon by 'Item' instead of by number. Set the index to become the 'Item' column. [Hint](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.set_index.html)
2. Check the dataset for any null values. Fill them in if any exist. [Hint](https://stackoverflow.com/questions/29530232/python-pandas-check-if-any-value-is-nan-in-dataframe)
3. - Option 1: Remove all spaces in the column names (e.g. 'Total Fat (% Daily Value)' becomes 'TotalFat(%DailyValue)') [Hint](https://stackoverflow.com/questions/30763351/removing-space-in-dataframe-python)
   - Option 2: Rename all column names with spaces to something shorter (e.g. Rename 'Total Fat (% Daily Value)' to 'TotalFat%DV'). [Hint](https://stackoverflow.com/questions/11346283/renaming-columns-in-pandas)

### Basic Selection
1. Retrieve the row data of the first menu item in the dataset by <b>integer-location</b> based indexing. [Hint](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iloc.html)
2. Retrieve the row data of the first menu item in the dataset by <b>label-location</b> based indexing. [Hint](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html) 
3. Retrieve the row data of the <b>20th</b> menu item in the dataset by <b>integer-location</b> based indexing. [Hint](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iloc.html)
4. Retrieve the row data of <b>Sausage, Egg & Cheese McGriddles with Egg Whites</b> in the dataset by <b>label-location</b> based indexing. [Hint](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html) 
5. Select only rows <b>21 - 23</b>, and columns <b>0 - 5</b>. 
6. Retrieve the row data of your favorite menu item in McDonalds.  

### Selection and Counting
Note: For the following items, refer to [this tutorial](http://nbviewer.jupyter.org/github/jvns/pandas-cookbook/blob/v0.1/cookbook/Chapter%202%20-%20Selecting%20data%20&%20finding%20the%20most%20common%20complaint%20type.ipynb).

1. Select just the 'Calories' column of the first 5 items. [Hint](http://nbviewer.jupyter.org/github/jvns/pandas-cookbook/blob/v0.1/cookbook/Chapter%202%20-%20Selecting%20data%20&%20finding%20the%20most%20common%20complaint%20type.ipynb#2.2-Selecting-columns-and-rows)
2. Select just the 'Calories' and 'Total Fat' columns of the first 5 items. [Hint](http://nbviewer.jupyter.org/github/jvns/pandas-cookbook/blob/v0.1/cookbook/Chapter%202%20-%20Selecting%20data%20&%20finding%20the%20most%20common%20complaint%20type.ipynb#2.3-Selecting-multiple-columns)
3. What are the unique categories under column 'Category'? [Hint](https://chrisalbon.com/python/pandas_list_unique_values_in_column.html) 
4. How many items belong to each Category? [Hint](http://nbviewer.jupyter.org/github/jvns/pandas-cookbook/blob/v0.1/cookbook/Chapter%202%20-%20Selecting%20data%20&%20finding%20the%20most%20common%20complaint%20type.ipynb#2.4-What's-the-most-common-complaint-type?)
5. What's the most common Category?

<b>Challenge: </b> Plot the bar graphs for item 4. 
- Label the axes properly [Hint](https://stackoverflow.com/questions/42223587/plt-scatter-how-to-add-title-and-xlabel-and-ylabel)
- Make sure to call `plt.show()` if you're using jupyter notebook.
- Plus points if you can order the elements in the bar plot in descending order (i.e. highest first) [Hint1](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.sort_values.html) [Hint2](https://stackoverflow.com/questions/22149584/what-does-axis-in-pandas-mean)

###  Congrats on making it this far! [Click here for Part 2.](https://github.com/wwcodemanila/WWCodeManila-ML.AI/blob/master/exercises/pokemon_mcdonalds_part2.ipynb)
- P.S. No more hints for Part 2, good luck! 