<h1> Statistics on the YummyCorp's nutritional values </h1>

Hello guys, suppose I'm an R&D employee in the Yummy Corporation. Our company is currently developing a new kind of tasty, wholesome cereal called CrunchieMunchies.

I was assigned the task of desmonstrating to all of you, who are the potential customers of our companies, how healthy our new cereal is in comparison to other brands.

Here we have collected nutritional data of several competitors.

**First, to accomplish this task, we need to import the Numpy library.**

In [1]:
import numpy as np

The **cereal.csv** file contains the calories amounts of different cereal brands. To use this file, load the file into a variable named **calorie_stats**.

Notice that the numbers are separated with each others by the character **,**. So we'll write the code **delimiter = ','**.

In [2]:
calorie_stats = np.genfromtxt('cereal.csv', delimiter = ',')

Take a look on how these data looks like.

In [4]:
print(calorie_stats)

[ 70. 120.  70.  50. 110. 110. 110. 130.  90.  90. 120. 110. 120. 110.
 110. 110. 100. 110. 110. 110. 100. 110. 100. 100. 110. 110. 100. 120.
 120. 110. 100. 110. 100. 110. 120. 120. 110. 110. 110. 140. 110. 100.
 110. 100. 150. 150. 160. 100. 120. 140.  90. 130. 120. 100.  50.  50.
 100. 100. 120. 100.  90. 110. 110.  80.  90.  90. 110. 110.  90. 110.
 140. 100. 110. 110. 100. 100. 110.]


It seems that our data is in no particular order.

Use the **np.sort** command to make tt easier to look at in an ascending order.

In [5]:
calorie_stats_sorted = np.sort(calorie_stats)
print(calorie_stats_sorted)

[ 50.  50.  50.  70.  70.  80.  90.  90.  90.  90.  90.  90.  90. 100.
 100. 100. 100. 100. 100. 100. 100. 100. 100. 100. 100. 100. 100. 100.
 100. 100. 110. 110. 110. 110. 110. 110. 110. 110. 110. 110. 110. 110.
 110. 110. 110. 110. 110. 110. 110. 110. 110. 110. 110. 110. 110. 110.
 110. 110. 110. 120. 120. 120. 120. 120. 120. 120. 120. 120. 120. 130.
 130. 140. 140. 140. 150. 150. 160.]


So now we got some very first information about what we have collected from other brands.

The *minimum calorie amount* is **50.0**, while *the most highest amount* is about **160.0**.

About our _Crunchies Munchies_ , there are __60 calories__ per serving. 

Hmmm do you wonder if this cereal had more or less calories than the average cereal. The **np.mean** command will help us figure out. To make the number shorter, use **round** command to round it up to the maximum of 2 decimal numbers.

In [11]:
average_calories = np.mean(calorie_stats)
print(round(average_calories, 2))

106.88


In [12]:
less_calories = average_calories - 60.
print(round(less_calories,2))

46.88


Wow, such a significant number!

Our new cereal is much healthier than the others because it have less **46.88** calories. Meanwhile, other brands have __106.88__ calories on average.

But hold on, the __median__ number is a better representative of the dataset. The _median_ splits our data into 2 equal halves, 50% of the data is higher than the _median_ .

In [14]:
median_calories = np.median(calorie_stats)
print(median_calories)

110.0


Do you see what I want to show you?

The majority of the dataset, at least a half, have a far higher calories amount than our __Crunchie Munchies__ . It seems to be the best choice of food for anyone on diet.

But it will be more impressive if I show you a significant portion of the competition has a higher calorie than CrunchieMunchies.

So we need to find the lowest _nth percentile_ of the dataset, which demonstrates the percentage of data greater than 60 calories in comparison to CrunchieMunchies.

In [15]:
print(np.percentile(calorie_stats, 20)) #the point at which 20% of data is lower

100.0


It seems to be much higher than our number of 60 calories of CrunchieMuchies. Let's cut it down in half.

In [16]:
print(np.percentile(calorie_stats, 10)) #the point at which 10% of data is lower

90.0


Still high, cut more.

In [17]:
print(np.percentile(calorie_stats, 5)) #the point at which 5% of data is lower

70.0


Yay, close to 60. But we need to cut it down to find the lowest point in this dataset which the majority is greater than our cereal.

In [18]:
print(np.percentile(calorie_stats, 3)) #the point at which 3% of data is lower

55.599999999999994


Thank God, we found the point which our calorie amount is higher than.

In [19]:
print(np.percentile(calorie_stats, 4)) #the point at which 4% of data is lower

70.0


In [20]:
nth_percentile = 4

It means that just 4% of other cereals has less calories than ours.

The __more_calories__ variable below holds the percentage of cereals that have more than 60 calories per serving. The __np.mean()__ funtion helps us find out.

In [22]:
more_calories = np.mean(calorie_stats > 60)
print(more_calories)

0.961038961038961


So high this percentage is!

Could we jump to a conclusion that our new developed product is much more healthier others?

A final question that I have is the variation of this dataset. We will find it through the __standard deviation__ , which can be found by _np.std_ function.

In [24]:
calorie_std = np.std(calorie_stats)
print(round(calorie_std, 2))

19.36


The data must spread out around _the average_ of **106.88** by most __19.36__ calories.

## Finally, we are honored to introduce our brand new cereal named __Cruchie Munchies__ .