# NY Times Food Best Recipes

This notebook describes the steps I took to visualize the data on the New York Times cooking website. My intention was to analyze which chefs are the most prolific and which recipes are the most popular on the website.

In [27]:
import pandas as pd
import re
import plotly.express as px

# Reads the CSV file created by NYT Recipe Importer.py
recipe_information= pd.read_csv('recipe_information.csv').iloc[:, 1::]

The following represents a snapshot of the DataFrame

In [44]:
display (recipe_information.head())
recipe_information.info()

Unnamed: 0,Recipe Name,Recipe Author,Recipe Rating,Recipe Review Count
0,Mushroom-Farro Soup With Parmesan Broth,Julia Sherman,4,101
1,Beans and Garlic Toast in Broth,Tejal Rao,4,482
2,Easiest Lentil Soup,Melissa Clark,4,888
3,Parmesan Broth,Julia Sherman,4,63
4,Potato Gratin With Swiss Chard and Sumac Onions,Yotam Ottolenghi,4,31


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 432 entries, 0 to 431
Data columns (total 4 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   Recipe Name          432 non-null    object
 1   Recipe Author        432 non-null    object
 2   Recipe Rating        432 non-null    int64 
 3   Recipe Review Count  432 non-null    int64 
dtypes: int64(2), object(2)
memory usage: 13.6+ KB


None

Currently, the data in the rating and review count columns are objects, rather than integers. In addition, we want to remove the "- New York Times" from the recipe titles to clean up display.

I then check to make sure that the changes have been made and are successful.

In [45]:
# Clean up recipe name
recipe_information['Recipe Name'] = recipe_information['Recipe Name'].str.extract('(.*)\sRecipe')

# Convert rating and review count columns to integers
recipe_information['Recipe Rating'] = pd.to_numeric(recipe_information['Recipe Rating'])
recipe_information['Recipe Review Count'] = pd.to_numeric(recipe_information['Recipe Review Count'])

display (recipe_information.head())
recipe_information.info()

Unnamed: 0,Recipe Name,Recipe Author,Recipe Rating,Recipe Review Count
0,,Julia Sherman,4,101
1,,Tejal Rao,4,482
2,,Melissa Clark,4,888
3,,Julia Sherman,4,63
4,,Yotam Ottolenghi,4,31


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 432 entries, 0 to 431
Data columns (total 4 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   Recipe Name          0 non-null      object
 1   Recipe Author        432 non-null    object
 2   Recipe Rating        432 non-null    int64 
 3   Recipe Review Count  432 non-null    int64 
dtypes: int64(2), object(2)
memory usage: 13.6+ KB


In order to create visualizations of the data, we are going to create a dataframe that includes the count of recipes by each author.

In [37]:
recipe_information_count = recipe_information.groupby('Recipe Author').size().reset_index(name='Recipes')
print (recipe_information_count.head())

     Recipe Author  Recipes
0     Alexa Weibel       32
1       Ali Slagle       38
2     Alison Roman       26
3  Angela Dimayuga       10
4     Becky Hughes        1


We are going to use Plotly to create a pie chart representing the proportion of recipes made by each author. We are grouping together authors who have published less than 8 recipes.

In [38]:
recipe_information_count.loc[recipe_information_count['Recipes'] < 8, 'Recipe Author'] = 'Other Authors' # Represent only large authors
fig_pie = px.pie(recipe_information_count, values='Recipes', names='Recipe Author', title='Percentage of NYT Recipes Written by Each Author')
fig_pie.show()

Based on the above graph, we can see that Melissa Clark, Ali Sagle, Alexa Weibel, and Alison Roman are the most prolific authors on the website. There seems to be a large proportion of authors who contributed less than 8 recipes. Let's see what the breakdown of recipe creation is.

In [39]:
fig_histogram = px.histogram(recipe_information_count, x = 'Recipes', title = "Histogram of Number of Recipes on Website")
fig_histogram.show()

Based on the above histogram, we can see that, outside of a few power contributor, the majority of contributors only have a few recipes on the website.

Now I want to see what the most popular recipes on the site are. I'm going to sort first by rating (i.e., 5-star recipes first) and then by the number of reviews.

In [40]:
sorted_recipe_information = recipe_information.sort_values(['Recipe Rating', 'Recipe Review Count'], ascending = False)

display (sorted_recipe_information.head(50))

Unnamed: 0,Recipe Name,Recipe Author,Recipe Rating,Recipe Review Count
102,Caramelized Shallot Pasta,Alison Roman,5,3766
162,Spicy White Bean Stew With Broccoli Rabe,Alison Roman,5,3168
284,Thai-Inspired Chicken Meatball Soup,Ali Slagle,5,2598
151,Red Curry Lentils With Sweet Potatoes and Spinach,Lidey Heuck,5,2499
210,Via Carota’s Insalata Verde,Samin Nosrat,5,2375
414,Coconut Curry Chickpeas With Pumpkin and Lime,Melissa Clark,5,1483
112,Lemony Shrimp and Bean Stew,Sue Li,5,1311
128,Cheesy Baked Pasta With Sausage and Ricotta,Melissa Clark,5,1290
375,Coconut Milk Chicken Adobo,Angela Dimayuga,5,1175
159,Indian Butter Chickpeas,Melissa Clark,5,952


Great job, Alison! I can confirm that your Pork Noodle Soup is quite delicious!

Now let's take a look at which authors create the biggest percentage of 5-star recipes.

In [42]:
recipe_information_five_star = recipe_information.loc[recipe_information['Recipe Rating'] == 5].iloc[:, 1::]
recipe_information_five_star_count = recipe_information_five_star.groupby('Recipe Author').size().reset_index(name='Recipes')
recipe_information_five_star_count.loc[recipe_information_five_star_count['Recipes'] < 3, 'Recipe Author'] = 'Other Authors' # Represent only large authors

fig_pie_five_star = px.pie(recipe_information_five_star_count, values='Recipes', names='Recipe Author', title='Percentage of NYT 5-Star Recipes Written by Each Author')
fig_pie_five_star.show()