# NY Times Food Best Recipes

This notebook describes the steps I took to visualize the data on the New York Times cooking website. My intention was to analyze which chefs are the most prolific and which recipes are the most popular on the website.

Note that the CSV file is excluded from the repository because it contains recipes that are protected behind a paywall -- if you want to check these recipes out, please become a subscriber of the NY Times and support the wonderful chefs.

In [45]:
import pandas as pd
import re
import plotly.express as px


# Reads the CSV file created by NYT Recipe Importer.py
recipe_information= pd.read_csv('recipe_information.csv').iloc[:, 1::]

The following represents a snapshot of the DataFrame, excluding tags, ingredients, and instructions.

In [46]:
display (recipe_information.iloc[:, 0:4].head())
recipe_information.info()

Unnamed: 0,Recipe Name,Recipe Author,Recipe Rating,Recipe Review Count
0,Mushroom-Farro Soup With Parmesan Broth Recipe...,Julia Sherman,4,108
1,Beans and Garlic Toast in Broth Recipe - NYT C...,Tejal Rao,4,488
2,Easiest Lentil Soup Recipe - NYT Cooking,Melissa Clark,4,895
3,Parmesan Broth Recipe - NYT Cooking,Julia Sherman,4,71
4,Potato Gratin With Swiss Chard and Sumac Onion...,Yotam Ottolenghi,4,34


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20194 entries, 0 to 20193
Data columns (total 7 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   Recipe Name          20194 non-null  object
 1   Recipe Author        20194 non-null  object
 2   Recipe Rating        20194 non-null  int64 
 3   Recipe Review Count  20194 non-null  int64 
 4   Recipe Tags          20194 non-null  object
 5   Recipe Ingredients   20194 non-null  object
 6   Recipe instructions  20194 non-null  object
dtypes: int64(2), object(5)
memory usage: 1.1+ MB


Currently, the data in the rating and review count columns are objects, rather than integers. In addition, we want to remove the "- New York Times" from the recipe titles to clean up display.

I then check to make sure that the changes have been made and are successful.

In [47]:
# Clean up recipe name
recipe_information['Recipe Name'] = recipe_information['Recipe Name'].str.extract('(.*)\sRecipe')

# Convert rating and review count columns to integers
recipe_information['Recipe Rating'] = pd.to_numeric(recipe_information['Recipe Rating'])
recipe_information['Recipe Review Count'] = pd.to_numeric(recipe_information['Recipe Review Count'])

display (recipe_information.iloc[:, 0:4].head())
recipe_information.info()

Unnamed: 0,Recipe Name,Recipe Author,Recipe Rating,Recipe Review Count
0,Mushroom-Farro Soup With Parmesan Broth,Julia Sherman,4,108
1,Beans and Garlic Toast in Broth,Tejal Rao,4,488
2,Easiest Lentil Soup,Melissa Clark,4,895
3,Parmesan Broth,Julia Sherman,4,71
4,Potato Gratin With Swiss Chard and Sumac Onions,Yotam Ottolenghi,4,34


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20194 entries, 0 to 20193
Data columns (total 7 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   Recipe Name          20185 non-null  object
 1   Recipe Author        20194 non-null  object
 2   Recipe Rating        20194 non-null  int64 
 3   Recipe Review Count  20194 non-null  int64 
 4   Recipe Tags          20194 non-null  object
 5   Recipe Ingredients   20194 non-null  object
 6   Recipe instructions  20194 non-null  object
dtypes: int64(2), object(5)
memory usage: 1.1+ MB


In order to create visualizations of the data, we are going to create a dataframe that includes the count of recipes by each author.

In [64]:
recipe_information_count = recipe_information.groupby('Recipe Author').size().reset_index(name='Recipes')
recipe_by_author = recipe_information.groupby('Recipe Author').agg({'Recipe Name':'count', 'Recipe Rating': 'mean'}).reset_index().rename(columns={'Recipe Name':'Count', 'Recipe Rating':'Avg. Rating'})
print (recipe_information_count.head())
print (recipe_by_author)

          Recipe Author  Recipes
0      Aaron Hutcherson        3
1          Abdul Tabini        1
2         Abigail Gullo        1
3  Abigail Gullo, Sobou        1
4        Adam Nagourney        4
            Recipe Author  Count  Avg. Rating
0        Aaron Hutcherson      3     4.000000
1            Abdul Tabini      1     5.000000
2           Abigail Gullo      1     5.000000
3    Abigail Gullo, Sobou      1     5.000000
4          Adam Nagourney      4     4.500000
..                    ...    ...          ...
529      Yanick Rice Lamb      4     2.000000
530      Yewande Komolafe     25     4.040000
531           Yossy Arefi     35     4.457143
532      Yotam Ottolenghi     54     4.018519
533       Zarela Martinez      2     0.000000

[534 rows x 3 columns]


We are going to use Plotly to create a pie chart representing the proportion of recipes made by each author. We are grouping together authors who have published less than 200 recipes.

In [66]:
recipe_by_author_pie = recipe_by_author
recipe_by_author_pie.loc[recipe_by_author_pie['Count'] < 200, 'Recipe Author'] = 'Other Authors' # Represent only large authors
fig_pie = px.pie(recipe_by_author_pie, values='Count', names='Recipe Author', title='Percentage of NYT Recipes Written by Each Author')
fig_pie.show()

Based on the above graph, we can see the most prolific authors on the website. The top 8 authors comprise more than 50% of NYT's entire recipe database. Outside of these power users, let's see what the breakdown of recipe creation is by taking a look at a histogram of the 29.9% of authors.

In [36]:
fig_histogram = px.histogram(recipe_information_count.loc[recipe_information_count['Recipes'] <= 200, :], x = 'Recipes', title = "Histogram of Number of Recipes on Website")
fig_histogram.show()

Based on the above histogram, we can see that, outside of a few power contributor, the majority of contributors only have a few recipes on the website.

Let's create a scatterplot of number of recipes vs. average rating. However, I'm going to exclude recipes with zero reviews.

In [91]:
recipe_information_scatter = recipe_information.loc[recipe_information['Recipe Review Count'] > 0, :]
recipe_by_author_scatter = recipe_information_scatter.groupby('Recipe Author').agg({'Recipe Name':'count', 'Recipe Rating': 'mean'}).reset_index().rename(columns={'Recipe Name':'Count', 'Recipe Rating':'Avg. Rating'})
fig_scatter = px.scatter(recipe_by_author_scatter, x='Count', y='Avg. Rating', hover_name = 'Recipe Author')
fig_scatter.show()

Now I want to see what the most popular recipes on the site are. I'm going to sort first by rating (i.e., 5-star recipes first) and then by the number of reviews.

In [29]:
sorted_recipe_information = recipe_information.sort_values(['Recipe Rating', 'Recipe Review Count'], ascending = False)

display (sorted_recipe_information.iloc[:, 0:4].head(50))

Unnamed: 0,Recipe Name,Recipe Author,Recipe Rating,Recipe Review Count
1056,Spiced Chickpea Stew With Coconut and Turmeric,Alison Roman,5,10503
7662,Red Lentil Soup With Lemon,Melissa Clark,5,10454
8037,Creamy Macaroni and Cheese,Julia Moskin,5,9152
3753,No-Knead Bread,Mark Bittman,5,8162
2590,Oven-Roasted Chicken Shawarma,Sam Sifton,5,7368
2055,Roasted Chicken Provençal,Sam Sifton,5,7364
4239,Marcella Hazan’s Bolognese Sauce,The New York Times,5,6636
8548,Old-Fashioned Beef Stew,Molly O'Neill,5,6368
2591,Shakshuka With Feta,Melissa Clark,5,6055
3085,Roasted Salmon Glazed With Brown Sugar and Mus...,Sam Sifton,5,5590


Great job, Alison! I can confirm that your Pork Noodle Soup is quite delicious!

Now let's take a look at which authors create the biggest percentage of 5-star recipes.

In [33]:
recipe_information_five_star = recipe_information.loc[recipe_information['Recipe Rating'] == 5].iloc[:, 1::]
recipe_information_five_star_count = recipe_information_five_star.groupby('Recipe Author').size().reset_index(name='Recipes')
recipe_information_five_star_count.loc[recipe_information_five_star_count['Recipes'] < 80, 'Recipe Author'] = 'Other Authors' # Represent only large authors

fig_pie_five_star = px.pie(recipe_information_five_star_count, values='Recipes', names='Recipe Author', title='Percentage of NYT 5-Star Recipes Written by Each Author')
fig_pie_five_star.show()