The Nobel Prize has been among the most prestigious international awards since 1901. Each year, awards are bestowed in chemistry, literature, physics, physiology or medicine, economics, and peace. In addition to the honor, prestige, and substantial prize money, the recipient also gets a gold medal with an image of Alfred Nobel (1833 - 1896), who established the prize.

![](Nobel_Prize.png)

The Nobel Foundation has made a dataset available of all prize winners from the outset of the awards from 1901 to 2023. The dataset used in this project is from the Nobel Prize API and is available in the `nobel.csv` file in the `data` folder.

In this project, you'll get a chance to explore and answer several questions related to this prizewinning data. And we encourage you then to explore further questions that you're interested in!

In [60]:
# Loading in required libraries
import pandas as pd
import seaborn as sns
import numpy as np

# Start coding here!

In [61]:
nobel = pd.read_csv('data/nobel.csv')
nobel.head()

Unnamed: 0,year,category,prize,motivation,prize_share,laureate_id,laureate_type,full_name,birth_date,birth_city,birth_country,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country
0,1901,Chemistry,The Nobel Prize in Chemistry 1901,"""in recognition of the extraordinary services ...",1/1,160,Individual,Jacobus Henricus van 't Hoff,1852-08-30,Rotterdam,Netherlands,Male,Berlin University,Berlin,Germany,1911-03-01,Berlin,Germany
1,1901,Literature,The Nobel Prize in Literature 1901,"""in special recognition of his poetic composit...",1/1,569,Individual,Sully Prudhomme,1839-03-16,Paris,France,Male,,,,1907-09-07,Châtenay,France
2,1901,Medicine,The Nobel Prize in Physiology or Medicine 1901,"""for his work on serum therapy, especially its...",1/1,293,Individual,Emil Adolf von Behring,1854-03-15,Hansdorf (Lawice),Prussia (Poland),Male,Marburg University,Marburg,Germany,1917-03-31,Marburg,Germany
3,1901,Peace,The Nobel Peace Prize 1901,,1/2,462,Individual,Jean Henry Dunant,1828-05-08,Geneva,Switzerland,Male,,,,1910-10-30,Heiden,Switzerland
4,1901,Peace,The Nobel Peace Prize 1901,,1/2,463,Individual,Frédéric Passy,1822-05-20,Paris,France,Male,,,,1912-06-12,Paris,France


In [62]:
# most commonly awarded gender and country
print(nobel["sex"].value_counts())
print(nobel["organization_country"].value_counts())
top_gender = "Male"
top_country = "United States of America"

Male      905
Female     65
Name: sex, dtype: int64
United States of America               385
United Kingdom                          93
Germany                                 49
France                                  38
Switzerland                             24
Federal Republic of Germany             23
Sweden                                  18
Japan                                   18
Netherlands                             11
Union of Soviet Socialist Republics      9
Canada                                   9
Denmark                                  9
Italy                                    7
Austria                                  7
Belgium                                  5
Israel                                   5
Norway                                   5
Australia                                5
Russia                                   3
Hungary                                  2
Argentina                                2
Czechoslovakia                           1
Al

In [63]:
#decade with highest ratio of US-born Nobel Prize winners to total winners
nobel["us_born"] = nobel["birth_country"].apply(lambda x: 1 if x == "United States of America" else 0)
nobel["decade"] = (nobel["year"] // 10) * 10

grouped = nobel.groupby("decade")

def us_born_winner_ratio(group):
    return pd.Series({
        'us_born_winners': group['us_born'].sum(),
        'total_winners': group['laureate_id'].count()
    })

aggregated_nobel = grouped.apply(us_born_winner_ratio)
aggregated_nobel["us_ratio"] = (aggregated_nobel["us_born_winners"] / aggregated_nobel["total_winners"])

print(aggregated_nobel)

max_decade_usa = 2000

        us_born_winners  total_winners  us_ratio
decade                                          
1900                  1             57  0.017544
1910                  3             40  0.075000
1920                  4             54  0.074074
1930                 14             56  0.250000
1940                 13             43  0.302326
1950                 21             72  0.291667
1960                 21             79  0.265823
1970                 33            104  0.317308
1980                 31             97  0.319588
1990                 42            104  0.403846
2000                 52            123  0.422764
2010                 38            121  0.314050
2020                 18             50  0.360000


In [64]:
#category and decate with most female laureates
nobel["is_woman"] = nobel["sex"].apply(lambda x: 1 if x == "Female" else 0)

grouped_2 = nobel.groupby(["decade","category"])

def woman_ratio(group):
    return pd.Series({
        'women' : group['is_woman'].sum(),
        'total_winners': group['laureate_id'].count()
    })

aggregated = grouped_2.apply(woman_ratio)
aggregated["proportion_woman"] = (aggregated["women"] / aggregated["total_winners"])
max_female_prop = aggregated[aggregated["proportion_woman"] == aggregated["proportion_woman"].max()]
print(max_female_prop)

max_female_dict = {2020:"Literature"}

                   women  total_winners  proportion_woman
decade category                                          
2020   Literature      2              4               0.5


In [65]:
# first woman to receive a Nobel Prize and the category
first_woman_winner = nobel.loc[(nobel["sex"] == "Female")]
first_woman_winner = first_woman_winner.loc[first_woman_winner["year"] == first_woman_winner["year"].min()]
print(first_woman_winner["full_name"])
print(first_woman_winner["category"])

first_woman_name = "Marie Curie, née Sklodowska"
first_woman_category = "Physics"

19    Marie Curie, née Sklodowska
Name: full_name, dtype: object
19    Physics
Name: category, dtype: object


In [66]:
nobel_names = nobel.groupby('full_name')['year'].count().reset_index(name='count')

repeat_winners = nobel_names[nobel_names["count"] >= 2]

repeat_winners.head()

repeat_list = repeat_winners["full_name"].to_list()

print(repeat_list)

['Comité international de la Croix Rouge (International Committee of the Red Cross)', 'Frederick Sanger', 'John Bardeen', 'Linus Carl Pauling', 'Marie Curie, née Sklodowska', 'Office of the United Nations High Commissioner for Refugees (UNHCR)']
