# Pandas challenge

While looking into the datasets provided for the [FishAI: Sustainable Commercial Fishing](https://www.nora.ai/Competition/sustainable-fishing.html)-challenge this morning, I stumbled into an interesting pandas challenge. I love solving small problems like this myself, so I thought I might share it with others. The dataset here is a sample of the dataset provided in the challenge. 

Then I remembered reading about [Jupyterlite](https://github.com/jupyterlite/demo), and decided to try to share a link to a live notebook, so people can solve the puzzle right in their own browser without any installation, downloading of files or any other hassle - just dive straight into the code. 

So, let's see if it works!

![test](https://www.nora.ai/Competition/screenshot-2022-05-09-at-14.13.55.png)

In [10]:
# Imports
import pandas as pd
import numpy as np

In [11]:
# Read a sample of the dataset

In [12]:
df1 = pd.read_csv("../data/sample_df.csv")

In [13]:
df1.sample(5)

Unnamed: 0,Fartøy ID,Siste fangstdato,Lokasjon (kode),Lon (lokasjon),Lat (lokasjon),Art,Art - gruppe,Rundvekt,date
847,1997002000.0,02.05.2011,2.0,19.5,70.25,Brosme,Annen torskefisk,63.0,2011-02-05
414,2008044000.0,17.01.2019,25.0,26.79157,70.49274,"Kongekrabbe, han-","Kongekrabbe, han",45.0,2019-01-17
874,2014064000.0,14.01.2016,24.0,15.47374,69.26583,Torsk,Torsk,363.0,2016-01-14
690,1998006000.0,28.02.2009,46.0,14.56579,68.15924,Sei,Sei,25.7,2009-02-28
929,,31.10.2005,4.0,4.5,61.75,Breiflabb,"Annen flatfisk, bunnfisk og dypvannsfisk",2.4,2005-10-31


The dataframe contains some fishing-related columns, but for the purpose of this puzzle, the ```date```-column is the only one needed. The date is between 2000-01-01 and 2021-12-31. 

The puzzle should be easy to grasp: We want to add a new feature-column to this DataFrame, which indicates ```days_since_full_moon```-as the moonphase is something that might affect migration of fish. 

To add this feature, we have a second dataset which include all the dates with full moon from 1950-2022. 

In [14]:
df2 = pd.read_csv("../data/full_moon.csv")

In [15]:
df2.sample(5)

Unnamed: 0,Day,Date,Time
1851,Saturday,11 September 2049,06:04:24 pm
1371,Sunday,21 November 2010,06:27:24 pm
636,Tuesday,19 June 1951,01:35:54 pm
1273,Thursday,19 December 2002,08:10:12 pm
181,Friday,4 September 1914,03:00:48 pm


In [16]:
def add_days_since_fullmoon(df1, df2):
    # Write your solution here
    
    assert df["days_since_last_fullmoon"].sum()==14041
    return df

In [17]:
%timeit df3 = add_days_since_fullmoon(df1, df2)

131 ms ± 1.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


Can you make it without any looping, or row-wise operations? 👩‍💻🤓