# 10_core_solns
> Demo notebook for exploring nbdev

In this notebook, we explore the functionality of nbdev.  In a sample project, perhaps we have a complex set of data that needs to be processed or needs to be somehow read and accessed by everyone on the project or in other notebooks.  Let's see how we can do this!

#default_exp data_load

In [None]:
#all_no_test

## First time setup
The first time you start using the repository, you'll need to run the following commands to get everything working.  Since we have access to a terminal, we'll use that to work with any command line functionality.  We'll discuss what these commands mean further in a moment.
```
nbdev_install_git_hooks
nbdev_clean_nbs
nbdev_build_lib
nbdev_build_docs
```

# Core functionality
The purpose of this demo notebook is to read from files and process them.  Let's say for instance that we have a function that for some reason needs to append `.pdf` to a string that has been input.

#export

In [12]:
import requests
import pandas as pd

In [9]:
files_list = [551293, 373587, 597061, 434648, 532970,
              520668, 209035, 830014, 671125, 893941,
              479957, 541893, 836261, 244666, 696866,
              332305, 930880, 297116, 542169, 272307]
files_base = 'https://raw.githubusercontent.com/vanderbilt-data-science/python-for-deep-learning-workshop/master/workshop-files/'
files_type = '.txt'

In [25]:
#get files
file_texts = [requests.get(files_base+str(file)+files_type).text for file in files_list]

In [26]:
#check files
file_texts[:3]

['The rain and wind abruptly stopped, but the sky still had the gray swirls of storms in the distance. Dave knew this feeling all too well. The calm before the storm. He only had a limited amount of time before all Hell broke loose, but he stopped to admire the calmness. Maybe it would be different this time, he thought, with the knowledge deep within that it wouldnt.',
 "She patiently waited for his number to be called. She had no desire to be there, but her mom had insisted that she go. She's resisted at first, but over time she realized it was simply easier to appease her and go. Mom tended to be that way. She would keep insisting until you wore down and did what she wanted. So, here she sat, patiently waiting for her number to be called.",
 'The chair sat in the corner where it had been for over 25 years. The only difference was there was someone actually sitting in it. How long had it been since someone had done that? Ten years or more he imagined. Yet there was no denying the pre

Now, we perform a unit test on the function we've written...

In [16]:
#get it into a dataframe
file_df = pd.DataFrame({'file':files_list, 'text':file_texts})
file_df

Unnamed: 0,file,text
0,551293,"The rain and wind abruptly stopped, but the sk..."
1,373587,She patiently waited for his number to be call...
2,597061,The chair sat in the corner where it had been ...
3,434648,The computer wouldn't start. She banged on the...
4,532970,Do you really listen when you are talking with...
5,520668,Cake or pie? I can tell a lot about you by whi...
6,209035,It was a concerning development that he couldn...
7,830014,She was in a hurry. Not the standard hurry whe...
8,671125,All he could think about was how it would all ...
9,893941,The red glint of paint sparkled under the sun....


# Collaboration
What if we were collaborating with others and wanted to use this data extensively?  What if this functionality was SUBSTANTIALLY more complex and took several functions and cells to implement?

#export

In [17]:
def read_data(files_list=[]):
    if not files_list:
        files_list = [551293, 373587, 597061, 434648, 532970,
              520668, 209035, 830014, 671125, 893941,
              479957, 541893, 836261, 244666, 696866,
              332305, 930880, 297116, 542169, 272307]
    
    files_base = 'https://raw.githubusercontent.com/vanderbilt-data-science/python-for-deep-learning-workshop/master/workshop-files/'
    files_type = '.txt'
    
    #get files
    file_texts = [requests.get(files_base+str(file)+files_type).text for file in files_list]
    
    #create dataframe
    file_df = pd.DataFrame({'file':files_list, 'text':file_texts})
    
    return file_df

### Test to make sure the function works:

In [18]:
read_data()

Unnamed: 0,file,text
0,551293,"The rain and wind abruptly stopped, but the sk..."
1,373587,She patiently waited for his number to be call...
2,597061,The chair sat in the corner where it had been ...
3,434648,The computer wouldn't start. She banged on the...
4,532970,Do you really listen when you are talking with...
5,520668,Cake or pie? I can tell a lot about you by whi...
6,209035,It was a concerning development that he couldn...
7,830014,She was in a hurry. Not the standard hurry whe...
8,671125,All he could think about was how it would all ...
9,893941,The red glint of paint sparkled under the sun....


**Questions to consider about the above code development**:  
We just now copied and pasted the same exact variable names into the function as what we used.  What would happen if we were to have missed a variable that we defined outside of the function, but used inside the function?  Can you think of any pitfalls of keeping the same variable names in the function as in the outside code?  What other approaches could you take?

# Building modules and cleaning notebooks
The following commands are useful for cleaning notebooks and building your module library, and we'll again use them at the command line.  Let's see what they do.  The commands are listed below for convenience:
```
nbdev_clean_nbs
nbdev_build_lib
```

# Try it yourself!
In the section below, create a function which takes in the dataframe we produced, and returns a random sample of half of the data.  You can use the `sample` function from pandas to do this with the `frac` argument (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sample.html).

Make sure that you correctly export this function and check to see that it is in the library you created.

#export

In [21]:
#define function
def random_subsample(df, ss_percent=0.5):
    
    return(df.sample(frac=ss_percent))

In [23]:
#get results
random_subsample(file_df, ss_percent=0.2)

Unnamed: 0,file,text
17,297116,What have you noticed today? I noticed that if...
7,830014,She was in a hurry. Not the standard hurry whe...
5,520668,Cake or pie? I can tell a lot about you by whi...
3,434648,The computer wouldn't start. She banged on the...
