# API to get nutrition facts
**Idea**
<br>
* Predict food categories from images
* Clean the categories --> data cleaning like done for NLP
* Put cleaned categories in string (concatenate all categories together)
* use API to detect food item in text then create a list of these items
* for each item, call API to ouptut nutrition fact of each cat.
* create a dataframe of all ingredients

**Important note**
<br>
We have to find a way to get kind of accurate final nutrition fact (for the whole meal, all items put together)

## Import packages
Necessary packages to run this notebook

In [1]:
#uncomment if installation is needed. Don't forget to comment after download
#pip install nltk

In [2]:
import string
from nltk.corpus import stopwords 
from nltk import word_tokenize
from nltk.stem import WordNetLemmatizer as wnl

In [3]:
import requests
import pandas as pd

## Clean categories
Take the category(ies) predicted by our model and clean it.
<br>
Needs to be properly cleaned to keywords because it works better while requesting the API.
<br>
Use the same type of cleaning from what it is used for NLP. 
<br>
Extra steps is needed to keep only useful keyword for the API.

In [4]:
def preprocessing(categories:list)->str:
    '''
    give a list of string --> each string is the category predicted
    clean each sentence in the column reviews for our dataframe "data"
    remove whitespaces, lowercase characers, remove numbers, remove punctuation ,tokenize, lemmatize
    '''
    cat_cleaned = []
    
    for sentence in categories:
    
        sentence = sentence.strip() #remove whitespaces
        sentence = sentence.lower() # lower chara
        sentence = ''.join(char for char in sentence if not char.isdigit()) # remove numbers

        for punctuation in string.punctuation:
            sentence = sentence.replace(punctuation, '') # remove punctuation

        tokenized_sentence = word_tokenize(sentence) # tokenize sentence
        stop_words = set(stopwords.words('english')) # define stopwords

        tokenized_sentence_cleaned = [w for w in tokenized_sentence if not w in stop_words] # remove stopwords

        verb_lemmatized = [wnl().lemmatize(word, pos = "v") for word in tokenized_sentence_cleaned] # lemmatize for verbs
        lemmatized = [wnl().lemmatize(word, pos = "n") for word in verb_lemmatized] # lemmatize for nouns on top of lemmatize for verbs

        sentence = ' '.join(lemmatized)
        
        cat_cleaned.append(sentence)
    
    cat_cleaned = ' '.join(cat_cleaned)
    
    return cat_cleaned

### Test cleaning
This part is not necessary for our package. It is just to test the cleaning part

In [5]:
categories = ['bread with Butter'] # to be changed for the categories predicted from model

In [6]:
clean = preprocessing(categories)

In [7]:
clean

'bread butter'

## API - Spoonacular
Use the API from Spoonacular using RapidAPI since its set up is easier.
<br>

<br>
There is another API --> Nutritionix API that we can use but seems a bit more complex to use. See later.

### Spoonacular api

In [8]:
API_KEY = 'xxxx'
BASE_URL = "https://api.spoonacular.com/"
params = {'apiKey': API_KEY}

#### Detect food in text
The text is the concatenation of all the categories predicted in one image.
<br>
The concatenation is a string, that will be given to the API to extract food items.
<br>
This food items will be given to the API (other endpoint) to get nutrition facts.

In [9]:
text = preprocessing(categories)

In [10]:
endpoint = "food/detect"
url_query = {"text":text}

In [11]:
response_item = requests.post(BASE_URL+endpoint, data=url_query, params=params)

In [12]:
item_json = response_item.json()

In [13]:
item_json

{'annotations': [{'annotation': 'butter',
   'tag': 'ingredient',
   'image': 'https://spoonacular.com/cdn/ingredients_100x100/butter-sliced.jpg'},
  {'annotation': 'bread',
   'tag': 'ingredient',
   'image': 'https://spoonacular.com/cdn/ingredients_100x100/white-bread.jpg'}],
 'processedInMs': 10}

In [14]:
item_lst = []

for i in range(len(item_json['annotations'])):
    item_lst.append(item_json['annotations'][i]['annotation'])

item_lst

['butter', 'bread']

#### Get food information from API call

In [15]:
lst_info = []
endpoint = "recipes/parseIngredients"

for i in range(len(item_lst)):
    ingredientList=item_lst[i]
    servings=1
    includeNutrition=True
    url_query = {"ingredientList": ingredientList, "servings": servings,"includeNutrition": includeNutrition}
    response_info = requests.post(BASE_URL+endpoint, data=url_query, params=params)
    lst_info.append(response_info.json()[0])

#### Get nutrition dataset

In [16]:
list_nut_fact = ['sodium','Saturated Fat','Carbohydrates','Fiber','Calories','Cholesterol']
#nut = json[0]['nutrition']['nutrients']
#df_nut = pd.DataFrame()

def get_nutrition(nut_key:list,nut_info:list)->pd.core.frame.DataFrame:
    
    """
    The input is a list of string containing nutritional fact wanted (nut_key).
    With the list of dictionary containing food information from the API call (nut_info).
    
    The output is a dataframe containing nutrition fact for a food category.
    """
    
    df_nut = pd.DataFrame()
    lst_nut = []
    dicts = {}
    list_nut_fact = [name.lower() for name in nut_key]
    
    for i in range(len(lst_info)):
        
        dicts['name'] = nut_info[i]['name']
        dicts['amount'] = nut_info[i]['nutrition']['weightPerServing']['amount']
        dicts['unit'] = nut_info[i]['nutrition']['weightPerServing']['unit']
        
        nut = nut_info[i]['nutrition']['nutrients']
        for j in range(len(nut)):
            if nut_info[i]['nutrition']['nutrients'][j]['name'].lower() in list_nut_fact:
                dicts[nut_info[i]['nutrition']['nutrients'][j]['name'].lower()+'_amount'] = nut_info[i]['nutrition']['nutrients'][j]['amount']
                dicts[nut_info[i]['nutrition']['nutrients'][j]['name'].lower()+'unit'] = nut_info[i]['nutrition']['nutrients'][j]['unit']
                
        lst_nut.append(dicts)
        dicts={}
        
    
    df_nut = pd.DataFrame(lst_nut)
                
    return df_nut

In [17]:
get_nutrition(list_nut_fact,lst_info)

Unnamed: 0,name,amount,unit,sodium_amount,sodiumunit,saturated fat_amount,saturated fatunit,carbohydrates_amount,carbohydratesunit,fiber_amount,fiberunit,calories_amount,caloriesunit,cholesterol_amount,cholesterolunit
0,butter,5,g,32.15,mg,2.57,g,0.0,g,0.0,g,35.85,kcal,10.75,mg
1,bread,28,g,132.44,mg,0.2,g,13.3,g,1.12,g,76.72,kcal,0.0,mg
