Switzerland is well know for its rich heritage: incredible landscapes, watches, cheese, chocolate and diversified influences from its five neighboring countries. This project investigates how this heritage is reflected in terms of food habits. We picked 2 Swiss and 3 French cities with high restaurant density to get insights about dietetics. We mapped restaurant meals to recipes and ingredients of recipes to products to analyze the corresponding nutriments.
Is there any area-based nutrition bias ?
Our infrastructure and datasets also allow us to explore other topics such as:
- food trends according to clichés (e.g. Rösti, Malakoff)
- food/nutriments variety per locations (e.g. meals with more salt/lipids/etc..)
- 11k restaurants (e.g. LaFourchette)
- 35k meals (extracted from the restaurants' menus)
- 170k recipes (various websites, e.g. CuisineAZ)
- 1.3M ingredients (derived from the recipes)
- 5k products (e.g. FDDB, OpenFood)
- 40k nutriments (extracted from the products)
We assumed that:
- the restaurants listed in LaFourchette were representative enough of the local food habits.
- we could associate recipes to meals and products to recipes well enough to derive the nutritious facts for a meal without suffering too much of variance and central limit theorem.
Each folder in this git is a step in the developpement of this project. Each folder contains a README describing what we did and why. We recommend reading them to get a better idea of the work we have done.
We implemented the following data pipeline:
We used this process to find matches:
Types of matching
|Rare events, misspelled, grouped
Pavé de boeuf aux morilles
Pavé de boeuf aux morilles simplissimes
Tiramisu caramel speculos beurre salé
Tiramisu au caramel au beurre salé et spéculoos
|Wide, personal meaning
café gourmand à ma façon
Salade d'orange au miel et à la cannelle
Salade d'orange au miel et à la cannelle
Rognons de lapins à la moutarde de Meaux
Fricassée de champignons à la moutarde de Meaux
Terrine de foie gras et confiture de pruneaux
Terrine de foie gras aux pruneaux et raisins secs
Tartare de boeuf minute, salade et potatoes
Twice baked potatoes au bacon
Cassolette de Saint-Jacques et crevettes
Ravioles, noix de Saint-Jacques et crevettes en cassolettes raffinées
A few examples of food facts we can extract from the datasets with our infrastructure.
|Per country||Per city|
Energy(kCal) per country
Energy(kCal) per city
Protein per country
Protein per city
Carbohydrates per country
Carbohydrates per city
Salt per country
Salt per city
Here are a few visualization examples for cliché-meal searches.
|Choucroute (red), Malakoff (blue)||Fondue Savoyarde (red), Fondue au fromage (blue)|
Expected food trends were present as one could expect from well-known clichés. Looking closer at the estimated nutritious facts, the high variance and noisiness of the datasets coupled to the matching process increases greatly the difficutly of our analysis. No relevant area-based nutrition bias among the insights was found. One could nonetheless use the matching process and the pipeline as tools for further in depth investigation.
Expected and encountered challenges
Before starting the project, we expected the following points to be the most challenging:
- datasets collection : menus data can be difficult to gather
- sparsity and spatial homogeneity : depending on datasets quality some regions might need to be ignored due to lack of data
- content languages : textual informations (including menus) can have different name depending on area, standardization and translation might be needed
- data completeness : non food data might need be extracted from different sources to achieve a valuable meaning
After finishing the project, the challenges actually were the following ones:
- data mining and normalization (high variance, different sources, captchas)
- data organization (complex queries, centralized storage with ElasticSearch)
- french NLP (weird characters, hard modeling)
- matching (many candidates, heterogeneous units)
- computationally heavy (vectorization, visualization)
Regarding the content languages, no data was available for the German and Italian part of Switzerland on LaFourchette. Hence we focused our work on France and the French part of Switzerland.
- formal statistical evaluation: as limited in time, the project does not contain a lot of insights. This could be definetly enhanced to increase modelling and evaluation.
- deep recurrent neural network for matching: one should evalute the effiency of neural net to match meal to recipes.
- computational efficiency: currently the matching lasts 20 seconds per restaurant (centralized server), this could be improved by batching, parallelisation and local server.
- expand visualization: better interactive and more diverse kind of visuzalization.
- more and enhanced data for Switzerland: data precision is still an issue. This could have been improved by using personal restaurant websites for example.
Project is available under Apache 2.0 license and data belong to their owners under appropriate licensing.