Added data loader and model evaluator #24

thefirebanks · 2020-12-03T08:08:09Z

Main changes:

Added methods in tasks/data_loader/src/ to load data from the dataset and model output json files.
Added a model evaluator class in tasks/evaluate_model/src to evaluate classification models and easily visualize the results. Documented notebook is in tasks/evaluate_model/notebooks

Bonus:

Added a script in the tasks/ folder so that whenever we need to create a new task, we can use it and it will automatically create the folder structure (including the input/output/src folders). More details in the README.md of the tasks/ folder.

… them

Include code from Omdena repo

jordiplanescutxi · 2020-12-07T09:30:55Z

You have done a magnificent work!!
Some things that I need to clarify:

If we had the example file "input/sample_model_output.json" it would be easier to execute the code in the notebook
If I understand it right you assume that, for each document we will have two files with labelled sentences, the sample_dataset and the sample_model_output. They are going to have the same sentences in the same order and then different labels. I'm afraid we may have some mess if we do not check that the sentences are actually the same and that they are at the same position.
If we are going trhough the sample_dataset.json as we do it in the function "labeled_sentences_from_dataset" we are assuming that all sentences will fall into one of the categories 0 to 5, while most of them will fall into a -1 category which is "no_incentive". We will talk about it.

thefirebanks · 2020-12-07T14:58:53Z

You have done a magnificent work!!
Some things that I need to clarify:

If we had the example file "input/sample_model_output.json" it would be easier to execute the code in the notebook

If I understand it right you assume that, for each document we will have two files with labelled sentences, the sample_dataset and the sample_model_output. They are going to have the same sentences in the same order and then different labels. I'm afraid we may have some mess if we do not check that the sentences are actually the same and that they are at the same position.

If we are going trhough the sample_dataset.json as we do it in the function "labeled_sentences_from_dataset" we are assuming that all sentences will fall into one of the categories 0 to 5, while most of them will fall into a -1 category which is "no_incentive". We will talk about it.

Hi Jordi, thank you for the feedback! Here are my responses:

You are absolutely right, I had completely forgotten that the input folders don't get versioned, so I uploaded the input folder to our google drive (left the link in Slack).
Indeed! To solve this, maybe we can create a unique ID for each sentence and that way it is easier to check for equality. This can be added in the script/process that creates the json files in the first place, and I can add a check to confirm that they are in the same order in the data loader. I will add this once we confirm the mechanism to identify the unique sentence.
Good point, I was actually thinking to make 0 be the "no incentive" label and then 1-6 be the distinct types of incentives. I will correct that now!

thefirebanks · 2020-12-08T06:40:44Z

Updated the input files in the google drive folder, will update the data loader tomorrow before midnight EST!

thefirebanks · 2020-12-10T05:01:32Z

Tried loading sentences from ElSalvador.json and they got effectively loaded!

Added data loader and model evaluator

Daniel Firebanks-Quevedo added 7 commits November 24, 2020 02:03

Added general script to evaluate models

bf78f1e

Added script to automatically create folders with input/output/src in…

7d2a9e3

… them

Added data loader utils and data templates

6bbbf05

Added notebook to play with data and models

a56fd9d

Merge branch 'master' into issue_17_model_evaluator

b9d7f6e

Include code from Omdena repo

Started adding evaluation metrics for plots

5c92728

Added model evaluator methods and notebook

2850321

thefirebanks requested review from jordiplanescutxi, dfhssilva, danncalle and Bcjg23 December 3, 2020 08:08

thefirebanks linked an issue Dec 3, 2020 that may be closed by this pull request

Create a general evaluator for the models #17

Closed

jordiplanescutxi approved these changes Dec 7, 2020

View reviewed changes

Added extra label representing non-incentives

015a611

Sync order of model predictions and dataset

e6b5894

thefirebanks removed request for danncalle and Bcjg23 December 10, 2020 05:01

thefirebanks merged commit a58f1ad into master Dec 10, 2020

thefirebanks deleted the issue_17_model_evaluator branch December 14, 2020 07:04

jordiplanescutxi pushed a commit that referenced this pull request Mar 23, 2021

Merge pull request #24 from wri-dssg/issue_17_model_evaluator

149ed17

Added data loader and model evaluator

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added data loader and model evaluator #24

Added data loader and model evaluator #24

thefirebanks commented Dec 3, 2020

jordiplanescutxi commented Dec 7, 2020 •

edited

Loading

thefirebanks commented Dec 7, 2020 •

edited

Loading

thefirebanks commented Dec 8, 2020

thefirebanks commented Dec 10, 2020

Added data loader and model evaluator #24

Added data loader and model evaluator #24

Conversation

thefirebanks commented Dec 3, 2020

jordiplanescutxi commented Dec 7, 2020 • edited Loading

thefirebanks commented Dec 7, 2020 • edited Loading

thefirebanks commented Dec 8, 2020

thefirebanks commented Dec 10, 2020

jordiplanescutxi commented Dec 7, 2020 •

edited

Loading

thefirebanks commented Dec 7, 2020 •

edited

Loading