Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added data loader and model evaluator #24

Merged
merged 9 commits into from
Dec 10, 2020
Merged

Conversation

thefirebanks
Copy link
Contributor

Main changes:

  • Added methods in tasks/data_loader/src/ to load data from the dataset and model output json files.
  • Added a model evaluator class in tasks/evaluate_model/src to evaluate classification models and easily visualize the results. Documented notebook is in tasks/evaluate_model/notebooks

Bonus:

  • Added a script in the tasks/ folder so that whenever we need to create a new task, we can use it and it will automatically create the folder structure (including the input/output/src folders). More details in the README.md of the tasks/ folder.

@jordiplanescutxi
Copy link
Collaborator

jordiplanescutxi commented Dec 7, 2020

You have done a magnificent work!!
Some things that I need to clarify:

  1. If we had the example file "input/sample_model_output.json" it would be easier to execute the code in the notebook
  2. If I understand it right you assume that, for each document we will have two files with labelled sentences, the sample_dataset and the sample_model_output. They are going to have the same sentences in the same order and then different labels. I'm afraid we may have some mess if we do not check that the sentences are actually the same and that they are at the same position.
  3. If we are going trhough the sample_dataset.json as we do it in the function "labeled_sentences_from_dataset" we are assuming that all sentences will fall into one of the categories 0 to 5, while most of them will fall into a -1 category which is "no_incentive". We will talk about it.

@thefirebanks
Copy link
Contributor Author

thefirebanks commented Dec 7, 2020

You have done a magnificent work!!
Some things that I need to clarify:

  1. If we had the example file "input/sample_model_output.json" it would be easier to execute the code in the notebook
  2. If I understand it right you assume that, for each document we will have two files with labelled sentences, the sample_dataset and the sample_model_output. They are going to have the same sentences in the same order and then different labels. I'm afraid we may have some mess if we do not check that the sentences are actually the same and that they are at the same position.
  3. If we are going trhough the sample_dataset.json as we do it in the function "labeled_sentences_from_dataset" we are assuming that all sentences will fall into one of the categories 0 to 5, while most of them will fall into a -1 category which is "no_incentive". We will talk about it.

Hi Jordi, thank you for the feedback! Here are my responses:

  1. You are absolutely right, I had completely forgotten that the input folders don't get versioned, so I uploaded the input folder to our google drive (left the link in Slack).
  2. Indeed! To solve this, maybe we can create a unique ID for each sentence and that way it is easier to check for equality. This can be added in the script/process that creates the json files in the first place, and I can add a check to confirm that they are in the same order in the data loader. I will add this once we confirm the mechanism to identify the unique sentence.
  3. Good point, I was actually thinking to make 0 be the "no incentive" label and then 1-6 be the distinct types of incentives. I will correct that now!

@thefirebanks
Copy link
Contributor Author

Updated the input files in the google drive folder, will update the data loader tomorrow before midnight EST!

@thefirebanks
Copy link
Contributor Author

Tried loading sentences from ElSalvador.json and they got effectively loaded!

Screen Shot 2020-12-10 at 12 01 11 AM

@thefirebanks thefirebanks merged commit a58f1ad into master Dec 10, 2020
@thefirebanks thefirebanks deleted the issue_17_model_evaluator branch December 14, 2020 07:04
jordiplanescutxi pushed a commit that referenced this pull request Mar 23, 2021
Added data loader and model evaluator
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a general evaluator for the models
2 participants