Skip to content

Commit

Permalink
Merge branch '#49-set-up-a-binary-classifier' of https://github.com/w…
Browse files Browse the repository at this point in the history
…ri-dssg/policy-data-analyzer into #49-set-up-a-binary-classifier
  • Loading branch information
Jordi Planas committed Jan 21, 2021
2 parents fd144ab + 51394a8 commit 9aed2ac
Show file tree
Hide file tree
Showing 19 changed files with 19,555 additions and 59 deletions.
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,18 +7,18 @@ Current Roadmap

### Phase 1: Augmenting training data

1. [ ] Fine-tune S-BERT on existing labeled data from 5 countries (`WRI_Policy_Tags.xlsx` file)
2. [ ] Find methods to improve performance of S-BERT for data augmentation purposes
1. [X] Fine-tune S-BERT on existing labeled data from 5 countries (`WRI_Policy_Tags.xlsx` file)
2. [X] Find methods to improve performance of S-BERT for data augmentation purposes
3. [ ] Build pipeline for further fine tuning as we get more data
4. [ ] Classify the policy instrument of new sentences from El Salvador and Chile policy documents
5. [ ] Manually review the model tags and tag more examples (2 reviewers)
6. [ ] Build pipeline to create excel documents for manual reviewing/tagging
4. [X] Classify the policy instrument of new sentences from El Salvador and Chile policy documents
5. [X] Manually review the model tags and tag more examples (2 reviewers)
6. [X] Build pipeline to create excel documents for manual reviewing/tagging
7. [ ] Explore other models if needed

### Phase 2: Modeling

1. [ ] Develop a model to first identify whether a sentence contains an incentive instrument, or is an icentive at all
2. [ ] Develop a model that classifies incentive instruments (direct payment, tax deduction, etc.)
1. [X] Develop a model to first identify whether a sentence contains an incentive instrument, or is an icentive at all
2. [X] Develop a model that classifies incentive instruments (direct payment, tax deduction, etc.)

-------------------------------------

Expand Down
309 changes: 258 additions & 51 deletions tasks/evaluate_model/notebooks/Plot_F1.ipynb

Large diffs are not rendered by default.

7 changes: 7 additions & 0 deletions tasks/evaluate_model/notebooks/avg-f1-results-compilation.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
,Experiment number,model,f1-score,epochs,test_perc
0,13,paraphrase-xlm-r-multilingual-v1,0.5,6,0.15
1,14,paraphrase-xlm-r-multilingual-v1,0.52,8,0.3
2,15,paraphrase-xlm-r-multilingual-v1,0.58,10,0.2
3,12,paraphrase-xlm-r-multilingual-v1,0.72,6,0.2
4,10,paraphrase-xlm-r-multilingual-v1,0.75,10,0.3
5,11,paraphrase-xlm-r-multilingual-v1,0.67,8,0.25
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
,Experiment number,model,f1-score,epochs,test_perc
0,13,paraphrase-xlm-r-multilingual-v1,0.4,6,0.15
1,14,paraphrase-xlm-r-multilingual-v1,0.48,8,0.3
2,15,paraphrase-xlm-r-multilingual-v1,0.65,10,0.2
3,12,paraphrase-xlm-r-multilingual-v1,0.71,8,0.2
4,10,paraphrase-xlm-r-multilingual-v1,0.82,10,0.3
5,11,paraphrase-xlm-r-multilingual-v1,0.66,8,0.25
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 7 additions & 0 deletions tasks/evaluate_model/output/avg-f1-results-compilation.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
,Experiment number,model,f1-score,epochs,test_perc
0,0,paraphrase-xlm-r-multilingual-v1,0.59,8,0.15
1,1,paraphrase-xlm-r-multilingual-v1,0.57,4,0.15
2,3,paraphrase-xlm-r-multilingual-v1,0.55,6,0.3
3,4,stsb-xlm-r-multilingual,0.72,4,0.3
4,5,paraphrase-xlm-r-multilingual-v1,0.51,8,0.15
5,2,paraphrase-xlm-r-multilingual-v1,0.79,10,0.25
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
,Experiment number,model,f1-score,epochs,test_perc
0,13,paraphrase-xlm-r-multilingual-v1,0.4,6,0.15
1,14,paraphrase-xlm-r-multilingual-v1,0.48,8,0.3
2,15,paraphrase-xlm-r-multilingual-v1,0.65,10,0.2
3,12,paraphrase-xlm-r-multilingual-v1,0.71,8,0.2
4,10,paraphrase-xlm-r-multilingual-v1,0.82,10,0.3
5,11,paraphrase-xlm-r-multilingual-v1,0.66,8,0.25
19,263 changes: 19,262 additions & 1 deletion tasks/incentive_classifier/notebooks/BinaryClassifierGoogleColab.ipynb

Large diffs are not rendered by default.

0 comments on commit 9aed2ac

Please sign in to comment.