Some notes:
- polarity has not been assigned to suggestions
- each row in `observations` table represent a single reading example, possibly across multiple evaluators
- ex. if reading example received 3 negative and 1 positive `EXPRESSION` remarks, the example is assigned '-2' in the `EXPRESSION` column

In [1]:
import pandas as pd

In [2]:
observations = pd.read_csv(
    '.obs.tsv.tmp',
    sep='\t'
)

In [3]:
print(observations.shape)
observations.head()

(70, 19)


Unnamed: 0,ACCURACY,EXPRESSION,FLUENCY,MONITORING_FOR_MEANING,MORPHOLOGY,MULTISYLLABIC_WORDS,OMISSION_INSERTION,PHONICS,PHRASING,PRONUNCIATION,PUNCTUATION,RATE,SELF_CORRECTION,SIGHT_WORDS,SUBSTITUTION_REVERSAL,VOCABULARY,WORD_ATTACK,WORD_BY_WORD,WORD_ENDINGS
0,1,0,0,-1,0,0,0,0,0,0,-1,1,1,0,0,1,0,0,0
1,1,-1,1,0,0,0,0,1,1,0,1,2,0,0,0,1,0,0,0
2,1,1,-1,0,0,-1,0,0,1,0,0,1,0,1,0,0,0,0,0
3,0,0,0,1,0,0,0,0,-1,0,0,0,2,0,-1,0,0,0,0
4,0,-1,1,0,0,1,0,1,-2,0,0,-1,1,0,-1,0,0,0,0


In [4]:
suggestions = pd.read_csv(
    '.sug.tsv.tmp',
    sep='\t'
)

In [5]:
print(suggestions.shape)
suggestions.head()

(70, 23)


Unnamed: 0,ARTICULATION,DIFFICULTY,EXPRESSION,FLUENCY,MEANING_COMPRENHENSION,MORPHOLOGY,MULTISYLLABIC_WORDS,OMISSIONS_INSERTIONS,PHONICS,PHRASING,...,SELF_CORRECTION,SELF_MONITOR,SELF_MONITORING,SIGHT_WORD,SIGHT_WORDS,SUBSTITUTIONS_REVERSALS,VOCABULARY,VOICE,WORD_ATTACK,WORD_ENDINGS
0,0,2,1,2,2,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,2,4,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,3,1,0,1,0,1,1,0,0,...,1,0,0,0,0,0,1,0,0,0
3,0,1,0,2,2,0,0,0,0,0,...,0,0,0,1,0,1,0,0,1,0
4,0,2,3,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0


In [6]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

Splitting into training and testing. Only 70 examples in total but all we need right now is a baseline.

The random forest should be a reliable model for this task so for consistency it'll be the only model I use.

In [7]:
# combined_obs_sug = pd.concat([observations, suggestions], axis=1)
X_train, X_test, Y_train, Y_test = train_test_split(
    observations,
    suggestions,
    test_size=(1/7),
    random_state=0
)

In [8]:
forests = [
    RandomForestClassifier(
        n_estimators=10,
        random_state=42
    )
    for _ in suggestions.columns
]

(default cost function)

In [9]:
for idx, category in enumerate(suggestions.columns):
    forests[idx].fit(X_train, Y_train[category])

In [10]:
scrs = []
for idx, category in enumerate(suggestions.columns):
    scr = forests[idx].score(X_test, Y_test[category])
    scrs.append(scr)

print('Macro-averaged score: {:.1f}%'.format(100 * sum(scrs) / len(scrs)))

Macro-averaged score: 68.3%


Quick note: I took an egregious number of liberties while making this baseline model.

Again, here are some areas where I cut corners:

- I treated `POSITIVE` and `NEGATIVE` observations as `+1`/`-1` and simply summed them up over each reading example

- The extent to which I processed the `$SUG`s was to make each reading example have a tally of how many times a suggestion in was given, over each category. No polarity and no information beyond simply the category.

A baseline accuracy of 68% seems reasonable.

With new weights:

In [11]:
new_forests = [
    RandomForestClassifier(
        n_estimators=10, 
        class_weight={0:1, 1:10}, 
        random_state=42
    )
    for _ in suggestions.columns
]

new_scrs = []
for idx, category in enumerate(suggestions.columns):
    new_forests[idx].fit(X_train, Y_train[category])
    scr = new_forests[idx].score(X_test, Y_test[category])
    new_scrs.append(scr)

print('Macro-averaged score: {:.1f}%'.format(100 * sum(new_scrs) / len(new_scrs)))

Macro-averaged score: 66.1%
