In [2]:
import sys; sys.path.append('../') #<- for running in current location

This just points to the top-level of library, called `transformer_tools`. To install, go this location and simply do `pip install -r requirements.txt` (see further directions in `README.md`, I would suggest using a virtual environment such as [miniconda](https://docs.conda.io/en/latest/miniconda.html)) 

In [3]:
import logging; logging.basicConfig(level=logging.ERROR) #<- preferred logging preference here

In [4]:
from transformer_tools import get_config 
from transformer_tools.Tagger import TaggerModel 

Above are the imports needed for building a configuration and the main `TaggerModel` that drives the polarity tagger.

In [7]:
config = get_config('transformer_tools.Tagger') #<- initialized a generic configuration 

In [8]:
config.wandb_model = "polarity_projection/polarity_model_runs/distilbert_combined_model:v0"
config.wandb_entity = "polarity_projection"

To run an existing model, as we do here, what you need to do is specify the location of the target model, `wandb_model`, and the `wandb_entity`. I'm using the platform [wandb](https://wandb.ai/) to host experiments, datasets and models. By installing the packages in `requirements.txt` you are installing the `wandb` Python API, however to get access to this project you also need to register at wandb and provide your `WANDB_API_KEY` (which can be set globally by doing `export WANDB_API_KEY=xxxxxxxxxxxxxx`). The Python API will do the rest for you: automatic download of datasets/models, caching, etc..

**AVAILABLE MODELS**

I don't have all combinations of models/datasets backed, but here are some I have currently (they can be used by switching `config.wandb_model` above).

**roberta_combined_hai_model**, address: [polarity_projection/polarity_model_runs/roberta_combined_hai_model:v0]. A `RoBERTa` model trained on all data concatenated together using Hai's tagger. 

**roberta_combined_model**, address: [polarity_projection/polarity_model_runs/roberta_combined_model:v0]. Same as above, but uses Eric's tagger output. 

**distilbert_combined_hai**, address: [polarity_projection/polarity_model_runs/distilbert_combined_hai_model:v0] . Same as above, but uses `DistilBert` model.

**distilbert_med_hai**, address: [polarity_projection/polarity_model_runs/distilbert_med_hai_model:v0] . `DistilBert` on MED.

**distilbert_larry_synthetc**, address: [polarity_projection/polarity_model_runs/distilbert_larry_synthetic_model:v0], `DistilBert` on Larry's synthetic data.

In [9]:
model = TaggerModel(config)

[34m[1mwandb[0m: Currently logged in as: [33myakazimir[0m (use `wandb login --relogin` to force relogin)
[34m[1mwandb[0m: wandb version 0.10.19 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade


[34m[1mwandb[0m: Downloading large artifact distilbert_combined_model:v0, 253.42MB. 9 files... 

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

In [10]:
model.query("Every company failed to make a profit")

'Every↑ company= failed↑ to↑ make↓ a↓ profit↓'

As shown above, you can query the model by doing `model.query(target_string)` (in this case, the model gets the analysis wrong). This will return an arrow tagged string; the option `convert_to_string=False` (see below) will print out a list representation with the original tag types.  

In [11]:
model.query("Every company failed to make a profit",convert_to_string=False)

[('Every', '↑', 0.99168396),
 ('company', '=', 0.92892534),
 ('failed', '↑', 0.9571423),
 ('to', '↑', 0.55873585),
 ('make', '↓', 0.6602871),
 ('a', '↓', 0.52001804),
 ('profit', '↓', 0.6960268)]

In [12]:
model.query("Every dog ran in the yard")

'Every↑ dog↓ ran↑ in↑ the↑ yard='

In [13]:
model.query("Every doctor knew a nurse")

'Every↑ doctor= knew↑ a↑ nurse↑'

In [14]:
model.query("No alien died without reading news magazines")

'No↑ alien↓ died↓ without↓ reading↑ news↑ magazines↑'

In [15]:
model.query("It 's not a crime to steal from a thief")

"It↑ 's↓ not↑ a↓ crime↓ to↓ steal↓ from↓ a↓ thief↓"

In [16]:
model.query(
    "It 's not a crime to steal from a thief",
    convert_to_string=False
)

[('It', '↑', 0.8468745),
 ("'s", '↓', 0.9052322),
 ('not', '↑', 0.7273171),
 ('a', '↓', 0.9505269),
 ('crime', '↓', 0.9752001),
 ('to', '↓', 0.9190763),
 ('steal', '↓', 0.97653013),
 ('from', '↓', 0.9277706),
 ('a', '↓', 0.9128494),
 ('thief', '↓', 0.9567506)]

In [17]:
 model.model.args.use_multiprocessing

True

In [23]:
model.query("At least 3 ")

'Every↑ X↑ company= failed↑ to↑ be↓ profitable↓'