# This is the tutorials of using PyABSA for aspect term extraction
Drafted for v2.0 and higher versions. Note there are many breaking changes in v2.0, so you do not need to upgrade to v2.0 and higher versions if you are using code, API, checkpoints, datasets or anything from v1.0. Let's begin the introduction.

In [1]:
!pip install pyabsa >= 2.0.0
from pyabsa import AspectTermExtraction as ATEPC


[notice] A new release of pip available: 22.3 -> 22.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip
  _warn("subprocess %s is still running" % self.pid,
  elif not 'NVIDIA System Management' in os.popen('nvidia-smi -h').read():
  import imp
  'nearest': pil_image.NEAREST,
  'bilinear': pil_image.BILINEAR,
  'bicubic': pil_image.BICUBIC,
  'hamming': pil_image.HAMMING,
  'box': pil_image.BOX,
  'lanczos': pil_image.LANCZOS,
  _warn(f"unclosed running multiprocessing pool {self!r}",


[31mPyABSA(2.0.3): PyABSA v2.x has been refactored; its APIs are now organized by NLP subtasks. 
Due to many breaking changes, it is not compatible with 1.x versions.
If you need to use pretrained checkpoints, you can downgrade to 1.x versions.
Please feel free to provide your advice to improve PyABSA[0m


# ATEPCModelList
There are three types of APC models for aspect term extraction, which are based on the local context focus mechanism
Notice: when you select to use a model, please make sure to carefully manage the configurations, e.g., for glove-based models, you need to set hidden dim and embed_dim manually.
We already provide some pre-defined configurations. Refer to the source code if you have any question
e.g.,

In [3]:
# config = ATEPC.ATEPCConfigManager.get_atepc_config_glove()  # get pre-defined configuration for GloVe model, the default embed_dim=300
config = (
    ATEPC.ATEPCConfigManager.get_atepc_config_english()
)  # this config contains 'pretrained_bert', it is based on pretrained models


# ATEPCDatasetList
There are the [datasets](https://github.com/yangheng95/ABSADatasets) from publication or third-party contribution. There dataset can be downloaded and processed automatically.
In pyabsa, you can pass a set of datasets to train a model.
e.g., for using integrated datasets:


In [4]:
from pyabsa import DatasetItem

dataset = ATEPC.ATEPCDatasetList.Restaurant16
# now the dataset is a DatasetItem object, which has a name and a list of subdatasets
# e.g., SemEval dataset contains Laptop14, Restaurant14, Restaurant16 datasets

You can use your own dataset provided that it is formatted according to [ABSADatasets](https://github.com/yangheng95/ABSADatasets#important-rename-your-dataset-filename-before-use-it-in-pyabsa)

In [None]:
# Put your dataset into integrated_datasets folder, it this folder does not exist, you need to call:
from pyabsa import download_all_available_datasets

download_all_available_datasets()

to pass datasets to PyABSA trainers, you can

In [None]:
my_dataset = DatasetItem("my_dataset", ["my_dataset1", "my_dataset2"])
# my_dataset1 and my_dataset2 are the dataset folders. In there folders, the train dataset is necessary

# Training
Let's prepare to train

In [7]:
from pyabsa import ModelSaveOption, DeviceTypeOption

config.batch_size = 8
trainer = ATEPC.ATEPCTrainer(
    config=config,
    dataset=dataset,
    from_checkpoint="english",
    # if you want to resume training from our pretrained checkpoints, you can pass the checkpoint name here
    auto_device=DeviceTypeOption.AUTO,
    path_to_save=None,  # set a path to save checkpoints, if it is None, save checkpoints at 'checkpoints' folder
    checkpoint_save_mode=ModelSaveOption.SAVE_MODEL_STATE_DICT,
    load_aug=False,
    # there are some augmentation dataset for integrated datasets, you use them by setting load_aug=True to improve performance
)

  _warn("subprocess %s is still running" % self.pid,
  results = os.popen(cmd).readlines()
  _warn("subprocess %s is still running" % self.pid,
  _warn("subprocess %s is still running" % self.pid,
  _warn("subprocess %s is still running" % self.pid,


2022-11-17 15:11:34,562 INFO: PyABSA version: 2.0.3
2022-11-17 15:11:34,563 INFO: Transformers version: 4.21.1
2022-11-17 15:11:34,563 INFO: Torch version: 1.12.1+cuda11.6
2022-11-17 15:11:34,564 INFO: Device: NVIDIA GeForce RTX 3070
2022-11-17 15:11:34,649 INFO: Local dataset version: 2022.11.07
2022-11-17 15:11:34,652 INFO: Remote dataset version: 2022.10.25
2022-11-17 15:11:34,654 INFO: Searching dataset 116.Restaurant16 in local disk...
2022-11-17 15:11:34,694 INFO: You can set load_aug=True in a trainer to augment your dataset (English only yet) and improve performance.
2022-11-17 15:11:34,694 INFO: Please use a new folder to perform new text augment if the former augment exited unexpectedly


  _warn(f"unclosed running multiprocessing pool {self!r}",
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
 60%|█████▉    | 1042/1743 [00:01<00:00, 718.72it/s, convert examples to features]



 78%|███████▊  | 1354/1743 [00:01<00:00, 718.96it/s, convert examples to features]



100%|██████████| 1743/1743 [00:02<00:00, 697.25it/s, convert examples to features]

2022-11-17 15:11:41,910 INFO: Dataset Label Details: {'Positive': 1235, 'Neutral': 69, 'Negative': 437, 'Sum': 1741}



 50%|█████     | 310/615 [00:00<00:00, 691.57it/s, convert examples to features]



100%|██████████| 615/615 [00:00<00:00, 639.62it/s, convert examples to features]

2022-11-17 15:11:43,273 INFO: Dataset Label Details: {'Positive': 467, 'Neutral': 30, 'Negative': 117, 'Sum': 614}



Some weights of the model checkpoint at microsoft/deberta-v3-base were not used when initializing DebertaV2Model: ['lm_predictions.lm_head.dense.bias', 'mask_predictions.LayerNorm.weight', 'mask_predictions.classifier.weight', 'lm_predictions.lm_head.dense.weight', 'lm_predictions.lm_head.LayerNorm.bias', 'mask_predictions.dense.bias', 'mask_predictions.LayerNorm.bias', 'lm_predictions.lm_head.LayerNorm.weight', 'mask_predictions.dense.weight', 'mask_predictions.classifier.bias', 'lm_predictions.lm_head.bias']
- This IS expected if you are initializing DebertaV2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DebertaV2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


2022-11-17 15:11:44,938 INFO: Save cache dataset to lcf_atepc.Restaurant16.dataset.0eb022898101db27269d2b97c8077ce23e288daf367f6040c0ea7d36469b35b5.cache
2022-11-17 15:11:45,281 INFO: cuda memory allocated:770206720
2022-11-17 15:11:45,281 INFO: ABSADatasetsVersion:2022.11.07	-->	Calling Count:0
2022-11-17 15:11:45,282 INFO: IOB_label_to_index:{'B-ASP': 1, 'I-ASP': 2, 'O': 3, '[CLS]': 4, '[SEP]': 5}	-->	Calling Count:3
2022-11-17 15:11:45,283 INFO: MV:<metric_visualizer.metric_visualizer.MetricVisualizer object at 0x000002072C883D00>	-->	Calling Count:0
2022-11-17 15:11:45,284 INFO: PyABSAVersion:2.0.3	-->	Calling Count:3
2022-11-17 15:11:45,284 INFO: SRD:3	-->	Calling Count:14130
2022-11-17 15:11:45,285 INFO: TorchVersion:1.12.1+cuda11.6	-->	Calling Count:3
2022-11-17 15:11:45,285 INFO: TransformersVersion:4.21.1	-->	Calling Count:3
2022-11-17 15:11:45,286 INFO: auto_device:True	-->	Calling Count:10
2022-11-17 15:11:45,286 INFO: batch_size:8	-->	Calling Count:13
2022-11-17 15:11:45,28

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  bucket_pos = np.where(abs_pos <= mid, relative_pos, log_pos * sign).astype(np.int)
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  bucket_pos = np.where(abs_pos <= mid, relative_pos, log_pos * sign).astype(np.int)
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  bucket_pos = np.where(abs_pos <= mid, relative_pos, log_pos * sign).astype(np.int)
 28%|██▊       | 62/218 [01:06<02:46,  1.07s/it, Epoch:0 | loss_apc:0.8569 | loss_ate:0.3008 | APC_ACC: 71.5(max:76.06) | APC_F1: 27.79(max:28.8) | ATE_F1: 57.18(max:57.18)]


KeyboardInterrupt: 

to load trained model for inference:

In [None]:
sentiment_classifier = trainer.load_trained_model()
assert isinstance(sentiment_classifier, ATEPC.AspectExtractor)

# Inference

## Use our checkpoints to initialize a SentimentClassifier

In [8]:
from pyabsa import available_checkpoints

ckpts = available_checkpoints()
# find a suitable checkpoint and use the name:
aspect_extractor = ATEPC.AspectExtractor(
    checkpoint="english"
)  # here I use the english checkpoint which is trained on all English datasets in PyABSA

[32mDownloading checkpoint:english ...[0m
[31mNotice: The pretrained model are used for testing, it is recommended to train the model on your own custom datasets[0m


579MB [00:17, 32.61MB/s, Downloading checkpoint...]                          

Find zipped checkpoint: ./checkpoints\ATEPC_ENGLISH_CHECKPOINT\fast_lcf_atepc_English_cdw_apcacc_82.36_apcf1_81.89_atef1_75.43.zip, unzipping...





Done.
[33mIf the auto-downloading failed, please download it via browser: https://huggingface.co/spaces/yangheng/PyABSA/resolve/main/checkpoints/English/ATEPC/fast_lcf_atepc_English_cdw_apcacc_82.36_apcf1_81.89_atef1_75.43.zip [0m
Load aspect extractor from ./checkpoints\ATEPC_ENGLISH_CHECKPOINT
config: ./checkpoints\ATEPC_ENGLISH_CHECKPOINT\fast_lcf_atepc_English_cdw_apcacc_82.36_apcf1_81.89_atef1_75.43\fast_lcf_atepc.config
state_dict: ./checkpoints\ATEPC_ENGLISH_CHECKPOINT\fast_lcf_atepc_English_cdw_apcacc_82.36_apcf1_81.89_atef1_75.43\fast_lcf_atepc.state_dict
model: None
tokenizer: ./checkpoints\ATEPC_ENGLISH_CHECKPOINT\fast_lcf_atepc_English_cdw_apcacc_82.36_apcf1_81.89_atef1_75.43\fast_lcf_atepc.tokenizer


  _warn("subprocess %s is still running" % self.pid,
  results = os.popen(cmd).readlines()
  _warn("subprocess %s is still running" % self.pid,
  _warn("subprocess %s is still running" % self.pid,
  _warn("subprocess %s is still running" % self.pid,
Some weights of the model checkpoint at microsoft/deberta-v3-base were not used when initializing DebertaV2Model: ['lm_predictions.lm_head.dense.bias', 'mask_predictions.LayerNorm.weight', 'mask_predictions.classifier.weight', 'lm_predictions.lm_head.dense.weight', 'lm_predictions.lm_head.LayerNorm.bias', 'mask_predictions.dense.bias', 'mask_predictions.LayerNorm.bias', 'lm_predictions.lm_head.LayerNorm.weight', 'mask_predictions.dense.weight', 'mask_predictions.classifier.bias', 'lm_predictions.lm_head.bias']
- This IS expected if you are initializing DebertaV2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
-

## Simple Prediction

In [10]:
atepc_examples = [
    "But the staff was so nice to us .",
    "But the staff was so horrible to us .",
    r"Not only was the food outstanding , but the little ` perks \' were great .",
    "It took half an hour to get our check , which was perfect since we could sit , have drinks and talk !",
    "It was pleasantly uncrowded , the service was delightful , the garden adorable , "
    "the food -LRB- from appetizers to entrees -RRB- was delectable .",
    "How pretentious and inappropriate for MJ Grill to claim that it provides power lunch and dinners !",
]

for ex in atepc_examples:
    aspect_extractor.predict(
        example=ex,
        print_result=True,
        ignore_error=True,  # ignore an invalid example, if it is False, invalid examples will raise Exceptions
        eval_batch_size=32,
    )

  probs = [float(x) for x in F.softmax(i_apc_logits).cpu().numpy().tolist()]


The results of aspect term extraction have been saved in D:\Works\PyABSA\examples-v2\aspect_term_extraction\atepc_inference.result.json
Example 0: But the [32m<staff:Positive Confidence:0.999491810798645>[0m was so nice to us .
The results of aspect term extraction have been saved in D:\Works\PyABSA\examples-v2\aspect_term_extraction\atepc_inference.result.json
Example 0: But the [31m<staff:Negative Confidence:0.9985008239746094>[0m was so horrible to us .
The results of aspect term extraction have been saved in D:\Works\PyABSA\examples-v2\aspect_term_extraction\atepc_inference.result.json
Example 0: Not only was the [32m<food:Positive Confidence:0.9992227554321289>[0m outstanding , but the little ` [32m<perks:Positive Confidence:0.9973457455635071>[0m \ ' were great .
The results of aspect term extraction have been saved in D:\Works\PyABSA\examples-v2\aspect_term_extraction\atepc_inference.result.json
Example 0: It took half an hour to get our [36m<check:Neutral Confidence:0.

## Batch Inference

In [12]:
aspect_extractor.batch_predict(
    inference_source=ATEPC.ATEPCDatasetList.Restaurant16,
    print_result=True,
    save_result=False,
    ignore_error=True,
    eval_batch_size=32,
)

Try to load 116.Restaurant16 dataset from local disk
loading: integrated_datasets\apc_datasets\110.SemEval\116.restaurant16\restaurant_test.raw.inference


100%|██████████| 422/422 [00:00<00:00, 1056.77it/s, preparing apc inference dataloader...]
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  bucket_pos = np.where(abs_pos <= mid, relative_pos, log_pos * sign).astype(np.int)
100%|██████████| 14/14 [00:08<00:00,  1.70it/s, extracting aspect terms...]
100%|██████████| 710/710 [00:01<00:00, 589.13it/s, preparing apc inference dataloader...]
  probs = [float(x) for x in F.softmax(i_apc_logits).cpu().numpy().tolist()]
100%|██████████| 23/23 [00:14<00:00,  1.63it/s, classifying aspect sentiments...]


Example 0: serves really good [32m<sushi:Positive Confidence:0.9918943047523499>[0m .
Example 1: not the biggest [31m<portions:Negative Confidence:0.979500949382782>[0m but adequate .
Example 2: green tea creme brulee is a must !
Example 3:   – i ca n ' t say enough about [32m<this:Positive Confidence:0.9954694509506226>[0m place .
Example 4: it has great [32m<sushi:Positive Confidence:0.9995418787002563>[0m and even better [32m<service:Positive Confidence:0.9995410442352295>[0m .
Example 5: the entire [32m<staff:Positive Confidence:0.9995019435882568>[0m was extremely accomodating and tended to my every need .
Example 6: i ' ve been to this [32m<restaurant:Positive Confidence:0.9987316727638245>[0m over a dozen times with no complaints to date .
Example 7: the [31m<owner:Negative Confidence:0.9984574317932129>[0m is belligerent to guests that have a complaint .
Example 8: good [32m<food:Positive Confidence:0.9994831085205078>[0m !
Example 9: this is a great place to 

[{'sentence': 'serves really good sushi .',
  'IOB': ['B-ASP', 'O', 'O', 'B-ASP', 'O'],
  'tokens': ['serves', 'really', 'good', 'sushi', '.'],
  'aspect': ['serves', 'sushi'],
  'position': [[0, 3], [0, 3]],
  'sentiment': ['Neutral', 'Positive'],
  'probs': [[0.0004788661899510771, 0.6750665307044983, 0.3244546353816986],
   [0.0002311569405719638, 0.007874456234276295, 0.9918943047523499]],
  'confidence': [0.6750665307044983, 0.9918943047523499]},
 {'sentence': 'not the biggest portions but adequate .',
  'IOB': ['O', 'O', 'O', 'B-ASP', 'O', 'O', 'O'],
  'tokens': ['not', 'the', 'biggest', 'portions', 'but', 'adequate', '.'],
  'aspect': ['portions'],
  'position': [[3]],
  'sentiment': ['Negative'],
  'probs': [[0.979500949382782, 0.00579943647608161, 0.014699668623507023]],
  'confidence': [0.979500949382782]},
 {'sentence': 'green tea creme brulee is a must !',
  'IOB': ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O'],
  'tokens': ['green', 'tea', 'creme', 'brulee', 'is', 'a', 'must', '

# Annotate your own datasets via PyABSA
[Auto-Annotation](https://github.com/yangheng95/ABSADatasets#auto-annoate-your-datasets-via-pyabsa)  # available for v1.0 currently
[Manually-Annotation](https://github.com/yangheng95/ABSADatasets/tree/v1.2/DPT)

# Deploy a ATEPC demo
TBC ...
