# Testing class

Dataset is a template class for performing Aspect-Based Sentiment Analysis on retrieved data. It is designed with a fixed template method, __init__(), that fixes the steps to be taken to process the data and subsequently perform ABSA. The abstract method, parse(), is meant to be flexible and allows the Developer to augment the class functions through class extension of the template class. Developers can polymorphosize the parse() method to ingest any data type as they see fit as long as the output is the class property, self.data, stores a pandas DataFrame with a column called "content".

To initiate the absa_model python file, you will need the following dependencies:
- ../data/topic_dict.pkl : dictionary containing topic and feature pair
- ../models/lda_model.pkl : LDA model for topic extraction
- ../models/vectorizer.pkl : Vectorizer to convert text to vector

## Loading JSON object and JSON class

In [1]:
from absa_model import Dataset, JSON_Dataset
import json

def read_json(file_path):
    try:
        with open(file_path, 'r') as json_file:
            data = json.load(json_file)  # Load the JSON data into a Python dictionary
        return data
    except FileNotFoundError:
        print(f"Error: The file at {file_path} was not found.")
    except json.JSONDecodeError:
        print(f"Error: The file at {file_path} is not a valid JSON file.")
    except Exception as e:
        print(f"An error occurred: {e}")

# Example usage:
file_path = '../data/reddit_posts.json'  # Replace with your file path
json_data = read_json(file_path)

if json_data:
    print(json_data)




[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\kengb\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


## Parsing JSON data using Template Class

This is for when dependency models have already been loaded.

In [2]:
import pickle
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

with open('../models/lda_model.pkl', 'rb') as f:
    lda_model = pickle.load(f)

with open('../models/vectorizer.pkl', 'rb') as f:
    vectorizer = pickle.load(f)

with open('../models/topic_dict.pkl', 'rb') as f:
    topic_dict = pickle.load(f)

vader = SentimentIntensityAnalyzer()
test_with_dependencies = JSON_Dataset(json_data, vectorizer=vectorizer, lda_model=lda_model, topic_dict=topic_dict, vader_model=vader)
print(test_with_dependencies.data)

Parsing json objects
Parsed json objects of size (440, 3)
Performing ABSA...
Extracting aspects...
Extracting aspects
Getting sentiment...
ABSA completed
          id                 date  \
0    1g7qgpv  2024-10-20 12:04:51   
1    1g7plv5  2024-10-20 11:12:20   
2    1g7o4ce  2024-10-20 09:44:42   
3    1g7nqmh  2024-10-20 09:23:06   
4    1g7no5o  2024-10-20 09:19:10   
..       ...                  ...   
435  1fwpz6b  2024-10-05 21:06:15   
436  1fwjhgn  2024-10-05 13:40:18   
437  1fwesa2  2024-10-05 09:03:36   
438  1fwelej  2024-10-05 08:53:51   
439  1fwe0df  2024-10-05 08:23:56   

                                               content  \
0    questionhelp hey everyon know long shot though...   
1    airpod left plane left airpod recent flight se...   
2    book ticket togeth im alist confirm day chang ...   
3    elliott southwest airlin begin settlement disc...   
4    silli question app book flight download app it...   
..                                                 ..

In [3]:
import pandas as pd
pd.DataFrame(test_with_dependencies.data).topic.value_counts()

topic
"Flight Integrity Concerns"    318
"Travel Experience"             91
"Travel Troubles"               13
"Travel Essentials"             10
"Sky Lounge Experience"          8
Name: count, dtype: int64

## Testing Datasets without dependencies

In [4]:
import pandas as pd
test_without_dependencies = JSON_Dataset(json_data, dataset_name='airline_test')
print(test_without_dependencies.data)

Parsing json objects
Parsed json objects of size (440, 3)
Preparing dataset for ABSA...
Preparing vectorizer...
Vectorizer saved as airline_test_vectorizer.pkl
Preparing LDA model...
LDA model saved as airline_test_lda_model.pkl
Topic dictionary saved as airline_test_topic_dict.pkl
Preparing VADER model...
VADER model saved as airline_test_vader_model.pkl
Dataset prepared for ABSA
Performing ABSA...
Extracting aspects...
Extracting aspects
Getting sentiment...
ABSA completed
          id                 date  \
0    1g7qgpv  2024-10-20 12:04:51   
1    1g7plv5  2024-10-20 11:12:20   
2    1g7o4ce  2024-10-20 09:44:42   
3    1g7nqmh  2024-10-20 09:23:06   
4    1g7no5o  2024-10-20 09:19:10   
..       ...                  ...   
435  1fwpz6b  2024-10-05 21:06:15   
436  1fwjhgn  2024-10-05 13:40:18   
437  1fwesa2  2024-10-05 09:03:36   
438  1fwelej  2024-10-05 08:53:51   
439  1fwe0df  2024-10-05 08:23:56   

                                               content  \
0    questionhelp

In [5]:
pd.DataFrame(test_with_dependencies.data).topic.value_counts()

topic
"Flight Integrity Concerns"    318
"Travel Experience"             91
"Travel Troubles"               13
"Travel Essentials"             10
"Sky Lounge Experience"          8
Name: count, dtype: int64