# API Authentication Setup and Token Retrieval

This segment of code is responsible for setting up the API URL and credentials, initializing the `TokenManager` class, and retrieving an authentication token. This token will be used for subsequent API requests.

### Code Explanation
- Import necessary modules from the `semtui_refactored` package.
- Define the API URL and user credentials.
- Initialize the `TokenManager` with the API URL, credentials, and required headers.
- Retrieve the authentication token using the `TokenManager`.


In [1]:
# Import necessary classes and functions from the semtui_refactored package
from semtui_refactored.data_manager import DataManager
from semtui_refactored.token_manager import TokenManager
from semtui_refactored.file_reader import FileReader
from semtui_refactored.extension_manager import ExtensionManager
from semtui_refactored.reconciliation_manager import ReconciliationManager
from semtui_refactored.utils import Utility
from semtui_refactored.dataset_manager import DatasetManager
from semtui_refactored.semtui_evals import EvaluationManager

# Set up the API URL and credentials
api_url = "http://localhost:3003/api/"  # The base URL for the API
username = "test"  # Username for authentication
password = "test"  # Password for authentication

# Initialize TokenManager
signin_data = {"username": username, "password": password}  # Payload for sign-in request
signin_headers = {
    "accept": "application/json",  # Specify the response format
    "content-type": "application/json"  # Specify the request content type
}
token_manager = TokenManager(api_url, signin_data, signin_headers)  # Create an instance of TokenManager

# Get token
token = token_manager.get_token()  # Retrieve the authentication token
#print(f"Token: {token}")  # Uncomment to print the token (useful for debugging)


# Initialization of Managers

This segment of code initializes various manager classes needed for handling data operations, reconciliation, dataset management, evaluations, and extensions. Each manager class is configured with the necessary API URL and authentication token.

### Code Explanation
- Initialize the `DataManager` for handling data-related operations.
- Initialize the `ReconciliationManager` for managing data reconciliation tasks.
- Initialize the `DatasetManager` for managing datasets.
- Initialize the `EvaluationManager` for handling evaluations.
- Initialize the `ExtensionManager` for managing extensions.


In [6]:
# Initialize DataManager
data_manager = DataManager(api_url, username, password)  # Create an instance of DataManager with API URL and credentials

# Initialize ReconciliationManager
reconciliation_manager = ReconciliationManager(api_url, token_manager)  # Create an instance of ReconciliationManager with API URL and token manager

# Initialize DatasetManager
dataset_manager = DatasetManager(api_url, token_manager)  # Create an instance of DatasetManager with API URL and token manager

# Initialize the EvaluationManager
evaluation_manager = EvaluationManager()  # Create an instance of EvaluationManager

# Initialize ExtensionManager
extension_manager = ExtensionManager(api_url, token)  # Create an instance of ExtensionManager with API URL and token


In [3]:
# Configure Pandas Display Options

#This segment of code configures the display options for Pandas DataFrames. 
#Adjusting these settings allows for better visibility and control over how DataFrames are presented in the notebook.

### Code Explanation
#- Import the Pandas library.
#- Set the display option to show all columns of a DataFrame.
#- Limit the display to show only the first 20 rows of a DataFrame.
    
import pandas as pd 
# Set pandas display options

pd.set_option('display.max_columns', None)  # Display all columns
pd.set_option('display.max_rows', 20)  # Limit to 20 rows for display

# Importing and Displaying CSV Data

This segment of code handles the importation of data from a CSV file using the `DataManager` class. It reads the CSV file into a Pandas DataFrame and displays the first few rows. Error handling is included to catch and report any issues that arise during the import process.

### Code Explanation
- Define the path to the CSV file.
- Attempt to read the CSV file using the `DataManager` and store it in a DataFrame.
- Print a success message and display the first few rows of the DataFrame.
- Catch and print any errors that occur during the CSV import.


In [4]:
# Path to your CSV file
csv_file_path = "/Users/abubakarialidu/Documents/SEMT-py/semtui_refactored/JOT sample original.csv"  # Define the path to the CSV file

# Read CSV data using DataManager
try:
    df = data_manager.read_csv_data(csv_file_path)  # Read the CSV file into a DataFrame using DataManager
    print("CSV file imported successfully!")  # Print success message
    display(df.head())  # Display the first few rows of the DataFrame
except Exception as e:
    print(f"Error importing CSV file: {e}")  # Print error message if CSV import fails


File '/Users/abubakarialidu/Documents/SEMT-py/semtui_refactored/JOT sample original.csv' read successfully with encoding 'ISO-8859-1'
CSV file imported successfully!


Unnamed: 0,Fecha_id,Cuenta_id,Campaña_id,Grupo_id,Keyword_id,City_id,State_id,Country_id,Keyword,Impresiones,Clicks,Cpc,QualityScore,City,County,Country,Impresiones_q1,Impresiones_q3,Impresiones_med,Impresiones_iqr,Impresiones_max,Impresiones_min,Clicks_q1,Clicks_q3,Clicks_med,Clicks_iqr,Clicks_max,Clicks_min,Cpc_q1,Cpc_q3,Cpc_med,Cpc_iqr,Cpc_max,Cpc_min,Qs_q1,Qs_q3,Qs_med,Qs_iqr,Qs_max,Qs_min
0,20230701,1018571837,17309187968,138000000000.0,299000000000.0,1023619,21168,2840,5th third bank cd rates,4,5,9000000,,Chardon,Ohio,United States,1,1,1,0,2794,0,0,0,0,0,148,0,160000,620000,300000,460000,999000000,10000,0,0,0,0,10,0
1,20230701,1018571837,17309187968,138000000000.0,299000000000.0,1023631,21168,2840,5th third bank cd rates,4,5,9000000,,Cleveland,Ohio,United States,1,1,1,0,2794,0,0,0,0,0,148,0,160000,620000,300000,460000,999000000,10000,0,0,0,0,10,0
2,20230701,1018571837,17309187968,138000000000.0,299000000000.0,1016367,21147,2840,5th third bank cd rates,5,5,9000000,,Chicago,Illinois,United States,1,1,1,0,2794,0,0,0,0,0,148,0,160000,620000,300000,460000,999000000,10000,0,0,0,0,10,0
3,20230701,1018571837,17309187968,138000000000.0,299000000000.0,1027744,21180,2840,5th third bank cd rates,4,5,9000000,,Seattle,Washington,United States,1,1,1,0,2794,0,0,0,0,0,148,0,160000,620000,300000,460000,999000000,10000,0,0,0,0,10,0
4,20230701,1119272776,20325090181,154000000000.0,2240000000000.0,1016359,21147,2840,p n c bank c d rates,3,5,6130000,,Champaign,Illinois,United States,1,1,1,0,2794,0,0,0,0,0,148,0,160000,620000,300000,460000,999000000,10000,0,0,0,0,10,0


# Processing DataFrame for Date Conversion

This segment of code processes the DataFrame to convert the 'Fecha_id' column to ISO date format using the `DataManager` class. It includes error handling to manage any issues that arise during the data processing.

### Code Explanation
- Attempt to process the DataFrame to convert the 'Fecha_id' column to ISO format.
- Print a success message and display the first few rows of the processed DataFrame.
- Catch and print any errors that occur during the data processing.


In [5]:
# Process the DataFrame to convert 'Fecha_id' to ISO format
try:
    processed_df = data_manager.process_data(df, date_col='Fecha_id')  # Process the data to convert 'Fecha_id' to ISO format
    print("Data processed successfully!")  # Print success message
    display(processed_df.head())  # Display the first few rows of the processed DataFrame
except Exception as e:
    print(f"Error processing data: {e}")  # Print error message if data processing fails

Data processed successfully!


Unnamed: 0,Fecha_id,Cuenta_id,Campaña_id,Grupo_id,Keyword_id,City_id,State_id,Country_id,Keyword,Impresiones,Clicks,Cpc,QualityScore,City,County,Country,Impresiones_q1,Impresiones_q3,Impresiones_med,Impresiones_iqr,Impresiones_max,Impresiones_min,Clicks_q1,Clicks_q3,Clicks_med,Clicks_iqr,Clicks_max,Clicks_min,Cpc_q1,Cpc_q3,Cpc_med,Cpc_iqr,Cpc_max,Cpc_min,Qs_q1,Qs_q3,Qs_med,Qs_iqr,Qs_max,Qs_min
0,2023-07-01,1018571837,17309187968,138000000000.0,299000000000.0,1023619,21168,2840,5th third bank cd rates,4,5,9000000,,Chardon,Ohio,United States,1,1,1,0,2794,0,0,0,0,0,148,0,160000,620000,300000,460000,999000000,10000,0,0,0,0,10,0
1,2023-07-01,1018571837,17309187968,138000000000.0,299000000000.0,1023631,21168,2840,5th third bank cd rates,4,5,9000000,,Cleveland,Ohio,United States,1,1,1,0,2794,0,0,0,0,0,148,0,160000,620000,300000,460000,999000000,10000,0,0,0,0,10,0
2,2023-07-01,1018571837,17309187968,138000000000.0,299000000000.0,1016367,21147,2840,5th third bank cd rates,5,5,9000000,,Chicago,Illinois,United States,1,1,1,0,2794,0,0,0,0,0,148,0,160000,620000,300000,460000,999000000,10000,0,0,0,0,10,0
3,2023-07-01,1018571837,17309187968,138000000000.0,299000000000.0,1027744,21180,2840,5th third bank cd rates,4,5,9000000,,Seattle,Washington,United States,1,1,1,0,2794,0,0,0,0,0,148,0,160000,620000,300000,460000,999000000,10000,0,0,0,0,10,0
4,2023-07-01,1119272776,20325090181,154000000000.0,2240000000000.0,1016359,21147,2840,p n c bank c d rates,3,5,6130000,,Champaign,Illinois,United States,1,1,1,0,2794,0,0,0,0,0,148,0,160000,620000,300000,460000,999000000,10000,0,0,0,0,10,0


# Creating and Uploading a Zip File from the Processed DataFrame

This segment of code demonstrates how to create a zip file from a processed DataFrame and upload it to a server. It uses utility functions to zip the DataFrame and the `DatasetManager` class to upload the dataset. Error handling is included to manage any issues during these processes.

### Code Explanation
- Create a zip file from the processed DataFrame.
- Print the path of the created zip file.
- Attempt to upload the zip file as a dataset to the server.
- Print success or failure messages based on the upload result.


In [6]:
'''

# Create a zip file from the processed DataFrame
try:
    zip_filename = 'processed_dataset.zip'  # Define the name for the zip file
    zip_path = Utility.create_zip_file(processed_df, zip_filename)  # Create a zip file from the processed DataFrame
    print(f"Zip file created at: {zip_path}")  # Print the path of the created zip file
except Exception as e:
    print(f"Error creating zip file: {e}")  # Print error message if zip file creation fails

# Add the dataset to the server
dataset_name = "Processed Dataset"  # Define the name for the dataset
try:
    success, result = dataset_manager.add_dataset(zip_path, dataset_name)  # Attempt to add the dataset to the server
    if success:
        print(f"Dataset added successfully with ID: {result}")  # Print success message with dataset ID
    else:
        print(f"Failed to add dataset: {result}")  # Print failure message with result details
except Exception as e:
    print(f"Error adding dataset: {e}")  # Print error message if dataset addition fails

'''

'\n\n# Create a zip file from the processed DataFrame\ntry:\n    zip_filename = \'processed_dataset.zip\'  # Define the name for the zip file\n    zip_path = Utility.create_zip_file(processed_df, zip_filename)  # Create a zip file from the processed DataFrame\n    print(f"Zip file created at: {zip_path}")  # Print the path of the created zip file\nexcept Exception as e:\n    print(f"Error creating zip file: {e}")  # Print error message if zip file creation fails\n\n# Add the dataset to the server\ndataset_name = "Processed Dataset"  # Define the name for the dataset\ntry:\n    success, result = dataset_manager.add_dataset(zip_path, dataset_name)  # Attempt to add the dataset to the server\n    if success:\n        print(f"Dataset added successfully with ID: {result}")  # Print success message with dataset ID\n    else:\n        print(f"Failed to add dataset: {result}")  # Print failure message with result details\nexcept Exception as e:\n    print(f"Error adding dataset: {e}")  # Print

# Retrieving and Displaying the List of Datasets

This segment of code retrieves the list of datasets from the server using the `DatasetManager` class. It then displays the retrieved datasets in a DataFrame format. Error handling is included to manage any issues during the retrieval process.

### Code Explanation
- Attempt to retrieve the list of datasets using the `DatasetManager`.
- Print a success message and display the DataFrame if datasets are retrieved successfully.
- Print a failure message if the retrieval fails.
- Catch and print any errors that occur during the retrieval process.


In [7]:
# Get the list of datasets
try:
    df_datasets = dataset_manager.get_database_list()  # Retrieve the list of datasets
    if df_datasets is not None:
        print("Datasets retrieved successfully!")  # Print success message
        display(df_datasets)  # Display the DataFrame containing the datasets
    else:
        print("Failed to retrieve datasets.")  # Print failure message if no datasets are retrieved
except Exception as e:
    print(f"Error retrieving datasets: {e}")  # Print error message if dataset retrieval fails


Datasets retrieved successfully!


Unnamed: 0,id,userId,name,nTables,lastModifiedDate
0,0,0,Museums,6,2023-11-06T10:34:36.196Z
1,1,0,JOT BC,7,2023-11-06T13:11:29.481Z
2,2,0,SN BC,8,2023-11-15T09:51:35.102Z
3,3,0,InterTwino,7,2023-12-15T13:31:24.769Z
4,13,0,JOT_May2,3,2024-05-06T09:34:53.132Z
5,19,0,JOT_data_Updated,1,2024-05-13T16:07:21.307Z
6,20,0,New_JOT_Update,2,2024-05-17T10:33:20.178Z
7,21,0,All_Cases,10,2024-05-29T11:07:13.489Z
8,22,0,JOT_data_Updated,1,2024-05-20T15:55:08.590Z
9,28,0,JOT_Tutorial,1,2024-05-22T13:09:13.192Z


# Adding a Table to a Dataset

This segment of code demonstrates how to add a table to an existing dataset on the server using the `DatasetManager` class. It specifies the dataset ID and table name, then attempts to add the DataFrame as a table to the specified dataset. Error handling is included to manage any issues that arise during this process.

### Code Explanation
- Define the dataset ID and table name.
- Attempt to add the DataFrame as a table to the specified dataset using the `DatasetManager`.
- Catch and print any errors that occur during the process.

In [8]:
# Add the table to the dataset
dataset_id = "30"  # Replace with the actual dataset ID
table_name = "New_Table3"  # Define the name of the new table to add

try:
    dataset_manager.add_table_to_dataset(dataset_id, df, table_name)  # Attempt to add the DataFrame as a table to the dataset
    print(f"Table '{table_name}' added to dataset ID {dataset_id} successfully.")  # Print success message
except Exception as e:
    print(f"Error adding table to dataset: {e}")  # Print error message if adding table fails


Table added successfully!
New table added: ID: 105, Name: New_Table3


# Listing Tables in a Dataset

This segment of code retrieves and lists the tables within a specified dataset using the `DatasetManager` class. It specifies the dataset ID and attempts to retrieve the list of tables. Error handling is included to manage any issues that arise during this process.

### Code Explanation
- Define the dataset ID.
- Attempt to list the tables in the specified dataset using the `DatasetManager`.
- Catch and print any errors that occur during the process.


In [3]:
# List tables in the dataset
dataset_id = "30"  # Replace with the actual dataset ID

try:
    dataset_manager.list_tables_in_dataset(dataset_id) # Attempt to list the tables in the dataset
except Exception as e:
    print(f"Error listing tables in dataset: {e}")

Tables in dataset 30:
ID: 102, Name: data
ID: 103, Name: New_Table
ID: 105, Name: New_Table3


# Deleting a Table from a Dataset

This segment of code demonstrates how to delete a specific table from an existing dataset on the server using the `DatasetManager` class. It specifies the dataset ID and the table name to be deleted, then attempts to remove the table. Error handling is included to manage any issues that arise during this process.

> **⚠️ Warning:**
> Deleting a table is a permanent action and cannot be undone. Ensure that you have verified the table name and dataset ID before performing this operation.

### Code Explanation
- Define the dataset ID and the table name to be deleted.
- Attempt to delete the specified table from the dataset using the `DatasetManager`.
- Catch and print any errors that occur during the deletion process.


In [10]:
'''
# Delete a table from the dataset
table_name_to_delete = "New_Table2"  # Replace with the actual table name

try:
    dataset_manager.delete_table(dataset_id, table_name_to_delete)  # Attempt to delete the specified table from the dataset
    print(f"Table '{table_name_to_delete}' deleted successfully from dataset ID {dataset_id}.")  # Print success message
except Exception as e:
    print(f"Error deleting table: {e}")  # Print error message if deleting the table fails

'''

'\n# Delete a table from the dataset\ntable_name_to_delete = "New_Table2"  # Replace with the actual table name\ntry:\n    dataset_manager.delete_table(dataset_id, table_name_to_delete)\nexcept Exception as e:\n    print(f"Error deleting table: {e}")\n'

# Deleting a Dataset

This segment of code demonstrates how to delete an entire dataset from the server using the `DatasetManager` class. It specifies the dataset ID and attempts to remove the dataset. Error handling is included to manage any issues that arise during this process.

> **⚠️ Warning:**
> Deleting a dataset is a permanent action and cannot be undone. Ensure that you have verified the dataset ID before performing this operation.

### Code Explanation
- Define the dataset ID to be deleted.
- Attempt to delete the specified dataset using the `DatasetManager`.
- Print the result message from the deletion attempt.
- Catch and print any errors that occur during the deletion process.


In [10]:
'''
# Delete the dataset
dataset_id = '12'  # Replace with the actual dataset ID

try:
    message = dataset_manager.delete_dataset(dataset_id)  # Attempt to delete the specified dataset
    print(message)  # Print the result message
except Exception as e:
    print(f"Error deleting dataset: {e}")  # Print error message if deleting the dataset fails

'''

'\n# Delete the dataset\ndataset_id = \'12\'  # Replace with the actual dataset ID\n\ntry:\n    message = dataset_manager.delete_dataset(dataset_id)  # Attempt to delete the specified dataset\n    print(message)  # Print the result message\nexcept Exception as e:\n    print(f"Error deleting dataset: {e}")  # Print error message if deleting the dataset fails\n\n'

# Retrieving a Table by Name from a Dataset

This segment of code demonstrates how to retrieve a specific table from an existing dataset on the server using the `DatasetManager` class. It specifies the dataset ID and table name, then attempts to fetch the table data. Error handling is included to manage any issues that arise during this process.

### Code Explanation
- Define the dataset ID and the table name to be retrieved.
- Attempt to retrieve the specified table from the dataset using the `DatasetManager`.
- Print a success message if the table is retrieved successfully.
- Print a failure message if the table is not found in the dataset.
- Catch and print any errors that occur during the retrieval process.


In [3]:
# Retrieve a table by name from a dataset
dataset_id = "30"  # Replace with the actual dataset ID
table_name = "New_Table"  # Define the name of the table to be retrieved

try:
    table_data = dataset_manager.get_table_by_name(dataset_id, table_name)  # Attempt to retrieve the specified table from the dataset
    if table_data:
        print(f"Table '{table_name}' retrieved successfully!")  # Print success message if table is retrieved
        # No need to display the DataFrame
    else:
        print(f"Table '{table_name}' not found in the dataset.")  # Print message if table is not found
except Exception as e:
    print(f"Error retrieving table '{table_name}': {e}")  # Print error message if retrieving the table fails


Table 'New_Table' retrieved successfully!


In [12]:
#table_data

# Retrieving the List of Reconciliators

This segment of code tests the retrieval of the list of reconciliators using the `ReconciliationManager` class. It attempts to fetch and display the reconciliators in a DataFrame. Error handling is included to manage any issues that arise during the retrieval process.

### Code Explanation
- Attempt to retrieve the list of reconciliators using the `ReconciliationManager`.
- Print a success message and display the DataFrame if reconciliators are retrieved successfully.
- Print a failure message if the retrieval fails.
- Catch and print any errors that occur during the retrieval process.


In [7]:
# Test get_reconciliators_list
try:
    reconciliators_list = reconciliation_manager.get_reconciliators_list()  # Attempt to retrieve the list of reconciliators
    if reconciliators_list is not None:
        print("Reconciliators retrieved successfully!")  # Print success message
        display(reconciliators_list.head())  # Display the first few rows of the DataFrame
    else:
        print("Failed to retrieve reconciliators.")  # Print failure message if no reconciliators are retrieved
except Exception as e:
    print(f"Error retrieving reconciliators: {e}")  # Print error message if retrieving reconciliators fails

Reconciliators retrieved successfully!


Unnamed: 0,id,relativeUrl,name
0,geocodingGeonames,/dataset,Geocoding: geo coordinates (GeoNames)
1,geocodingHere,/here,Geocoding: geo coordinates (HERE)
2,geonames,/dataset,Linking: GeoNames (GeoNames)
3,wikidataAlligator,/dataset,Linking: Wikidata (Alligator)
4,wikidataOpenRefine,/wikidata,Linking: Wikidata (OpenRefine)


# Retrieving Parameters for a Specific Reconciliator

This segment of code tests the retrieval of parameters for a specific reconciliator using the `ReconciliationManager` class. It specifies the reconciliator ID and attempts to fetch its parameters. The parameters are printed if the retrieval is successful. Error handling is included to manage any issues that arise during the retrieval process.

### Code Explanation
- Define the ID of the reconciliator.
- Attempt to retrieve the parameters for the specified reconciliator using the `ReconciliationManager`.
- Print a success message if the parameters are retrieved successfully.
- Print a failure message if the retrieval fails.
- Catch and print any errors that occur during the retrieval process.


In [6]:
# Test get_reconciliator_parameters
id_reconciliator = "geocodingHere"  # Replace with the actual reconciliator ID

# Get the reconciliator parameters
try:
    params = reconciliation_manager.get_reconciliator_parameters(id_reconciliator, print_params=True)  # Attempt to retrieve parameters
    if params:
        print(f"Parameters for reconciliator '{id_reconciliator}' retrieved successfully!")  # Print success message
    else:
        print(f"Failed to retrieve parameters for reconciliator '{id_reconciliator}'.")  # Print failure message if retrieval fails
except Exception as e:
    print(f"Error retrieving parameters for reconciliator '{id_reconciliator}': {e}")  # Print error message if retrieving parameters fails

Parameters for reconciliator 'geocodingHere':
Mandatory parameters:
- table (json): Mandatory
  Description: The table data in JSON format
- columnName (string): Mandatory
  Description: The name of the column to reconcile
- idReconciliator (string): Mandatory
  Description: The ID of the reconciliator to use

Optional parameters:
- secondPart (selectColumns): Optional
  Description: Optional column to add information to support reconciliation.
  Label: Select a column with information about the location to reconcile
  Info Text: 
- thirdPart (selectColumns): Optional
  Description: Optional column to add information to support reconciliation.
  Label: Select a column with information about the location to reconcile
  Info Text: 
- fourthPart (selectColumns): Optional
  Description: Optional column to add information to support reconciliation.
  Label: Select a column with information about the location to reconcile
  Info Text: 
Parameters for reconciliator 'geocodingHere' retrieved s

# Reconciling a Column in a Table

This segment of code tests the reconciliation of a specific column in a table using the `ReconciliationManager` class. It specifies the dataset ID, table name, column name, and reconciliator ID, then attempts to reconcile the column. Error handling is included to manage any issues that arise during the reconciliation process.

### Code Explanation
- Define the table name and the column name to be reconciled.
- Define the ID of the reconciliator.
- Attempt to retrieve the table data.
- Attempt to reconcile the specified column using the `ReconciliationManager`.
- Print a success message if the column is reconciled successfully.
- Print a failure message if the reconciliation fails.
- Catch and print any errors that occur during the reconciliation process.


In [8]:
# Test reconcile
#dataset_id = "30"  # Dataset ID is commented out as it's not used in the current context
table_name = "New_Table"  # Define the name of the table to be reconciled

column_name = "City"  # Define the column name to be reconciled
id_reconciliator = "geocodingHere"  # Define the ID of the reconciliator


# Reconcile the column
try:
    reconciled_table = reconciliation_manager.reconcile(table_data, column_name, id_reconciliator)  # Attempt to reconcile the specified column
    if reconciled_table:
        print("Column reconciled successfully!")  # Print success message
        # No need to display the table, just print a success message
    else:
        print("Failed to reconcile column.")  # Print failure message if reconciliation fails
except Exception as e:
    print(f"Error reconciling column: {e}")  # Print error message if reconciliation fails



Column reconciled successfully!


In [5]:
#reconciled_table

# Push reconciliation data to the backend

In [None]:
# Push reconciliation data to the backend
try:
    response = reconciliation_manager.push_reconciliation_data_to_backend(dataset_id, table_id, reconciled_data)
    if response:
        print("Reconciliation data pushed successfully!")
except Exception as e:
    print(f"Error pushing reconciliation data to the backend: {e}")

# Extracting Row Metadata from a Reconciled Table

This segment of code demonstrates how to extract metadata for a specific row from a reconciled table using the `EvaluationManager` class. It specifies the reconciled table, the columns of interest, and the row ID, then attempts to extract the metadata for that row. Error handling is included to manage any issues that arise during the extraction process.

### Code Explanation
- Define the reconciled table data and the list of reconciled columns.
- Define the row ID for which metadata is to be extracted.
- Attempt to extract metadata for the specified row using the `EvaluationManager`.
- Print a success message and the extracted metadata if the extraction is successful.
- Catch and print any errors that occur during the extraction process.


In [9]:
# Example data
reconciled_table = reconciled_table  # Your reconciled table data
reconciled_columns = ['City']  # List of reconciled columns
row_id = 'r0'  # Replace with the desired row ID

# Extract row metadata
try:
    row_metadata = evaluation_manager.extract_row_metadata(reconciled_table, row_id, reconciled_columns)  # Attempt to extract metadata for the specified row
    print("Row metadata extracted successfully!")  # Print success message
    print(row_metadata)  # Print the extracted metadata
except Exception as e:
    print(f"Error extracting row metadata: {e}")  # Print error message if metadata extraction fails


Row metadata extracted successfully!
{'City': [{'id': 'georss:41.58339,-81.20288', 'feature': [{'id': 'all_labels', 'value': 100}], 'name': {'value': 'Chardon, OH, United States', 'uri': 'http://www.google.com/maps/place/41.58339,-81.20288'}, 'score': 1, 'match': True, 'type': [{'id': 'wd:Q29934236', 'name': 'GlobeCoordinate'}, {'id': 'georss:point', 'name': 'point'}]}]}


# Analyzing Reconciled Data

This segment of code performs several analyses on the reconciled table using the `EvaluationManager` class. It counts the number of reconciled cells per column, the number of unique reconciled values per column, and calculates the percentage of reconciled cells per column. The results are printed for each analysis.

### Code Explanation
- Define the reconciled table and columns of interest.
- Count the number of reconciled cells per column.
- Count the number of unique reconciled values per column.
- Calculate the percentage of reconciled cells per column.
- Print the results for each analysis.


In [19]:
# Count reconciled cells per column
reconciled_cell_counts = evaluation_manager.count_reconciled_cells_per_column(reconciled_table['raw'], reconciled_columns)  # Count reconciled cells
print("Reconciled cells per column:")
print(reconciled_cell_counts)  # Print the count of reconciled cells


Reconciled cells per column:
{'City': 18}


In [20]:
# Count unique reconciled values per column
unique_reconciled_values = evaluation_manager.count_unique_reconciled_values_per_column(reconciled_table['raw'], reconciled_columns)  # Count unique reconciled values
print("Unique reconciled values per column:")
print(unique_reconciled_values)  # Print the count of unique reconciled values


Unique reconciled values per column:
{'City': 86}


In [21]:
# Calculate percentage of reconciled cells per column
reconciled_cell_percentages = evaluation_manager.percentage_reconciled_cells_per_column(reconciled_table['raw'], reconciled_columns)  # Calculate percentage of reconciled cells
print("Percentage of reconciled cells per column:")
print(reconciled_cell_percentages)  # Print the percentage of reconciled cells

Percentage of reconciled cells per column:
{'City': 100.0}


# Retrieving the List of Extenders

This segment of code retrieves the list of extenders using the `ExtensionManager` class. It attempts to fetch and display the extenders in a DataFrame. Error handling is included to manage any issues that arise during the retrieval process.

### Code Explanation
- Attempt to retrieve the list of extenders using the `ExtensionManager`.
- Print a success message and display the DataFrame if extenders are retrieved successfully.
- Print a failure message if the retrieval fails.
- Catch and print any errors that occur during the retrieval process.


In [6]:
# Get Extender List
try:
    extenders_list = extension_manager.get_extenders_list()  # Attempt to retrieve the list of extenders
    if extenders_list is not None:
        print("Extenders retrieved successfully!")  # Print success message
        display(extenders_list.head())  # Display the first few rows of the DataFrame
    else:
        print("Failed to retrieve extenders.")  # Print failure message if no extenders are retrieved
except Exception as e:
    print(f"Error retrieving extenders: {e}")  # Print error message if retrieving extenders fails

Extenders retrieved successfully!


Unnamed: 0,id,relativeUrl,name
0,geoPropertiesWikidata,/wikidata/entities,Geo Properties (Wikidata)
1,geoRouteHere,,Geo Route (HERE)
2,meteoPropertiesOpenMeteo,,Meteo Properties (OpenMeteo)
3,reconciledColumnExt,,Annotation properties
4,reconciledColumnExtWikidata,/entity/labels,Annotation properties (Wikidata)


# Retrieving Parameters for a Specific Extender

This segment of code tests the retrieval of parameters for a specific extender using the `ExtensionManager` class. It specifies the extender ID and attempts to fetch its parameters. The parameters are printed if the retrieval is successful. Error handling is included to manage any issues that arise during the retrieval process.

### Code Explanation
- Define the ID of the extender.
- Attempt to retrieve the parameters for the specified extender using the `ExtensionManager`.
- Print a success message if the parameters are retrieved successfully.
- Print a failure message if the retrieval fails.
- Catch and print any errors that occur during the retrieval process.


In [7]:
# Test get_extender_parameters
id_extender = "meteoPropertiesOpenMeteo"  # Replace with the actual extender ID

# Get the extender parameters
try:
    params = extension_manager.get_extender_parameters(id_extender, print_params=True)  # Attempt to retrieve parameters
    if params:
        print(f"Parameters for extender '{id_extender}' retrieved successfully!")  # Print success message
    else:
        print(f"Failed to retrieve parameters for extender '{id_extender}'.")  # Print failure message if retrieval fails
except Exception as e:
    print(f"Error retrieving parameters for extender '{id_extender}': {e}")  # Print error message if retrieving parameters fails

Parameters for extender 'meteoPropertiesOpenMeteo':
Mandatory parameters:
- dates (selectColumns): Mandatory
  Description: Select a column with the days on which to retrieve the weather data:
  Label: Select a column with days in ISO8601 format (yyyy-mm-dd)
  Info Text: Only dates prior to 10 days are covered (ISO8601 format yyyy-mm-dd)
  Options: []

- weatherParams (checkbox): Mandatory
  Description: Select one or more <b>weather</b> parameters:
  Label: Weather parameters
  Info Text: Meteo parameters to extend the table
  Options: [{'id': 'apparent_temperature_max', 'label': 'Maximum daily apparent temperature in °C', 'value': 'apparent_temperature_max'}, {'id': 'apparent_temperature_min', 'label': 'Minimum daily apparent temperature in °C', 'value': 'apparent_temperature_min'}, {'id': 'precipitation_sum', 'label': 'Sum of daily precipitation (including rain, showers and snowfall) in mm', 'value': 'precipitation_sum'}, {'id': 'precipitation_hours', 'label': 'The number of hours w

# Extending a Column with Additional Properties

This segment of code tests the extension of a column in the reconciled table using the `ExtensionManager` class. It specifies the column containing reconciled IDs, the properties to extend, the new column names, the date column, and the extender ID. The properties are added to the DataFrame, creating new columns. Error handling is included to manage any issues that arise during the extension process.

### Code Explanation
- Define the column containing reconciled IDs.
- Specify the properties to be added and their corresponding new column names.
- Define the date column name.
- Define the ID of the extender.
- Attempt to extend the specified column using the `ExtensionManager`.
- Print a success message if the column is extended successfully.
- Print a failure message if the extension fails.
- Catch and print any errors that occur during the extension process.


In [10]:
# Extend Column
reconciliated_column_name = 'City'  # Column that contains reconciled IDs
properties = ["property1", "property2"]  # Replace with actual properties to extend
new_columns_name = ["Apparent_Max_Temperature", "Apparent_Min_Temperature", "Total_Precipitation"]  # Replace with actual new column names
date_column_name = "Fecha_id"  # Replace with actual date column name
id_extender = "meteoPropertiesOpenMeteo"  # ID for Open Meteo Properties extender
weather_params = ["apparent_temperature_max", "apparent_temperature_min", "precipitation_sum"]  # Replace with actual weather parameters

try:
    extended_table = extension_manager.extend_column(
        reconciled_table['raw'], 
        reconciliated_column_name, 
        id_extender, 
        properties, 
        new_columns_name, 
        date_column_name, 
        weather_params
    )  # Attempt to extend the specified column
    if extended_table:
        print("Column extended successfully!")  # Print success message
        # No need to display the table, just print a success message
    else:
        print("Failed to extend column.")  # Print failure message if extension fails
except Exception as e:
    print(f"Error extending column: {e}")  # Print error message if extending column fails


Column extended successfully!


In [13]:
extended_table

{'table': {'id': '103',
  'idDataset': '30',
  'name': 'New_Table',
  'nCols': 40,
  'nRows': 18,
  'nCells': 720,
  'nCellsReconciliated': 0,
  'lastModifiedDate': '2024-05-30T12:12:53.505Z'},
 'columns': {'Fecha_id': {'id': 'Fecha_id',
   'label': 'Fecha_id',
   'status': 'empty',
   'context': {},
   'metadata': []},
  'Cuenta_id': {'id': 'Cuenta_id',
   'label': 'Cuenta_id',
   'status': 'empty',
   'context': {},
   'metadata': []},
  'Campaña_id': {'id': 'Campaña_id',
   'label': 'Campaña_id',
   'status': 'empty',
   'context': {},
   'metadata': []},
  'Grupo_id': {'id': 'Grupo_id',
   'label': 'Grupo_id',
   'status': 'empty',
   'context': {},
   'metadata': []},
  'Keyword_id': {'id': 'Keyword_id',
   'label': 'Keyword_id',
   'status': 'empty',
   'context': {},
   'metadata': []},
  'City_id': {'id': 'City_id',
   'label': 'City_id',
   'status': 'empty',
   'context': {},
   'metadata': []},
  'State_id': {'id': 'State_id',
   'label': 'State_id',
   'status': 'empty',
  

# Converting Extended Data to DataFrame

This segment of code demonstrates how to convert the extended data returned as a JSON object into a Pandas DataFrame using the `Utility` class. It assumes that the extended data is in JSON format and uses a utility function to load it into a DataFrame. Error handling is included to manage any issues that arise during the conversion process.

### Code Explanation
- Import the `Utility` class.
- Attempt to convert the JSON object containing extended data into a Pandas DataFrame using `Utility.load_json_to_dataframe`.
- Print a success message and display the first few rows of the DataFrame if the conversion is successful.
- Catch and print any errors that occur during the conversion process.


In [11]:
# Import the Utility class
from semtui_refactored.utils import Utility

# Assuming extended_table is the JSON object you got from the extend_column method
try:
    # Convert JSON to DataFrame
    extended_df = Utility.load_json_to_dataframe(extended_table, georeference_data=True)  # Load the extended data into a DataFrame
    print("Extended data loaded into DataFrame successfully!")  # Print success message
    
    # Display the DataFrame
    display(extended_df.head())  # Display the first few rows of the DataFrame
except Exception as e:
    print(f"Error loading extended data into DataFrame: {e}")  # Print error message if loading into DataFrame fails

Extended data loaded into DataFrame successfully!


Unnamed: 0,Fecha_id,Cuenta_id,Campaña_id,Grupo_id,Keyword_id,City_id,State_id,Country_id,Keyword,Impresiones,...,Qs_med,Qs_iqr,Qs_max,Qs_min,Apparent_Max_Temperature,Apparent_Min_Temperature,Total_Precipitation,City URI,Latitude,Longitude
0,2023-07-01,1018571837,17309187968,138000000000.0,299000000000.0,1023619,21168,2840,5th third bank cd rates,4,...,0,0,10,0,299,203,106,"http://www.google.com/maps/place/41.58339,-81....",41.58339,-81.20288
1,2023-07-01,1018571837,17309187968,138000000000.0,299000000000.0,1023631,21168,2840,5th third bank cd rates,4,...,0,0,10,0,307,224,51,"http://www.google.com/maps/place/41.50473,-81....",41.50473,-81.69074
2,2023-07-01,1018571837,17309187968,138000000000.0,299000000000.0,1016367,21147,2840,5th third bank cd rates,5,...,0,0,10,0,324,228,162,"http://www.google.com/maps/place/41.88425,-87....",41.88425,-87.63245
3,2023-07-01,1018571837,17309187968,138000000000.0,299000000000.0,1027744,21180,2840,5th third bank cd rates,4,...,0,0,10,0,25,113,0,"http://www.google.com/maps/place/47.60357,-122...",47.60357,-122.32945
4,2023-07-01,1119272776,20325090181,154000000000.0,2240000000000.0,1016359,21147,2840,p n c bank c d rates,3,...,0,0,10,0,349,231,122,"http://www.google.com/maps/place/40.1142,-88.2435",40.1142,-88.2435


# Analyzing Extended Data

This segment of code performs several analyses on the extended table using the `EvaluationManager` class. It counts the number of extended cells per column, the number of unique extended values per column, and calculates the percentage of extended cells per column. The results are printed for each analysis.

### Code Explanation
- Define the list of extended columns.
- Count the number of extended cells per column.
- Count the number of unique extended values per column.
- Calculate the percentage of extended cells per column.
- Print the results for each analysis.


In [16]:
# Define the list of extended columns
extended_columns = ['Apparent_Max_Temperature', 'Apparent_Min_Temperature', 'Total_Precipitation']

# Count extended cells per column
extended_cell_counts = evaluation_manager.count_extended_cells_per_column(extended_table, extended_columns)  # Count extended cells
print("Extended cells per column:")
print(extended_cell_counts)  # Print the count of extended cells


Extended cells per column:
{'Apparent_Max_Temperature': 18, 'Apparent_Min_Temperature': 18, 'Total_Precipitation': 18}


In [18]:
# Count unique extended values per column
unique_extended_values = evaluation_manager.count_unique_extended_values_per_column(extended_table, extended_columns)  # Count unique extended values
print("Unique extended values per column:")
print(unique_extended_values)  # Print the count of unique extended values

Unique extended values per column:
{'Apparent_Max_Temperature': 18, 'Apparent_Min_Temperature': 18, 'Total_Precipitation': 12}


In [19]:
# Calculate percentage of extended cells per column
extended_cell_percentages = evaluation_manager.percentage_extended_cells_per_column(extended_table, extended_columns)  # Calculate percentage of extended cells
print("Percentage of extended cells per column:")
print(extended_cell_percentages)  # Print the percentage of extended cells

Percentage of extended cells per column:
{'Apparent_Max_Temperature': 100.0, 'Apparent_Min_Temperature': 100.0, 'Total_Precipitation': 100.0}


# Download the data as CSV

In [12]:
# Define the CSV file path
csv_file_path = "extended_data.csv"

try:
    # Save the DataFrame to a CSV file
    extended_df.to_csv(csv_file_path, index=False)  # Save DataFrame to CSV
    print(f"Extended data saved to {csv_file_path} successfully!")  # Print success message for saving CSV
except Exception as e:
    print(f"Error saving extended data to CSV: {e}")  # Print error message if saving to CSV fails


Extended data saved to extended_data.csv successfully!
