# Waston Assistant Logs Notebook

## Introduction
This notebook demonstrates how to download Watson Assistant user-generated logs based on different criteria.

### Programming language and environment
Some familiarity with Python is recommended. This notebook runs on Python 3.6

<a id="setup"></a>
## 1. Configuration and Setup

In this section, we add data and workspace access credentials, import required libraries and functions.

### <a id="python"></a> 1.1 Install Assistant Improve Toolkit

In [None]:
!pip3 install --user --upgrade "assistant-improve-toolkit";

### <a id="function"></a> 1.2 Import functions used in the notebook

In [None]:
# Import Watson Assistant related functions
from ibm_cloud_sdk_core.authenticators import BasicAuthenticator
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
import pandas as pd
import json
from ibm_watson import AssistantV1

from assistant_improve_toolkit.watson_assistant_func import get_logs
from assistant_improve_toolkit.watson_assistant_func import get_assistant_definition
from assistant_improve_toolkit.watson_assistant_func import load_logs_from_file
from assistant_improve_toolkit.watson_assistant_func import export_csv_for_intent_recommendation

## <a id="load"></a> 2. Load and format data 

### 2.1 Add Watson Assistant configuration

This notebook uses the Watson Assistant v1 API to access your skill definition and your logs. Provide your Watson Assistant credentials and the workspace id that you want to fetch data from.

You can access the values you need for this configuration from the Watson Assistant user interface. Go to the Skills page and select View API Details from the menu of a skill tile.

- The string to set in the call to IAMAuthenticator is your Api Key under Service Credentials
- The string to set for version is a date in the format version=YYYY-MM-DD. The version date string determines which version of the Watson Assistant V1 API will be called. For more information about version, see [Versioning](https://cloud.ibm.com/apidocs/assistant/assistant-v1#versioning).
- The string to pass into assistant.set_service_url is the portion of the Legacy v1 Workspace URL that ends with /api. For example, https://gateway.watsonplatform.net/assistant/api. This value will be different depending on the location of your service instance. Do not pass in the entire Workspace URL.

In [None]:
# Provide credentials to connect to assistant
authenticator = IAMAuthenticator('API_KEY')
sdk_object = AssistantV1(version='2020-04-01', authenticator=authenticator)
sdk_object.set_service_url('URL')

Add the information of your assistant. To load the skill of an assistant in the next section, you need to provide either Workspace ID or Skill ID. The values can be found on the View API Details page. If you are using versioning in Watson Assistant, this ID represents the Development version of your skill definition.

For more information about authentication and finding credentials in the Watson Assistant UI, please see [Watson Assistant v1 API](https://cloud.ibm.com/apidocs/assistant/assistant-v1) in the offering documentation.


In [None]:
assistant_information = {'workspace_id' : '',
                         'skill_id' : '',
                         'assistant_id' : ''}

### 2.2 Fetch and load logs

- `num_logs`: number of logs to fetch
- Use `filename` to specify if logs are saved as a JSON file (default: `None`)
- Apply `filters` while fetching logs (default: `[]`), e.g.,
    - removing empty input: `meta.summary.input_text_length_i>0`
    - fetching logs generated after a timestamp: `response_timestamp>=2018-09-18`
  
  Refer to [Filter query reference](https://cloud.ibm.com/docs/services/assistant?topic=assistant-filter-reference) for
  more information.
- Use `project` to specify project when using Watson Studio (default: `None`)
- Use `overwrite` to overwrite if `filename` exists (default: `False`)


__A. Download all logs for a period of time (and save as a JSON file for Measure notebook)__

In [None]:
# Add filter queries
filters = ['language::en', # Logs in English
           'meta.summary.input_text_length_i>0', # Logs with non empty input 
           'response_timestamp>=2020-03-01'] # Logs with response timestamp later or equal to 2020-03-01

# Query 20,000 logs using default filename 
logs = get_logs(sdk_object, assistant_information, num_logs=20000, filters=filters, filename='logs.json')

__B. Download and export logs for intent recommendation__

For intent recommendation, by default, an utterance is considered only when:
- It is the first user utterance in each conversation
- its confidence `response.intents::confidence` is between 0.1 and 0.6 (exclusive),
- its token count is between 3 and 20 (exclusive), and
- it is not a duplicate of the other utterances in the logs.

This example adds confidence filters when calling `get_logs`, and then exports the utterances to a CSV file by calling
`export_csv_for_intent_recommendation` with token count filter and dedeplication applied.


In [None]:
# Add filter queries
filters = ['language::en', # Logs in English
           'request.context.system.dialog_turn_counter::1', # The first user utterance in each conversation
           'response.intents:confidence<0.6', # filter out high intent confidence utterance
           'response.intents:confidence>0.1', # filter out low intent confidnce utterance
          ]

# Query 20,000 logs using filename 'log_first_utterances.json'
logs = get_logs(sdk_object,
                assistant_information,
                num_logs=20000,
                filters=filters,
                filename='log_for_intent_recommendation.json')

# Or, load previously saved logs.
logs = load_logs_from_file(filename='log_for_intent_recommendation.json')

Export logs to a CSV file for intent recommendation

- `logs`: the logs object from `get_logs` or `load_logs_from_file`
- `filename`: the CSV output filename
- Use `deduplicate` to specify if duplicate messages should be removed (default: `True`)
- Use `project` to specify project when using Watson Studio (default: `None`)
- Use `overwrite` to overwrite if `filename` exists (default: `False`)
- Use `min_length` to filter out utterances that are less than certain number of tokens (exclusive, default: `3`)
- Use `max_length` to filter out utterances that are more than certain number of tokens (exclusive, default: `20`)

In [None]:
export_csv_for_intent_recommendation(logs,
                                     filename='log_for_intent_recommendation.csv',
                                     deduplicate=True,
                                     min_length=3,
                                     max_length=20,
                                     overwrite=False)

__C. More examples__

Download logs of the first user utterance in each conversation for a period of time

In [None]:
# Add filter queries
filters = ['language::en', # Logs in English 
           'request.context.system.dialog_turn_counter::1', # The first user utterance in each conversation
           'response_timestamp>=2020-03-01'] # Logs with response timestamp later or equal to 2020-03-01

# Query 20,000 logs using filename 'log_first_utterances.json'
logs = get_logs(sdk_object, assistant_information, num_logs=20000, filters=filters, filename='log_first_utterances.json')

Download logs containing specific input text

In [None]:
# Add filter queries
filters = ['language::en', # Logs in English
           'request.input.text::"Is there an article on how to make cherry pie?"'] # Logs with input text: "Is there an article on how to make cherry pie?"

# Query 20,000 logs using filename 'log_input.json'
logs = get_logs(sdk_object, assistant_information, num_logs=20000, filters=filters, filename='log_input.json')

Download logs trigging specific intent

In [None]:
# Add filter queries
filters = ['language::en', # Logs in English
           'response.intents:intent::"article_food"']  # Intent been triggered: article_food
# Query 20,000 logs using filename log_intent.json
logs = get_logs(sdk_object, assistant_information, num_logs=20000, filters=filters, filename='log_intent.json')

Download logs trigging specific intent with a confidence range

In [None]:
# Add filter queries
filters = ['language::en', # Logs in English
           'response.intents:(intent:article_food,confidence<0.25)']  # Intent been triggered: article_food with confidence below 0.25
# Query 20,000 logs using filename log_intent.json
logs = get_logs(sdk_object, assistant_information, num_logs=20000, filters=filters, filename='log_intent_confidence.json')

Download logs visited specific node

In [None]:
# Fetch assistant definition and save to a JSON file
df_assistant = get_assistant_definition(sdk_object, assistant_information, filename='assistant_definition.json')

# Get all intents
assistant_intents = [intent['intent'] for intent in df_assistant['intents'].values[0]] 

# Get all dialog nodes
assistant_nodes = pd.DataFrame(df_assistant['dialog_nodes'].values[0])

# Find mappings betweeen node name and node id
node_title_map = dict()
for idx, node in assistant_nodes.iterrows():
    if str(node['title']) != 'nan':
        node_title_map[node['title']] = node['dialog_node']
node_df = pd.DataFrame(node_title_map.items())
node_df.columns = {'node_name', 'node_id'}

# Add filter queries
filters = ['language::en', # Logs in English
           'response.output:nodes_visited::[{}]'.format(node_title_map['book_short_dialog'])]  # Visited node: book_short_dialog
# Query 20,000 logs using filename log_node.json
logs = get_logs(sdk_object, assistant_information, num_logs=20000, filters=filters, filename='log_node.json')

Copyright © 2020 IBM. This notebook and its source code are released under the terms of the MIT License.