# Waston Assistant Logs Notebook

## Introduction
This notebook demonstrates how to download Watson Assistant user-generated logs based on different criteria.

### Programming language and environment
Some familiarity with Python is recommended. This notebook runs on Python 3.6

<a id="setup"></a>
## 1. Configuration and Setup

In this section, we add data and workspace access credentials, import required libraries and functions.

### <a id="python"></a> 1.1 Install required Python libraries

In [None]:
!pip3 install --user --upgrade "ibm-watson==4.1.0";
!pip3 install --user --upgrade "pandas==1.0.1";
!curl -O https://raw.githubusercontent.com/watson-developer-cloud/assistant-improve-recommendations-notebook/logs/src/main/python/watson_assistant_func.py

### <a id="function"></a> 1.2 Import functions used in the notebook

In [None]:
# Import Watson Assistant related functions
from ibm_cloud_sdk_core.authenticators import BasicAuthenticator
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
import pandas as pd
import json
from ibm_watson import AssistantV1

from watson_assistant_func import get_logs_filter, get_assistant_definition

## <a id="load"></a> 2. Load and format data 

### 2.1 Add Watson Assistant configuration

Provide the service credentials used for accessing your Watson Assistant instance.  

- For more information about obtaining Watson Assistant credentials, see [Service credentials for Watson services](https://console.bluemix.net/docs/services/watson/getting-started-credentials.html#creating-credentials).
- API requests require a version parameter that takes a date in the format version=YYYY-MM-DD. For more information about version, see [Versioning](https://www.ibm.com/watson/developercloud/assistant/api/v1/curl.html?curl#versioning).

In [None]:
# For stage testing workspace:
authenticator = BasicAuthenticator('apikey', 'EABmPgmkgrOZBWOzhfL5n_4TAuldiJ-5_4pgonWYr4jr')
sdk_object = AssistantV1(version='2020-02-05', authenticator=authenticator)
sdk_object.set_service_url('https://api.us-south.assistant.test.watson.cloud.ibm.com/')

Specify your assistant information.

In [None]:
# For stage testing workspace:
assistant_information = {'workspace_id' : '',
                         'skill_id' : 'd56b5c28-bb69-471e-aa5b-f97bc0036efd',
                         'assistant_id' : '47f8cfcf-2f3b-46e9-be6f-faaaca7f1f7b'}

### 2.2 Fetch and load logs

- Use `num_logs` to specify number of logs to download 
- Use `filename` to specify the downloaded file name or ignore to use default filename (format: logs_assistant_[assisant_id]_skill_[skill_id].json)
- Apply `filters` while fetching logs, e.g.,
    - removing empty input: `meta.summary.input_text_length_i>0`
    - fetching logs generated after a timestamp: `response_timestamp>=2018-09-18`
- Logs are also stored as Pandas dataframe in `df_logs`

Refer to [Filter query reference](https://cloud.ibm.com/docs/services/assistant?topic=assistant-filter-reference) for more information.

__A. Download all logs for a period of time__

In [None]:
print('Fetching logs ... ')
# Add filter queries
filters = ['language::en', # Logs in English 
           'meta.summary.input_text_length_i>0', # Logs with non empty input 
           'response_timestamp>=2020-03-01'] # Logs with response timestamp later or equal to 2020-03-01

# Query 20,000 logs using default filename 
df_logs = get_logs_filter(sdk_object, assistant_information, num_logs=20000, filters=filters, reset=True, generate_csv=True)

__B. Download logs of the first user utterance in each conversation for a period of time__

In [None]:
print('Fetching logs ... ')
# Add filter queries
filters = ['language::en', # Logs in English 
           'request.context.system.dialog_turn_counter::1', # The first user utterance in each conversation 
           'response_timestamp>=2020-03-01'] # Logs with response timestamp later or equal to 2020-03-01

# Query 20,000 logs using filename 'log_first_utterances.json'
df_logs = get_logs_filter(sdk_object, assistant_information, num_logs=20000, filters=filters, filename='log_first_utterances.json', generate_csv=True)

__C. Download logs containing specific input text__

In [None]:
print('Fetching logs ... ')
# Add filter queries
filters = ['language::en', # Logs in English
           'request.input.text::"Is there an article on how to make cherry pie?"'] # Logs with input text: "Is there an article on how to make cherry pie?"

# Query 20,000 logs using filename 'log_input.json'
df_logs = get_logs_filter(sdk_object, assistant_information, num_logs=20000, filters=filters, filename='log_input.json', generate_csv=True)

__D. Download logs trigging specific intent__

In [None]:
print('Fetching logs ... ')
# Add filter queries
filters = ['language::en', # Logs in English
           'response.intents:intent::"article_food"']  # Intent been triggered: article_food
# Query 20,000 logs using filename log_intent.json
df_logs = get_logs_filter(sdk_object, assistant_information, num_logs=20000, filters=filters, filename='log_intent.json', generate_csv=True)

__E. Download logs trigging specific intent with a confidence range__

In [None]:
print('Fetching logs ... ')
# Add filter queries
filters = ['language::en', # Logs in English
           'response.intents:(intent:article_food,confidence<0.25)']  # Intent been triggered: article_food with confidence below 0.25
# Query 20,000 logs using filename log_intent.json
df_logs = get_logs_filter(sdk_object, assistant_information, num_logs=20000, filters=filters, filename='log_intent_confidence.json', generate_csv=True)

__F. Download logs visited specific node__

In [None]:
# Fetch assistant definition and save to a JSON file
df_assistant = get_assistant_definition(sdk_object, assistant_information, filename='assistant_definition.json')

# Get all intents
assistant_intents = [intent['intent'] for intent in df_assistant['intents'].values[0]] 

# Get all dialog nodes
assistant_nodes = pd.DataFrame(df_assistant['dialog_nodes'].values[0])

# Find mappings betweeen node name and node id
node_title_map = dict()
for idx, node in assistant_nodes.iterrows():
    if str(node['title']) != 'nan':
        node_title_map[node['title']] = node['dialog_node']
node_df = pd.DataFrame(node_title_map.items())
node_df.columns = {'node_name', 'node_id'}

print('Fetching logs ... ')
# Add filter queries
filters = ['language::en', # Logs in English
           'response.output:nodes_visited::[{}]'.format(node_title_map['book_short_dialog'])]  # Visited node: book_short_dialog
# Query 20,000 logs using filename log_node.json
df_logs = get_logs_filter(sdk_object, assistant_information, num_logs=20000, filters=filters, filename='log_node.json', generate_csv=True)