## Download logs for Improvement

### Introduction
This notebook demonstrates how to download logs using different filters

In [1]:
!pip3 install --user --upgrade "ibm-watson==4.1.0";
!pip3 install --user --upgrade "pandas==1.0.1";

# Import Watson Assistant related functions
from ibm_cloud_sdk_core.authenticators import BasicAuthenticator
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
import pandas as pd
import json
from ibm_watson import AssistantV1

!curl -O https://raw.githubusercontent.com/watson-developer-cloud/assistant-improve-recommendations-notebook/logs/src/main/python/watson_assistant_func.py
from watson_assistant_func import get_logs_with_filter, get_assistant_definition


Requirement already up-to-date: ibm-watson==4.1.0 in /Users/zhezhang/.local/lib/python3.6/site-packages (4.1.0)
[33mYou are using pip version 18.1, however version 20.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
Requirement already up-to-date: pandas==1.0.1 in /Users/zhezhang/.local/lib/python3.6/site-packages (1.0.1)
[33mYou are using pip version 18.1, however version 20.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 11364  100 11364    0     0  11364      0  0:00:01 --:--:--  0:00:01 39458


### 1. Add Watson Assistant configuration

Provide the service credentials used for accessing your Watson Assistant instance.  

- For more information about obtaining Watson Assistant credentials, see [Service credentials for Watson services](https://console.bluemix.net/docs/services/watson/getting-started-credentials.html#creating-credentials).
- API requests require a version parameter that takes a date in the format version=YYYY-MM-DD. For more information about version, see [Versioning](https://www.ibm.com/watson/developercloud/assistant/api/v1/curl.html?curl#versioning).

In [2]:
# For dev testing workspace:
authenticator = BasicAuthenticator('apikey', 'LUbYW_TaiUQEgdXcQmYzUiMrzhuiUX__SF-6L1-LFjIb')
sdk_object = AssistantV1(version='2020-02-05', authenticator=authenticator)
sdk_object.set_service_url('https://api.us-south.assistant.dev.watson.cloud.ibm.com/')

Specify your assistant information.

In [3]:
# For dev testing workspace:
assistant_information = {'workspace_id' : '',
                         'skill_id' : 'c1ad23a8-37ea-40b8-bcfa-c329529774b2',
                         'assistant_id' : 'aae78540-40bb-4f7c-8d20-84a894d998b9'}

### 2. Fetch and load logs

- Use `num_logs` to specify number of logs to download 
- Use `filename` to specify the downloaded file name
- Apply `filters` while fetching logs, e.g.,
    - removing empty input: `meta.summary.input_text_length_i>0`
    - fetching logs generated after a timestamp: `response_timestamp>=2018-09-18`
- Logs are also stored as Pandas dataframe in `df_logs`


In [4]:
print('Fetching logs ... ')

# Filter to be applied while fetching logs
filters = ['language::en',
           'meta.summary.input_text_length_i>0']

df_logs = get_logs_with_filter(sdk_object, assistant_information, num_logs=20000, filters=filters, filename=None)

Fetching logs ... 
20 logs retrieved
Loading 20 logs into dataframe ...
Saving logs into test.json ... 
Completed


### 3. More filter examples
#### 3.1 Get all of the begining utterances in each conversation

In [5]:
print('Fetching logs ... ')

# Filter to be applied while fetching logs
filters = ['language::en',
           'request.context.system.dialog_turn_counter::1']

df_logs = get_logs_with_filter(sdk_object, assistant_information, num_logs=20000, filters=filters, filename='log_turn_1.json')

Fetching logs ... 
17 logs retrieved
Loading 17 logs into dataframe ...
Saving logs into log_turn_1.json ... 
Completed


#### 3.2 Get logs with specified text as input utterance

In [6]:
print('Fetching logs ... ')

# Filter to be applied while fetching logs
filters = ['language::en',
           'request.input.text::"I need a good science fiction book"']

df_logs = get_logs_with_filter(sdk_object, assistant_information, num_logs=20000, filters=filters, filename='log_input.json')

Fetching logs ... 
224 logs retrieved
Loading 224 logs into dataframe ...
Saving logs into log_input.json ... 
Completed


#### 3.3 Get logs with specified intent detected in reponse

In [7]:
print('Fetching logs ... ')

# Filter to be applied while fetching logs
filters = ['language::en',
           'response.intents:intent::"book_scifi"']

df_logs = get_logs_with_filter(sdk_object, assistant_information, num_logs=20000, filters=filters, filename='log_intent.json')

Fetching logs ... 
2164 logs retrieved
Loading 2164 logs into dataframe ...
Saving logs into log_intent.json ... 
Completed


#### 3.4 Get assistant definition and save to a JSON file

In [8]:
df_assistant = get_assistant_definition(sdk_object, assistant_information, filename='test_work.json')

# Get all intents
assistant_intents = [intent['intent'] for intent in df_assistant['intents'].values[0]] 

# Get all dialog nodes
assistant_nodes = pd.DataFrame(df_assistant['dialog_nodes'].values[0])

Loading skill definition using skill id: c1ad23a8-37ea-40b8-bcfa-c329529774b2
Assistant definition test_work.json exported


#### 3.4 Get logs visited specified node in reponse

In [9]:
node_title_map = dict()
for idx, node in assistant_nodes.iterrows():
    if str(node['title']) != 'nan':
        node_title_map[node['title']] = node['dialog_node']
node_df = pd.DataFrame(node_title_map.items())
node_df.columns = {'node_name', 'node_id'}
node_df

Unnamed: 0,node_id,node_name
0,article_food_dialog,node_2_1582134542638
1,hello,node_6_1582040347726
2,book_short_dialog,node_5_1582050079858
3,article_politics_dialog,node_10_1582133662325
4,book_computer,node_1_1582049209157
5,book_politics_dialog,node_5_1582049713134
6,article_selfimprove_dialog,node_1_1582134460437
7,book_long_dialog,node_8_1582050051508
8,article_fantasy_dialog,node_7_1582133509246
9,book_history_dialog,node_2_1582049297943


In [10]:
print('Fetching logs ... ')

# Filter to be applied while fetching logs
filters = ['language::en',
           'response.output:nodes_visited::[{}]'.format(node_title_map['book_short_dialog'])]

df_logs = get_logs_with_filter(sdk_object, assistant_information, num_logs=20000, filters=filters, filename='log_node.json')

Fetching logs ... 
20 logs retrieved
Loading 20 logs into dataframe ...
Saving logs into log_node.json ... 
Completed
