# Information Retrieval Simulation Logs

This notebook is used with the Application Insights dashboard described in the [readme](../README.md), but you can adapt this approach for other scenarios where synthetic data helps you develop the dashboards and workbooks you need for deep analysis.


## Introduction

In this notebook, we will generate fake log events using Faker with the aim of simulating an information retrieval user experience. The log events are intended to be consumed in a Azure Application Insights workbook and dashboard. The user experience is composed of the following user events:

- _OnSearch_ - The user submits a query of interest into the system. Additional parameters are `{'query':'<your query string>'}`
- _OnResults_ - A list of returned documents by your search engine. Additional parameter for each result is the rank of each document.
  A rank in our case is the running index of the returned documents. The first document is ranked 1. `{'indexRank':'<document rank>'}`
- _OnNavigate_ - An event where the user navigates to a particular document for further investigation.
- _OnSuccess_ - An indication that the document in question was found relative to the search query.

<center><img src="../images/search_funnel.png" width="70%" alt="user funnel"/></center>

Across all events, we are logging unique identifiers to separate between queries.<br>
Each document is identified by a unique `documentId`<br>
Each submitted query is identified by a unique `correlationId`<br>
The unique identifier is a running `sessionId` + `correlationId`. <br>The `sessionId` is generated automatically by AppInsights. See [User, session, and event analysis in Application Insights](https://docs.microsoft.com/azure/azure-monitor/app/usage-segmentation) for details.<br>


## Prerequisites

- Complete the [set up tasks described in the readme](../README.md). You should have an Azure subscription, Azure Application Insights, and an empty dashboard.

- Create a `.env` file in the root folder of the sample. Provide the following environment variables. Variables can be used as-is except for the instrumentation key, which you can copy from the Essentials section of the Application Insights Overview page in Azure portal.

  ```
  API_LOGGING_LEVEL = "DEBUG"
  LOGGER_NAME = "test_logger"
  APP_INSIGHTS_KEY = "<your-app-insights-instrumentation-key>"
  ```

- Install the packages listed in the requirements TXT file. Packages include [opencensus-ext-azure](https://pypi.org/project/opencensus-ext-azure/), [python-dotenv](https://pypi.org/project/python-dotenv/), and [Faker](https://pypi.org/project/Faker/).

  ```
  !pip install -r ../requirements.txt
  ```


## Import packages

Your now ready to begin running the code in this notebook. In the following cell, import the libraries used to generate the simulated logs.


In [8]:
import logging
import time
from src.app_insights_logging import CustomLogger
from datetime import datetime
import random

## Load Environment Variables


In [1]:
from dotenv import load_dotenv
import os

# take environment variables from .env.
load_dotenv(".env")

# add a custom name for your logger
logger_name = os.getenv("LOGGER_NAME")

# You can get the instrumentation key in the overview tab in the Azure Portal
appinsights_key = os.getenv("APP_INSIGHTS_KEY")

## Initialize the Azure AppInsights Logger

AppInsights allows us to log into separate tables such as traces, events and metrics.<br>
In this example we will log into the _traces_ table and the _events_ table


In [9]:
custom_logger = CustomLogger(appinsights_key, logger_name)
# This is a custom trace logger in AppInsights
trace_logger = custom_logger.setup_azure_trace_logging()

# This is a custom event logger in AppInsights
event_logger = custom_logger.setup_azure_event_logging()

In [None]:
type(event_logger)

## Generate user feedback for search results

The following code simulates the incidence of feedback (not every user will provide feedback) and the date at which the user session occurred.


In [11]:
import random
import datetime


def probability_of_feedback(probability: float = 1 / 3) -> bool:
    """Returns True if random float < probability threshold, False otherwise

    Args:
        probability (float, optional): The probability for a user to mark the document as relevant to the search query. Defaults to 1/3.

    Returns:
        Boolean: True if random float < probability threshold, False otherwise
    """
    return random.random() < probability


def get_range_of_dates(start_date: datetime.date, day_count: int) -> list:
    """
        Generate a list of dates starting from `start_date` and sample 70% of the list
    Args:
        start_date (datetime.date): starting date
        day_count (int): Initial date period the generate

    Returns:
        list: List of dates
    """
    date_range = [
        (start_date + datetime.timedelta(days=day)).isoformat()
        for day in range(day_count)
    ]
    return random.sample(date_range, int(day_count * 0.7))

## Generate queries

The following code uses [Faker](https://faker.readthedocs.io/en/master/index.html) to generate random three word sentences that substitute for actual queries.


In [12]:
from faker import Faker

# Number of daily user sessions
number_of_sessions_per_timestamp = 3
number_of_queries_per_session = 4


# The dates we are trying to simulate the logs for
# We recommend choosing a date from the recent past
start_date = datetime.date(2022, 9, 18)
date_range = get_range_of_dates(start_date, 6)

# initialize faker
faker = Faker()

## Run the simulation

In AppInsights, each event/trace timestamp is logged under the field `timestamp`. For the purposes of this simulation, with the aim of generating a history of searches, we're using the field: `mock_timestamp` as the simulated timestamp field.


In [None]:
import string, random

for mock_timestamp in date_range:
    print("Logging sessions for date:", mock_timestamp)
    mock_timestamp = datetime.datetime.strptime(mock_timestamp, "%Y-%m-%d")
    for session in range(number_of_sessions_per_timestamp):
        # sample unique session_id
        session_id = faker.pystr_format()

        # sample how many queries were submitted in a single user session
        for _ in range(random.randint(1, number_of_queries_per_session)):
            correlation_id = faker.pystr_format()

            # sample working user
            user_id = faker.pystr(6)

            # log query event
            query = faker.sentence(nb_words=3)[:-1]
            query_logging_dimensions = custom_logger.get_logging_dimensions(
                event="onSearch",
                query=query,
                correlationId=correlation_id,
                sessionId=session_id,
                userId=user_id,
                mock_timestamp=str(mock_timestamp),
            )

            mock_timestamp += datetime.timedelta(seconds=10)
            event_logger.info(f"onSearch", extra=query_logging_dimensions)
            print("User submitted query:", query)

            # Log results the user navigates to
            for _ in range(random.randint(1, 10)):
                index_rank = random.randint(1, 20)
                document_id = "".join(random.sample(string.ascii_lowercase, 6))
                video_logging_dimensions = custom_logger.get_logging_dimensions(
                    event="onNavigate",
                    query=query,
                    correlationId=correlation_id,
                    sessionId=session_id,
                    userId=user_id,
                    indexRank=index_rank,
                    documentId=document_id,
                    mock_timestamp=str(mock_timestamp),
                )
                mock_timestamp += datetime.timedelta(seconds=5)
                event_logger.info(f"onNavigate", extra=video_logging_dimensions)
                print("Navigating to document:", document_id)

                # If session is successful
                if probability_of_feedback(1 / 5):
                    feedback_logging_dimensions = custom_logger.get_logging_dimensions(
                        event="onSuccess",
                        query=query,
                        correlationId=correlation_id,
                        sessionId=session_id,
                        userId=user_id,
                        documentId=document_id,
                        indexRank=index_rank,
                        mock_timestamp=str(mock_timestamp),
                    )

                    event_logger.info(f"onSuccess", extra=feedback_logging_dimensions)
                    print("Success on document with index:", index_rank)

## View the dashboard in Azure portal


Your data should now be loaded into workbook in Application Insights.

1. In Azure portal, in your Application Insights resource, select **Application Dashboard**.

   ![Screenshot of the portal page, showing the location of the command at the top of the page.](../images/portal_appinsights_dashboard_cmd.png)

1. Select the name of the dashboard you provided for the deployment.

   ![Screenshot of the dashboard drop-down list at the top of the page.](../images/portal_appinsight_dashboard_selector.png)

1. The dashboard includes metrics that you can't find "out of the box" in the portal pages of Azure Cognitive Search. Custom metrics include the _Daily Reciprocal Rate_ and _Successful Over Inspected Rate_ and session metrics.

   ![Screenshot of a dashboard containing simulated data.](../images/portal_appinsight_dashboard.png)

1. To check the Kusto query that backs each visualization, select **Open Editing Pane** at the top of each tile.

   ![Screenshot of the Kusto query and visualization.](../images/portal_appinsight_open_edit_pane.png)

1. To use the custom metrics in new Kusto queries, open **Monitoring > Logs** and start a new query that pulls from the CustomEvents table.

   ![Screenshot of query editor.](../images/logs_example.png)


## Next steps

Now that you've seen the dashboard, your next step is to replace simulated data with real-world data from your search application. Revisit the fields used by the dashboard to see which ones your application will need to collect. [Set up diagnostic logging](https://learn.microsoft.com/azure/search/monitor-azure-cognitive-search#enable-resource-logging) for your search service to begin collecting actual logged events in Azure Monitor.
