# Report - focus on extracting insights

Earlier we used the `classify.ipynb` notebook to load the feedbacks, load a classification file and using embedding perform classification on the data. In this notebook we will focus on extracting insights from the data. We will use LLM calls to summarize, identify key common themes or issues.

## Pre-requisites

A json file with the feedbacks and their classification, here is how it should look like:

```json
[
  {
    "Id": "an id",
    "PartnerShortName": "FastTrackFeedback",
    "ServiceName": "Azure Data Factory - Data Movement",
    "Type": "Feature Request",
    "Title": "a title",
    "Blocking": "",
    "Description": "some description",
    "WorkaroundAvailable": "No",
    "Priority": "2",
    "CustomerName": "Customer name",
    "CustomerTpid": "",
    "WorkaroundDescription": "some workaround ",
    "UserStory": "a user story",
    "Embedding": [
    -0.010973175
    ...
    ],
    "ClassificationLevels": [
      "Performance Efficiency",
      "Data performance"
    ]
  },
  {
    "Id": "FastTrackFeedback_396325",
    ...
]
```

You will require LLM access. In this example, I am using Azure OpenAI.

In [None]:
#r "nuget: Azure.AI.OpenAI, 1.0.0-beta.12"
#r "nuget: DotNetEnv, 2.5.0"
// ability to load the entire console project, so no need to create local classes
# r "../bin/Debug/net8.0/console.dll"

using Azure; 
using Azure.AI.OpenAI;
using DotNetEnv;
using System.IO;
using System.Text.Json; 
using ProductLeaders.console.Models;


## Initialize OpenAI

As part of the report, there are calls to OpenAI to summarize the feedbacks. You will need to initialize OpenAI with your API key.


In [None]:

static string _configurationFile = @"../../../configuration/.env";
Env.Load(_configurationFile);

string oAiApiKey = Environment.GetEnvironmentVariable("AOAI_APIKEY") ?? "AOAI_APIKEY not found";
string oAiEndpoint = Environment.GetEnvironmentVariable("AOAI_ENDPOINT") ?? "AOAI_ENDPOINT not found";
string chatCompletionDeploymentName = Environment.GetEnvironmentVariable("CHATCOMPLETION_DEPLOYMENTNAME") ?? "CHATCOMPLETION_DEPLOYMENTNAME not found";
string embeddingDeploymentName = Environment.GetEnvironmentVariable("EMBEDDING_DEPLOYMENTNAME") ?? "EMBEDDING_DEPLOYMENTNAME not found";

AzureKeyCredential azureKeyCredential = new AzureKeyCredential(oAiApiKey);
OpenAIClient openAIClient = new OpenAIClient(new Uri(oAiEndpoint), azureKeyCredential);

Console.WriteLine($"OpenAI Client created: {oAiEndpoint} with: {chatCompletionDeploymentName} and {embeddingDeploymentName} deployments");

## Helper method :: Call OpenAI


In [None]:
async Task<string> CallOpenAI(string prompt, string systemMessage, bool JasonResponse = true)
{
    // Create ChatCompletionsOptions and set up the system and user messages
    ChatCompletionsOptions options = new ChatCompletionsOptions();
    
    // Add system message
    options.Messages.Add(new ChatRequestSystemMessage(systemMessage));
    
    // Add user message (the prompt generated from feedback)
    options.Messages.Add(new ChatRequestUserMessage(prompt));

    // Configure request properties
    options.MaxTokens = 250;
    options.Temperature = 0.7f;
    options.NucleusSamplingFactor = 0.95f;
    options.FrequencyPenalty = 0.0f;
    options.PresencePenalty = 0.0f;
    // options.StopSequences.Add("\n"); 
    options.DeploymentName = chatCompletionDeploymentName;
    if (JasonResponse) options.ResponseFormat = ChatCompletionsResponseFormat.JsonObject;

    // Make the API request to get the chat completions
    // add timing for this call 
    var watch = System.Diagnostics.Stopwatch.StartNew();
    Response<ChatCompletions> response = await openAIClient.GetChatCompletionsAsync(options);
    watch.Stop();
    var elapsedMs = watch.ElapsedMilliseconds;
    Console.WriteLine($"OpenAI API call took: {elapsedMs} ms");

    // Extract and return the first response from the choices
    ChatCompletions completions = response.Value;
    if (completions.Choices.Count > 0)
    {
        // Console.WriteLine($"Response generated: {completions.Choices[0].Message.Content}");
        return completions.Choices[0].Message.Content;
    }
    else
    {
        return "No response generated.";
    }
}

## LLM Call

When making the call to OpenAI, we instruct the LLM to perform the activities we wish to perform on it. This is where addtional insights might be required to be pulled out from the call, as of now, there is a specific `json` struct received from the LLM call. The struct needs to match the `LlmSummary` struct, thus making changes to the result from the LLM must be also addressed in the subsequent calls.

In [None]:
const string CreateClusterSystemMessage = @"
You are a helpful assistant. When responding, follow these rules:
1. Output only valid JSON. Do not include any markdown, quotes, or extra text.
2. The JSON must have this structure:
{
  ""CommonElement"": ""<A concise phrase describing the common theme>"",
  ""Summary"": ""<A detailed explanation summarizing the feedback>""
}
3. Base your summary solely on the provided user stories.
4. If any user story is unrelated, you may still incorporate it in the summary if there's a consistent theme.
";

public class LlmSummary
{
    public string CommonElement { get; set; }
    public string Summary { get; set; }
}

string BuildUserPrompt(IEnumerable<string> userStories, string pillarName, string subCatName)
{
    // You might add some context about the subcategory:
    var sb = new StringBuilder();
    sb.AppendLine($"Pillar: {pillarName}");
    sb.AppendLine($"Subcategory: {subCatName}");
    sb.AppendLine("Here are user stories:");
    foreach (var story in userStories)
    {
        sb.AppendLine($"- {story}");
    }
    sb.AppendLine();
    sb.AppendLine("Generate the JSON based on these stories:");
    return sb.ToString();
}

In [None]:
using System.Text.Json;

public async Task<Dictionary<string, LlmSummary>> GenerateStructuredInsightsAsync(
    List<ProductLeaders.console.Models.FeedbackRecord> feedbackRecords)
{
    // We'll store "pillar::subCat" -> LlmSummary
    // but in your new approach, "pillar" and "subCat" come from ClassificationLevels
    var insightsDictionary = new Dictionary<string, LlmSummary>();

    // Group by the first two classification levels, e.g. [0] => Pillar, [1] => SubCat
    // If a record has only 1 level, treat subCat as "(None)".
    var grouped = feedbackRecords
        .Where(f => f.ClassificationLevels != null && f.ClassificationLevels.Count > 0)
        .GroupBy(f =>
        {
            // classificationLevels[0] => Pillar
            var level1 = f.ClassificationLevels[0];
            // classificationLevels[1] => SubCat (if it exists, otherwise "(None)")
            var level2 = f.ClassificationLevels.Count >= 2
                ? f.ClassificationLevels[1]
                : "(None)";
            return new { Pillar = level1, SubCat = level2 };
        })
        // Order pillars, then subcats
        .OrderBy(g => g.Key.Pillar)
        .ThenBy(g => g.Key.SubCat);

    // For each group, gather user stories & decide whether to call the LLM
    foreach (var group in grouped)
    {
        var pillarName = group.Key.Pillar;
        var subCatName = group.Key.SubCat;
        var count = group.Count();

        // We'll use "pillarName::subCatName" as the dictionary key
        string dictKey = $"{pillarName}::{subCatName}";

        // Only call LLM if > 5
        if (count > 5)
        {
            // Possibly sample to 20 items
            var itemSamples = group.Take(20).ToList();
            // Gather user stories
            var userStoryTexts = itemSamples.Select(f => f.UserStory);

            // Call the LLM to get structured summary
            LlmSummary summaryObj = await GetStructuredSummaryAsync(userStoryTexts, pillarName, subCatName);

            insightsDictionary[dictKey] = summaryObj;
        }
        else
        {
            // fewer than 6 items => no summary
            insightsDictionary[dictKey] = null;
        }
    }

    return insightsDictionary;
}

// ---------------------------------------------------------------------
// Helper method that calls the LLM and returns an LlmSummary from JSON.
// This merges logic from your first snippet (GetStructuredSummaryAsync).
// ---------------------------------------------------------------------
private async Task<LlmSummary> GetStructuredSummaryAsync(
    IEnumerable<string> userStories,
    string level1,
    string level2)
{
    // 1) Build user message for LLM (prompt)
    string userPrompt = BuildUserPrompt(userStories, level1, level2);

    // 2) Make the LLM call
    Console.WriteLine($"Generating summary for {level1} - {level2} ({userStories.Count()} items)...");
    string rawResponse = await CallOpenAI(userPrompt, CreateClusterSystemMessage, true);

    // 3) Attempt to parse the JSON
    try
    {
        var summary = JsonSerializer.Deserialize<LlmSummary>(rawResponse);
        if (summary == null)
        {
            return new LlmSummary
            {
                CommonElement = "ParsingError",
                Summary = "LLM returned null or invalid JSON"
            };
        }
        Console.WriteLine($"Summary generated: {summary.Summary}");
        return summary;
    }
    catch (JsonException ex)
    {
        // handle malformed JSON
        return new LlmSummary
        {
            CommonElement = "ParsingError",
            Summary = $"Failed to parse LLM output as JSON. Error: {ex.Message}"
        };
    }
}



## Markdown Report

The last step is to create a readable report in markdown format. The report will contain few aggregated statistics, insights per group of feedbacks.

In [None]:
using System.Text;
using System.Linq;



public class MarkdownBuilder
{
    // Provide a method that builds the Markdown with:
    //  - A summary table of subcategories in each pillar
    //  - Detailed listing with optional LLM summaries
    public string BuildMarkdown(
        Dictionary<string, LlmSummary> summaries,
        Dictionary<string, List<ProductLeaders.console.Models.FeedbackRecord>> feedbacks)
    {
        var sb = new StringBuilder();
        sb.AppendLine("# WAF Feedback Report");
        sb.AppendLine();

        // -------------------------------------------------
        // 1) Group by pillar (top-level).
        // Key in feedbacks is "Pillar::Subcat"
        // We'll produce data structures for each pillar
        // plus a top-level summary table of pillars.
        // -------------------------------------------------
        var groupedByPillar = feedbacks
            .GroupBy(kv => kv.Key.Split("::", 2)[0]) // pillar = everything before "::"
            .Select(g => new
            {
                PillarName = g.Key,
                // Dictionary of subcatKey -> List<FeedbackRecord>
                SubcatDictionary = g.ToDictionary(x => x.Key, x => x.Value)
            })
            // Sort pillars by total feedback count desc
            .OrderByDescending(p => p.SubcatDictionary.Values.Sum(list => list.Count))
            .ToList();

        // 1A) Build a top-level table showing each pillar + total count
        sb.AppendLine("## Summary of All Pillars");
        sb.AppendLine();
        sb.AppendLine("| Pillar | Total Feedback Count |");
        sb.AppendLine("|--------|-----------------------|");

        foreach (var pillarGroup in groupedByPillar)
        {
            // Sum the subcategory counts to get total for pillar
            int totalCount = pillarGroup.SubcatDictionary.Values.Sum(list => list.Count);
            sb.AppendLine($"| {EscapeMd(pillarGroup.PillarName)} | {totalCount} |");
        }

        sb.AppendLine();

        // -------------------------------------------------
        // 2) Now, for each pillar, show details:
        //    - A subcategory summary table
        //    - The subcategories themselves with optional LLM summary
        // -------------------------------------------------
        foreach (var pillarGroup in groupedByPillar)
        {
            int pillarCount = pillarGroup.SubcatDictionary.Values.Sum(list => list.Count);

            sb.AppendLine($"## {pillarGroup.PillarName} (Total: {pillarCount})");
            sb.AppendLine();

            // 2A) Build a subcategory summary table
            sb.AppendLine("| Subcategory | Feedback Count |");
            sb.AppendLine("|-------------|----------------|");

            // Order subcategories by count desc
            var subCatEntries = pillarGroup.SubcatDictionary
                .OrderByDescending(sc => sc.Value.Count)
                .ToList();

            foreach (var subCatEntry in subCatEntries)
            {
                string key = subCatEntry.Key; // e.g. "Reliability::Scaling"
                var subcatRecords = subCatEntry.Value;
                int subcatCount = subcatRecords.Count;

                // subcat name is after "::"
                var parts = key.Split("::", 2);
                string subCatName = parts.Length > 1 ? parts[1] : "(None)";

                sb.AppendLine($"| {EscapeMd(subCatName)} | {subcatCount} |");
            }
            sb.AppendLine();

            // 2B) Detailed listing for each subcategory
            foreach (var subCatEntry in subCatEntries)
            {
                string key = subCatEntry.Key;
                var subcatRecords = subCatEntry.Value;
                int subcatCount = subcatRecords.Count;

                var parts = key.Split("::", 2);
                string subCatName = parts.Length > 1 ? parts[1] : "(None)";

                sb.AppendLine($"### {subCatName} (Count: {subcatCount})");
                sb.AppendLine();

                // Check for LLM summary
                if (summaries.TryGetValue(key, out var llmSummary) && llmSummary != null)
                {
                    sb.AppendLine("**LLM Summary**:");
                    sb.AppendLine();
                    sb.AppendLine($"- **CommonElement**: {EscapeMd(llmSummary.CommonElement ?? "")}");
                    sb.AppendLine($"- **Summary**: {EscapeMd(llmSummary.Summary ?? "")}");
                    sb.AppendLine();
                }
                else
                {
                    sb.AppendLine("> No summary was generated (≤ 5 feedback items or not processed).");
                    sb.AppendLine();
                }

                // // 2C) Collapsible details for each item
                // foreach (var record in subcatRecords)
                // {
                //     sb.AppendLine("<details>");
                //     sb.AppendLine($"<summary>{EscapeHtml(record.Title ?? "No Title")}</summary>");
                //     sb.AppendLine();
                //     sb.AppendLine($"**ID**: {EscapeHtml(record.Id ?? "")}");
                //     sb.AppendLine();
                //     sb.AppendLine($"**User Story**: {EscapeHtml(record.UserStory ?? "")}");
                //     sb.AppendLine();
                //     sb.AppendLine("</details>");
                //     sb.AppendLine();
                // }
            }
        }

        return sb.ToString();
    }
    // Minimal escaping for the subcat name (in table cells).
    private string EscapeMd(string text)
    {
        if (string.IsNullOrEmpty(text)) return "";
        // Escape | in table cells
        return text.Replace("|", "\\|");
    }

    // Minimal HTML escaping for <details> content
    private string EscapeHtml(string input)
    {
        if (string.IsNullOrEmpty(input)) return "";
        return input
            .Replace("&", "&amp;")
            .Replace("<", "&lt;")
            .Replace(">", "&gt;");
    }
}

## Enriching each classification

Now we will be able to call LLM with specific set of items to summarize or perform any insight extraction. the outcome is saved to a file.

In [None]:
// 1) Load the feedback
var feedbackList = JsonSerializer.Deserialize<List<ProductLeaders.console.Models.FeedbackRecord>>(File.ReadAllText("feedback_classified.json"));
Console.WriteLine($"Loaded {feedbackList.Count} feedback records.");
// var insights = await GenerateStructuredInsightsAsync(feedbackList);

// 2) Generate or load your insights
// Option A: Call the LLM now
var insightsDict = await GenerateStructuredInsightsAsync(feedbackList);

var json = JsonSerializer.Serialize(insightsDict);
File.WriteAllText("pillar_subcat_insights.json", json);

## Prepare the report

The report builder requires the feedbacks and the enriched classification (two distinct files), while the feedback list is a non sorted items, the report builder expect a grouped feedbacks, therefore in the next step we will group the feedbacks by classification. 

In [None]:
// 1) Load the LLM insights (Dictionary<string, LlmSummary>)
string insightsFile = "pillar_subcat_insights.json";
Dictionary<string, LlmSummary> insights =
    JsonSerializer.Deserialize<Dictionary<string, LlmSummary>>(File.ReadAllText(insightsFile));

// 2) Load the classified feedback (List<FeedbackRecord>)
string feedbackFile = "feedback_classified.json";
var feedbackList =
    JsonSerializer.Deserialize<List<ProductLeaders.console.Models.FeedbackRecord>>(File.ReadAllText(feedbackFile));

if (insights == null || feedbackList == null)
{
    Console.WriteLine("Failed to load insights or feedback data.");
    return;
}

In [None]:
// 3) Group feedback: "pillar::subcat" => List<FeedbackRecord> in this case there are only 2 levels
var feedbackByPillarSubcat = feedbackList
    .Where(f => f.ClassificationLevels != null && f.ClassificationLevels.Count > 0)
    .GroupBy(f =>
    {
        string level1 = f.ClassificationLevels[0];
        // if there's a second level, use it; otherwise "(None)"
        string level2 = (f.ClassificationLevels.Count > 1)
            ? f.ClassificationLevels[1]
            : "(None)";
        return $"{level1}::{level2}";
    })
    .ToDictionary(g => g.Key, g => g.ToList());

## Calling the report builder

We now have the required artifacts to call the report builder. The report builder will generate a markdown file with the insights extracted from the feedbacks. The section which prints the actual feedbacks is commented out, as it might be too verbose, however it is available to report on.

In [None]:
var builder = new MarkdownBuilder();
string mdContent = builder.BuildMarkdown(insights, feedbackByPillarSubcat);

File.WriteAllText("waf_report.md", mdContent);
Console.WriteLine("Report generated: waf_report.md");