# Enriching Filtered Events

In this notebook, we'll enrich the filtered events from the previous notebook with additional information. We'll use a combination of techniques to enrich the events:

1. Topic modeling using a Large Language Model (LLM) to extract topics from the posts


## Topic Modeling with Large Language Models
Topic modeling is a technique used to discover abstract topics in a collection of documents. In this notebook, we'll use a Large Language Model to extract topics from posts. This will allow us to categorize posts and make them more searchable.

### Setting Up the Ollama API Client
We'll use the Spring AI Ollama client to interact with the Ollama API.

Ollama is a tool that allows us to run large language models locally.

In [1]:
%use coroutines
@file:DependsOn("org.springframework.ai:spring-ai-ollama:1.0.0")

The prompt we'll use for the LLM is designed to extract software-related topics from posts. The prompt includes examples of how to format the output and what types of topics to include.

In [2]:
import java.io.File

val topicModelingSystemPrompt = File("resources/topic-extractor-prompt.txt").readText()

Create the Ollama Chat Model

In [3]:
import org.springframework.ai.ollama.OllamaChatModel
import org.springframework.ai.ollama.api.OllamaApi
import org.springframework.ai.ollama.api.OllamaApi.ChatRequest
import org.springframework.ai.ollama.api.OllamaApi.Message
import org.springframework.ai.ollama.api.OllamaApi.Message.Role
import org.springframework.ai.ollama.api.OllamaOptions

val ollamaApi = OllamaApi.builder()
    .baseUrl("http://localhost:11434")
    .build()

val ollamaOptions = OllamaOptions.builder().model("deepseek-coder-v2").build()

val ollamaChatModel = OllamaChatModel.builder()
    .ollamaApi(ollamaApi)
    .defaultOptions(ollamaOptions)
    .build()

### Creating a Topic Modeling Function
This function takes a post as input and uses the Ollama API to extract topics from the post. The function returns a string of comma-separated topics.

In [4]:
import org.springframework.ai.chat.messages.SystemMessage
import org.springframework.ai.chat.messages.UserMessage
import org.springframework.ai.chat.prompt.Prompt
import dev.raphaeldelio.*

fun extractTopics(post: String, existingTopics: String): String {
    val messages = listOf(
        SystemMessage(topicModelingSystemPrompt),
        UserMessage("Existing topics: $existingTopics"),
        UserMessage("Post: $post")
    )

    val response = ollamaChatModel.call(Prompt(messages))
    return response.result.output.text ?: ""
}

In [5]:
extractTopics("Kotlin is a great programming language for beginners who wants to build Agentic AI apps", "")

 “Kotlin, Programming Languages, Beginners, AI Applications”

In [6]:
extractTopics("Brazilian samba is a great music genre for dancing", "")

 ""

### Counting how many times a topic appears

Topk is a probabilistic data structure used for counting the number of occurrences of items in a stream. It is particularly useful for counting the number of occurrences of items in a large dataset without storing all the items explicitly.

In [7]:
import redis.clients.jedis.exceptions.JedisDataException
import java.time.LocalDateTime

fun createTopK(): String {
    try {
        jedisPooled.topkReserve("topics-topk", 15, 3000, 10, 0.9)
    } catch (_: JedisDataException) {
        println("TopK already exists")
    }

    return "topics-topk"
}

### Creating a Topic Extraction Handler
This function creates a handler that extracts topics from an event's text and stores them in Redis. The topics are stored as a pipe-separated string in the "topics" field of the event's hash.

In [8]:
val extractTopics: (Event) -> Pair<Boolean, String> = { event ->
    val existingTopics = jedisPooled.smembers("topics")
    val topics = extractTopics(event.text, existingTopics.joinToString(", "))
        .replace("\"", "")
        .replace("“", "")
        .replace("”", "")
        .split(",")
        .map { it.trim() }
        .filter { it.isNotBlank() }

    val topKKey = createTopK()
    if (topics.isNotEmpty()) {
        jedisPooled.topkAdd(topKKey, *topics.toTypedArray())
        jedisPooled.hset("post:" + event.uri.replace("at://did:plc:", ""), mapOf("topics" to topics.joinToString("|")))
        jedisPooled.sadd("topics", *topics.toTypedArray())
    }
    Pair(true, "OK")
}

In [9]:
createConsumerGroup("filtered-events", "topic-extraction-example")

In [11]:
runBlocking {
    consumeStream(
        streamName = "filtered-events",
        consumerGroup = "topic-extraction-example",
        consumer = "topic-extraction-1",
        handlers = listOf(printUri, extractTopics),
        ackFunction = ackFn(),
        count = 1,
        limit = 400
    )
}

topic-extraction-1: No new messages for 2 seconds. Stopping.


## Creating a Redis Search Index
In this section, we'll create a Redis Search index to make the enriched events searchable. Redis Search is a module that adds full-text search capabilities to Redis. It allows us to search for events based on their text, topics, and other fields.

### Creating the Index Schema in Code
Now we'll create the index schema in code. We'll use the Jedis client to create the schema and the index.

The following schema defines the fields that will be indexed. The schema includes:
- Text fields for full-text search
- Tag fields for exact matching
- Vector fields for semantic search

```
FT.CREATE postIdx ON HASH PREFIX 1 post: SCHEMA
        topics        TAG SEPARATOR "|"
        text          TEXT
```

In [12]:
import redis.clients.jedis.search.IndexDefinition
import redis.clients.jedis.search.IndexOptions
import redis.clients.jedis.search.Schema
import redis.clients.jedis.search.schemafields.VectorField.VectorAlgorithm

val schema = Schema()
    .addTagField("topics", "|")
    .addTextField("text", 1.0)

// Define index options (e.g., prefix)
val rule = IndexDefinition()
    .setPrefixes("post:")

// Create the index
try {
    jedisPooled.ftCreate("postIdx", IndexOptions.defaultOptions().setDefinition(rule), schema)
} catch (e: JedisDataException) {
    println("Index already exists")
}

### Searching the Index
Now that we have created the index, we can search for events based on their topics, text, and other fields. In this example, we'll search for events with the topic "Samba".

Redis Search uses a query language similar to SQL. For example, to search for events with the topic "machine_learning", we would use the query `@topics:{machine_learning}`.

Exact Matching Search

In [13]:
import redis.clients.jedis.params.ScanParams

val count = jedisPooled.topkListWithCount("topics-topk")
println(count)

{AI=2, Military Technology=1, Autonomous Weapons=1, Task Automation=1, Generation Z Skills=1}


In [14]:
//FT.SEARCH postIdx "@topics:{machine_learning}"
val result = jedisPooled.ftSearch(
    "postIdx",
    "@topics:{AI}"
)

result.documents.forEach { post ->
    println(post.get("topics"))
    println(post.get("text"))
    println("\n")
}

AI|Task Automation|Generation Z Skills
Outside of coding using AI is worse than just finding a 15 year old to do tasks


AI|Autonomous Weapons|Military Technology
Using machines for violent colonialism. Palmer Luckey made his fortune with VR headsets, founding Oculus as a teen. Now he's focused on the future of warfare, developing autonomous weapons powered by AI for the U.S. military and its allies.

youtu.be/bWEXnph1ElI?...




Full Text Search

In [15]:
//FT.SEARCH postIdx "@text:Open source"
val result = jedisPooled.ftSearch(
    "postIdx",
    "@text:by AI for"
)

result.documents.forEach { post ->
    println(post.get("text"))
    println("\n")
}

Outside of coding using AI is worse than just finding a 15 year old to do tasks


Using machines for violent colonialism. Palmer Luckey made his fortune with VR headsets, founding Oculus as a teen. Now he's focused on the future of warfare, developing autonomous weapons powered by AI for the U.S. military and its allies.

youtu.be/bWEXnph1ElI?...




Querying the TopK