# **Using Local LLMs for Object Suggestions**
One of my first forays into building tools on top of Capacities will be "object suggestions". Basically: when prompted with all of my Object types + a particular Object (say, a Daily Note), then I'll use a locally hosted LLM (served via `ollama`) to parse out possible suggestions for objects to create. 

# Setup
The cells below will set up the rest of the notebook.

I'll start by configuring the kernel: 

In [None]:
# Change the working directory 
%cd ..

# Enable the autoreload extension, which will automatically load in new code as it's written
%load_ext autoreload
%autoreload 2

Now I'll import some necessary modules:

In [25]:
# Third-party imports
from IPython.display import display, Markdown
from pydantic import BaseModel, Field
from ollama import chat
from tqdm import tqdm
import pandas as pd

# Project imports
from utils.data import parse_capacities_export_zip

# Configuring `ollama`
What local model should I use?

In [3]:
# Declare the model for Ollama
# Ones I've downloaded include:
# ["qwen3:4b", "gemma3:4b", "deepseek-r1:1.5b", "deepseek-r1:7b", "gemma3:12b"]
ollama_model = "gemma3:12b"

# Loading Data
First, I'm going to load in my Capacities export data!

In [4]:
# Declare the path to the Capacities export .zip file
capacities_export_zip_path = "D:/data/datasets/capacities-export-data/Daily Export (fd84f574)/Everything (2025-06-02 21-45-56).zip"

# Load the capacities data from the specified .zip file
capacities_data_df = parse_capacities_export_zip(capacities_export_zip_path)

# Defining the Prompt
Next up: I'll define the prompt that I'll use to identify objects within the entries.

In [12]:
object_types_of_interest = [
    "VideoGame",
    "TvShow",
    "Technology",
    "StageShow",
    "Person",
    "Event",
    "Book",
    "BoardGame",
]
# object_types_of_interest = [
#     object_type
#     for object_type in capacities_data_df["object_type"].unique()
#     if object_type not in ["DailyNote"]
# ]

prompt = f"""# Role
I'd like you to act as a "Named Entity Recognition" (NER) system of sorts. 

# Context
Users will provide you with a note from an export of their personal knowledge management system. 

"Linked entities" will be in Markdown format, like this:
```
[some text](EntityType/entity-id.md)
```

# Task

Your task is to identify "unlinked" entities in the note. 

These "unlinked" entities will be entities that ought to be a new PKM Object, but aren't currently "linked" to anything. 

As follows are the possible entity types you should be looking for: 

{"\n".join(["- " + entity_type for entity_type in object_types_of_interest])}

# Suggestions

- You can ignore any objects that are already linked in this way, as the user has already created a PKM Object for them.
- If you find an entity that doesn't match any of the object types above, you can ignore it.
- In your `reasoning` field, make sure to leverage your encyclopedic knowledge of the world's knowledge to add any necessary context to the entity

# Response Format
You'll respond with a JSON list of `PotentialObject` dictionaries. 
"""


class PotentialObject(BaseModel):
    """An object that could potentially be created in the PKM system"""

    text_snippet: str = Field(
        ...,
        description="A short, exact-text snippet (around ~10 words) from the note that contains the potential object",
    )

    new_object_title: str = Field(..., description="The title of the potential object")

    new_object_type: str = Field(..., description="The type of the potential object")

    reasoning: str = Field(
        ...,
        description="Roughly 1-2 sentences explaining why this is a potential object and why it should be created in the PKM system",
    )

    make_new_object: bool = Field(
        ...,
        description="Based on the `reasoning`, whether the user should create a new object for this potential object",
    )


class PotentialObjectsResponse(BaseModel):
    """A response containing a list of potential objects"""

    potential_objects: list[PotentialObject] = Field(
        ..., description="A list of potential objects that could be created"
    )

# Generating Completions
Next up: we can use the aforementioned prompt to process my Capacities data!

In [None]:
# Parameterize the completion generation
max_chars_per_text = 7_500

# Filter the capacities_data_df to only show DailyNote objects
daily_notes_df = capacities_data_df.query("object_type == 'DailyNote'").copy()

# Iterate through each of the DailyNotes in the capacities_data_df and collect the potential objects
df_idx_to_potential_objects_map = {}
for idx, row in tqdm(
    iterable=list(daily_notes_df.iterrows()),
    desc="Generating potential objects for DailyNotes",
    total=len(daily_notes_df),
    unit="note",
):

    try:

        # Extrtact the text content of the note
        text_content = row["text_content"]

        # Generate suggestions using ollama
        response = chat(
            messages=[
                {"role": "system", "content": prompt},
                {"role": "user", "content": text_content[:max_chars_per_text]},
            ],
            model=ollama_model,
            format=PotentialObjectsResponse.model_json_schema(),
            options={"num_ctx": 4096},
        )

        # Parse the response into a PotentialObjectsResponse object
        potential_object_list = PotentialObjectsResponse.model_validate_json(
            response.message.content
        )

        # Append the potential objects to the list
        df_idx_to_potential_objects_map[idx] = potential_object_list.potential_objects

    except Exception as e:
        print(f"Error processing row {idx}: {e}")
        continue

# Transforming Data

In [24]:
df_idx_to_modeldumped_potential_objects_map = {
    idx: [potential_object.model_dump() for potential_object in potential_objects]
    for idx, potential_objects in df_idx_to_potential_objects_map.items()
}

potential_objects_df = daily_notes_df.copy()

# Add the potential objects to the DataFrame
potential_objects_df["potential_objects"] = potential_objects_df.index.map(
    df_idx_to_modeldumped_potential_objects_map
)

# Save the potential_objects_df to a JSON file
potential_objects_df.to_json(
    "data/potential_objects.json",
    orient="records",
    force_ascii=False,
    indent=2,
)

# Analyzing Data

In [None]:
exploded_potential_objects_df = potential_objects_df.explode(
    "potential_objects", ignore_index=True
).copy()

# Use pd.json_normalize to normalize the potential_objects column
normalized_potential_objects_df = pd.json_normalize(
    exploded_potential_objects_df["potential_objects"]
)

normalized_potential_objects_df.groupby(["new_object_type", "new_object_title"]).size().sort_values(
    ascending=False
).head(30)