To run this notebook just press _"Run All"_ <span style="opacity:.8;">(in Google Colab: <b>Runtime ▸ Run all</b>)</span>

<p align="center">
  <a href="https://docs.fenic.ai">Read the Docs</a> •
  <a href="https://discord.com/invite/GdqF3J7huR">Join Discord</a> •
  <a href="https://github.com/typedef-ai/fenic">⭐️ Star fenic</a>
</p>

To install fenic locally, just follow the instructions on the [Github Repo](https://github.com/typedef-ai/fenic)

**What you'll learn:**
- How to configure fenic with language and embedding models for semantic analysis.
- How to load and structure customer feedback data.
- How to generate semantic embeddings for feedback text.
- How to cluster feedback into thematic groups using semantic similarity.
- How to summarize each cluster to extract actionable insights.

By the end of this example, you'll see how fenic can help you quickly identify common issues, feature requests, and areas of praise from customer feedback—enabling data-driven decision making and faster response to user needs.

If this notebook helps, please give <a href="https://github.com/typedef-ai/fenic" target="_blank" rel="noopener noreferrer">fenic</a> a ⭐️ — it really helps!

## Installation

In [None]:
!pip uninstall -y sklearn-compat ibis-framework imbalanced-learn
!pip install fenic matplotlib seaborn polars==1.30.0

In [None]:
import os
import getpass
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

In [16]:
import fenic as fc

In this step, we configure a fenic session with both language and embedding models. 

This setup specifies which models to use for semantic analysis and embeddings, as well as their rate limits. 

Initializing the session with these settings enables all subsequent semantic operations in the notebook.

In [None]:
# Configure session with both language models and embedding models
config = fc.SessionConfig(
    app_name="feedback_clustering",
    semantic=fc.SemanticConfig(
        language_models={
            "mini": fc.OpenAILanguageModel(
                model_name="gpt-4o-mini",
                rpm=500,
                tpm=200_000,
            )
        },
        embedding_models={
            "small": fc.OpenAIEmbeddingModel(
                model_name="text-embedding-3-small",
                rpm=3000,
                tpm=1_000_000
            )
        }
    ),
)

# Create session
session = fc.Session.get_or_create(config)

### Sample Customer Feedback Data

This section defines a sample dataset of customer feedback entries, each containing details such as customer name, feedback text, rating, and timestamp. 

The data is then loaded into a fenic DataFrame, which will be used for semantic analysis and clustering in the following steps.

In [None]:
# Sample customer feedback data with various themes
feedback_data = [
    {
        "feedback_id": "fb_001",
        "customer_name": "Alice Johnson",
        "feedback": "The mobile app crashes every time I try to upload a photo. Very frustrating experience!",
        "rating": 1,
        "timestamp": "2024-01-15"
    },
    {
        "feedback_id": "fb_002",
        "customer_name": "Bob Smith",
        "feedback": "Love the new dark mode feature! Much easier on the eyes during night time use.",
        "rating": 5,
        "timestamp": "2024-01-16"
    },
    {
        "feedback_id": "fb_003",
        "customer_name": "Carol Davis",
        "feedback": "The app is way too slow when loading my dashboard. Takes over 30 seconds every time.",
        "rating": 2,
        "timestamp": "2024-01-17"
    },
    {
        "feedback_id": "fb_004",
        "customer_name": "David Wilson",
        "feedback": "Please add a feature to export data to Excel. Really need this for my monthly reports.",
        "rating": 3,
        "timestamp": "2024-01-18"
    },
    {
        "feedback_id": "fb_005",
        "customer_name": "Emma Brown",
        "feedback": "The checkout process is so confusing. Too many steps to complete a simple purchase.",
        "rating": 2,
        "timestamp": "2024-01-19"
    },
    {
        "feedback_id": "fb_006",
        "customer_name": "Frank Miller",
        "feedback": "Amazing customer support team! They solved my billing issue in just minutes.",
        "rating": 5,
        "timestamp": "2024-01-20"
    },
    {
        "feedback_id": "fb_007",
        "customer_name": "Grace Lee",
        "feedback": "Button layouts are inconsistent across different screens. Looks unprofessional.",
        "rating": 2,
        "timestamp": "2024-01-21"
    },
    {
        "feedback_id": "fb_008",
        "customer_name": "Henry Clark",
        "feedback": "Would love to see integration with Google Calendar for appointment scheduling.",
        "rating": 4,
        "timestamp": "2024-01-22"
    },
    {
        "feedback_id": "fb_009",
        "customer_name": "Ivy Martinez",
        "feedback": "App constantly freezes when I try to edit my profile information. Please fix!",
        "rating": 1,
        "timestamp": "2024-01-23"
    },
    {
        "feedback_id": "fb_010",
        "customer_name": "Jack Taylor",
        "feedback": "The search functionality is excellent! Found exactly what I needed quickly.",
        "rating": 5,
        "timestamp": "2024-01-24"
    },
    {
        "feedback_id": "fb_011",
        "customer_name": "Karen White",
        "feedback": "Loading times are terrible. Sometimes the app doesn't respond for minutes.",
        "rating": 1,
        "timestamp": "2024-01-25"
    },
    {
        "feedback_id": "fb_012",
        "customer_name": "Leo Garcia",
        "feedback": "Add offline mode please! Need to access my data when traveling without internet.",
        "rating": 3,
        "timestamp": "2024-01-26"
    }
]

# Create DataFrame
feedback_df = session.create_dataframe(feedback_data)
print(f"Loaded {feedback_df.count()} customer feedback entries:")
feedback_df.select("customer_name", "feedback", "rating").show()

### Generating Semantic Embeddings

In this step, we generate semantic embeddings for each feedback entry using the configured embedding model. 

These embeddings capture the meaning of the feedback text in a numerical form, enabling semantic comparison and clustering in later steps. 

The resulting embeddings are added as a new column to the DataFrame.

In [None]:
# Step 1: Create embeddings for feedback text

# Generate embeddings from the feedback text
feedback_with_embeddings = feedback_df.select(
    "*",
    fc.semantic.embed(fc.col("feedback")).alias("feedback_embeddings")
)

### Clustering Feedback into Semantic Themes

This step clusters the feedback entries into thematic groups based on the semantic similarity of their embeddings. 

We specify the number of clusters (e.g., bugs, performance, features, praise) and aggregate each group to compute statistics such as feedback count, average rating, and customer names. 

Additionally, we use a semantic reduction to generate a concise summary describing the main theme, common issues, and overall sentiment for each cluster.

In [None]:
# Step 2: Cluster feedback into semantic themes
# Use semantic group_by to cluster feedback into 4 thematic groups
# and apply semantic.reduce directly in the aggregation
feedback_clusters = feedback_with_embeddings.semantic.with_cluster_labels(
    fc.col("feedback_embeddings"),
    4  # Number of clusters - expecting themes like bugs, performance, features, praise
).group_by(
    "cluster_label"
).agg(
    fc.count("*").alias("feedback_count"),
    fc.avg("rating").alias("avg_rating"),
    fc.collect_list("customer_name").alias("customer_names"),
    fc.semantic.reduce(
        (
            "Analyze this cluster of customer feedback and provide a concise summary of the main theme, "
            "common issues, and sentiment."
        ),
        column=fc.col("feedback")
    ).alias("theme_summary")
)

### Displaying and Cleaning Up

In this final step, we display the results of the clustering analysis, showing each cluster’s label, the number of feedback entries, average rating, and a summary of the main theme. 

After presenting the results, we clean up by stopping the Fenic session.

In [None]:
# Display detailed analysis for each cluster
feedback_clusters.select(
    "cluster_label",
    "feedback_count",
    "avg_rating",
    "theme_summary"
).sort("cluster_label").show()
print()

# Clean up
session.stop()
print("Analysis complete!")

## Next Steps

A few other examples to explore:

- Extract structured metadata from unstructured text [notebook](https://colab.research.google.com/github/typedef-ai/fenic/blob/main/examples/document_extraction/document_extraction.ipynb)
- Named Entity Recognition [notebook](https://colab.research.google.com/github/typedef-ai/fenic/blob/main/examples/named_entity_recognition/ner.ipynb)
- Classification of new articles for article bias detection [notebook](https://colab.research.google.com/github/typedef-ai/fenic/blob/main/examples/news_analysis/news_analysis.ipynb)

---

<p align="center" style="margin:18px 0 6px; font-size:1.05em;">
  Enjoyed this? Help others find it.
</p>

<p align="center" style="margin:0 0 12px;">
<a href="https://github.com/typedef-ai/fenic">⭐️ Give fenic a Star ⭐️</a>
</p>



