# Semantic Joins with LLM Reasoning
This example demonstrates how to use semantic joins powered by large language models (LLMs) in Fenic to perform context-aware data matching and recommendations.

Semantic joins enable you to flexibly join tables based on natural language instructions, leveraging LLM reasoning rather than strict key or embedding similarity.

In this notebook, you'll see two practical scenarios:

- **Matching users to relevant articles**: Join user profiles with articles by reasoning about their interests and the article descriptions.
- **Product recommendations**: Suggest products to customers based on their purchase history using semantic relationships between purchased and recommended items.

This approach allows for more intuitive and powerful data enrichment, recommendation, and matching workflows, all driven by natural language join instructions.

## Setting up the fenic session

here we configure the session to use a language model for semantic joins. No embeddings are required, only LLM reasoning is used for flexible, context-aware data matching.

In [1]:
from typing import Optional

import fenic as fc

"""Demonstrate semantic join capabilities using LLM reasoning."""
# Configure session with language models (no embeddings needed)
config = fc.SessionConfig(
    app_name="semantic_joins",
    semantic=fc.SemanticConfig(
        language_models={
            "mini": fc.OpenAIModelConfig(
                model_name="gpt-4o-mini",
                rpm=500,
                tpm=200_000,
            )
        }
    ),
)

# Create session
session = fc.Session.get_or_create(config)

## Creating Example DataFrames

In this section, we define sample user profiles and articles as Python dictionaries and load them into Fenic DataFrames. 

This data will be used to demonstrate how semantic joins can match users to relevant articles based on their interests and the article descriptions.

In [None]:
# Sample user profiles data
users_data = [
    {
        "user_id": "user_001",
        "name": "Sarah",
        "interests": "I love cooking Italian food and trying new pasta recipes"
    },
    {
        "user_id": "user_002",
        "name": "Mike",
        "interests": "I enjoy working on cars and fixing engines in my spare time"
    },
    {
        "user_id": "user_003",
        "name": "Emily",
        "interests": "Gardening is my passion, especially growing vegetables and flowers"
    },
    {
        "user_id": "user_004",
        "name": "David",
        "interests": "I'm interested in learning about car maintenance and automotive repair"
    }
]

# Sample content/articles data
articles_data = [
    {
        "article_id": "art_001",
        "title": "Cooking Pasta Recipes",
        "description": "Delicious pasta recipes including spaghetti carbonara and fettuccine alfredo"
    },
    {
        "article_id": "art_002",
        "title": "Car Engine Maintenance",
        "description": "Essential guide to automobile engine care and troubleshooting"
    },
    {
        "article_id": "art_003",
        "title": "Gardening for Beginners",
        "description": "Start your garden with basic techniques for growing vegetables and flowers"
    },
    {
        "article_id": "art_004",
        "title": "Advanced Automotive Repair",
        "description": "Comprehensive automotive repair instructions for experienced mechanics"
    }
]

# Create DataFrames
users_df = session.create_dataframe(users_data)
articles_df = session.create_dataframe(articles_data)

print("User Profiles:")
users_df.select("name", "interests").show()
print()

print("Available Articles:")
articles_df.select("title", "description").show()
print()


## Semantic Join: Matching Users to Relevant Articles

In this step, we use a semantic join to match each user with articles that align with their interests. 

The join leverages LLM reasoning and a natural language instruction to determine which articles would be relevant for each user, based on the content of their interests and the article descriptions.


In [None]:
# Step 1: Semantic join to match users with relevant articles
print("Step 1: Matching users to relevant articles using semantic reasoning...")
print("-" * 70)

# Use semantic join to match users with articles based on their interests
user_article_matches = users_df.semantic.join(
    articles_df,
    join_instruction="A person with interests '{interests:left}' would be interested in reading about '{description:right}'"
)

print("User-Article Matches:")
user_article_matches.select(
    "name",
    "interests",
    "title",
    "description"
).show()
print()

## Preparing Data for Product Recommendations

In this section, we create sample data for customer purchase history and a product catalog. 

These are loaded into Fenic DataFrames and will be used to demonstrate how semantic joins can recommend new products to customers based on the relationships between their previous purchases and available products.

In [None]:
# Step 2: Product recommendation system using semantic joins
print("Step 2: Product recommendation system...")
print("-" * 50)

# Sample customer purchase history
purchases_data = [
    {
        "customer_id": "cust_001",
        "customer_name": "Alice", 
        "purchased_product": "Professional DSLR Camera"
    },
    {
        "customer_id": "cust_002",
        "customer_name": "Bob",
        "purchased_product": "Gaming Laptop"
    },
    {
        "customer_id": "cust_003",
        "customer_name": "Carol",
        "purchased_product": "Yoga Mat"
    },
    {
        "customer_id": "cust_004",
        "customer_name": "Dan",
        "purchased_product": "Coffee Maker"
    }
]

# Sample product catalog for recommendations
products_data = [
    {
        "product_id": "prod_001",
        "product_name": "Camera Lens Kit",
        "category": "Photography"
    },
    {
        "product_id": "prod_002",
        "product_name": "Tripod Stand", 
        "category": "Photography"
    },
    {
        "product_id": "prod_003",
        "product_name": "Gaming Mouse",
        "category": "Gaming"
    },
    {
        "product_id": "prod_004",
        "product_name": "Mechanical Keyboard",
        "category": "Gaming"
    },
    {
        "product_id": "prod_005",
        "product_name": "Yoga Blocks",
        "category": "Fitness"
    },
    {
        "product_id": "prod_006",
        "product_name": "Exercise Resistance Bands",
        "category": "Fitness"
    },
    {
        "product_id": "prod_007",
        "product_name": "Coffee Beans Premium Blend",
        "category": "Food & Beverage"
    },
    {
        "product_id": "prod_008",
        "product_name": "French Press",
        "category": "Food & Beverage"
    }
]

# Create DataFrames
purchases_df = session.create_dataframe(purchases_data)
products_df = session.create_dataframe(products_data)

print("Customer Purchase History:")
purchases_df.select("customer_name", "purchased_product").show()
print()

print("Available Products for Recommendation:")
products_df.select("product_name", "category").show()
print()


In [None]:
# Use semantic join for product recommendations
recommendations = purchases_df.semantic.join(
    products_df,
    join_instruction="A customer who bought '{purchased_product:left}' would also be interested in '{product_name:right}'"
)

print("Product Recommendations:")
recommendations.select(
    "customer_name",
    "purchased_product", 
    "product_name",
    "category"
).show()
print()

# Clean up
session.stop()
print("Session complete!")