# Demo: LLM-Powered Analytics with Snowflake Cortex


#### In this notebook, we will analyze Mayagüez restaurants data scraped from Yelp. We will:
1) Read a public Parquet file from an AWS S3 bucket, containing the scraped data 
2) Ingest the data into Snowflake 
3) Perform analytics using Snowflake Cortex AI (Meta Llama 3 8B model)
#####

Note that this is a small dataset and we will be using an XS warehouse (i.e., single node). For your projects, you can leverage Snowflake's distributed compute capabilities for larger datasets and use a larger warehouse if needed (e.g., an X-Large warehouse which has 16 nodes and 128 CPU cores). Of course, a larger warehouse will consume more credits so we encourage you to be mindful of credit usage! At the same time, do not panic, remember you are getting a $400 free trial...

Also note that this notebook must be executed inside Snowflake. Running the code locally won't work. 

In [None]:
import streamlit as st
from snowflake.snowpark.context import get_active_session
session = get_active_session()
image_data = session.file.get_stream("@DEMO_ASSETS/demo_pipeline.png",decompress=False).read()

In [None]:
st.image(image_data)

# Step 1: Read Parquet Data From S3 

This is a public S3 bucket that we've made available for today's demos. In your projects, you can create your own S3 buckets to store your project's data and read/analyze it within Snowflake.

In [None]:
-- Not good practice! But fine for today's demo...
USE ROLE ACCOUNTADMIN;
GRANT DATABASE ROLE SNOWFLAKE.CORTEX_USER TO ROLE ACCOUNTADMIN;

-- Select a warehouse (has to be created already, but this is super easy!)
USE WAREHOUSE COMPUTE_WH;

-- Ensure the demo database exists, if not create it
CREATE DATABASE IF NOT EXISTS UPRM_BIG_DATA;
USE DATABASE UPRM_BIG_DATA;

-- Create a schema for the demo
CREATE SCHEMA IF NOT EXISTS AI_ANALYTICS_DEMO;
USE SCHEMA AI_ANALYTICS_DEMO;

CREATE OR REPLACE STAGE yelp_public_stage
  URL = 's3://uprm-2025-demo-yelp';
  
SELECT 'Successfully created Snowflake Stage from S3' AS note;

In [None]:
-- Verify that you see a csv in the Snowflake stage
-- note that Parquet achieves ~2x compression compared to CSV
LIST @yelp_public_stage PATTERN='^(?!.*json).*';

# Step 2: Load data into Snowflake Table

In Step 1 we loaded raw files into a Snowflake Stage, but they do not exist in a table yet. Here we create a table and insert the data there.

In [None]:
-------------------------------------------------
-- 2. DDL for Yelp Table
-------------------------------------------------
CREATE OR REPLACE TABLE yelp_reviews (
    business_url STRING,
    business_name STRING,
    average_rating FLOAT,
    total_reviews STRING,
    price_range STRING,
    business_address STRING,
    contact_number STRING,
    latest_reviewer_name STRING,
    review_avatar_url STRING,
    review_id STRING,
    latest_reviewer_location STRING,
    latest_reviewer_rating FLOAT,
    review_date DATE,
    review_text STRING,
    helpful_count INT,
    thanks_count INT,
    love_this_count INT,
    oh_no_count INT,
    response_author_name STRING,
    response_date STRING,
    response_content STRING
);

-------------------------------------------------
-- 3. Load data into Yelp Table
-------------------------------------------------
COPY INTO yelp_reviews
FROM @yelp_public_stage/dataset_yelp-reviews-scraper_2025-10-06_15-10-02-775.parquet
FILE_FORMAT = (TYPE = PARQUET)
MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE
ON_ERROR = 'CONTINUE';

-------------------------------------------------
-- 4. Setup completion check
-------------------------------------------------
SELECT 'Successfully inserted data into Snowflake table' AS note;

In [None]:
SELECT * FROM yelp_reviews;

# Step 3: Analyze sentiment with LLM using a SQL query!

Yes, it's as simple as that! Snowflake takes care of the LLM infrastructure management and provides a SQL API to do LLM queries via `SNOWFLAKE.CORTEX` functions. Note that the Llama 3 8B is not hosted in your Virtual Warehouse since this is a Snowflake service. 

In [None]:
CREATE OR REPLACE TEMP TABLE yelp_reviews_classified as
(
SELECT 
  review_id,
  business_name,
  review_date,
  review_text,
  SNOWFLAKE.CORTEX.AI_COMPLETE(
    'llama3-8b',
    CONCAT(
      'You are a sentiment analysis assistant. ',
      'Classify the following restaurant review as Positive, Negative, or Neutral. ',
      'Only return one of those three words.\n\nReview: ',
      review_text
    )
  ) AS llm_sentiment
FROM yelp_reviews);

SELECT * FROM yelp_reviews_classified;

In [None]:
SELECT business_name, llm_sentiment, COUNT(1) from yelp_reviews_classified
GROUP BY ALL
ORDER BY business_name, llm_sentiment DESC 

Snowflake also offers direct sentiment analysis functions using cheaper, pre-trained ML models. The goal of this demo was to show the flexibility of this SQL API to directly analyze your data using LLMs without using external tools.

There are many other functions you can explore for your projects, which you can find here: https://docs.snowflake.com/user-guide/snowflake-cortex/aisql?lang=it%252f. Not all of them may be available using a Snowflake Free Trial but most of them should be available!