<h1>Load data into dataframe</h1>

In [None]:
import pandas as pd
df = pd.read_json('Software_5.json.gz', lines=True, compression='gzip')
df.head()

In [None]:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

In [17]:
review_embeddings = model.encode(df['reviewText'].tolist(), convert_to_tensor=True)


In [18]:
query = "battery life"

queryEmbedding = model.encode(query, convert_to_tensor = True)



In [23]:
# Save the SentenceTransformer model and computed embeddings to disk
# Run this once after you have instantiated `model` and computed `review_embeddings`.

# 1) Save the model local copy (prevents re-downloading/re-instantiating)
model.save('models/all-MiniLM-L6-v2')

# 2) Save embeddings and the dataframe for later use
import numpy as np

# If review_embeddings is a torch tensor (convert_to_tensor=True), convert to numpy first
try:
    emb_numpy = review_embeddings.cpu().numpy()
except Exception:
    # if already numpy
    emb_numpy = np.array(review_embeddings)

np.save('review_embeddings.npy', emb_numpy)

# Save the dataframe (so you can map embeddings back to reviews)
# This saves the entire dataframe; change to subset if you prefer.
import pandas as pd
df.to_pickle('reviews_df.pkl')

print('Saved model -> models/all-MiniLM-L6-v2')
print('Saved embeddings -> review_embeddings.npy')
print('Saved dataframe -> reviews_df.pkl')

Saved model -> models/all-MiniLM-L6-v2
Saved embeddings -> review_embeddings.npy
Saved dataframe -> reviews_df.pkl


In [60]:
from sentence_transformers import SentenceTransformer, util
import numpy as np
import pandas as pd

# 1) Load the local model (no download)
model = SentenceTransformer('models/all-MiniLM-L6-v2')

# 2) Load embeddings and dataframe
embeddings = np.load('review_embeddings.npy')
df = pd.read_pickle('reviews_df.pkl')

# 3) Encode query
query = "Is Photoshop Good?"
query_embedding = model.encode(query)

# 4) Compute cosine similarities
scores = util.cos_sim(query_embedding, embeddings)[0]  # tensor shape: (num_reviews,)

# 5) Convert to NumPy
scores = scores.cpu().numpy()

# 6) Get top-k indices (highest scores first)
topk = 5
topk_indices = np.argsort(-scores)[:topk]

# 7) Display results
for idx in topk_indices:
    print(f"Score: {scores[idx]:.4f}")
    print(f"Review: {df.iloc[idx]['reviewText']}\n")

print('✅ Model and embeddings loaded successfully.')


Score: 0.7615
Review: OK, so I admit that I know nothing about photoshop but I wondered if this would be a good subsitute. Honestly, I'm not sure that it is. However, for about thirty bucks, it's perfect for a novice like me. I take a lot of digital photos and most are pretty lousy. I am still learning all the features and functions, but I do have to say that I've made some pics a bit better. It's also easier to crop and resize than in other applications that I have, but I'm not sure that's enough to warrant a 30 dollar purchase.

Score: 0.7472
Review: I know there are fans of this software, and it's a good buy, but I prefer Photoshop Elements (Adobe) and can't get into the PS Pro interface. It's ok but I find it slower and clunkier than Photoshop Elements. I suppose it's what you are used to, that is, a matter of taste. However, it will do the job of editing photos. Meh.

Score: 0.7440
Review: It's been a while since I used PaintShop Pro, and it's as good as I remember. It has tools f

In [57]:
topk_indices

array([11030,  4400,  5172, 11554,  5984])

In [58]:
df.iloc[11554]

overall                                                           4
verified                                                      False
reviewTime                                               11 3, 2015
reviewerID                                           A17HMM1M7T9PJ1
asin                                                     B0158RGNR8
style                                     {'Platform:': ' PC Disc'}
reviewerName                                       Timothy B. Riley
reviewText        I am a serious photographer and use several di...
summary           Very fitting for the advanced amateur photogra...
unixReviewTime                                           1446508800
vote                                                            NaN
image                                                           NaN
Name: 11554, dtype: object