Write a Python program to calculate the cosine similarity between two strings using the Scikit-learn library. You can use the TfidfVectorizer' class to transform the text into vectors.

In [2]:
pip install scikit-learn

Collecting scikit-learn
  Downloading scikit_learn-1.6.1-cp312-cp312-win_amd64.whl.metadata (15 kB)
Collecting threadpoolctl>=3.1.0 (from scikit-learn)
  Downloading threadpoolctl-3.5.0-py3-none-any.whl.metadata (13 kB)
Downloading scikit_learn-1.6.1-cp312-cp312-win_amd64.whl (11.1 MB)
   ---------------------------------------- 0.0/11.1 MB ? eta -:--:--
   ---------------------------------------- 0.0/11.1 MB ? eta -:--:--
   ---------------------------------------- 0.1/11.1 MB 1.7 MB/s eta 0:00:07
   - -------------------------------------- 0.5/11.1 MB 5.4 MB/s eta 0:00:02
   --- ------------------------------------ 0.9/11.1 MB 6.5 MB/s eta 0:00:02
   ----- ---------------------------------- 1.6/11.1 MB 7.7 MB/s eta 0:00:02
   ------- -------------------------------- 2.1/11.1 MB 8.8 MB/s eta 0:00:02
   -------- ------------------------------- 2.3/11.1 MB 9.3 MB/s eta 0:00:01
   --------- ------------------------------ 2.5/11.1 MB 7.7 MB/s eta 0:00:02
   ----------- -------------------


[notice] A new release of pip is available: 24.0 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [3]:

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

def cosine_sim(text1, text2):
  """
  Calculates the cosine similarity between two strings.

  Args:
    text1: The first string.
    text2: The second string.

  Returns:
    The cosine similarity score between the two strings.
  """

  # Create a TfidfVectorizer object
  vectorizer = TfidfVectorizer(stop_words='english')

  # Create a list of strings
  texts = [text1, text2]

  # Generate the TF-IDF vectors
  tfidf_matrix = vectorizer.fit_transform(texts)

  # Calculate the cosine similarity
  cosine_sim = cosine_similarity(tfidf_matrix[0], tfidf_matrix[1])

  return cosine_sim[0][0]

# Example usage
text1 = "This is the first document."
text2 = "This document is the second document."

similarity_score = cosine_sim(text1, text2)
print("Cosine Similarity:", similarity_score)

Cosine Similarity: 0.8181802073667197
