<a href="https://colab.research.google.com/github/walkerjian/DailyCode/blob/main/Code_Craft_disambiguate_sentence_with_ai.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Problem:
Word sense disambiguation is the problem of determining which sense a word takes on in a particular setting, if that word has multiple meanings. For example, in the sentence "I went to get money from the bank", bank probably means the place where people deposit money, not the land beside a river or lake.

Suppose you are given a list of meanings for several words, formatted like so:

{
    "word_1": ["meaning one", "meaning two", ...],
    ...
    "word_n": ["meaning one", "meaning two", ...]
}
Given a sentence, most of whose words are contained in the meaning list above, create an algorithm that determines the likely sense of each possibly ambiguous word.

##Solution:
Creating an algorithm for word sense disambiguation (WSD) involves several steps, including understanding the context in which a word is used, comparing it to the known meanings, and determining the most likely meaning based on that context. For this task, we'll design a simple WSD algorithm that follows these general steps:

1. **Preprocess the Sentence**: Tokenize the sentence into words, and possibly perform lemmatization (to get the base form of words) if the meanings are provided in their base forms.

2. **Contextual Analysis**: For each potentially ambiguous word, analyze the surrounding words (context) to gather clues about the intended meaning.

3. **Meaning Selection**: Compare the context of the ambiguous word with the provided meanings to select the most appropriate one. This can be done through simple keyword matching, semantic similarity, or more advanced techniques.

For simplicity, our algorithm will use a basic keyword matching strategy. It will look for the presence of keywords in the meanings within the context of the sentence to select the most likely meaning.



##Implementation:
Here's a simplified version of the algorithm in Python:

```python
def disambiguate_sentence(sentence, meanings_dict):
    # Tokenize the sentence. For simplicity, we'll just split by spaces.
    words = sentence.lower().split()
    
    # Function to find the best meaning based on keyword matching
    def find_best_meaning(word, context):
        possible_meanings = meanings_dict.get(word, [])
        best_meaning = None
        max_overlap = 0
        
        for meaning in possible_meanings:
            # Split meaning into keywords
            meaning_keywords = set(meaning.split())
            # Calculate overlap with context
            overlap = len(meaning_keywords.intersection(context))
            if overlap > max_overlap:
                max_overlap = overlap
                best_meaning = meaning
                
        return best_meaning
    
    # Determine the likely sense of each ambiguous word
    disambiguated_words = {}
    for i, word in enumerate(words):
        if word in meanings_dict:
            # Consider words in a window around the current word as its context
            context_window = words[max(0, i-5):i] + words[i+1:min(len(words), i+6)]
            context_set = set(context_window)
            disambiguated_words[word] = find_best_meaning(word, context_set)
            
    return disambiguated_words

# Example usage:
meanings = {
    "bank": ["place for financial transactions", "land beside a river"],
    # Add more words and meanings as needed
}

sentence = "I went to get money from the bank"

disambiguated = disambiguate_sentence(sentence, meanings)
print(disambiguated)
```

This code snippet defines a function `disambiguate_sentence` that takes a sentence and a dictionary of words with their possible meanings. It then attempts to disambiguate each word in the sentence that has multiple meanings provided in the dictionary. The current implementation uses a basic form of keyword matching within a context window around each ambiguous word to select the most likely meaning.

For more sophisticated or accurate WSD, advanced techniques involving natural language processing (NLP) models, machine learning, or leveraging large language datasets for semantic analysis might be necessary.



In [1]:
def disambiguate_sentence(sentence, meanings_dict):
    # Tokenize the sentence. For simplicity, we'll just split by spaces.
    words = sentence.lower().split()

    # Function to find the best meaning based on keyword matching
    def find_best_meaning(word, context):
        possible_meanings = meanings_dict.get(word, [])
        best_meaning = None
        max_overlap = 0

        for meaning in possible_meanings:
            # Split meaning into keywords
            meaning_keywords = set(meaning.split())
            # Calculate overlap with context
            overlap = len(meaning_keywords.intersection(context))
            if overlap > max_overlap:
                max_overlap = overlap
                best_meaning = meaning

        return best_meaning

    # Determine the likely sense of each ambiguous word
    disambiguated_words = {}
    for i, word in enumerate(words):
        if word in meanings_dict:
            # Consider words in a window around the current word as its context
            context_window = words[max(0, i-5):i] + words[i+1:min(len(words), i+6)]
            context_set = set(context_window)
            disambiguated_words[word] = find_best_meaning(word, context_set)

    return disambiguated_words

# Example usage:
meanings = {
    "bank": ["place for financial transactions", "land beside a river"],
    # Add more words and meanings as needed
}

sentence = "I went to get money from the bank"

disambiguated = disambiguate_sentence(sentence, meanings)
print(disambiguated)


{'bank': None}


In [3]:
!pip install openai

Collecting openai
  Downloading openai-1.12.0-py3-none-any.whl (226 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/226.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━[0m [32m143.4/226.7 kB[0m [31m4.1 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m226.7/226.7 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m7.9 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.4-py3-none-any.whl (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.8/77.8 kB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-p

In [8]:
!pip install openai==0.28

Collecting openai==0.28
  Downloading openai-0.28.0-py3-none-any.whl (76 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/76.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.5/76.5 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: openai
  Attempting uninstall: openai
    Found existing installation: openai 1.12.0
    Uninstalling openai-1.12.0:
      Successfully uninstalled openai-1.12.0
Successfully installed openai-0.28.0


In [12]:
import openai
import os
from google.colab import userdata

def disambiguate_sentence_with_ai(sentence, meanings_dict):
    # Retrieve the OpenAI API key from Colab secrets
    openai_api_key = userdata.get('OPENAI_API_KEY')

    # Set the OpenAI API key
    openai.api_key = openai_api_key

    # Ensure that openai_api_key is not None or empty before proceeding
    if openai_api_key:
        print("API key is set successfully.")
    else:
        print("API key is not set. Please check your Colab secrets.")


    disambiguated_words = {}

    for word, meanings in meanings_dict.items():
        # Formulate a prompt asking the AI to choose the correct meaning
        prompt = f"Given the sentence: '{sentence}', what is the meaning of '{word}'? \n\n"
        prompt += "Options:\n"
        for i, meaning in enumerate(meanings, 1):
            prompt += f"{i}. {meaning}\n"

        # Adjusted to conform with the new API interface
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": "You are a highly intelligent AI trained to understand and generate human-like text."},
                {"role": "user", "content": prompt}
            ]
        )

        # Assuming the response text is the most appropriate meaning directly
        disambiguated_words[word] = response['choices'][0]['message']['content'].strip()

    return disambiguated_words

# Example usage:
meanings = {
    "bank": ["place for financial transactions", "land beside a river"],
    # Add more words and meanings as needed
}

sentence = "I went to get money from the bank"

disambiguated = disambiguate_sentence_with_ai(sentence, meanings)
print(disambiguated)


API key is set successfully.
{'bank': 'In the given sentence "I went to get money from the bank," the meaning of "bank" is option 1: a place for financial transactions. It refers to a financial institution where people deposit and withdraw money, manage accounts, and perform various financial transactions.'}


##Testing: