# Word Sense Disambiguation with the Simplified Lesk Algorithm

In natural language, many words have multiple meanings (or "senses"). For example, the word **"bank"** can refer to a **financial institution** or the **side of a river**.

**Word Sense Disambiguation (WSD)** is the task of identifying which meaning of a word is used in a particular context.

In this notebook, we'll explore the **Simplified Lesk Algorithm**, a classic approach to WSD that uses dictionary definitions (called *glosses*) to identify the best-fitting sense of a word based on context.


Prerequisites

In [28]:
!pip install -q nltk

import nltk
nltk.download('wordnet', quiet=True)
nltk.download('punkt_tab', quiet=True);

## How the Simplified Lesk Algorithm Works

The basic idea of the algorithm is:
- For each possible meaning (sense) of the target word:
  - Look at the definition (and example sentences) of that sense.
  - Count how many words from that definition overlap with the words in the context sentence.
- Choose the sense with the highest overlap.

Now let's implement this in code.


In [31]:
from nltk.corpus import wordnet as wn
from nltk.tokenize import word_tokenize

def similarities(word, sentence):
  # We could/should also do stopword removal and lemmatization here
  context = set(word_tokenize(sentence.lower()))
  for sense in wn.synsets(word):
    # Gloss and example sentences as sense_context
    sense_context = set(word_tokenize(sense.definition().lower()))
    for example in sense.examples():
        sense_context.update(word_tokenize(example.lower()))

    # Compute overlap (number of words shared between context and sense context)
    overlap = len(context & sense_context)
    yield sense, overlap

def simplified_lesk(word: str, sentence: str):
  """
  Simplified Lesk algorithm: chooses the sense of a word that has the most overlap
  with the context (surrounding words in the sentence).
  """
  return max(similarities(word, sentence), key=lambda x: x[1])[0]

# 🔍 Example usage
def test(word: str, sentence: str, verbose: bool=False):
  sense = simplified_lesk(word, sentence)

  print(f"Best sense for '{word}':")
  print(f" - Definition: {sense.definition()}")
  print(f" - Examples: {sense.examples()}")

  if verbose:
    print()
    print(f"Overlaps:")
    for sense, overlap in sorted(list(similarities(word, sentence)), key=lambda x: x[1], reverse=True):
      print(f"[{overlap}] {sense.definition()}")
      print(f"    {sense.examples()}")

## Example: Disambiguating "bank"

Let's use the algorithm to determine the correct sense of the word **"bank"** in the sentence:

> He went to the bank to deposit some money into his account and keep it safe

We expect the algorithm to choose the **financial institution** meaning based on the context words like *deposit*, *money*, *account*, and *safe*.


In [33]:
test("bank", "he went to the bank to deposit some money into his account and keep it safe", verbose=True)

Best sense for 'bank':
 - Definition: a financial institution that accepts deposits and channels the money into lending activities
 - Examples: ['he cashed a check at the bank', 'that bank holds the mortgage on my home']

Overlaps:
[6] a financial institution that accepts deposits and channels the money into lending activities
    ['he cashed a check at the bank', 'that bank holds the mortgage on my home']
[5] the funds held by a gambling house or the dealer in some gambling games
    ['he tried to break the bank at Monte Carlo']
[4] sloping land (especially the slope beside a body of water)
    ['they pulled the canoe up on the bank', 'he sat on the bank of the river and watched the currents']
[4] a flight maneuver; aircraft tips laterally about its longitudinal axis (especially in turning)
    ['the plane went into a steep bank']
[3] a container (usually with a slot in the top) for keeping money at home
    ['the coin bank was empty']
[3] a building in which the business of banking t

In [34]:
test("bank", "the plane banked to the right")

Best sense for 'bank':
 - Definition: the funds held by a gambling house or the dealer in some gambling games
 - Examples: ['he tried to break the bank at Monte Carlo']


In [35]:
test("bank", "the pilot banked the aircraft to the right")

Best sense for 'bank':
 - Definition: tip laterally
 - Examples: ['the pilot had to bank the aircraft']


## Summary
- The Simplified Lesk Algorithm is a knowledge-based method for word sense disambiguation.
- It uses definitions (glosses) from **WordNet** to find overlaps with the sentence context.
- While simple, it often gives surprisingly good results when context is rich.
