# Key Phrase Extraction

Key phrase extraction identifies the main points from text.
1. Use sentiment analysis to determine the sentiment of a review.
2. Key phrase extraction to identify important elements of the review.

## Methods

Method|Best For|Pros|Cons
---|---|---|---
spaCy (Noun Chunks)|Simple phrase extraction|Fast, rule-based|No ranking, misses verbs
YAKE|Unsupervised keyword extraction|No training needed, good for short texts|No deep understanding of meaning
KeyBERT|Deep semantic key phrases|Context-aware, best for accuracy|Needs more resources
RAKE|Phrase-based extraction|Works well for long documents|Can pick up unimportant phrases

## 1. SpaCy + noun chunks (basic method)

Extracts noun phrases (good for short key phrases).

**Pros:** Fast and simple\
**Cons:** Doesn't rank importance

In [17]:
sample_short_text = "Elon Musk founded SpaceX and Tesla, leading innovations in space travel and electric cars."

sample_long_text = """
    Artificial Intelligence (AI) has rapidly evolved over the past few decades, transforming industries, reshaping economies, and revolutionizing human interactions with technology. The journey of AI began in the mid-20th century when pioneers like Alan Turing and John McCarthy laid the theoretical foundations for machine intelligence. Early AI systems focused on rule-based approaches and expert systems, which, although powerful for specific tasks, lacked the adaptability of modern machine learning models.
    With the rise of deep learning in the 2010s, AI took a significant leap forward. Neural networks, inspired by the structure of the human brain, enabled machines to recognize speech, translate languages, and even generate realistic images. Companies like Google, OpenAI, and Tesla leveraged deep learning to create state-of-the-art AI applications. Self-driving cars, natural language processing (NLP), and recommendation algorithms became mainstream.
    Despite these advancements, AI still faces ethical and technical challenges. Bias in AI models, data privacy concerns, and the impact of automation on employment are widely debated topics. Researchers continue to develop responsible AI frameworks to ensure fairness, transparency, and accountability in AI-driven decision-making.
    Looking ahead, AI is expected to become even more integrated into daily life. Innovations in healthcare, education, and robotics promise to enhance human capabilities while raising important ethical considerations. As AI progresses, balancing innovation with ethical responsibility will be crucial for shaping a future where artificial intelligence benefits all of humanity.
"""

sample_long_text = sample_long_text.replace("\n", "")
sample_long_text = sample_long_text.replace("    ", "")

In [19]:
import spacy

# Load the English language model for spaCy
nlp = spacy.load("en_core_web_sm")

# Process the texts
short_text_doc = nlp(sample_short_text)
long_text_doc = nlp(sample_long_text)

# Extract noun phrases from the texts
short_text_noun_phrases = [chunk.text for chunk in short_text_doc.noun_chunks]
long_text_noun_phrases = [chunk.text for chunk in long_text_doc.noun_chunks]

# Display the noun phrases
print("Noun Phrases in Short Text:")
print(short_text_noun_phrases)
print("\nNoun Phrases in Long Text:")
print(long_text_noun_phrases)

Noun Phrases in Short Text:
['Elon Musk', 'SpaceX', 'Tesla', 'leading innovations', 'space travel', 'electric cars']

Noun Phrases in Long Text:
['Artificial Intelligence', '(AI', 'the past few decades', 'transforming industries', 'economies', 'human interactions', 'technology', 'The journey', 'AI', 'the mid-20th century', 'pioneers', 'Alan Turing', 'John McCarthy', 'the theoretical foundations', 'machine intelligence', 'Early AI systems', 'rule-based approaches', 'expert systems', 'which', 'specific tasks', 'the adaptability', 'modern machine learning models', 'the rise', 'deep learning', 'AI', 'a significant leap', 'Neural networks', 'the structure', 'the human brain', 'machines', 'speech', 'translate languages', 'realistic images', 'Companies', 'Google', 'OpenAI', 'Tesla', 'deep learning', 'the-art', 'AI', 'Self-driving cars', 'natural language processing', 'NLP', 'recommendation algorithms', 'these advancements', 'AI', 'ethical and technical challenges', 'Bias', 'AI models', 'data 

## 2. YAKE (unsupervised keyword extraction)

Detects important phrases without training.

**Pros:** More intelligent than noun chunks, includes verb phrases

In [12]:
from yake import KeywordExtractor

# Initialize YAKE
keyword_extractor = KeywordExtractor(n=2, top=5)

# Extract keywords from the texts
short_text_keywords = keyword_extractor.extract_keywords(sample_short_text)
long_text_keywords = keyword_extractor.extract_keywords(sample_long_text)

# Display the keywords
print("Keywords in Short Text:")
print([stk[0] for stk in short_text_keywords])
print("\nKeywords in Long Text:")
print([ltk[0] for ltk in long_text_keywords])

Keywords in Short Text:
['Elon Musk', 'Musk founded', 'leading innovations', 'electric cars', 'founded SpaceX']

Keywords in Long Text:
['transforming industries', 'reshaping economies', 'rapidly evolved', 'Alan Turing', 'Artificial Intelligence']


## 3. KeyBERT (BERT-based for semantic key phrases)

Uses BERT embeddings to find the most relevant words.

**Pros:** Context-aware and captures meaning, not just frequency

In [15]:
from keybert import KeyBERT

# Initialize KeyBERT
keybert_model = KeyBERT()

# Extract keywords from the texts
short_text_keywords = keybert_model.extract_keywords(
    sample_short_text, keyphrase_ngram_range=(1, 2), stop_words="english"
)
long_text_keywords = keybert_model.extract_keywords(
    sample_long_text, keyphrase_ngram_range=(1, 2), stop_words="english"
)

# Display the keywords
print("Keywords in Short Text:")
print([stk[0] for stk in short_text_keywords])
print("\nKeywords in Long Text:")
print([ltk[0] for ltk in long_text_keywords])

Keywords in Short Text:
['musk founded', 'spacex tesla', 'elon musk', 'founded spacex', 'tesla']

Keywords in Long Text:
['ai driven', 'advancements ai', 'ai frameworks', 'responsible ai', 'ai']


## 4. RAKE (Rapid Automatic Keyword Extraction)

Extracts multi-word key phrases based on word frequency & co-occurrence.

**Pros:** Good for multi-word key phrases, but may not be as accurate as BERT.

In [16]:
from rake_nltk import Rake

# Initialize Rake
r = Rake()

# Extract keywords from the texts
r.extract_keywords_from_text(sample_short_text)
short_text_keywords = r.get_ranked_phrases()

r.extract_keywords_from_text(sample_long_text)
long_text_keywords = r.get_ranked_phrases()

# Display the keywords
print("Keywords in Short Text:")
print(short_text_keywords[:5])
print("\nKeywords in Long Text:")
print(long_text_keywords[:5])

Keywords in Short Text:
['elon musk founded spacex', 'space travel', 'leading innovations', 'electric cars', 'tesla']

Keywords in Long Text:
['recommendation algorithms became mainstream', 'pioneers like alan turing', 'raising important ethical considerations', 'even generate realistic images', 'develop responsible ai frameworks']
