# Setting Up spaCy in Google Colab

This notebook provides a comprehensive guide to setting up spaCy, using NLP features like tokenization, POS tagging, NER, dependency parsing, and similarity checks.

## 1. Installing spaCy and Its Models

Let's install spaCy and download required language models including transformer-based models.

In [None]:
!pip install spacy -q
!python -m spacy download en_core_web_sm
!pip install spacy-transformers -q
!python -m spacy download en_core_web_trf

## 2. Loading spaCy Models

We'll load both the small model and transformer-based model.

In [None]:
import spacy

nlp_sm = spacy.load('en_core_web_sm')
nlp_trf = spacy.load('en_core_web_trf')

## 3. Tokenization
Tokenization is the process of breaking text into individual tokens (words, punctuation, etc.).

In [None]:
text = "Natural Language Processing (NLP) enables computers to understand human language."

doc_sm = nlp_sm(text)
tokens_sm = [token.text for token in doc_sm]
print("Tokens (Small Model):", tokens_sm)

doc_trf = nlp_trf(text)
tokens_trf = [token.text for token in doc_trf]
print("Tokens (Transformer Model):", tokens_trf)

## 4. Part-of-Speech (POS) Tagging

POS tagging assigns parts of speech to each token (e.g., noun, verb, adjective).

In [None]:
pos_tags_sm = [(token.text, token.pos_) for token in doc_sm]
print("POS Tags (Small Model):", pos_tags_sm)

pos_tags_trf = [(token.text, token.pos_) for token in doc_trf]
print("POS Tags (Transformer Model):", pos_tags_trf)

## 5. Named Entity Recognition (NER)

NER identifies entities like people, places, and organizations within text.

In [None]:
entities_sm = [(ent.text, ent.label_) for ent in doc_sm.ents]
print("Entities (Small Model):", entities_sm)

entities_trf = [(ent.text, ent.label_) for ent in doc_trf.ents]
print("Entities (Transformer Model):", entities_trf)

## 6. Visualizing Dependency Trees

Dependency parsing identifies grammatical relationships between words.

In [None]:
from spacy import displacy
displacy.render(doc_sm, style='dep', jupyter=True)

## 7. Word Similarity

Word similarity measures how similar two words are semantically.

In [None]:
doc = nlp_sm("apple orange")
similarity = doc[0].similarity(doc[1])
print(f"Similarity between 'apple' and 'orange': {similarity:.2f}")