<a href="https://colab.research.google.com/github/vinisilvanunes/P1PLN/blob/master/Aula_06_An%C3%A1lise_Sem%C3%A2ntica.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Aula 06** - Interpretação Semântica e Gramática

- Compreender os conceitos de **interpretação semântica** e sua importância no Processamento de Linguagem Natural (PLN).
- Explorar **estruturas gramaticais** e seu impacto na análise de textos.
- Aplicar técnicas de **análise semântica** em um corpus textual.
- Utilizar bibliotecas de PLN para **extrair significado de textos**.

## Exemplo 01 - Representação do significado das palavras e frases com resdes semânticas

In [2]:
import nltk
from nltk.corpus import wordnet

nltk.download('wordnet') #Banco de dados para utilização do sinônimos
nltk.download('omw-1.4') #Corpus que relaciona as palavras em diversos idiomas - tradução automática

sinonimos = wordnet.synsets("carro", lang="por") # Método para encontrar os sinônimos da palavra indicada e o idioma

print(sinonimos) # Imprimir a lista gerada

for s in sinonimos:
    print(s.lemmas()[0].name()) # Mostrar sinônimos da palavra
    """
      s.lemma(): obtém a lista de memmas(formas básicas da palavra) no synset atual.
      [0]: pega o primeiro lemma da lista.
      .name(): obtém o nome do lemma (o sinonimo em si)
      print(): imprime o sinonimo na tela
    """

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...


[Synset('beach_wagon.n.01'), Synset('car.n.01'), Synset('car.n.02'), Synset('cart.n.01')]
beach_wagon
car
car
cart


## Exemplo 2 - Representação do significado das palavras e frases por vetores (embeddings).

In [3]:
!pip install spacy
!python -m spacy download pt_core_news_md

Collecting pt-core-news-md==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/pt_core_news_md-3.8.0/pt_core_news_md-3.8.0-py3-none-any.whl (42.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.4/42.4 MB[0m [31m11.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pt-core-news-md
Successfully installed pt-core-news-md-3.8.0
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('pt_core_news_md')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


In [4]:
import spacy

nlp = spacy.load('pt_core_news_md') # Carregando o modelo pré treinado - modelo com relações entre palavras

# Criação de objetos com suas informações e vetores
palavra1 = nlp('rei')
palavra2 = nlp('rainha')

print(palavra1.similarity(palavra2)) # Cálculo de similaridade dos objetos vetorizados.
# Valores mais próxima de  1 =  palavras similares
# Valores mais próxima de -1 =  palavras antonimas

0.6001228094100952


## Exemplo 3 - Árvore Sintática

In [5]:
!python -m spacy download pt_core_news_sm

Collecting pt-core-news-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/pt_core_news_sm-3.8.0/pt_core_news_sm-3.8.0-py3-none-any.whl (13.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.0/13.0 MB[0m [31m40.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pt-core-news-sm
Successfully installed pt-core-news-sm-3.8.0
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('pt_core_news_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


In [6]:
import spacy
from spacy import displacy # Módulo para visualização de dependencias

nlp = spacy.load("pt_core_news_sm")
frase = "O cachorro correu no parque"

doc = nlp(frase)

displacy.render(doc, style="dep", jupyter=True)

## Exemplo 4 - Ontologia

In [7]:
!pip install owlready2

Collecting owlready2
  Downloading owlready2-0.47.tar.gz (27.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.3/27.3 MB[0m [31m30.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: owlready2
  Building wheel for owlready2 (pyproject.toml) ... [?25l[?25hdone
  Created wheel for owlready2: filename=owlready2-0.47-cp311-cp311-linux_x86_64.whl size=24577496 sha256=efb7494da9f76a17ecd56e4ade0a539b495695433f5d56c984d8ee44fa782ef0
  Stored in directory: /root/.cache/pip/wheels/25/9a/a3/fb1ac6339caa859c8bb18d685736168b0b51d851af13d81d52
Successfully built owlready2
Installing collected packages: owlready2
Successfully installed owlready2-0.47


In [8]:
from owlready2 import *

onto = get_ontology("http://exemplo.com/minha_ontologia.owl") # Criando uma nova ontologia

with onto:
  class Animal(Thing):pass
  class Mamifero(Animal):pass
  class Cachorro(Mamifero):pass
  class Gato(Mamifero):pass

onto.save("minha_ontologia.owl")

## Estudo de Caso 1 - Aplicação de Análise Semântica em um corpus

In [9]:
#importando as bibliotecas necessárias
import spacy
import nltk
import pandas as pd

from nltk.corpus import wordnet as wn # Banco de dados léxico - agrupa palavvras em conjutos de sinônimos

In [11]:
nltk.download('wordnet')
nltk.download('omw-1.4')

nlp = spacy.load('en_core_web_sm') # Acessar as funcionalidades como tokenização, análise sintática e vetore de palavras

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


In [13]:
# Texto do Estudo de caso
text = "Apple is looking at buying U.K. startup for $1 billion. Steve Jobs founded Apple in 1976."

# 1. Análise Sintática
doc = nlp(text)
syntatic_data = []

for token in doc:
  syntatic_data.append({
      "Token": token.text,
      "Pos-tag": token.pos_,
      "Dependência": token.dep_,
      "Cabeça da Dep": token.head.text
  })

# Covertendo para DataFrame
df_syntatic = pd.DataFrame(syntatic_data)
print("\n Análise Sintática:")
print(df_syntatic)


 Análise Sintática:
      Token Pos-tag Dependência Cabeça da Dep
0     Apple   PROPN       nsubj       looking
1        is     AUX         aux       looking
2   looking    VERB        ROOT       looking
3        at     ADP        prep       looking
4    buying    VERB       pcomp            at
5      U.K.   PROPN       nsubj       startup
6   startup    VERB       ccomp        buying
7       for     ADP        prep       startup
8         $     SYM    quantmod       billion
9         1     NUM    compound       billion
10  billion     NUM        pobj           for
11        .   PUNCT       punct       looking
12    Steve   PROPN    compound          Jobs
13     Jobs   PROPN       nsubj       founded
14  founded    VERB        ROOT       founded
15    Apple   PROPN        dobj       founded
16       in     ADP        prep       founded
17     1976     NUM        pobj            in
18        .   PUNCT       punct       founded


In [14]:
# 2. Reconhecimento de Entidades Nomeadas (NER)
entities_data = []

for ent in doc.ents:
  entities_data.append({
      "Entidade": ent.text,
      "Tipo": ent.label_
  })

# Covertendo para DataFrame
df_entities = pd.DataFrame(entities_data)
print("\n Reconhecimento de Entidades:")
print(df_entities)


 Reconhecimento de Entidades:
     Entidade    Tipo
0       Apple     ORG
1        U.K.     GPE
2  $1 billion   MONEY
3  Steve Jobs  PERSON
4       Apple     ORG
5        1976    DATE


In [15]:
# 3. Análise Semântica com WordNet
semantic_data = []

for token in doc:
  synsets = wn.synsets(token.text)
  if synsets:
    semantic_data.append({
        "Palavra": token.text,
        "Significado": synsets[0].definition(),
        "Exemplo": synsets[0].examples()
    })

# Covertendo para DataFrame
df_semantic = pd.DataFrame(semantic_data)
print("\n Análise Semântica:")
print(df_semantic)


 Reconhecimento de Entidades:
    Palavra  ...                                            Exemplo
0     Apple  ...                                                 []
1        is  ...          [John is rich, This is not a good answer]
2   looking  ...  [he went out to have a look, his look was fixe...
3        at  ...                                                 []
4    buying  ...  [buying and selling fill their days, shrewd pu...
5      U.K.  ...                                                 []
6   startup  ...    [repeated shutdowns and startups are expensive]
7         1  ...  [he has the one but will need a two and three ...
8   billion  ...                                                 []
9      Jobs  ...                  [he's not in my line of business]
10  founded  ...                    [She set up a literacy program]
11    Apple  ...                                                 []
12       in  ...                                                 []

[13 rows x 3 col