In [19]:
import numpy as np
from scipy import spatial
import nltk

# Part 1: Usupervised learning
## Meaning representation

Numerical data have a natural meaning for computational machine. Numbers can be compared, or distance between vectors can be measured. 

In [9]:
a = [1, 5, 6]
b = [-2, -6, 1]
distance = spatial.distance.cosine(a, b)
distance

1.5156862774427124

But this can not be easy done for words.

In [11]:
a = ['Weather', 'is', 'good']
b = ['Sun', 'is', 'shining']
distance = None # ?

Definition of **meaning**:
 1. the logical connotation of a word or phrase;
 2. what is intended to be, or actually is, expressed or indicated; signification;
 3. the thing that is conveyed especially by language.
 
Meaning of human language is a complex problem. The same words can have a different meaning in different contexts. Different words also can have a similar meaning (synonyms). Words with broader meaning can include meaning for more specific categories (hypernyms and hyponyms).

The simple solution for meaning representation is to manually mark up a graph of relations between words (WordNet).

In [40]:
from nltk.corpus import wordnet as wn
def synset_to_str(synset):
    return '({}) {}'.format(synset.pos(), ', '.join(map(str, synset.lemma_names())))

In [41]:
# synonyms
for synset in wn.synsets('evil'):
    print(synset_to_str(synset))

(n) evil, immorality, wickedness, iniquity
(n) evil
(n) evil, evilness
(a) evil
(s) evil, vicious
(s) malefic, malevolent, malign, evil


In [45]:
# hypernyms
hypernyms = lambda s: s.hypernyms()
cat = wn.synset('cat.n.01')
for synset in list(cat.closure(hypernyms)):
    print(synset_to_str(synset))

(n) feline, felid
(n) carnivore
(n) placental, placental_mammal, eutherian, eutherian_mammal
(n) mammal, mammalian
(n) vertebrate, craniate
(n) chordate
(n) animal, animate_being, beast, brute, creature, fauna
(n) organism, being
(n) living_thing, animate_thing
(n) whole, unit
(n) object, physical_object
(n) physical_entity
(n) entity


In [46]:
hyponyms = lambda s: s.hyponyms()
for synset in list(cat.closure(hyponyms)):
    print(synset_to_str(synset))

(n) domestic_cat, house_cat, Felis_domesticus, Felis_catus
(n) wildcat
(n) Abyssinian, Abyssinian_cat
(n) alley_cat
(n) Angora, Angora_cat
(n) Burmese_cat
(n) Egyptian_cat
(n) kitty, kitty-cat, puss, pussy, pussycat
(n) Maltese, Maltese_cat
(n) Manx, Manx_cat
(n) mouser
(n) Persian_cat
(n) Siamese_cat, Siamese
(n) tabby, tabby_cat
(n) tabby, queen
(n) tiger_cat
(n) tom, tomcat
(n) tortoiseshell, tortoiseshell-cat, calico_cat
(n) cougar, puma, catamount, mountain_lion, painter, panther, Felis_concolor
(n) European_wildcat, catamountain, Felis_silvestris
(n) jaguarundi, jaguarundi_cat, jaguarondi, eyra, Felis_yagouaroundi
(n) jungle_cat, Felis_chaus
(n) kaffir_cat, caffer_cat, Felis_ocreata
(n) leopard_cat, Felis_bengalensis
(n) lynx, catamount
(n) manul, Pallas's_cat, Felis_manul
(n) margay, margay_cat, Felis_wiedi
(n) ocelot, panther_cat, Felis_pardalis
(n) sand_cat
(n) serval, Felis_serval
(n) tiger_cat, Felis_tigrina
(n) blue_point_Siamese
(n) gib
(n) bobcat, bay_lynx, Lynx_rufus
(n)