# A simple demo for the OpenHowNet Python Package

To begin with, make sure you have installed **Python 3.X**. 

Also, the [**anytree**](https://pypi.org/project/anytree/) is required to be installed, which is the only dependency for OpenHowNet.

Next, you should follow the [instruction](https://github.com/thunlp/OpenHowNet#installation) to install **OpenHowNet** API. 

After that, you can import the module:

In [1]:
# Import the OpenHowNet module
import OpenHowNet

Then we can create a **HowNetDict** object:

In [2]:
# Initialize HowNetDict, you can initialize the similarity calculation module by setting the init_sim to True.
hownet_dict = OpenHowNet.HowNetDict(init_sim=False)

Initializing OpenHowNet succeeded!


Now the preparation work is all done. Let's explore some important features of HowNetDict.

## Basic Usage of OpenHowNet

### Get word annotations in HowNet

By default, the api will search the target word in both English and Chinese annotations in HowNet, which will cause significant search overhead. Note that if the target word does not exist in HowNet annotation, this api will simply return an empty list.

In [3]:
# Get the senses list annotated with "苹果".
result_list = hownet_dict.get_sense("苹果")
print("The number of retrievals: ", len(result_list))
print("An example of retrievals: ", result_list)

The number of retrievals:  8
An example of retrievals:  [No.244396|apple|苹果, No.244397|apple|苹果, No.244398|IPHONE|苹果, No.244399|apple|苹果, No.244400|iphone|苹果, No.244401|apple|苹果, No.244402|malus pumila|苹果, No.244403|orchard apple tree|苹果]


In OpenHowNet package, the detailed information of senses and sememes in HowNet are wrapped into classes.

In [4]:
# Get the detailed information of the sense.
sense_example = result_list[0]
print("Sense example:", sense_example)
print("Sense id: ",sense_example.No)
print("English word in the sense: ", sense_example.en_word)
print("Chinese word in the sense: ", sense_example.zh_word)
print("HowNet annotation of the sense: ", sense_example.Def)
print("Sememe list of the sense: ", sense_example.get_sememe_list())

Sense example: No.244396|apple|苹果
Sense id:  000000244396
English word in the sense:  apple
Chinese word in the sense:  苹果
HowNet annotation of the sense:  {computer|电脑:modifier={PatternValue|样式值:CoEvent={able|能:scope={bring|携带:patient={$}}}}{SpeBrand|特定牌子}}
Sememe list of the sense:  [computer|电脑, bring|携带, SpeBrand|特定牌子, PatternValue|样式值, able|能]


In [5]:
# Get the detailed information of the sememe.
sememe_example = sense_example.get_sememe_list().pop()
print("Sememe example: ", sememe_example)
print("The English annotation of the sememe: ", sememe_example.en)
print("The Chinese annotation of the sememe: ", sememe_example.zh)
print("The frequency of occurrence of the sememe in HowNet: ", sememe_example.freq)

Sememe example:  able|能
The English annotation of the sememe:  able
The Chinese annotation of the sememe:  能
The frequency of occurrence of the sememe in HowNet:  2861


You can visualize the retrieved HowNet structured annotations ("sememe tree") of sense as follow :

In [6]:
sense_example.visualize_sememe_tree()

[sense]No.244396|apple|苹果
└── [None]computer|电脑
    ├── [modifier]PatternValue|样式值
    │   └── [CoEvent]able|能
    │       └── [scope]bring|携带
    │           └── [patient]$
    └── [patient]SpeBrand|特定牌子



Besides, you can get the Sememe instance list by the English annotation or Chinese annotation. Similarily, you can set the language of the input or set the `strict` to `False` to fuzzy match the sememe.

In [7]:
sememe1 = hownet_dict.get_sememe('FormValue', language='en')
sememe2 = hownet_dict.get_sememe('圆', language='zh')
print("Retrieved sememes: ",sememe1, sememe2)

sememe3 = hownet_dict.get_sememe('值', strict=False)
print("Fuzzy match the sememes (retrieved {} results): ".format(len(sememe3)), sememe3[:5])

sememe_all = hownet_dict.get_all_sememes()
print("There are {} sememes in HowNet in total.".format(len(sememe_all)))

Retrieved sememes:  [FormValue|形状值] [round|圆]
Fuzzy match the sememes (retrieved 249 results):  [PropertyValue|特性值, FinenessValue|粗细值, AgeValue|年龄值, DistanceValue|距离值, PerformanceValue|性能值]
There are 2540 sememes in HowNet in total.


To boost the efficiency of the search process, you can specify the language of the target word as the following.

In [8]:
print("The number of mixed search results:",len(hownet_dict.get_sense("X")))
print("The number of Chinese results:",len(hownet_dict.get_sense("X",language="zh")))
print("The number of English results:",len(hownet_dict.get_sense("X",language="en")))

The number of mixed search results: 3
The number of Chinese results: 3
The number of English results: 2


You can limit the POS of the target word by setting the `pos`.  Besides, you can set the `strict` to false to make a fuzzy match.

In [9]:

res = hownet_dict.get_sense("苹果", strict=False)
print("Fuzzy match: (The number of retrievals: {})".format(len(res)))
print("Retrivals: {}\n".format(res))
res = hownet_dict.get_sense("苹果",pos='adj', strict=False)
print("Fuzzy match and limit the POS to adj: (The number of retrievals: {})".format(len(res)))
print("Retrivals: {}".format(res))


Fuzzy match: (The number of retrievals: 32)
Retrivals: [No.244407|curry chicken with apple|苹果咖喱鸡, No.244408|apple orchard|苹果园, No.244409||苹果园西锦江之星, No.244413|apple|苹果树, No.244414|apple tree|苹果树, No.244415|apple juice|苹果汁, No.63141||北京双井苹果酒店, No.244416|apple pie|苹果派, No.244419|apple and fish soup|苹果煲生鱼汤, No.244420|MAC|苹果电脑, No.244421|mac|苹果电脑, No.63229||北京四季苹果酒店, No.244396|apple|苹果, No.244423|apple green|苹果绿, No.244424|apple gateau|苹果蛋糕, No.244397|apple|苹果, No.244425|cider|苹果酒, No.244398|IPHONE|苹果, No.244426|cyder|苹果酒, No.244399|apple|苹果, No.199378|baked apples|焗苹果, No.244427|hard cider|苹果酒, No.244400|iphone|苹果, No.244428|apple jam|苹果酱, No.244401|apple|苹果, No.244429|apple pie|苹果馅饼, No.244402|malus pumila|苹果, No.180902|French apple tart|法式苹果挞, No.244403|orchard apple tree|苹果, No.244404|Apple|苹果公司, No.244405|apple jelly|苹果冻, No.244406|apple-scented|苹果味]

Fuzzy match and limit the POS to adj: (The number of retrievals: 2)
Retrivals: [No.244423|apple green|苹果绿, No.244406|apple-scented|苹果味]


You can get all senses by using the follow API.

In [10]:
all_senses = hownet_dict.get_all_senses()
print("The number of all senses: {}".format(len(all_senses)))

The number of all senses: 237974


Besides, you can also get all the English or Chinese words in HowNet annotations.

In [11]:
zh_word_list = hownet_dict.get_zh_words()
en_word_list = hownet_dict.get_en_words()
print("Chinese words in HowNet: ",zh_word_list[:10])
print("English words in HowNet: ",en_word_list[:10])

Chinese words in HowNet:  ['', '下乘', '中国人民解放军', '分离器', '龙文区', '独木不成林', '牙痛', '镇静自若', '腹腔镜剖腹手术', '茫然不解']
English words in HowNet:  ['', 'lawn mower', 'hanging bed', 'Dresden bank', 'snares of love', 'MRA', 'whipsaw', 'ascend to heaven and become immortal', 'trade contacts', 'Android']


### Get Sememe Trees for certain word in HowNet¶

You can get the sememes by certain word in a variety of forms of presentation. Detailed explanation of params will be displayed in our documentation.
First, you can retrieve all the senses annotated with the certain word and their sememes.

In [12]:
# Get the respective sememe list of the senses annotated with the word.
# The word can be English or Chinese or *
hownet_dict.get_sememes_by_word(word = '苹果', display='list', merge=False, expanded_layer=-1, K=None)

[{'sense': No.244396|apple|苹果,
  'sememes': [computer|电脑, bring|携带, SpeBrand|特定牌子, PatternValue|样式值, able|能]},
 {'sense': No.244397|apple|苹果, 'sememes': [fruit|水果]},
 {'sense': No.244398|IPHONE|苹果,
  'sememes': [tool|用具,
   bring|携带,
   SpeBrand|特定牌子,
   PatternValue|样式值,
   communicate|交流,
   able|能]},
 {'sense': No.244399|apple|苹果,
  'sememes': [tool|用具,
   bring|携带,
   SpeBrand|特定牌子,
   PatternValue|样式值,
   communicate|交流,
   able|能]},
 {'sense': No.244400|iphone|苹果,
  'sememes': [tool|用具,
   bring|携带,
   SpeBrand|特定牌子,
   PatternValue|样式值,
   communicate|交流,
   able|能]},
 {'sense': No.244401|apple|苹果, 'sememes': [reproduce|生殖, fruit|水果, tree|树]},
 {'sense': No.244402|malus pumila|苹果,
  'sememes': [reproduce|生殖, fruit|水果, tree|树]},
 {'sense': No.244403|orchard apple tree|苹果,
  'sememes': [reproduce|生殖, fruit|水果, tree|树]}]

The `display` can be set to "tree"/"dict"/"list"/"visual", and the function will return in different forms.
1. When set to "list", the sememes will be returned in the form of list as shown above.
2. When set to "dict", the function will return the sememe tree in the form of dict as shown below.

In [13]:
hownet_dict.get_sememes_by_word(word = '苹果', display='dict', merge=False, expanded_layer=-1, K=None)[0]

{'sense': No.244396|apple|苹果,
 'sememes': {'role': 'sense',
  'name': No.244396|apple|苹果,
  'children': [{'role': 'None',
    'name': computer|电脑,
    'children': [{'role': 'modifier',
      'name': PatternValue|样式值,
      'children': [{'role': 'CoEvent',
        'name': able|能,
        'children': [{'role': 'scope',
          'name': bring|携带,
          'children': [{'role': 'patient', 'name': '$'}]}]}]},
     {'role': 'patient', 'name': SpeBrand|特定牌子}]}]}}

3. When set to "tree", the function will return the senses and the root node of their respective sememe tree. 

In [14]:
t = hownet_dict.get_sememes_by_word(word = '苹果', display='tree', merge=False, expanded_layer=-1, K=None)[0]
print(t)
print("The type of the root node is:", type(t['sememes']))

{'sense': No.244396|apple|苹果, 'sememes': Node('/No.244396|apple|苹果', role='sense')}
The type of the root node is: <class 'anytree.node.node.Node'>


4. When set to "visual", the function will visualize the Top-K sememe trees. At this point, `K` can be set to control the num of the visualized sememe trees. 

In [15]:
hownet_dict.get_sememes_by_word(word = '苹果', display='visual', merge=False, expanded_layer=-1, K=2)

Find 8 result(s)
Display #0 sememe tree
[sense]No.244396|apple|苹果
└── [None]computer|电脑
    ├── [modifier]PatternValue|样式值
    │   └── [CoEvent]able|能
    │       └── [scope]bring|携带
    │           └── [patient]$
    └── [patient]SpeBrand|特定牌子

Display #1 sememe tree
[sense]No.244397|apple|苹果
└── [None]fruit|水果



5. `merge` and `expanded_layer` only work when `display=="list"`. When `merge==True`, the sememe lists of all the senses retrieved by the word will be merged into one. `expanded_layer` is set to control the expanded layer num of the sememe tree and by default it will be set to -1(expanded all layers).

In [16]:
# Expand all layers and merge all the sememe list into one
hownet_dict.get_sememes_by_word(word = '苹果', display='list', merge=True, expanded_layer=-1, K=None)

[tool|用具,
 reproduce|生殖,
 tree|树,
 bring|携带,
 PatternValue|样式值,
 fruit|水果,
 computer|电脑,
 SpeBrand|特定牌子,
 communicate|交流,
 able|能]

In [17]:
# Expand the top2 layers and merge all the sememe list into one. Note that the first layer is the sense node. 
hownet_dict.get_sememes_by_word(word = '苹果', display='list', merge=True, expanded_layer=2, K=None)

[tool|用具, computer|电脑, tree|树, fruit|水果]

### Get sememes via relations between sememes

There are various relations between sememes as follows. The package provides api to retrieve related sememes.
You can retrieve the relation between two sememes by the annotation of the sememe.

In [18]:
all_sememe_relations = hownet_dict.get_all_sememe_relations()
print(all_sememe_relations)
# Get the relation between sememes. Please pay attention to the order of the sememes.
relations = hownet_dict.get_sememe_relation('FormValue','圆', return_triples=False)
print(relations)
# You can get the triples in the form of (head_sememe, relation, tail_relation)
triples = hownet_dict.get_sememe_relation('FormValue','圆', return_triples=True)
print(triples)

['hypernym', 'hyponym', 'antonym', 'converse']
['hyponym']
[(FormValue|形状值, 'hyponym', round|圆)]


If you want sememes that have the exact relation with some sememe, you can do as below. Note that you can also get triples.

In [19]:
triples = hownet_dict.get_related_sememes('FormValue', relation='hyponym',return_triples=True)
print(triples)

[(FormValue|形状值, 'hyponym', unformed|不成形), (FormValue|形状值, 'hyponym', round|圆), (FormValue|形状值, 'hyponym', square|方), (FormValue|形状值, 'hyponym', formed|成形), (FormValue|形状值, 'hyponym', angular|角), (FormValue|形状值, 'hyponym', netlike|网)]


Besides, you can get related sememes directly by the sememe instance.

In [20]:
print("Take {} as example.".format(sememe1[0]))
print("The sememes that have the relaiton of hyponym with the sememe are:")
print(sememe1[0].get_related_sememes(relation='hyponym'))

Take FormValue|形状值 as example.
The sememes that have the relaiton of hyponym with the sememe are:
[unformed|不成形, angular|角, square|方, round|圆, formed|成形, netlike|网]


Moreover, you can get all the sememes that have relation with the exact sememe (ignore the order).

In [21]:
print("The sememes that have relaiton with the sememe {} are:".format(sememe1[0]))
print(sememe1[0].get_related_sememes())

The sememes that have relaiton with the sememe FormValue|形状值 are:
[unformed|不成形, angular|角, round|圆, square|方, AppearanceValue|外观值, formed|成形, netlike|网]


## Advanced Feature #1: Word Similarity Calculation via Sememes

The following parts are mainly implemented by Jun Yan and integrated by Chenghao Yang. Our implementation is based on the paper:
> Jiangming Liu, Jinan Xu, Yujie Zhang. An Approach of Hybrid Hierarchical Structure for Word Similarity Computing by HowNet. In Proceedings of IJCNLP

### Extra initializaiton
Because there are some files required to be loaded for similarity calculation. The initialization overhead will be larger than before. To begin with, you can initialize the hownet_dict object as the following code :

In [22]:
hownet_dict_anvanced = OpenHowNet.HowNetDict(init_sim=True)

Initializing OpenHowNet succeeded!
Initializing similarity calculation succeeded!


You can also postpone the initialization work of similarity calculation until use. The following code serves as an example.

In [23]:
hownet_dict.initialize_similarity_calculation()

Initializing similarity calculation succeeded!


### Get senses that have the same sememe list
You can retrieve the senses that have the same sememe list with the exact sense. Note that the structured information is ignored.

In [24]:
print("Take sense {} as an example. Its sememes contains: ".format(sense_example))
print(sense_example.get_sememe_list())
print("Senses that have the same sememe list contains: ")
print(hownet_dict_anvanced.get_sense_synonyms(sense_example)[:10])

Take sense No.244396|apple|苹果 as an example. Its sememes contains: 
[computer|电脑, bring|携带, SpeBrand|特定牌子, PatternValue|样式值, able|能]
Senses that have the same sememe list contains: 
[No.16651|IBM|IBM, No.28840|Toshiba|东芝, No.65913|HUAWEI|华为, No.135177|Dell|戴尔, No.226574|Sony|索尼, No.235535|Lenovo|联想, No.244396|apple|苹果, No.244420|MAC|苹果电脑, No.244421|mac|苹果电脑]


### Get Top-K Nearest Words for the Given Word
Given an exact word, the function will return the Top-K nearest words in HowNet.
First of all, the HowNetDict will match the senses in HowNet by the word and give the nearest words separately.
Note that you must set the language of the words, and the calculation may takes a long time.

In [25]:
hownet_dict_anvanced.get_nearest_words('苹果', language='zh',K=5)

{No.244396|apple|苹果: ['IBM', '东芝', '华为', '戴尔', '索尼'],
 No.244397|apple|苹果: ['丑橘', '乌梅', '五敛子', '凤梨', '刺梨'],
 No.244398|IPHONE|苹果: ['OPPO', '华为', '苹果', '智能手机', '彩笔'],
 No.244399|apple|苹果: ['OPPO', '华为', '苹果', '智能手机', '彩笔'],
 No.244400|iphone|苹果: ['OPPO', '华为', '苹果', '智能手机', '彩笔'],
 No.244401|apple|苹果: ['山梨', '山楂', '山楂树', '山里红', '开心果树'],
 No.244402|malus pumila|苹果: ['山梨', '山楂', '山楂树', '山里红', '开心果树'],
 No.244403|orchard apple tree|苹果: ['山梨', '山楂', '山楂树', '山里红', '开心果树']}

You can get the similarity score as below:

In [26]:
hownet_dict_anvanced.get_nearest_words('苹果', language='zh',K=5,score=True)

{No.244396|apple|苹果: [('IBM', 1.0),
  ('东芝', 1.0),
  ('华为', 1.0),
  ('戴尔', 1.0),
  ('索尼', 1.0)],
 No.244397|apple|苹果: [('丑橘', 1.0),
  ('乌梅', 1.0),
  ('五敛子', 1.0),
  ('凤梨', 1.0),
  ('刺梨', 1.0)],
 No.244398|IPHONE|苹果: [('OPPO', 1.0),
  ('华为', 1.0),
  ('苹果', 1.0),
  ('智能手机', 0.9428571428571428),
  ('彩笔', 0.836074074074074)],
 No.244399|apple|苹果: [('OPPO', 1.0),
  ('华为', 1.0),
  ('苹果', 1.0),
  ('智能手机', 0.9428571428571428),
  ('彩笔', 0.836074074074074)],
 No.244400|iphone|苹果: [('OPPO', 1.0),
  ('华为', 1.0),
  ('苹果', 1.0),
  ('智能手机', 0.9428571428571428),
  ('彩笔', 0.836074074074074)],
 No.244401|apple|苹果: [('山梨', 1.0),
  ('山楂', 1.0),
  ('山楂树', 1.0),
  ('山里红', 1.0),
  ('开心果树', 1.0)],
 No.244402|malus pumila|苹果: [('山梨', 1.0),
  ('山楂', 1.0),
  ('山楂树', 1.0),
  ('山里红', 1.0),
  ('开心果树', 1.0)],
 No.244403|orchard apple tree|苹果: [('山梨', 1.0),
  ('山楂', 1.0),
  ('山楂树', 1.0),
  ('山里红', 1.0),
  ('开心果树', 1.0)]}

By setting the `merge` to True, you can merge the words list of senses into one and get the Top-K words.

In [27]:
hownet_dict_anvanced.get_nearest_words('苹果', language='zh',K=5, merge=True)

['IBM', '东芝', '华为', '戴尔', '索尼']

Detailed explanation of params will be displayed in our documentation.

### Calculate the Similarity for the Given Two Words¶
If any of the given words does not exist in HowNet annotations, this function will return -1.

In [28]:
print('The similarity of 苹果 and 梨 is {}.'.format(hownet_dict_anvanced.calculate_word_similarity('苹果','梨')))

The similarity of 苹果 and 梨 is 1.0.


## Advanced Feature #2: BabelNet Synset Search

### Extra initializaiton
Because there are more files required to be loaded for BabelNet dict. The initialization overhead will be larger than before. You can initialize the hownet_dict object as the following code :

In [29]:
hownet_dict_anvanced = OpenHowNet.HowNetDict(init_babel=True)

Initializing OpenHowNet succeeded!
Initializing BabelNet Synset Dict succeeded!


Or you can use the following API to initialize the BabelNet dict.

In [30]:
hownet_dict.initialize_babelnet_dict()

Initializing BabelNet Synset Dict succeeded!


You can retrieve a synset instance and get the abundant information in it using the follow APIs.

In [31]:
syn_list = hownet_dict_anvanced.get_synset('黄色')
print("{} results are retrieved and take the first one as an example".format(len(syn_list)))
syn_example = syn_list[0]
print("Synset: {}".format(syn_example))
print("English synonyms: {}".format(syn_example.en_synonyms))
print("Chinese synonyms: {}".format(syn_example.zh_synonyms))
print("English glosses: {}".format(syn_example.en_glosses))
print("Chinese glosses: {}".format(syn_example.zh_glosses))

3 results are retrieved and take the first one as an example
Synset: bn:00081866n|yellow|黄色
English synonyms: ['yellow', 'yellowness', 'yellow_color', 'ffff00', 'color_yellow', 'rgb', 'dark_yellow', 'symbolism_of_yellow', 'yelow', 'yelloww', 'yellower', 'yellow_colour', '(255,_255,_0)', 'yellowy', 'royal_yellow', 'colour_yellow', 'y', 'yellowest']
Chinese synonyms: ['黄色', '黄', 'yellow', '黃色']
English glosses: ['Yellow color or pigment; the chromatic color resembling the hue of sunflowers or ripe lemons', 'Yellow is the color between orange and green on the spectrum of visible light.', 'In the CMYK color model', 'Color', 'Color evoked by light that stimulates both the long and medium-wavelength cone cells of the retina about equally, but does not significantly stimulate the short-wavelength cone cells.', 'The colour of gold, butter, or a lemon; the colour obtained by mixing green and red light, or by subtracting blue from white light.', 'Colour.']
Chinese glosses: ['黃色是由波長介於565至590奈米的光線

You can also get all the synsets and relations between synsets:

In [32]:
all_synsets = hownet_dict_anvanced.get_all_babel_synsets()
all_synset_relation = hownet_dict_anvanced.get_all_synset_relations()
print("There are {} synsets and {} relations".format(len(all_synsets),len(all_synset_relation)))

There are 15755 synsets and 403 relations


Also, you can search for the synsets that have the exact relation with the synset.

In [33]:
related_synsets = syn_example.get_related_synsets()
print("There are {} synsets that have relation with the {}, they are: ".format(len(related_synsets), syn_example))
print(related_synsets[:10])

There are 756 synsets that have relation with the bn:00081866n|yellow|黄色, they are: 
[bn:00006911n|gold|金, bn:00003956n|andorra|安道尔, bn:00050669n|lenin|列宁, bn:00004108n|angiosperm|被子植物, bn:00055052n|mm|毫米, bn:00061778n|plague|鼠疫, bn:00043411n|wild_pansy|三色堇, bn:00010205n|bhutan|不丹, bn:00113968a|yellow|黄, bn:00011477n|blue|蓝色]


The package also provides search for the sememe list by the BabelNet sememe annotations.
The API is similar with the HowNet APIs.

In [34]:
print(hownet_dict_anvanced.get_sememes_by_word_in_BabelNet('黄色'))
print(hownet_dict_anvanced.get_sememes_by_word_in_BabelNet('黄色',merge=True))

[{'synset': bn:00081866n|yellow|黄色, 'sememes': [yellow|黄]}, {'synset': bn:00113968a|yellow|黄, 'sememes': [yellow|黄]}, {'synset': bn:00101430a|dirty|淫秽的, 'sememes': [lascivious|淫, dirty|龊, despicable|卑劣, BadSocial|坏风气]}]
[despicable|卑劣, BadSocial|坏风气, dirty|龊, yellow|黄, lascivious|淫]
