# A simple demo for the HowNet Python Package

To begin with, make sure you have installed **Python 3.X**.  The **anytree** dependency is required to be installed.

It is our only required dependency because other python packages we need in building the **HowNet** Python Package will be **defaultly** installed with the Python 3.X. 

Then you should download the **HowNet** Package.  Please checkout the installation by import the **Standards** module like the following code:

In [1]:
import OpenHowNet
OpenHowNet.download()

openhownet_data.zip: 100%|██████████| 107433/107432.51 [00:11<00:00, 9259.61KB/s] 


After that we can build a **HowNet dict**:

In [2]:
hownet_dict = OpenHowNet.HowNetDict()

Finally, the preparation work is all done! Then let's explore some important features of HowNetDict!

# Basic Usage of OpenHowNet Python Package

## Get word annotations in HowNet
<b> By default, the api will search the target word in both English and Chinese annotations in HowNet, which will cause significant search overhead. Note that if the target word does not exist in HowNet annotation, this api will simply return an empty list. </b>

In [3]:
result_list = hownet_dict.get("苹果")
print("检索数量：",len(result_list))
print("检索结果范例:",result_list[0])

检索数量： 6
检索结果范例: {'Def': '{computer|电脑:modifier={PatternValue|样式值:CoEvent={able|能:scope={bring|携带:patient={$}}}}{SpeBrand|特定牌子}}', 'en_grammar': 'noun', 'ch_grammar': 'noun', 'No': '127151', 'syn': [{'id': '004024', 'text': 'IBM'}, {'id': '041684', 'text': '戴尔'}, {'id': '049006', 'text': '东芝'}, {'id': '106795', 'text': '联想'}, {'id': '156029', 'text': '索尼'}, {'id': '004203', 'text': 'iPad'}, {'id': '019457', 'text': '笔记本'}, {'id': '019458', 'text': '笔记本电脑'}, {'id': '019459', 'text': '笔记本电脑'}, {'id': '019460', 'text': '笔记本电脑'}, {'id': '019461', 'text': '笔记本电脑'}, {'id': '019463', 'text': '笔记簿电脑'}, {'id': '019464', 'text': '笔记簿电脑'}, {'id': '020567', 'text': '便携式电脑'}, {'id': '020568', 'text': '便携式计算机'}, {'id': '020569', 'text': '便携式计算机'}, {'id': '127224', 'text': '平板电脑'}, {'id': '127225', 'text': '平板电脑'}, {'id': '172264', 'text': '膝上型电脑'}, {'id': '172265', 'text': '膝上型电脑'}], 'ch_word': '苹果', 'en_word': 'apple'}


In [4]:
hownet_dict.get("test_for_non_exist_word")

[]

<b> You can visualize the retrieved HowNet structured annotations ("sememe tree") of the target word as follow : <br>
    (K=2 means only display 2 sememe trees) </b>

In [5]:
hownet_dict.visualize_sememe_trees("苹果", K=2)

Find 6 result(s)
Display #0 sememe tree
[sense]苹果
└── [None]computer|电脑
    ├── [modifier]PatternValue|样式值
    │   └── [CoEvent]able|能
    │       └── [scope]bring|携带
    │           └── [patient]$
    └── [patient]SpeBrand|特定牌子
Display #1 sememe tree
[sense]苹果
└── [None]fruit|水果


<b> To boost the efficiency of the search process, you can specify the language of the target word as the following. </b>

In [6]:
result_list = hownet_dict.get("苹果", language="zh")
print("单语检索数量：",len(result_list))
print("单语检索结果范例:",result_list[0])
print("-------双语混合检索测试---------")
print("混合检索结果数量:",len(hownet_dict.get("X")))
print("中文检索结果数量:",len(hownet_dict.get("X",language="zh")))
print("英语检索结果数量:",len(hownet_dict.get("X",language="en")))

单语检索数量： 6
单语检索结果范例: {'Def': '{computer|电脑:modifier={PatternValue|样式值:CoEvent={able|能:scope={bring|携带:patient={$}}}}{SpeBrand|特定牌子}}', 'en_grammar': 'noun', 'ch_grammar': 'noun', 'No': '127151', 'syn': [{'id': '004024', 'text': 'IBM'}, {'id': '041684', 'text': '戴尔'}, {'id': '049006', 'text': '东芝'}, {'id': '106795', 'text': '联想'}, {'id': '156029', 'text': '索尼'}, {'id': '004203', 'text': 'iPad'}, {'id': '019457', 'text': '笔记本'}, {'id': '019458', 'text': '笔记本电脑'}, {'id': '019459', 'text': '笔记本电脑'}, {'id': '019460', 'text': '笔记本电脑'}, {'id': '019461', 'text': '笔记本电脑'}, {'id': '019463', 'text': '笔记簿电脑'}, {'id': '019464', 'text': '笔记簿电脑'}, {'id': '020567', 'text': '便携式电脑'}, {'id': '020568', 'text': '便携式计算机'}, {'id': '020569', 'text': '便携式计算机'}, {'id': '127224', 'text': '平板电脑'}, {'id': '127225', 'text': '平板电脑'}, {'id': '172264', 'text': '膝上型电脑'}, {'id': '172265', 'text': '膝上型电脑'}], 'ch_word': '苹果', 'en_word': 'apple'}
-------双语混合检索测试---------
混合检索结果数量: 5
中文检索结果数量: 3
英语检索结果数量: 2


In [7]:
hownet_dict.get("苹果", language="en")

[]

## Get All Words annotated in HowNet

In [8]:
zh_word_list = hownet_dict.get_zh_words()
en_word_list = hownet_dict.get_en_words()

In [9]:
print(zh_word_list[:30])

['', '"', '#', '#号标签', '$', '%', "'", '(', ')', '*', '+', '-', '--', '...', '...出什么问题', '...底', '...底下', '...发生故障', '...发生了什么', '...何如', '...家里有几口人', '...检测呈阳性', '...检测呈阴性', '...来', '...内', '...为止', '...也同样使然', '...以来', '...以内', '...以上']


In [10]:
print(en_word_list[:30])

['A', 'An', 'Frenchmen', 'Frenchwomen', 'Ottomans', 'a', 'aardwolves', 'abaci', 'abandoned', 'abbreviated', 'abode', 'aboideaux', 'aboiteaux', 'abscissae', 'absorbed', 'acanthi', 'acari', 'accepted', 'acciaccature', 'acclaimed', 'accommodating', 'accompanied', 'accounting', 'accused', 'acetabula', 'acetified', 'aching', 'acicula', 'acini', 'acquired']


## Get Flattened Sememe Trees for certain word or all words in HowNet

<b> Cautions: the parameters "lang", "merge" and "expanded_layer" only works when "structured = False". The main consideration is that there are multiple ways to interpret these params when deal with structured data. We leave the freedom to our end user. In next section, you will be able to see how to utilize the structured data.

   Detailed explanation of params will be displayed in our documentation.</b>

### Get the full merged sememe list from multi-sense words

In [11]:
hownet_dict.get_sememes_by_word("苹果",structured=False,lang="zh",merge=True)

{'交流', '携带', '树', '样式值', '水果', '特定牌子', '生殖', '用具', '电脑', '能'}

In [12]:
hownet_dict.get_sememes_by_word("apple",structured=False,lang="en",merge=True)

{'$',
 'PatternValue',
 'SpeBrand',
 'able',
 'bring',
 'communicate',
 'computer',
 'fruit',
 'reproduce',
 'tool',
 'tree'}

**Even if the language is not corresponding to the target word, the api still works. It will keep all the returned word entries to be in the same language you specified**

In [13]:
hownet_dict.get_sememes_by_word("苹果",structured=False,lang="en",merge=True)

{'apple': {'$',
  'PatternValue',
  'SpeBrand',
  'able',
  'bring',
  'communicate',
  'computer',
  'fruit',
  'reproduce',
  'tool',
  'tree'},
 'malus pumila': {'fruit', 'reproduce', 'tree'},
 'orchard apple tree': {'fruit', 'reproduce', 'tree'}}

**Note that, in the latest version, if the number of the word entries equals to one, for convenience, the api will simply return the set of sememes. See Out[11] for example.**

<b> You could specify the number of the expanded layers like the following:</b>

In [14]:
hownet_dict.get_sememes_by_word("苹果",structured=False,merge=True,expanded_layer=1)

set()

<b>You could get all flattened sememe trees for all words as well as specify the number of the expanded layers:</b>

In [15]:
hownet_dict.get_sememes_by_word("*",structured=False,merge=True)

{'标点'}

<b> If you would like to see the sememe lists for different senses of particular word in HowNet,  just need to set the param "merged" to False.</b>

In [16]:
hownet_dict.get_sememes_by_word("苹果",structured=False,lang="zh",merge=False)

[{'word': '苹果', 'sememes': {'携带', '样式值', '特定牌子', '电脑', '能'}},
 {'word': '苹果', 'sememes': {'水果'}},
 {'word': '苹果', 'sememes': {'交流', '携带', '样式值', '特定牌子', '用具', '能'}},
 {'word': '苹果', 'sememes': {'树', '水果', '生殖'}},
 {'word': '苹果', 'sememes': {'树', '水果', '生殖'}},
 {'word': '苹果', 'sememes': {'树', '水果', '生殖'}}]

In [17]:
hownet_dict.get_sememes_by_word("apple",structured=False,lang="en",merge=False)

[{'word': 'apple',
  'sememes': {'$', 'PatternValue', 'SpeBrand', 'able', 'bring', 'computer'}},
 {'word': 'apple', 'sememes': {'fruit'}},
 {'word': 'apple',
  'sememes': {'$',
   'PatternValue',
   'SpeBrand',
   'able',
   'bring',
   'communicate',
   'tool'}},
 {'word': 'apple', 'sememes': {'fruit', 'reproduce', 'tree'}},
 {'word': 'apple',
  'sememes': {'$',
   'PatternValue',
   'SpeBrand',
   'able',
   'bring',
   'communicate',
   'tool'}},
 {'word': 'apple', 'sememes': {'fruit', 'reproduce', 'tree'}},
 {'word': 'apple', 'sememes': {'fruit'}},
 {'word': 'apple', 'sememes': {'fruit'}}]

## Get Structured Sememe Trees for certain words in HowNet

In [18]:
hownet_dict.get_sememes_by_word("苹果",structured=True)[0]["tree"]

{'role': 'sense',
 'name': '苹果',
 'children': [{'role': 'None',
   'name': 'computer|电脑',
   'children': [{'role': 'modifier',
     'name': 'PatternValue|样式值',
     'children': [{'role': 'CoEvent',
       'name': 'able|能',
       'children': [{'role': 'scope',
         'name': 'bring|携带',
         'children': [{'role': 'patient', 'name': '$'}]}]}]},
    {'role': 'patient', 'name': 'SpeBrand|特定牌子'}]}]}

<b> Two ways to see the corresponding annotation data </b>

In [19]:
hownet_dict.get_sememes_by_word("苹果",structured=True)[0]["word"]

{'Def': '{computer|电脑:modifier={PatternValue|样式值:CoEvent={able|能:scope={bring|携带:patient={$}}}}{SpeBrand|特定牌子}}',
 'en_grammar': 'noun',
 'ch_grammar': 'noun',
 'No': '127151',
 'syn': [{'id': '004024', 'text': 'IBM'},
  {'id': '041684', 'text': '戴尔'},
  {'id': '049006', 'text': '东芝'},
  {'id': '106795', 'text': '联想'},
  {'id': '156029', 'text': '索尼'},
  {'id': '004203', 'text': 'iPad'},
  {'id': '019457', 'text': '笔记本'},
  {'id': '019458', 'text': '笔记本电脑'},
  {'id': '019459', 'text': '笔记本电脑'},
  {'id': '019460', 'text': '笔记本电脑'},
  {'id': '019461', 'text': '笔记本电脑'},
  {'id': '019463', 'text': '笔记簿电脑'},
  {'id': '019464', 'text': '笔记簿电脑'},
  {'id': '020567', 'text': '便携式电脑'},
  {'id': '020568', 'text': '便携式计算机'},
  {'id': '020569', 'text': '便携式计算机'},
  {'id': '127224', 'text': '平板电脑'},
  {'id': '127225', 'text': '平板电脑'},
  {'id': '172264', 'text': '膝上型电脑'},
  {'id': '172265', 'text': '膝上型电脑'}],
 'ch_word': '苹果',
 'en_word': 'apple'}

In [20]:
hownet_dict["苹果"][0]

{'Def': '{computer|电脑:modifier={PatternValue|样式值:CoEvent={able|能:scope={bring|携带:patient={$}}}}{SpeBrand|特定牌子}}',
 'en_grammar': 'noun',
 'ch_grammar': 'noun',
 'No': '127151',
 'syn': [{'id': '004024', 'text': 'IBM'},
  {'id': '041684', 'text': '戴尔'},
  {'id': '049006', 'text': '东芝'},
  {'id': '106795', 'text': '联想'},
  {'id': '156029', 'text': '索尼'},
  {'id': '004203', 'text': 'iPad'},
  {'id': '019457', 'text': '笔记本'},
  {'id': '019458', 'text': '笔记本电脑'},
  {'id': '019459', 'text': '笔记本电脑'},
  {'id': '019460', 'text': '笔记本电脑'},
  {'id': '019461', 'text': '笔记本电脑'},
  {'id': '019463', 'text': '笔记簿电脑'},
  {'id': '019464', 'text': '笔记簿电脑'},
  {'id': '020567', 'text': '便携式电脑'},
  {'id': '020568', 'text': '便携式计算机'},
  {'id': '020569', 'text': '便携式计算机'},
  {'id': '127224', 'text': '平板电脑'},
  {'id': '127225', 'text': '平板电脑'},
  {'id': '172264', 'text': '膝上型电脑'},
  {'id': '172265', 'text': '膝上型电脑'}],
 'ch_word': '苹果',
 'en_word': 'apple'}

## Get the static synonyms of the certain word
<b>The similarity metrics are based on HowNet.</b>

In [21]:
hownet_dict["苹果"][0]["syn"]

[{'id': '004024', 'text': 'IBM'},
 {'id': '041684', 'text': '戴尔'},
 {'id': '049006', 'text': '东芝'},
 {'id': '106795', 'text': '联想'},
 {'id': '156029', 'text': '索尼'},
 {'id': '004203', 'text': 'iPad'},
 {'id': '019457', 'text': '笔记本'},
 {'id': '019458', 'text': '笔记本电脑'},
 {'id': '019459', 'text': '笔记本电脑'},
 {'id': '019460', 'text': '笔记本电脑'},
 {'id': '019461', 'text': '笔记本电脑'},
 {'id': '019463', 'text': '笔记簿电脑'},
 {'id': '019464', 'text': '笔记簿电脑'},
 {'id': '020567', 'text': '便携式电脑'},
 {'id': '020568', 'text': '便携式计算机'},
 {'id': '020569', 'text': '便携式计算机'},
 {'id': '127224', 'text': '平板电脑'},
 {'id': '127225', 'text': '平板电脑'},
 {'id': '172264', 'text': '膝上型电脑'},
 {'id': '172265', 'text': '膝上型电脑'}]

## Get access of the word by ID

In [22]:
hownet_dict["004024"]

[{'Def': '{computer|电脑:modifier={PatternValue|样式值:CoEvent={able|能:scope={bring|携带:patient={$}}}}{SpeBrand|特定牌子}}',
  'en_grammar': 'noun',
  'ch_grammar': 'noun',
  'No': '004024',
  'syn': [{'id': '041684', 'text': '戴尔'},
   {'id': '049006', 'text': '东芝'},
   {'id': '106795', 'text': '联想'},
   {'id': '127151', 'text': '苹果'},
   {'id': '156029', 'text': '索尼'},
   {'id': '004203', 'text': 'iPad'},
   {'id': '019457', 'text': '笔记本'},
   {'id': '019458', 'text': '笔记本电脑'},
   {'id': '019459', 'text': '笔记本电脑'},
   {'id': '019460', 'text': '笔记本电脑'},
   {'id': '019461', 'text': '笔记本电脑'},
   {'id': '019463', 'text': '笔记簿电脑'},
   {'id': '019464', 'text': '笔记簿电脑'},
   {'id': '020567', 'text': '便携式电脑'},
   {'id': '020568', 'text': '便携式计算机'},
   {'id': '020569', 'text': '便携式计算机'},
   {'id': '127224', 'text': '平板电脑'},
   {'id': '127225', 'text': '平板电脑'},
   {'id': '172264', 'text': '膝上型电脑'},
   {'id': '172265', 'text': '膝上型电脑'}],
  'ch_word': 'IBM',
  'en_word': 'IBM'}]

## Get all sememes

In [23]:
hownet_dict.get_all_sememes()

['模仿',
 '保护',
 '诱人性',
 '问候',
 '数量关系',
 '仅',
 '经济',
 '捣乱',
 '弊',
 '劝说',
 '宁乱',
 '得罪',
 '定期',
 '能力值',
 '辣',
 '漂',
 '友善性值',
 '泰国',
 '信息载体',
 '之间',
 '特殊性',
 '爱尔兰',
 '灰',
 '幅度',
 '不当',
 '程度值',
 '不稳',
 '表示坏情感',
 '构助',
 '上海',
 '发射',
 '善待',
 '注意',
 '仪态',
 '男',
 '怯',
 '卢旺达',
 '中非',
 '遭受',
 '首次',
 '制造',
 '毛里求斯',
 '扎',
 '规矩',
 '分析',
 '状况',
 '非',
 '变形状',
 '无礼',
 '有序',
 '浓',
 '拔出',
 '茂',
 '疼痛',
 '未成熟',
 '比较',
 '坐蹲',
 '摩洛哥',
 '暗',
 '废',
 '精',
 '乌干达',
 '足',
 '朝向',
 '采集',
 '全面',
 '使脏',
 '巴布亚',
 '平均',
 '搀扶',
 '不公正',
 '推荐',
 '积极',
 '安哥拉',
 '赠',
 '叙利亚',
 '使继续',
 '逻辑性',
 '印度尼西亚',
 '支撑',
 '托住',
 '示喜',
 '无望',
 '甜',
 '长时间',
 '不说',
 '准确性',
 '叮',
 '马达加斯加',
 '借入',
 '变空间位置',
 '扎伊尔',
 '文字',
 '无',
 '澳门',
 '柔',
 '好情',
 '低植',
 '气',
 '拉脱维亚',
 '抢',
 '事务',
 '贱',
 '埋入',
 '晚期',
 '赞比亚',
 '立场',
 '格林纳达',
 '利弊',
 '处理',
 '排泄',
 '有关',
 '清晰度',
 '超',
 '地方',
 '组织',
 '不重要',
 '臭名',
 '暴',
 '印度',
 '免除',
 '堵塞',
 '文莱',
 '形成',
 '独自',
 '房间',
 '喜悦',
 '穿戴',
 '胖瘦',
 '标点',
 '不讲理',
 '利用',
 '不适',
 '长度值',
 '丹麦',
 '体格',
 '犹豫',
 '降级',
 '集聚',
 '荣'

### Get Relationship Between Two Sememes

The output could be hypernym, hyponym, antonym or converse.

In [24]:
hownet_dict.get_sememe_relation("音量值", "尖声")

'hyponym'

In [25]:
hownet_dict.get_sememe_relation("音量值", "shrill")

'hyponym'

### Get sememes having a certain relation with the input sememe
The sememe you input can be in any language, but the relation must be in lowercase English. You can specify the language of result, by default it will be Chinese.

In [26]:
hownet_dict.get_sememe_via_relation("音量值", "hyponym")

['高声', '低声', '尖声', '沙哑', '无声', '有声']

In [27]:
hownet_dict.get_sememe_via_relation("音量值", "hyponym", lang="en")

['loud', 'LowVoice', 'shrill', 'hoarse', 'silent', 'talking']

# Advanced Feature #1: Word Similarity Calculation via Sememes
<b>The following parts are mainly implemented by Jun Yan and integrated by Chenghao Yang. Our implementation is based on the paper: </b>

  >Jiangming Liu, Jinan Xu, Yujie Zhang. An Approach of Hybrid Hierarchical Structure for Word Similarity Computing by HowNet. In Proceedings of IJCNLP

## Extra Initialization

<b> Because there are some files required to be loaded for similarity calculation. The initialization overhead will be larger than before. To begin with, you can initialize the hownet_dict object as the following code:</b>

In [29]:
hownet_dict_advanced = OpenHowNet.HowNetDict(use_sim=True)

<b>You can also postpone the initialization work of similarity calculation until use. The following code serves as an example and the return value will indicate whether the extra initialization process succeed.</b>

In [30]:
hownet_dict.initialize_sememe_similarity_calculation()

True

## Get Top-K Nearest Words for the Given Word
<b>If the given word does not exist in HowNet annotations, this function will return an empty list.</b>

In [31]:
query_result = hownet_dict_advanced.get_nearest_words_via_sememes("苹果",20)
example = query_result[0]
print("word_name:",example["word"])
print("id:",example["id"])
print("synset and corresonding word&id&score:")
print(example["synset"])

word_name: 苹果
id: 127151
synset and corresonding word&id&score:
[{'id': 4024, 'word': 'IBM', 'score': 1.0}, {'id': 41684, 'word': '戴尔', 'score': 1.0}, {'id': 49006, 'word': '东芝', 'score': 1.0}, {'id': 106795, 'word': '联想', 'score': 1.0}, {'id': 156029, 'word': '索尼', 'score': 1.0}, {'id': 4203, 'word': 'iPad', 'score': 0.865}, {'id': 19457, 'word': '笔记本', 'score': 0.865}, {'id': 19458, 'word': '笔记本电脑', 'score': 0.865}, {'id': 19459, 'word': '笔记本电脑', 'score': 0.865}, {'id': 19460, 'word': '笔记本电脑', 'score': 0.865}, {'id': 19461, 'word': '笔记本电脑', 'score': 0.865}, {'id': 19463, 'word': '笔记簿电脑', 'score': 0.865}, {'id': 19464, 'word': '笔记簿电脑', 'score': 0.865}, {'id': 20567, 'word': '便携式电脑', 'score': 0.865}, {'id': 20568, 'word': '便携式计算机', 'score': 0.865}, {'id': 20569, 'word': '便携式计算机', 'score': 0.865}, {'id': 127224, 'word': '平板电脑', 'score': 0.865}, {'id': 127225, 'word': '平板电脑', 'score': 0.865}, {'id': 172264, 'word': '膝上型电脑', 'score': 0.865}, {'id': 172265, 'word': '膝上型电脑', 'score': 0.865}

In [32]:
hownet_dict_advanced.get_nearest_words_via_sememes("苹果",20)

[{'id': 127151,
  'word': '苹果',
  'synset': [{'id': 4024, 'word': 'IBM', 'score': 1.0},
   {'id': 41684, 'word': '戴尔', 'score': 1.0},
   {'id': 49006, 'word': '东芝', 'score': 1.0},
   {'id': 106795, 'word': '联想', 'score': 1.0},
   {'id': 156029, 'word': '索尼', 'score': 1.0},
   {'id': 4203, 'word': 'iPad', 'score': 0.865},
   {'id': 19457, 'word': '笔记本', 'score': 0.865},
   {'id': 19458, 'word': '笔记本电脑', 'score': 0.865},
   {'id': 19459, 'word': '笔记本电脑', 'score': 0.865},
   {'id': 19460, 'word': '笔记本电脑', 'score': 0.865},
   {'id': 19461, 'word': '笔记本电脑', 'score': 0.865},
   {'id': 19463, 'word': '笔记簿电脑', 'score': 0.865},
   {'id': 19464, 'word': '笔记簿电脑', 'score': 0.865},
   {'id': 20567, 'word': '便携式电脑', 'score': 0.865},
   {'id': 20568, 'word': '便携式计算机', 'score': 0.865},
   {'id': 20569, 'word': '便携式计算机', 'score': 0.865},
   {'id': 127224, 'word': '平板电脑', 'score': 0.865},
   {'id': 127225, 'word': '平板电脑', 'score': 0.865},
   {'id': 172264, 'word': '膝上型电脑', 'score': 0.865},
   {'id': 172

## Calculate the Similarity for the Given Two Words
<b>If any of the given words does not exist in HowNet annotations, this function will return 0.</b>

In [33]:
hownet_dict_advanced.calculate_word_similarity("苹果","梨")

1.0