<a href="https://colab.research.google.com/github/vitroid/PythonTutorials/blob/master/2%20Advanced/021%E8%BE%9E%E6%9B%B8.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 辞書 / Dictionary (a.k.a hash)

リストは、複数のデータを並べて集めたものであると同時に、番号と値を結びつける関数とみなすこともできます。

A list can be viewed as a collection of multiple data arranged side by side as well as a function that associates a number with a value.

In [28]:
a = [3,1,4,1,5,9]
print(a[1]) # 1
print(a[3]) # 1

1
1


値には文字列やリストなど、どんなデータタイプでも選べますが、番号のほうは非負の整数でなければなりません。

これに対し、辞書(dictionary)を使うと、番号に文字列を使うこともできます。

The value can be any data type, such as a string or list, but the index must be a non-negative integer.

In contrast, using a dictionary, you can also use a string for the index.

In [29]:
a = dict()
a["Matsumoto"] = "vitroid@gmail.com"
a["Tanaka"]    = "htanakaa@okayama-u.ac.jp"
name = input("Name?")
print(a[name])

KeyError: ''

辞書のキー( `[...]`の中身 )には実数でも文字列で何でも使えると書きましたが、実は制約があります。キーは定数でなければなりません。リストは変数(あとから中身をさしかえられる)なので辞書のキーには使えませんが、タプルは定数なので、辞書のキーに使えます。

The dictionary keys ( content in the bracket ) can be any type of contant data such as a real number or a string. A list cannot be used as dictionary keys because they are not constant (whose contents can be added later). Tuples can be used as dictionary keys because they are constants.

In [30]:
a = dict()
a["四"] = "four" # the key is "四" and the value is "four"
a[2,3] = 5       # same as a[(2,3)]; the key is a tuple
a[1,2,3] = 6
print(a)

{'四': 'four', (2, 3): 5, (1, 2, 3): 6}


In [31]:
a[[2,3]] = 5     # Causes an error because the disctionary key is non-constant.

TypeError: unhashable type: 'list'

辞書はデータベースとも言えます。上の場合、"Matsumoto"をキー、"vitroid@gmail.com"を値と呼びます。辞書を使えば、電話帳が簡単に作れます。

上のプログラムでは、存在しない名前を入力するとエラーになってしまうので、辞書にその名前のキーがあるがどうかをin演算子を使って調べます。

> Pythonでは、`in`はいくつかの意味で使われるので、すこしまぎらわしい語です。ここのように、`if`文の中にあらわれる`in`は、文字通り、集合体(集合、文字列、リストなど)のなかにその要素が含まれているかを判別する演算子として使われています。一方、`for`文では、inは繰り返しを行う集合体を指定するもので、演算子ではありません。英語の`in`の多義性がそのままプログラム言語に反映されてしまっている、あまり好ましくない例です。

A dictionary is also a database. In the above case, "Matsumoto" is called the key and "vitroid@gmail.com" is called the value. Using dictionaries, it is easy to create a phone book.

In the above program, we use the `in` operator to check if there is a key for the name in the dictionary, because entering a name that does not exist will result in an error.

> In Python, `in` can be used in several ways, so it can be a bit confusing. As shown here, `in` is used in an `if` statement as an operator to determine whether an element is contained in a set (a set, string, list, etc.). In a `for` statement, on the other hand, in specifies an aggregate to be repeated and is not an operator. This is an unfavorable example of the polysemy of the English word "in" that is directly reflected in the programming language.

In [32]:
a = dict()
a["Matsumoto"] = "vitroid@gmail.com"
a["Tanaka"]    = "htanakaa@okayama-u.ac.jp"
while True:
    name = input("Name?")
    if name == "":
        print("Bye.")
        break
    if name in a:
        print(a[name])
    else:
        print("Sorry, the name '{0}' is not found in the directory.".format(name))
        email = input("Input his/her email address:")
        a[name] = email


Bye.


## 初期値を準備する方法 / Initialize a dict

あらかじめ、辞書にいろんな情報を入れておきたい場合はいくつもの書き方があります。ここでは、大文字と小文字を対応させる辞書を作ってみます。

There are a number of ways to prepare a dictionary if you want to include various information in advance. Here, we will try to create a dictionary that maps uppercase and lowercase letters.

### 1. 単純な書き方 / The simplest

In [33]:
D = dict()
D["A"] = "a"
D["B"] = "b"
# ...

D

{'A': 'a', 'B': 'b'}

### 2. `{...}`を使う書き方 / using the curly brackets

In [34]:
D = {"A": "a", "B": "b"}
D

{'A': 'a', 'B': 'b'}

### 3. dict()を使う / using dict()


In [35]:
D = dict(A="a", B="b")
D

{'A': 'a', 'B': 'b'}

### 4. 2つのリストを融合する方法 / Combining two lists

In [36]:
caps = ["A", "B"]
small = ["a", "b"]
# zip()は2つのリストをくっつける。
for c, s in zip(caps, small):
    print(c,s)


A a
B b


In [37]:
D = dict(zip(caps, small))
D

{'A': 'a', 'B': 'b'}

In [38]:
# 文字列を文字の羅列とみなすことで、こんな書き方もできます。
D = dict(zip("ABCDEFG", "abcdefg"))
D

{'A': 'a', 'B': 'b', 'C': 'c', 'D': 'd', 'E': 'e', 'F': 'f', 'G': 'g'}

### 5. 2つの辞書を融合する方法 / Combining two dicts

In [39]:
A = {"A": "a"}
B = {"B": "b"}
D = A | B
D

{'A': 'a', 'B': 'b'}

### Exercise

いずれかの書き方で、すべてのアルファベットの大文字と小文字を対応させる辞書を作って下さい。

Use either systax to create a dictionary that maps all uppercase and lowercase letters of the alphabet.

辞書を配列の代わりに使う場合もあります。特に、ほとんどの要素が0であるようなリストは、辞書にしたほうが格段にメモリの無駄がなくなり、処理も速くなります。

Dictionaries may be used in place of arrays (lists). In particular, for sparse arrays where most of the elements are zeros, it is much less memory-wasting and faster to use a dictionary.

### リストで書いた例 / using a list

In [40]:
import time
now = time.time()                   #time.time()関数は現在時刻を秒単位の実数で返す。 time() returns the current time in seconds.

a = [0 for i in range(10000000)]    #すべての要素が0の、一千万個のリスト。100MB程度のメモリが必要 A huge list constsing of 10 million items.
a[0] = 1                            #2つだけ要素を1にする。 Set a single item to be one.
a[9999999] = 1

for i in range(10000000):           #値が1の要素をさがす。一千万回のループ Find the element by iteration.
    if a[i] == 1:
        print(i)

print(time.time()-now," sec")

0
9999999
0.6838510036468506  sec


### 辞書で書いた例 / using a dict

In [41]:
#@title
import time
now = time.time()                   #time.time()関数は現在時刻を秒単位の実数で返す。

a = dict()
a[-9999999] = 1                     #2つだけ要素を1にする。 Set values of some items to be one.
a[9999999.9] = 1
for i in a:                         #aのキーについて繰り返す。 Finding the element is trivial.
    print(i)

print(time.time()-now," sec")

-9999999
9999999.9
0.03784322738647461  sec


辞書のキーは負の整数でも実数でも構いません。使わない要素は0を入れておく必要もないので、メモリも処理も最小限ですみます。

Keys in the dictionary can be negative integers or real numbers. There is no need to keep unused elements with zeros, so memory and processing are minimal.

## 練習問題 / Exercise

1. アルファベットの大文字をキー、小文字を値とする新しい辞書を作り、`D`とする。
1. アルファベットの小文字をキー、小文字を値とする新しい辞書を作り、`E`とする。
2. `D` と`E`を融合して`F`とする。
3. 辞書`F`を使い、大文字と小文字がいりまじった文字列`HelloWorld`をすべて小文字に書きかえる。


* 

1. create a new dictionary `D` with uppercase alphabet as key and lowercase alphabet as value.
2. Create a new dictionary `E` whose keys and values are both lower case letters.
3. Merge `D` and `E` into a new dict `F`.
4. Using the dictionary `F`, rewrite the mixed-case string `HelloWorld` in all lowercase.


## 辞書の使用例 / Examples
「ロミオとジュリエット」に含まれる文字の種類と個数を数えてみます。原文はProject Gutenbergの以下のURLにあります。
`https://www.gutenberg.org/ebooks/1513.txt.utf-8`

In [42]:
import requests as req
url = "https://www.gutenberg.org/ebooks/1513.txt.utf-8"
RJ = req.get(url).text #, encoding='utf-8').text
print(RJ)


The Project Gutenberg eBook of Romeo and Juliet, by William Shakespeare

This eBook is for the use of anyone anywhere in the United States and
most other parts of the world at no cost and with almost no restrictions
whatsoever. You may copy it, give it away or re-use it under the terms
of the Project Gutenberg License included with this eBook or online at
www.gutenberg.org. If you are not located in the United States, you
will have to check the laws of the country where you are located before
using this eBook.

Title: Romeo and Juliet

Author: William Shakespeare

Release Date: November, 1998 [eBook #1513]
[Most recently updated: May 11, 2022]

Language: English


Produced by: the PG Shakespeare Team, a team of about twenty Project Gutenberg volunteers.

*** START OF THE PROJECT GUTENBERG EBOOK ROMEO AND JULIET ***




THE TRAGEDY OF ROMEO AND JULIET



by William Shakespeare


Contents

THE PROLOGUE.

ACT I
Scene I. A public place.
Scene II. A Street.
Scene III. Room in Capulet’s Hous

最初のforループで、1文字ずつ処理します。

In [43]:
# 空の辞書を準備する。 Empty dict()
lettercount = dict()

# letterには、RJの文字が1文字ずつ入る。 for every letter in the string RJ
for letter in RJ:
    if letter not in lettercount:
        #辞書にない文字なら初期化する initialize an item if it is not in the dict
        lettercount[letter] = 0
    # count up
    lettercount[letter] += 1

# and print
for letter in lettercount:
    print(letter,lettercount[letter])

 1
T 1146
h 7080
e 13648
  24842
P 436
r 6826
o 9326
j 161
c 2410
t 10237
G 297
u 3765
n 7066
b 1653
g 1992
B 342
k 966
f 2118
R 928
m 3218
a 8309
d 4316
J 221
l 4935
i 6730
, 2850
y 2817
W 464
S 596
s 6861
p 1603
 5634

 5634
w 2427
U 494
v 1100
. 2820
Y 175
- 198
L 595
I 1374
: 82
A 1061
D 183
N 561
1 60
9 9
8 10
[ 126
# 1
5 10
3 12
] 126
M 533
2 10
0 17
E 1178
* 16
O 916
F 310
H 271
C 560
K 14
’ 867
V 140
; 320
æ 1
z 30
_ 250
x 156
q 73
? 369
! 249
Q 3
Z 5
— 65
‘ 31
& 2
( 17
) 17
" 22
/ 6
7 4
4 8
6 7
% 1
X 2
' 7
$ 2


再利用できるように、関数([解説](https://colab.research.google.com/github/vitroid/PythonTutorials/blob/master/2%20Advanced/026関数.ipynb))`count_members`をつくってしまいます。

Define a function in order to reuse the procedure.

In [44]:
def count_members(s):
    """
    count number of occurrence of each item in a collection of data s.
    """
    # 空の辞書を準備する。
    count = dict()
    # cには、sの内容が1文字ずつ入る。
    for c in s:
        if c not in count:
            count[c] = 0
        count[c] += 1
    # 辞書を、関数の値として返す。
    return count

lettercount = count_members(RJ)
for letter in lettercount:
    print(letter,lettercount[letter])

 1
T 1146
h 7080
e 13648
  24842
P 436
r 6826
o 9326
j 161
c 2410
t 10237
G 297
u 3765
n 7066
b 1653
g 1992
B 342
k 966
f 2118
R 928
m 3218
a 8309
d 4316
J 221
l 4935
i 6730
, 2850
y 2817
W 464
S 596
s 6861
p 1603
 5634

 5634
w 2427
U 494
v 1100
. 2820
Y 175
- 198
L 595
I 1374
: 82
A 1061
D 183
N 561
1 60
9 9
8 10
[ 126
# 1
5 10
3 12
] 126
M 533
2 10
0 17
E 1178
* 16
O 916
F 310
H 271
C 560
K 14
’ 867
V 140
; 320
æ 1
z 30
_ 250
x 156
q 73
? 369
! 249
Q 3
Z 5
— 65
‘ 31
& 2
( 17
) 17
" 22
/ 6
7 4
4 8
6 7
% 1
X 2
' 7
$ 2


文字ではなく単語単位で数えます。 

せっかく作った`count_members()`を再利用したいので、まず、記号を除去します。

Let us count words instead of letters.

To reuse the function, we first remove the symbols and numbers from the text.

In [45]:
RJ2 = ""
for letter in RJ:
    if letter in "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz ":
        RJ2 += letter
    else:
        RJ2 += " "

RJ2




そして空白で分割します。

And divide it at the spaces.

In [46]:
words = RJ2.split()
words

['The',
 'Project',
 'Gutenberg',
 'eBook',
 'of',
 'Romeo',
 'and',
 'Juliet',
 'by',
 'William',
 'Shakespeare',
 'This',
 'eBook',
 'is',
 'for',
 'the',
 'use',
 'of',
 'anyone',
 'anywhere',
 'in',
 'the',
 'United',
 'States',
 'and',
 'most',
 'other',
 'parts',
 'of',
 'the',
 'world',
 'at',
 'no',
 'cost',
 'and',
 'with',
 'almost',
 'no',
 'restrictions',
 'whatsoever',
 'You',
 'may',
 'copy',
 'it',
 'give',
 'it',
 'away',
 'or',
 're',
 'use',
 'it',
 'under',
 'the',
 'terms',
 'of',
 'the',
 'Project',
 'Gutenberg',
 'License',
 'included',
 'with',
 'this',
 'eBook',
 'or',
 'online',
 'at',
 'www',
 'gutenberg',
 'org',
 'If',
 'you',
 'are',
 'not',
 'located',
 'in',
 'the',
 'United',
 'States',
 'you',
 'will',
 'have',
 'to',
 'check',
 'the',
 'laws',
 'of',
 'the',
 'country',
 'where',
 'you',
 'are',
 'located',
 'before',
 'using',
 'this',
 'eBook',
 'Title',
 'Romeo',
 'and',
 'Juliet',
 'Author',
 'William',
 'Shakespeare',
 'Release',
 'Date',
 'Novemb

数えさせます。

Count.

In [47]:
wordcount = count_members(words)
for word in wordcount:
    print(word,wordcount[word])

The 86
Project 85
Gutenberg 85
eBook 11
of 478
Romeo 153
and 551
Juliet 72
by 112
William 3
Shakespeare 4
This 44
is 331
for 186
the 783
use 25
anyone 5
anywhere 3
in 360
United 15
States 15
most 17
other 33
parts 4
world 21
at 77
no 81
cost 4
with 275
almost 5
restrictions 2
whatsoever 2
You 36
may 57
copy 12
it 205
give 36
away 22
or 109
re 4
under 9
terms 21
License 10
included 3
this 219
online 4
www 9
gutenberg 9
org 9
If 44
you 326
are 86
not 271
located 7
will 145
have 122
to 538
check 4
laws 10
country 5
where 31
before 22
using 6
Title 1
Author 1
Release 1
Date 1
November 1
Most 5
recently 1
updated 1
May 7
Language 1
English 1
Produced 1
PG 2
Team 1
a 465
team 2
about 19
twenty 11
volunteers 5
START 2
OF 11
THE 9
PROJECT 4
GUTENBERG 4
EBOOK 2
ROMEO 167
AND 4
JULIET 122
TRAGEDY 1
Contents 1
PROLOGUE 2
ACT 10
I 656
Scene 24
A 82
public 8
place 16
II 12
Street 8
III 12
Room 6
Capulet 63
s 313
House 12
IV 10
V 10
Hall 6
CHORUS 4
An 15
open 9
adjoining 2
Garden 8
Friar 43
Lawrence

わかりにくいので、出現頻度でソートします。

Sort it by occurrence frequency.

In [48]:
for word in sorted(wordcount, key=wordcount.get):
    print(word,wordcount[word])

Title 1
Author 1
Release 1
Date 1
November 1
recently 1
updated 1
Language 1
English 1
Produced 1
Team 1
TRAGEDY 1
Contents 1
Dramatis 1
Person 1
ESCALUS 1
Nobleman 1
Order 1
Officer 1
relations 1
Guards 1
Watchmen 1
During 1
Fifth 1
Act 1
dignity 1
grudge 1
unclean 1
loins 1
foes 1
star 1
misadventur 1
overthrows 1
passage 1
continuance 1
nought 1
traffic 1
stage 1
strive 1
Sampson 1
armed 1
bucklers 1
coals 1
colliers 1
collar 1
moves 1
runn 1
weakest 1
weaker 1
vessels 1
push 1
masters 1
maidenheads 1
Me 1
tool 1
Abram 1
naked 1
sides 1
list 1
disgrace 1
Quarrel 1
washing 1
Part 1
Beats 1
heartless 1
hinds 1
coward 1
clubs 1
Clubs 1
bills 1
Beat 1
gown 1
Old 1
flourishes 1
Escalus 1
Rebellious 1
subjects 1
Profaners 1
stained 1
beasts 1
pernicious 1
purple 1
fountains 1
issuing 1
Throw 1
mistemper 1
prince 1
bred 1
thrice 1
Cast 1
wield 1
Canker 1
Free 1
judgement 1
Once 1
abroach 1
servants 1
adversary 1
fighting 1
instant 1
prepar 1
defiance 1
swung 1
hiss 1
interchanging 1
thrust

現代では見かけないthou, thee, thyを除けば、知らない語はほとんどありません。また、"Project" "Gutenberg"がかなり多数みつかったことから、本文以外の文章が紛れこんでいることがわかります。本格的に分析するためには、本文だけを抽出する必要がありそうです。

Except for thou, thee, and thy, which are not found in modern texts, there are few unfamiliar words. The fact that "Project" and "Gutenberg" were found in large numbers indicates that there are sentences other than the main text mixed in. For a finer analysis, it may be necessary to extract only the main body text.


英語のNative Speakerにボイスレコーダーをくっつけ、日常に使っている英語のすべてを一週間まるまる録音してから、使ったフレーズを出現頻度順に並べ、上から順番に丸暗記すれば、誰でもNative Speakerの言い回しができるようになると思っている。誰か、外国人向けの促成日本語教材を作ってみてはどうだろう。頻繁に「Majika」とか「Sugee」とか「Yabai」とか言う外国人が促成される気はするが。

I believe that if we attach a voice recorder to an English native speaker and record all of his/her daily English for an entire week, then list the phrases used in order of frequency of occurrence and memorize them from the top to the bottom, anyone will be able to speak the native speaker's way of speaking. How about someone make a prompt Japanese teaching material for foreigners? I have a feeling that it would prompt foreigners to say "Majika", "Sugee", or "Yabai" frequently.

辞書を使い、語と語のつながりを分析してみましょう。

Let us analyze the word-to-word connections.

上の例では、記号を除去しましたが、ここでは省略し、`RJ`を分割して`words`にします。

In the example above, we removed the symbol, but here we omit the process and split the `RJ` into `words`.

In [49]:
words = RJ.split()
words

['\ufeffThe',
 'Project',
 'Gutenberg',
 'eBook',
 'of',
 'Romeo',
 'and',
 'Juliet,',
 'by',
 'William',
 'Shakespeare',
 'This',
 'eBook',
 'is',
 'for',
 'the',
 'use',
 'of',
 'anyone',
 'anywhere',
 'in',
 'the',
 'United',
 'States',
 'and',
 'most',
 'other',
 'parts',
 'of',
 'the',
 'world',
 'at',
 'no',
 'cost',
 'and',
 'with',
 'almost',
 'no',
 'restrictions',
 'whatsoever.',
 'You',
 'may',
 'copy',
 'it,',
 'give',
 'it',
 'away',
 'or',
 're-use',
 'it',
 'under',
 'the',
 'terms',
 'of',
 'the',
 'Project',
 'Gutenberg',
 'License',
 'included',
 'with',
 'this',
 'eBook',
 'or',
 'online',
 'at',
 'www.gutenberg.org.',
 'If',
 'you',
 'are',
 'not',
 'located',
 'in',
 'the',
 'United',
 'States,',
 'you',
 'will',
 'have',
 'to',
 'check',
 'the',
 'laws',
 'of',
 'the',
 'country',
 'where',
 'you',
 'are',
 'located',
 'before',
 'using',
 'this',
 'eBook.',
 'Title:',
 'Romeo',
 'and',
 'Juliet',
 'Author:',
 'William',
 'Shakespeare',
 'Release',
 'Date:',
 'Nov

これを加工し、最初の語と、それに続く語のペアにします。

We processed it to make pairs of the first word and the word following it.

In [54]:
# first: words 
first = words
# second: words (except the first word)
second = words[1:]

print(first)
print(second)



ある語の次にどんな語が来るかを、辞書を使って集計します。辞書`nextwords`のキーは最初の語、値は最初の語に続く語を集めたリストとします。

A dictionary is used to count what words come after a word. The key of the dictionary `nextwords` is the first word and the value is a list of words following the first word.

In [55]:
nextwords = dict()

for word in first:
    nextwords[word] = []



`zip`は、2つのリストを同時に扱うのに便利です。以下に使用例を示します。

`zip` is useful for iterating two listings at the same time. Here is an example.


In [56]:
for w1, w2 in zip(first, second):
    print(w1,w2)


The Project
Project Gutenberg
Gutenberg eBook
eBook of
of Romeo
Romeo and
and Juliet,
Juliet, by
by William
William Shakespeare
Shakespeare This
This eBook
eBook is
is for
for the
the use
use of
of anyone
anyone anywhere
anywhere in
in the
the United
United States
States and
and most
most other
other parts
parts of
of the
the world
world at
at no
no cost
cost and
and with
with almost
almost no
no restrictions
restrictions whatsoever.
whatsoever. You
You may
may copy
copy it,
it, give
give it
it away
away or
or re-use
re-use it
it under
under the
the terms
terms of
of the
the Project
Project Gutenberg
Gutenberg License
License included
included with
with this
this eBook
eBook or
or online
online at
at www.gutenberg.org.
www.gutenberg.org. If
If you
you are
are not
not located
located in
in the
the United
United States,
States, you
you will
will have
have to
to check
check the
the laws
laws of
of the
the country
country where
where you
you are
are located
located before
before using
usin

次にくる語を、どんどんnextwordsに書きくわえます。

The next word is added to the `nextwords` in order.


In [25]:
for w1, w2 in zip(first, second):
    # nextwords[w1]の内容は空のリスト。それにw2を書き加える。 Content of nextwords[w1] is an empty list. w2 is appended to it.
    nextwords[w1].append(w2)

# theのつぎに来る語は? WHat is the word candidates next to "the"?
nextwords["the"]

['use',
 'United',
 'world',
 'terms',
 'Project',
 'United',
 'laws',
 'country',
 'PG',
 'Garden',
 'bed',
 'Capulets',
 'Prince',
 'Prince',
 'Capulets',
 'Montagues',
 'same',
 'greater',
 'Play',
 'Fifth',
 'fatal',
 'continuance',
 'two',
 'collar',
 'house',
 'wall',
 'weakest',
 'wall',
 'weaker',
 'wall',
 'wall',
 'wall',
 'men',
 'maids',
 'maids',
 'heads',
 'maids',
 'house',
 'law',
 'law',
 'peace',
 'word',
 'Capulets',
 'Montagues',
 'fire',
 'ground',
 'sentence',
 'quiet',
 'forfeit',
 'peace',
 'rest',
 'servants',
 'instant',
 'winds',
 'Prince',
 'worshipp',
 'golden',
 'east',
 'grove',
 'covert',
 'wood',
 'fresh',
 'all',
 'farthest',
 'cause',
 'cause',
 'bud',
 'air',
 'sun',
 'day',
 'fume',
 'siege',
 'way',
 'fair',
 'peace',
 'world',
 'change',
 'hopeful',
 'store',
 'heel',
 'shoemaker',
 'tailor',
 'fisher',
 'painter',
 'writing',
 'learned',
 'rank',
 'old',
 'letters',
 'language',
 'letter',
 'lively',
 'paper',
 'great',
 'house',
 'fair',
 'admir

ある語の次に、どんな語がどんな頻度で出現するか、という統計情報とみることもできます。これを使い、確率的に次の語を選んでみます。リストのなかから要素を1つランダムに選ぶ`choice()`関数を利用します。

It can also be regarded as statistical information about how often a certain word appears next to another word. Using this information, we try to select the next word probabilistically. We use the `choice()` function to randomly select one element from the list.

In [26]:
from random import choice
first = 'the'
candid = nextwords[first]
print(candid)
nextletter = choice(candid)
nextletter

['use', 'United', 'world', 'terms', 'Project', 'United', 'laws', 'country', 'PG', 'Garden', 'bed', 'Capulets', 'Prince', 'Prince', 'Capulets', 'Montagues', 'same', 'greater', 'Play', 'Fifth', 'fatal', 'continuance', 'two', 'collar', 'house', 'wall', 'weakest', 'wall', 'weaker', 'wall', 'wall', 'wall', 'men', 'maids', 'maids', 'heads', 'maids', 'house', 'law', 'law', 'peace', 'word', 'Capulets', 'Montagues', 'fire', 'ground', 'sentence', 'quiet', 'forfeit', 'peace', 'rest', 'servants', 'instant', 'winds', 'Prince', 'worshipp', 'golden', 'east', 'grove', 'covert', 'wood', 'fresh', 'all', 'farthest', 'cause', 'cause', 'bud', 'air', 'sun', 'day', 'fume', 'siege', 'way', 'fair', 'peace', 'world', 'change', 'hopeful', 'store', 'heel', 'shoemaker', 'tailor', 'fisher', 'painter', 'writing', 'learned', 'rank', 'old', 'letters', 'language', 'letter', 'lively', 'paper', 'great', 'house', 'fair', 'admired', 'devout', 'world', 'matter', 'year', 'earthquake', 'days', 'year', 'sun', 'dovehouse', 'wor

'night'

これをぐるぐる回してみます。意味不明な英文のような文章ができます。

Let's iterate it. You will get a sentence that looks like an unintelligible English sentence.

In [27]:
sentence = ""
# 最初の文字は与える。 The sentence begins at "The".
word = 'The'
for i in range(300):
    sentence += word + " "
    # candidには、次に来る可能性のある文字のリストが入る。
    candid = nextwords[word]
    # そのなかから、1つ選ぶ。
    # candidには同じ文字が重複して含まれているので、そのような文字は選ばれる確率が高くなる。
    word = choice(candid)
print(sentence)

The unreasonable fury O teach me go tell her grave man s seal d such a courtier s stone On Thursday sir which as mine be husband friend And bring you could not sir a fearful were tomorrow So Romeo whom thou hast vow I to him talk of mine and then I dwell on different terms of breath of a compilation copyright holder found MERCUTIO Good morrow father to you will remain after I will carry no trust No marry us forth in cloudy night sit nay or odd days How sound Why lamb why lady was ne er a villain didst bower the heart abhors To you Tybalt Tybalt the nipple Of her match Play in Putting on The fish if you ll dispose of modesty Still blush bepaint my unrest CAPULET S Hart was decreed Ascend her father to complying with you to bid me nightly in a Friar Lawrence s corse unto thee hence to wish to the prompter for any Project Gutenberg tm License available with her tomb And an ancient feast Whereto I it Where the more wit of the mouse a virtuous and loving black fate on my hands with my soul 

こういう、直前の語の情報だけで、次の語の出現確率を決めるやり方を、**マルコフ連鎖**といいます。(モンテカルロシミュレーションは典型的なマルコフ連鎖です)

This method of determining the probability of the occurrence of the next word based solely on the information of the previous word is called a **Markov chain**. (A Monte Carlo simulation is a typical Markov chain.)


## 課題 Practice

2語を辞書のキーにし、3語目を辞書の値に選ぶと、よりもっともらしい文章になります。つまり、`nextwords[w1, w2]`のように、`netxowrds`のキーを連続する2語とし、その次にくる語を蒐集し、それを使って文章を合成します。

Choosing two words as the dictionary key and the third word as the dictionary value makes the sentence more plausible. In other words, as in `nextwords[w1, w2]`, the key of `netxowrds` is two consecutive words, the next word is collected, and the sentence is composed using it.

1. `first`, `second`に加えて`third`を準備しておきます。 Prepare `third` in addition to `first` and `second`.
```python
third = words[2:]
```
2. `nextwords`に語の連鎖を書き加える時には、`w1, w2`をキーとし、`w3`をリストに付けたします。 When adding a chain of words to `nextwords`, use `w1, w2` as keys and append `w3` to the list.
```python
for w1, w2, w3 in zip(first, second, third):
    # nextwords[w1,w2]の内容は空のリスト。それにw3を書き加える。 Content of nextwords[w1,w2] is an empty list. w3 is appended to it.
    nextwords[w1,w2].append(w3)
```

ここまでの説明ででてきたプログラムのパーツと組みあわせて、プログラムを完成させて下さい。そして、Project Gutenbergの"Romeo and Juliet"以外の作品を使って、文章を合成してみて下さい。