- Take several pictures of red, blue, and green items with your phone or other digital camera (or download some from the internet, if a camera isn’t available).
    - Load each image, and convert it to a tensor.
    - For each image tensor, use the .mean() method to get a sense of how bright the image is.
    - Take the mean of each channel of your images. Can you identify the red, green, and blue items from only the channel averages?

In [1]:
import imageio
import torch

In [21]:
img_arr = imageio.imread('../data/p1ch4/exercise/Madonna_of_the_Magnificat.png')
img_arr.shape

(3216, 3212, 4)

- 4th channelはalpha channelらしい。allows for transparency.

In [22]:
img_t = torch.from_numpy(img_arr)
img_t.shape

torch.Size([3216, 3212, 4])

In [23]:
img_t[:,:,3]

tensor([[0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        ...,
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0]], dtype=torch.uint8)

In [24]:
img_t_rgb = img_t[:,:,:3].float()
img_t_rgb.shape

torch.Size([3216, 3212, 3])

In [25]:
img_t_rgb.mean()

tensor(79.4028)

In [26]:
means = [torch.mean(img_t_rgb[:,:,i]) for i in range(3)]
means

[tensor(105.7936), tensor(80.7073), tensor(51.7076)]

- Select a relatively large file containing Python source code.
    - Build an index of all the words in the source file (feel free to make your tokenization as simple or as complex as you like; we suggest starting with replacing r"[^a-zA-Z0-9_]+" with spaces).
    - Compare your index with the one we made for Pride and Prejudice. Which is larger?
    - Create the one-hot encoding for the source code file.
    - What information is lost with this encoding? How does that information compare to what’s lost in the Pride and Prejudice encoding?

In [27]:
with open('../data/p1ch4/exercise/augmentation.py', encoding='utf8') as f:
    text = f.read()

In [63]:
import re

In [65]:
text_re = re.sub(r'[^a-zA-Z0-9_]+', ' ', text)

In [67]:
def clean_words(input_str):
    punctuation = '.,;:"!?”“_-' # ”“は本からコピーしてきた。。。
    word_list = input_str.lower().replace('\n',' ').split()
    word_list = [word.strip(punctuation) for word in word_list]
    return word_list

In [71]:
text_words = clean_words(text_re)
word_list = sorted(set(text_words))
word2index_dict = {word: i for (i, word) in enumerate(word_list)}

In [72]:
len(word2index_dict), len(text_words), len(word_list)

(265, 1329, 265)

In [70]:
word2index_dict

{'0': 0,
 '1': 1,
 '125': 2,
 '2': 3,
 '3': 4,
 '360': 5,
 '3d': 6,
 '4': 7,
 '5': 8,
 '8': 9,
 'a': 10,
 'abs': 11,
 'affine': 12,
 'after': 13,
 'and': 14,
 'angle': 15,
 'append': 16,
 'apply': 17,
 'args': 18,
 'around': 19,
 'as': 20,
 'assert': 21,
 'autograd': 22,
 'axes': 23,
 'axis_vector': 24,
 'b': 25,
 'back': 26,
 'backends': 27,
 'be': 28,
 'before': 29,
 'between': 30,
 'bilinear': 31,
 'blob': 32,
 'c': 33,
 'c1': 34,
 'can': 35,
 'ceil': 37,
 'center': 38,
 'center_list': 39,
 'clamp': 40,
 'clamphsv': 41,
 'clamps': 42,
 'clone': 43,
 'com': 44,
 'contiguous': 45,
 'coordinate': 46,
 'coordinates': 47,
 'coords': 48,
 'cos': 49,
 'cpu': 50,
 'crop_int': 51,
 'crop_list': 52,
 'croptoshape': 53,
 'cudnn': 54,
 'cval': 55,
 'debug': 56,
 'def': 57,
 'device': 58,
 'do': 59,
 'down': 60,
 'dtype': 61,
 'else': 62,
 'end_int': 63,
 'everything': 64,
 'expand_as': 65,
 'false': 66,
 'fill': 67,
 'filters': 68,
 'flatten': 69,
 'flip': 70,
 'float32': 71,
 'float64': 72,
 '

In [74]:
word_t = torch.zeros(len(text_words), len(word2index_dict))
for i, word in enumerate(text_words):
    word_index = word2index_dict[word]
    word_t[i][word_index] = 1
    if i < 50:
        print('{:2} {:4} {}'.format(i, word_index, word))

print(word_t.shape)

 0   93 import
 1  118 math
 2   93 import
 3  160 random
 4   93 import
 6   93 import
 7  145 numpy
 8   20 as
 9  144 np
10   93 import
11  184 scipy
12  131 ndimage
13   93 import
14  208 torch
15   76 from
16  208 torch
17   22 autograd
18   93 import
19   78 function
20   76 from
21  208 torch
22   22 autograd
23   78 function
24   93 import
25  151 once_differentiable
26   93 import
27  208 torch
28   27 backends
29   54 cudnn
30   20 as
31   54 cudnn
32   76 from
33  217 util
34  111 logconf
35   93 import
36  112 logging
37  110 log
38  112 logging
39   80 getlogger
40  129 name
41  110 log
42  185 setlevel
43  112 logging
44  232 warn
45  110 log
46  185 setlevel
47  112 logging
48   96 info
49  110 log
torch.Size([1329, 265])
