## 讀取 txt 檔
* 請讀取 [text file](https://raw.githubusercontent.com/vashineyu/slides_and_others/master/tutorial/examples/imagenet_urls_examples.txt)
* 懶人複製連結: https://raw.githubusercontent.com/vashineyu/slides_and_others/master/tutorial/examples/imagenet_urls_examples.txt

### Hints: 使用 [Request](https://blog.gtwang.org/programming/python-requests-module-tutorial/) 抓取資料
### Hints: [字串分割](http://www.runoob.com/python/att-string-split.html)
### Hints: 例外處理: [Try-Except](https://pydoing.blogspot.com/2011/01/python-try.html)

# [作業目標]
- 試著讀取網頁上的圖片連結清單, 再以清單中網址讀取圖片

# [作業重點]
- 從網頁上讀取連結清單 (In[1], In[2])
- 從清單網址讀取圖片 (In[6]~In[9], Out[6]~Out[9])

In [3]:
## 假如我們不想把資料載到自己的電腦裡?
# 把連結填入
target_url = "https://raw.githubusercontent.com/vashineyu/slides_and_others/master/tutorial/examples/imagenet_urls_examples.txt"

In [4]:
import requests
response = requests.get(target_url)
data = response.text

# 用 request 傳送回來的資料不會認得斷行符號
print(len(data))
data[0:100]

784594


'n00015388_157\thttp://farm1.static.flickr.com/145/430300483_21e993670c.jpg\nn00015388_238\thttp://farm2'

In [5]:
# 找到換行符號，用該符號做字串分割後，把它拿掉
split_tag = "\n"
data = data.split(split_tag)
print(len(data))
data[0]

9996


'n00015388_157\thttp://farm1.static.flickr.com/145/430300483_21e993670c.jpg'

## 將 txt 轉成 pandas dataframe

In [None]:
"""
# 方法一，用 append 比較慢
import pandas as pd
import numpy as np
part1 = []
part2 = []

for x in range(len(data)):
    SplitResult = data[x].split('\t')
    part1 = np.append(part1,SplitResult[0])
    if len(SplitResult) > 1:
        part2 = np.append(part2,SplitResult[1])
    else:
        part2 = np.append(part2,' ')    
        
arrange_data = np.array([part1, part2]).T
df = pd.DataFrame(arrange_data)
df.head()  

"""

In [6]:
# 方法二，跑比較快，下方的運算會cover掉part3 4的值，所以一開始可以直接複製data
import pandas as pd
import numpy as np
part3 = np.array(data)
part4 = np.array(data)

for x in range(len(data)):
    SplitResult = data[x].split('\t')
    part3[x] = SplitResult[0]
    if len(SplitResult) > 1:
        part4[x] = SplitResult[1]
    else:
        part4[x] = ' '
        
arrange_data = np.array([part3, part4]).T
df = pd.DataFrame(arrange_data)
df.head()

Unnamed: 0,0,1
0,n00015388_157,http://farm1.static.flickr.com/145/430300483_2...
1,n00015388_238,http://farm2.static.flickr.com/1005/3352960681...
2,n00015388_304,http://farm1.static.flickr.com/27/51009336_a96...
3,n00015388_327,http://farm4.static.flickr.com/3025/2444687979...
4,n00015388_355,http://img100.imageshack.us/img100/3253/forres...


## 讀取圖片，請讀取上面 data frame 中的前 5 張圖片

In [15]:
print(df.loc[2,1])


http://farm1.static.flickr.com/27/51009336_a9663af3dd.jpg


In [7]:
from PIL import Image
from io import BytesIO
import numpy as np
import matplotlib.pyplot as plt

# 請用 df.loc[...] 得到第一筆資料的連結
first_link = df.loc[0,1]

response = requests.get(first_link)
img = Image.open(BytesIO(response.content))

# Convert img to numpy array

plt.imshow(img)
plt.show()

<Figure size 640x480 with 1 Axes>

In [8]:
print (df[0:5][1].values)

['http://farm1.static.flickr.com/145/430300483_21e993670c.jpg'
 'http://farm2.static.flickr.com/1005/3352960681_37b9c1d27b.jpg'
 'http://farm1.static.flickr.com/27/51009336_a9663af3dd.jpg'
 'http://farm4.static.flickr.com/3025/2444687979_bf7bc8df21.jpg'
 'http://img100.imageshack.us/img100/3253/forrest004fs9.jpg']


In [None]:
#url_list = df[0:5][1].values #debug purpose
def img2arr_fromURLs(url_list, resize = False):
    img_list = {}

    for linkno in range(len(url_list)):
        #print(url_list[linkno]) #debug purpose
        response = requests.get(url_list[linkno])
        try:
            img_temp = Image.open(BytesIO(response.content))
            #print(linkno)  #debug purpose
            img = np.array(img_temp)
            img_list[linkno] = img
        except:
            print("Error link is %s" %url_list[linkno])
    return img_list
    #plt.imshow(img_list[linkno]) #debug purpose
    #plt.show() #debug purpose

In [None]:
result = img2arr_fromURLs(df[0:5][1].values)

print("Total images that we got: %i " % len(result)) # 如果不等於 5, 代表有些連結失效囉

#plt.imshow(result[0]) #debug purpose
#plt.show() #debug purpose

for index in range(len(result)):
    try:
        plt.imshow(result[index])
        plt.show()
    except:
        print("error")