In [1]:
def mapper() :
    for line in sys.stdin :
        data = line.strip().split("\t")
        date, time, store, item, cost, payment = data
        print ("{0}\t{1}".format(store, cost))

    

데이터가 많을수록 데이터의 예외가 많기 때문에, 어떤 데이터를 받더라도 Mapper가 계속 동작하도록 해야한다.

따라서 MapReudce 코드에서는 Defensive하게 작성하는 것이 중요하다.

In [3]:
# 항목이 6개가 아닐 경우
def mapper() :
    import sys
    
    for line in sys.stdin :
        data = line.strip().split("\t")
        
        if len(data) == 6 :
            date, time, store, item, cost, payment = data
            print ("{0}\t{1}".format(store, cost))


Reducer는 (Key, Value) 쌍을 "Hadoop Streaming"을 통해 받는다. 

Hadoop에서 자체적으로 Shuffle & Sort를 한다. 

In [4]:
def reducer() :
    import sys
    
    salesTotal = 0
    oldKey = None

    for line in sys.stdin :
        data = line.strip().split("\t")
            # store, sales
        if len(data) != 2 :
            continue
            
        thisKey, thisSale = data
            # thisKey = store, thisSale = sale
        
        if oldKey and oldKey != thisKey :
            print ("{0}\t{1}".format(oldKey, salesTotal))
            
            salesTotal = 0
        
        oldKey = thisKey
        salesTotal += float(thisSale)
        
        # 하지만 마지막 Key가 출력되지 않았다 !!
        
        if oldKey != None :
            print ("{0}\t{1}".format(oldKey, salesTotal))

이제 Hadoop을 사용하기 전에, 터미널을 통해 미리 Mapper와 Reducer를 확인해 볼 것이다.

In [21]:
# purchase의 10행만 실험을 위해 사용
! head -n 10 purchases.txt > test.txt
! cat test.txt

2012-01-01	09:00	San Jose	Men's Clothing	214.05	Amex
2012-01-01	09:00	Fort Worth	Women's Clothing	153.57	Visa
2012-01-01	09:00	San Diego	Music	66.08	Cash
2012-01-01	09:00	Pittsburgh	Pet Supplies	493.51	Discover
2012-01-01	09:00	Omaha	Children's Clothing	235.63	MasterCard
2012-01-01	09:00	Stockton	Men's Clothing	247.18	MasterCard
2012-01-01	09:00	Austin	Cameras	379.6	Visa
2012-01-01	09:00	New York	Consumer Electronics	296.8	Cash
2012-01-01	09:00	Corpus Christi	Toys	25.38	Discover
2012-01-01	09:00	Fort Worth	Toys	213.88	Visa


In [22]:
! cat test.txt | python mapper.py

San Jose	214.05
Fort Worth	153.57
San Diego	66.08
Pittsburgh	493.51
Omaha	235.63
Stockton	247.18
Austin	379.6
New York	296.8
Corpus Christi	25.38
Fort Worth	213.88


In [23]:
# 이번에는 Reducer까지 사용
# Sorting은 직접 해준다 (Hadoop에서는 자동)
! cat test.txt | python mapper.py | sort | python reducer.py

Austin	379.6
Corpus Christi	25.38
Fort Worth	367.45
New York	296.8
Omaha	235.63
Pittsburgh	493.51
San Diego	66.08
San Jose	214.05
Stockton	247.18


##### - Implementation
Mapping, Sorting, Reducing을 하는 쉘 & 파이썬 문법은 다음과 같다.

In [2]:
# Mapper

import sys


#with open('./purchases.txt.ignore') as f:
#        for line in f :
for line in sys.stdin :

    data = line.strip().split("\t")

    if len(data) == 6 :
        date, time, store, item, cost, payment = data
        print("{0}\t{1}".format(item, cost))


In [3]:
# Sorting

import sys

temp_list = list()
i = 0

for line in sys.stdin :
    item = line.strip().split("\t")
    temp_list.append(tuple(item))

temp_list = sorted(temp_list, key = lambda temp_list : temp_list[0])

for item in temp_list :
    print(item)


In [4]:
# Reducer


import sys
import re

oldKey = None
salesTotal = 0

r = re.compile(r'\([\'\"]([\w \']+)[\'\"][ .,]*[\'\"](\d+.?\d*)[\'\"]\)')

for line in sys.stdin :
    m = r.search(line.strip())
    data = m.group(1,2)

    if len(data) != 2 :
        continue
    thisKey, thisSale = data

    if oldKey and oldKey != thisKey :
        print ("{0}\t{1}".format(oldKey, salesTotal))

        salesTotal = 0

    oldKey = thisKey
    salesTotal += float(thisSale)

if oldKey != None :
    print ("{0}\t{1}".format(oldKey, salesTotal))

In [6]:
! cat purchases.txt.ignore | python3 Project1_mapper.py | python3 Project1_sorter.py | \
python3 Project1_reducer.py

Baby	57491808.43999965
Books	57450757.91000004
CDs	57410753.04000111
Cameras	57299046.64000087
Children's Clothing	57624820.94000126
Computers	57315406.319999866
Consumer Electronics	57452374.12999909
Crafts	57418154.50000017
DVDs	57649212.13999929
Garden	57539833.109999545
Health and Beauty	57481589.560001
Men's Clothing	57621279.04000138
Music	57495489.700000465
Pet Supplies	57197250.24000008
Sporting Goods	57599085.890000574
Toys	57463477.10999907
Video Games	57513165.5800005
Women's Clothing	57434448.96999881


위와 동일한 작업을 수행하는 하둡 명령어는 아래와 같다.

In [8]:
!hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.12.0.jar \
-file /home/cloudera/Repo/mapper.py -mapper mapper.py \
-file /home/cloudera/Repo/reducer.py -reducer reducer.py \
-input inputt/ -output outputdir

hadoop dfsadmin -safemode leave

출처: http://knight76.tistory.com/entry/Hadoop-Name-node-is-in-safe-mode-에러-해결 [김용환 블로그(2004-2017)]

/bin/sh: 1: cannot open jar: No such file
/bin/sh: 1: hadoop: not found


In [1]:
# 세이프 모드 탈출
hadoop dfsadmin -safemode leave

SyntaxError: invalid syntax (<ipython-input-1-f042c4297c4a>, line 2)

이를 통해 동일한 결과물을 얻을 수 있다.

사실 하둡에서 쓴 파이썬 코드와 쉘에서 쓴 파이썬 코드는 약간 다른데,

쉘에 내장된 sort 프로그램을 사용하지 않고 직접 Sorting 코드를 구현했기에 Reducer 코드도 약간 달라졌다!

# Exercise 2

이제 다른 데이터셋을 사용한 예제를 살펴보자. 
여기서 사용할 데이터셋은 웹 서버의 로그이고, 

[IP 주소, 시간, 날짜, 페이지 이름] 등 아래와 같은 [Common Log Format](https://en.wikipedia.org/wiki/Common_Log_Format) 으로 이루어져 있다.



- %h 는 클라이언트의 IP 주소이다.
- %I 는 클라이언트의 신원이고, 미상일 경우 '-'로 표시한다.
- %u 는 클라이언트의 이름이고, 미상일 경우 '-'로 표시한다.
- %t 는 서버가 요청 처리를 끝낸 시간이다. [day/month/year:hour:min:sec zone]으로 표시한다.
- %r 은 클라이언트로부터 요청받은 라인이다. method, path, query string, protocol을 포함한다.
- %>s 는 서버가 클라이언트에게 다시 보내는 상태 코드이다. 대부분의 경우 200(OK)au, 304(Not modified), 404(Not Found)[등이 있다.](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html)
- %b 는 클라이언트에게 반환된 오브젝트의 바이트로 표시된 사이즈이다. 304인 경우 '-'로 표시한다.

이번 예제에서 해결할 문제는 다음과 같다.
1. 특정 사이트에 접속한 횟수를 구하시오.
2. 특정 IP 주소가 접속한 횟수를 구하시오.
3. 가장 유명한 파일의 경로명과 그 횟수를 구하시오.

In [2]:
# 이번에도 마찬가지로 Mapper, Sorter, Reducer 세 코드로 나누어 짰다.

# Mapper

import sys

for line in sys.stdin :
    #ip, identity, username, time, request, status, size =
    data = (line.strip().split())
    ip, path = data[0], data[6]
    print(path, ip)


In [3]:
# Sorter는 이게 효율적인지는 모르겠다!

import sys

temp_list = list()

for line in sys.stdin :
    path, ip = line.strip().split()
    temp_list.append([path, ip])

temp_list = sorted(temp_list, key = lambda temp_list : temp_list[0])

for item in temp_list :
    print(item)


In [4]:
# Reducer
import sys
import re

rule = re.compile(r'\[[\'\"](.*[\']?)[\'\"], \'(.*)\'\]')
oldKey = None
hitTotal = 0


for line in sys.stdin :
    match = rule.match(line.strip())

    try :
        thisKey = match.group(1)
    except :
        print("NOT WORKING", line)
        break
    if oldKey and oldKey != thisKey :
        print("{0}\t{1}".format(oldKey, hitTotal))
        hitTotal = 0

    hitTotal += 1
    oldKey = thisKey

if oldKey != None :
    print("{0}\t{1}".format(oldKey, hitTotal))


In [3]:
! cat access_log.ignore | python3 Project2_mapper.py | python3 Project2_sorter.py| \
python3 Project2_reducer.py  | grep "/assets/js/the-associates.js"

/assets/js/the-associates.js	2456


나머지 문제도 위 코드의 변형으로 해결할 수 있다! 재밌다 하둡!

In [4]:
# 까다로웠던 마지막 문제는 아래처럼 정렬을 한 뒤,
! cat answer.txt | sort --key 2nr

/assets/css/combined.css	117348
/assets/js/javascript_combined.js	106818
/	99303
/assets/img/home-logo.png	98744
/assets/css/printstyles.css	93158
/images/filmpics/0000/3695/Pelican_Blood_2D_Pack.jpg	91933
/favicon.ico	66831
/robots.txt	51975
/images/filmpics/0000/3139/SBX476_Vanquisher_2d.jpg	39591
/assets/img/search-button.gif	38990
/assets/img/play_icon.png	34151
/images/filmmediablock/290/Harpoon_2d.JPG	32533
/assets/img/x.gif	29377
/images/filmpics/0000/1421/RagingPhoenix_2DSleeve.jpeg	29243
/release-schedule/	25937
/assets/img/release-schedule-logo.png	24292
/search/	23055
/assets/img/banner/ten-years-banner-grey.jpg	22129
/assets/img/banner/ten-years-banner-white.jpg	22121
/assets/img/banner/ten-years-banner.png	21930
/release-schedule	18940
/assets/img/banner/ten-years-banner-black.jpg	17208
/images/filmmediablock/293/NewsMakers_2DBluRay.jpeg	13466
/images/clientlogos/0000/0010/Manga.jpg	11503
/images/filmmediablock/39/bluray_pontypool2d_new.jpg	11318
/

/downloadSingle.php?id=1175&fid=255	144
/images/filmmediablock/381/01.jpg	144
/images/filmmediablock/381/03.jpg	144
/images/filmmediablock/406/MANG9039_3D_Comp.jpg	144
/images/filmmediablock/440/pristills149_1_1_.jpg	144
/images/filmmediablock/531/248_0053.jpg	144
/images/filmmediablock/543/stills_00302.jpg	144
/images/filmpics/0000/1325/RAI_2329_thumb.JPG	144
/images/filmpics/0000/1331/RA2_0895_thumb.JPG	144
/images/filmpics/0000/1333/RAI_1079_thumb.JPG	144
/images/filmpics/0000/2287/cap065_thumb.jpg	144
/images/filmpics/0000/3107/MANGB8027_3D_thumb.jpg	144
/images/filmpics/0000/3109/DSCF1036_thumb.JPG	144
/images/filmpics/0000/3465/Eden_s01_031_thumb.jpg	144
/images/filmpics/0000/3879/I_Spit_Online_One_Sheet.jpg	144
/images/filmpics/0000/5731/Pie_in_the_Sky_box_3d_thumb.jpg	144
/images/filmpics/0000/6001/Birdy_2_art_thumb.jpg	144
/images/filmpics/0000/6361/Cat_o_nine_tails_Front_Panel_Packshot_thumb.jpg	144
/printable.php?id=305	144
/printable.php?id=537	144
/prin

/release-schedule/index.php?p=50&r=d&l=&o=d&rpp=10	15
/release-schedule/index.php?p=7&r=&l=&o=&rpp=10	15
/show_film.php3?id=3325	15
/show_film.php3?id=3349	15
/show_film.php3?id=3378	15
/trailers/index.php?o=a&r=a&l=1	15
/trailers/index.php?o=a&r=a&l=21	15
/trailers/index.php?o=a&r=a&l=38&go=Go	15
/trailers/index.php?o=a&r=d&l=19	15
/trailers/index.php?o=a&r=t&l=47&go=Go	15
/trailers/index.php?o=d&r=t&l=2&go=Go	15
/trailers/index.php?o=d&r=t&l=48&go=Go	15
/trailers/index.php?p=0&r=a&l=19&o=a	15
/trailers/index.php?p=7&r=a&l=&o=a	15
/#tabs-1	14
//mysqladmin/scripts/setup.php	14
/assets/img/home-logo.png?=	14
/crm/editnotes.php?c=22	14
/crm/removeOutlet.php	14
/database/?	14
/displaytitle.php?id='296	14
/displaytitle.php?id=2'82	14
/displaytitle.php?id=361%09XoR%098%3D3	14
/displaytitle.php?id=361%09XoR%098%3D8	14
/displaytitle.php?id=361%20XoR%208%3D3	14
/displaytitle.php?id=361%20XoR%208%3D8	14
/displaytitle.php?id=361%27%09XoR%09%278%27%3D%273	14
/displaytit

/jogos/2011_08_02/5418075d/fifa-jogos-amigaveis/sheffield-fc-buxton-fc-transmissao-gratis-online	2
/jogos/2011_08_02/686a08c7/fifa-jogos-amigaveis/rugby-town-brackley-town-transmissao-gratis-online	2
/jogos/2011_08_02/6de408cc/fifa-jogos-amigaveis/bromley-fc-crystal-palace-transmissao-gratis-online	2
/jogos/2011_08_02/7975095b/fifa-jogos-amigaveis/canvey-island-dulwich-hill-transmissao-gratis-online	2
/jogos/2011_08_02/7c8d091b/fifa-jogos-amigaveis/hampton-&-richmond-lewes-fc-transmissao-gratis-online	2
/jogos/2011_08_02/89d3098f/fifa-jogos-amigaveis/olympiakos-nic-skoda-xanthi-transmissao-gratis-online	2
/jogos/2011_08_02/9d420a62/fifa-jogos-amigaveis/hartlepool-utd-sunderland-afc-transmissao-gratis-online	2
/jogos/2011_08_02/bca40c15/fifa-jogos-amigaveis/sporting-gij%C3%B3n-real-valladolid-transmissao-gratis-online	2
/kauppa/advanced_search_result.php?keywords=gamereactor&page=1&sort=2d	2
/kauppa/contact_us.php	2
/kauppa/new_in_stock.php	2
/kauppa/pelivaihto.php	2
/ku/Kar

/crm/contacts.php?_search=false&nd=1263220124584&rows=50&page=1&sidx=last_name&sord=asc	1
/crm/contacts.php?_search=false&nd=1263220159061&rows=50&page=1&sidx=last_name&sord=asc	1
/crm/contacts.php?_search=false&nd=1263220426237&rows=50&page=1&sidx=last_name&sord=asc	1
/crm/contacts.php?_search=false&nd=1263220603030&rows=50&page=1&sidx=last_name&sord=asc	1
/crm/contacts.php?_search=false&nd=1263220627343&rows=50&page=1&sidx=last_name&sord=asc	1
/crm/contacts.php?_search=false&nd=1263220641893&rows=50&page=1&sidx=last_name&sord=asc	1
/crm/contacts.php?_search=false&nd=1263220660545&rows=50&page=1&sidx=last_name&sord=asc	1
/crm/contacts.php?_search=false&nd=1263220837042&rows=50&page=1&sidx=last_name&sord=asc	1
/crm/contacts.php?_search=false&nd=1263221020771&rows=50&page=1&sidx=last_name&sord=asc	1
/crm/contacts.php?_search=false&nd=1263221031251&rows=50&page=1&sidx=first_name&sord=asc	1
/crm/contacts.php?_search=false&nd=1263221171056&rows=50&page=1&sidx=last_name&sord=asc	1

/database/export.php?m=1&person%5B%5D=394&person%5B%5D=89&person%5B%5D=325&person%5B%5D=190&person%5B%5D=191&person%5B%5D=276&person%5B%5D=244&person%5B%5D=153&person%5B%5D=22&person%5B%5D=250&person%5B%5D=115&person%5B%5D=73&person%5B%5D=75&person%5B%5D=111&person%5B%5D=88&person%5B%5D=122&person%5B%5D=79&person%5B%5D=349&person%5B%5D=142&person%5B%5D=430&person%5B%5D=312&person%5B%5D=16&person%5B%5D=37&person%5B%5D=163&person%5B%5D=106&person%5B%5D=401&person%5B%5D=308&person%5B%5D=135&person%5B%5D=275&person%5B%5D=213&person%5B%5D=344&person%5B%5D=316&person%5B%5D=157&person%5B%5D=283&person%5B%5D=176&person%5B%5D=172&person%5B%5D=278&person%5B%5D=209&person%5B%5D=87&person%5B%5D=177&person%5B%5D=150&person%5B%5D=314&person%5B%5D=144&person%5B%5D=166&person%5B%5D=112&person%5B%5D=116&person%5B%5D=154&person%5B%5D=289&person%5B%5D=255&person%5B%5D=204&person%5B%5D=147&person%5B%5D=56&person%5B%5D=121&person%5B%5D=398&person%5B%5D=118&person%5B%5D=207&person%5B%5D=339	1
/database/exp

/database/fullDetails.php?height=600&modal=true&id=394&random=1293027225667	1
/database/fullDetails.php?height=600&modal=true&id=394&random=1293102706735	1
/database/fullDetails.php?height=600&modal=true&id=394&random=1297959256302	1
/database/fullDetails.php?height=600&modal=true&id=394&random=1298995235068	1
/database/fullDetails.php?height=600&modal=true&id=394&random=1300848522960	1
/database/fullDetails.php?height=600&modal=true&id=394&random=1306495455050	1
/database/fullDetails.php?height=600&modal=true&id=394&random=1306495583790	1
/database/fullDetails.php?height=600&modal=true&id=394&random=1310566049171	1
/database/fullDetails.php?height=600&modal=true&id=394&random=1311243269545	1
/database/fullDetails.php?height=600&modal=true&id=394&random=1313500508490	1
/database/fullDetails.php?height=600&modal=true&id=394&random=1313500566523	1
/database/fullDetails.php?height=600&modal=true&id=394&random=1315220585192	1
/database/fullDetails.php?height=600&modal=true&id=3

/displaytitle.php?id=-227+UNION+SELECT+0x6d6567613164756d706572,0x6d6567613264756d706572,0x6d6567613364756d706572,0x6d6567613464756d706572,0x6d6567613564756d706572,0x6d6567613664756d706572,0x6d6567613764756d706572,0x6d6567613864756d706572,0x6d6567613964756d706572,0x6d65676131064756d706572,0x6d65676131164756d706572,0x6d65676131264756d706572,0x6d65676131364756d706572,0x6d65676131464756d706572,0x6d65676131564756d706572,0x6d65676131664756d706572,0x6d65676131764756d706572,0x6d65676131864756d706572,0x6d65676131964756d706572,0x6d65676132064756d706572,0x6d65676132164756d706572,0x6d65676132264756d706572,0x6d65676132364756d706572--	1
/displaytitle.php?id=-227+UNION+SELECT+0x6d6567613164756d706572,0x6d6567613264756d706572,0x6d6567613364756d706572,0x6d6567613464756d706572,0x6d6567613564756d706572,0x6d6567613664756d706572,0x6d6567613764756d706572,0x6d6567613864756d706572,0x6d6567613964756d706572,0x6d65676131064756d706572,0x6d65676131164756d706572,0x6d65676131264756d706572,0x6d65676131364756d706572

/displaytitle.php?id=36+AND+1=2+UNION+SELECT+0x67756e733073686f6f7421,0x67756e733173686f6f7421,0x67756e733273686f6f7421,0x67756e733373686f6f7421,0x67756e733473686f6f7421,0x67756e733573686f6f7421,0x67756e733673686f6f7421,0x67756e733773686f6f7421,0x67756e733873686f6f7421,0x67756e733973686f6f7421,0x67756e73313073686f6f7421,0x67756e73313173686f6f7421,0x67756e73313273686f6f7421,0x67756e73313373686f6f7421,0x67756e73313473686f6f7421,0x67756e73313573686f6f7421,0x67756e73313673686f6f7421,0x67756e73313773686f6f7421,0x67756e73313873686f6f7421,0x67756e73313973686f6f7421,0x67756e73323073686f6f7421,0x67756e73323173686f6f7421,0x67756e73323273686f6f7421,0x67756e73323373686f6f7421,0x67756e73323473686f6f7421,0x67756e73323573686f6f7421,0x67756e73323673686f6f7421,0x67756e73323773686f6f7421,0x67756e73323873686f6f7421,0x67756e73323973686f6f7421,0x67756e73333073686f6f7421,0x67756e73333173686f6f7421,0x67756e73333273686f6f7421,0x67756e73333373686f6f7421,0x67756e73333473686f6f7421,0x67756e73333573686f6f7421--	1

/downloadSingle.php?id=4705&amp;fid=559	1
/downloadSingle.php?id=4707&amp;fid=559	1
/downloadSingle.php?id=4709&amp;fid=559	1
/downloadSingle.php?id=4711&amp;fid=543	1
/downloadSingle.php?id=4713&amp;fid=543	1
/downloadSingle.php?id=4715&amp;fid=543	1
/downloadSingle.php?id=4717&amp;fid=543	1
/downloadSingle.php?id=4719&amp;fid=543	1
/downloadSingle.php?id=4721&amp;fid=543	1
/downloadSingle.php?id=4723&amp;fid=543	1
/downloadSingle.php?id=4725&amp;fid=543	1
/downloadSingle.php?id=4727&amp;fid=543	1
/downloadSingle.php?id=4729&amp;fid=543	1
/downloadSingle.php?id=4731&amp;fid=543	1
/downloadSingle.php?id=4733&amp;fid=544	1
/downloadSingle.php?id=4761&amp;fid=564	1
/downloadSingle.php?id=4763&amp;fid=564	1
/downloadSingle.php?id=559&fid=202%27	1
/downloadSingle.php?id=5633&fid=618	1
/downloadSingle.php?id=57&fid=10	1
/downloadSingle.php?id=5733&fid=630	1
/downloadSingle.php?id=59&fid=10	1
/downloadSingle.php?id=61&fid=10	1
/downloadSingle.php?id=63&fid=8	1
/downlo

/index.php+and+1=0+%20Union+Select+0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x787878756E696F6E787878+,0x7

/oldsitesearch.php?searchphrase=36%20chamber	1
/oldsitesearch.php?searchphrase=5	1
/oldsitesearch.php?searchphrase=5cm/s	1
/oldsitesearch.php?searchphrase=8%201/2	1
/oldsitesearch.php?searchphrase=AGNOSIA	1
/oldsitesearch.php?searchphrase=AGURRE	1
/oldsitesearch.php?searchphrase=An%20education	1
/oldsitesearch.php?searchphrase=Antonio%20Bido	1
/oldsitesearch.php?searchphrase=Archived	1
/oldsitesearch.php?searchphrase=BATTLESTAR	1
/oldsitesearch.php?searchphrase=BURDEN	1
/oldsitesearch.php?searchphrase=Babylon	1
/oldsitesearch.php?searchphrase=Bido	1
/oldsitesearch.php?searchphrase=Body%20of%20Lies	1
/oldsitesearch.php?searchphrase=Braveheart	1
/oldsitesearch.php?searchphrase=CASSHAN	1
/oldsitesearch.php?searchphrase=Casshern	1
/oldsitesearch.php?searchphrase=Chelsea	1
/oldsitesearch.php?searchphrase=Chocolate	1
/oldsitesearch.php?searchphrase=Creature%20feature	1
/oldsitesearch.php?searchphrase=DINNER	1
/oldsitesearch.php?searchphrase=Dard%20Divorce	1
/oldsitesear

/printable.php?id=-0+AND+2=0+UNION+ALL+SELECT+0x3065376332613738353864303833656636636535323337343330636466343033,0x3a3a7865512d312d7465643a3a,0x3a3a7865512d322d7465643a3a,0x3a3a7865512d332d7465643a3a,0x3a3a7865512d342d7465643a3a,0x3a3a7865512d352d7465643a3a,0x3a3a7865512d362d7465643a3a,0x3a3a7865512d372d7465643a3a,0x3a3a7865512d382d7465643a3a,0x3a3a7865512d392d7465643a3a,0x3a3a7865512d31302d7465643a3a,0x3a3a7865512d31312d7465643a3a,0x3a3a7865512d31322d7465643a3a,0x3a3a7865512d31332d7465643a3a,0x3a3a7865512d31342d7465643a3a,0x3a3a7865512d31352d7465643a3a,0x3a3a7865512d31362d7465643a3a,0x3a3a7865512d31372d7465643a3a,0x3a3a7865512d31382d7465643a3a,0x3a3a7865512d31392d7465643a3a,0x3a3a7865512d32302d7465643a3a,0x3a3a7865512d32312d7465643a3a,0x3a3a7865512d32322d7465643a3a,0x3a3a7865512d32332d7465643a3a,0x3a3a7865512d32342d7465643a3a,0x3a3a7865512d32352d7465643a3a,0x3a3a7865512d32362d7465643a3a,0x3a3a7865512d32372d7465643a3a,0x3a3a7865512d32382d7465643a3a,0x3a3a7865512d32392d7465643a3a,0x3a3a

/printable.php?id=-0+AND+2=0+UNION+ALL+SELECT+0x3065376332613738353864303833656636636535323337343330636466343033,0x3a3a7865512d312d7465643a3a,0x3a3a7865512d322d7465643a3a,0x3a3a7865512d332d7465643a3a,0x3a3a7865512d342d7465643a3a,0x3a3a7865512d352d7465643a3a,0x3a3a7865512d362d7465643a3a,0x3a3a7865512d372d7465643a3a,0x3a3a7865512d382d7465643a3a,0x3a3a7865512d392d7465643a3a,0x3a3a7865512d31302d7465643a3a,0x3a3a7865512d31312d7465643a3a,0x3a3a7865512d31322d7465643a3a,0x3a3a7865512d31332d7465643a3a,0x3a3a7865512d31342d7465643a3a,0x3a3a7865512d31352d7465643a3a,0x3a3a7865512d31362d7465643a3a,0x3a3a7865512d31372d7465643a3a,0x3a3a7865512d31382d7465643a3a,0x3a3a7865512d31392d7465643a3a,0x3a3a7865512d32302d7465643a3a,0x3a3a7865512d32312d7465643a3a,0x3a3a7865512d32322d7465643a3a,0x3a3a7865512d32332d7465643a3a,0x3a3a7865512d32342d7465643a3a,0x3a3a7865512d32352d7465643a3a,0x3a3a7865512d32362d7465643a3a,0x3a3a7865512d32372d7465643a3a,0x3a3a7865512d32382d7465643a3a,0x3a3a7865512d32392d7465643a3a,0x3a3a

/printable.php?id=206%25%27%09aND%09%278%27%3D%278	1
/printable.php?id=206%25%27%20aND%20%278%25%27%3D%273	1
/printable.php?id=206%25%27%20aND%20%278%25%27%3D%278	1
/printable.php?id=206%25%27/**/aND/**/%278%25%27%3D%273	1
/printable.php?id=206%25%27/**/aND/**/%278%27%3D%278	1
/printable.php?id=206%27%09aND%09%278%27%3D%273	1
/printable.php?id=206%27%09aND%09%278%27%3D%278	1
/printable.php?id=206%27%20aND%20%278%27%3D%273	1
/printable.php?id=206%27%20aND%20%278%27%3D%278	1
/printable.php?id=206%27%60%28%5B%7B%5E%7E	1
/printable.php?id=206%27/**/aND/**/%278%27%3D%273	1
/printable.php?id=206%27/**/aND/**/%278%27%3D%278	1
/printable.php?id=206'%20and%201=1%20and%20''='	1
/printable.php?id=206'%20and%20char(124)%2Buser%2Bchar(124)=0%20and%20'%25'='	1
/printable.php?id=206'%20and%20char(124)%2Buser%2Bchar(124)=0%20and%20''='	1
/printable.php?id=206/**/aND/**/8%3D3	1
/printable.php?id=206/**/aND/**/8%3D8	1
/printable.php?id=207/detail.php?id='	1
/printable.php?id=210%20/pag

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.


In [5]:
# 가장 많은 파일을 아래와 같이 검색하였다.
! cat answer.txt | grep "/assets/css/combined.css"

/assets/css/combined.css	117348
http://www.the-associates.co.uk/assets/css/combined.css	4


하지만 이는 근본적인 해결책이 되지 못할 뿐더러, 오류가 아주 많을 것이다.
앞에 http가 붙은 경우 어떻게 Regex로 표현해야 할까? 나중에 생각해보자..