### Scrapy
- pip install scrapy
- index
    - scrapy 프로젝트 만들기
    - scrapy 기본 구조
    - scrapy 코드 작성
    - scrapy 실행
    - Pipelines 설정

In [1]:
# 1. 스크레피 프로젝트 생성

In [2]:
!scrapy startproject crawler

New Scrapy project 'crawler', using template directory '/usr/local/anaconda3/lib/python3.7/site-packages/scrapy/templates/project', created in:
    /Users/rada/Documents/lecture/dss/dss_13/code/dss_13/04_scrapy/crawler

You can start your first spider with:
    cd crawler
    scrapy genspider example example.com


- mac : brew install tree
- ubuntu : sudo apt-get install tree

In [4]:
!tree crawler

[01;34mcrawler[00m
├── [01;34mcrawler[00m
│   ├── __init__.py
│   ├── [01;34m__pycache__[00m
│   ├── items.py
│   ├── middlewares.py
│   ├── pipelines.py
│   ├── settings.py
│   └── [01;34mspiders[00m
│       ├── __init__.py
│       └── [01;34m__pycache__[00m
└── scrapy.cfg

4 directories, 7 files


In [5]:
# 2. scrapy 구조

- spiders 디렉토리
    - 어떤 사이트를 어떤 절차로 크롤링할것인지를 코드로 명시하는 디렉토리
    - 여러개의 spider를 모듈로 생성할수 있다.
- items.py
    - 웹페이지에서 원하는 데이터를 저장할때 사용되는 자료구조 클래스
- pipelines.py
    - 크롤링한 데이터를 처리하는 코드를 작성
- settings.py
    - 스크래피 프로젝트에 대한 설정값 : robots.txt를 따를지 안따를지 설정 ...

In [8]:
# 3. scrapy 코드 작성
# 상품 링크 크롤링 > 상세페이지에서 제목, 판매가, 원가 크롤링

In [9]:
# 절차
# 1. 웹페이지 분석 및 selector(xpath) 찾기
# 2. scrapy 프로젝트 생성 및 코드 작성(items > spider > pipeline)
# 3. scrapy 실행(크롤링)

In [10]:
# 베스트 셀러 상품 200개 링크 selector

In [12]:
from scrapy.http import TextResponse

In [None]:
//*[@id="gBestWrap"]/div/div[3]/div[2]/ul/li[1]

In [13]:
req = requests.get("http://corners.gmarket.co.kr/Bestsellers")
response = TextResponse(req.url, body=req.text, encoding="utf-8")
response

<200 http://corners.gmarket.co.kr/Bestsellers>

In [17]:
links = response.xpath(
    '//*[@id="gBestWrap"]/div/div[3]/div[2]/ul/li/div[1]/a/@href').extract()
links[-3:]

['http://item.gmarket.co.kr/Item?goodscode=1373677338&ver=637293862628021789',
 'http://item.gmarket.co.kr/Item?goodscode=1791767432&ver=637293862628021789',
 'http://item.gmarket.co.kr/Item?goodscode=1515939133&ver=637293862628021789']

In [11]:
# 상세페이지에서 제목, 판매가, 원가 수집

In [27]:
link = links[2] # 2번상품 : o_price가 없는 경우

In [28]:
req = requests.get(link)
response = TextResponse(req.url, body=req.text, encoding="utf-8")
response

<200 http://item.gmarket.co.kr/Item?goodscode=1837751815&ver=637293862628021789>

In [30]:
title = response.xpath('//*[@id="itemcase_basic"]/h1/text()')[0].extract().strip()
s_price = response.xpath('//*[@id="itemcase_basic"]/p/span/strong/text()')[0].extract()
try:
    o_price = response.xpath('//*[@id="itemcase_basic"]/p/span/span/text()')[0].extract()
except:
    o_price = s_price
title, s_price, o_price

('[일월] 일월 보건용마스크 대형  KF80  60매', '59,800', '59,800')

In [None]:
# 프로젝트 만들기

In [31]:
!scrapy startproject gmarket

New Scrapy project 'gmarket', using template directory '/usr/local/anaconda3/lib/python3.7/site-packages/scrapy/templates/project', created in:
    /Users/rada/Documents/lecture/dss/dss_13/code/dss_13/04_scrapy/gmarket

You can start your first spider with:
    cd gmarket
    scrapy genspider example example.com


In [36]:
!tree gmarket/

[01;34mgmarket/[00m
├── [01;34mgmarket[00m
│   ├── __init__.py
│   ├── [01;34m__pycache__[00m
│   ├── items.py
│   ├── middlewares.py
│   ├── pipelines.py
│   ├── settings.py
│   └── [01;34mspiders[00m
│       ├── __init__.py
│       └── [01;34m__pycache__[00m
└── scrapy.cfg

4 directories, 7 files


In [35]:
# 1. items.py 코드 작성 : 제목, 상품링크, 원가, 판매가

In [38]:
%%writefile gmarket/gmarket/items.py
import scrapy

class GmarketItem(scrapy.Item):
    title = scrapy.Field()
    link = scrapy.Field()
    o_price = scrapy.Field()
    s_price = scrapy.Field()

Overwriting gmarket/gmarket/items.py


In [34]:
# 2. spider.py 코드 작성

In [58]:
%%writefile gmarket/gmarket/spiders/spider.py
import scrapy
from gmarket.items import GmarketItem

class Spider(scrapy.Spider):
    name = "GmarketBestsellers"
    allow_domain = ["gmarket.co.kr"]
    start_urls = ["http://corners.gmarket.co.kr/Bestsellers"]
    
    def parse(self, response):
        links = response.xpath(
            '//*[@id="gBestWrap"]/div/div[3]/div[2]/ul/li/div[1]/a/@href').extract()
        for link in links:
            yield scrapy.Request(link, callback=self.parse_content)
    
    def parse_content(self, response):
        item = GmarketItem()
        item["title"] = response.xpath('//*[@id="itemcase_basic"]/h1/text()')[0].extract().strip()
        item["s_price"] = response.xpath('//*[@id="itemcase_basic"]/p/span/strong/text()')[0].extract()
        try:
            item["o_price"] = response.xpath('//*[@id="itemcase_basic"]/p/span/span/text()')[0].extract()
        except:
            item["o_price"] = item["s_price"]
        item["link"] = response.url
        yield item

Overwriting gmarket/gmarket/spiders/spider.py


In [59]:
# 3. scrapy 실행
# scrapy.cfg 파일이 있는 디렉토리에서 $ scrapy crawl <spider name> 으로 실행
# 실행 절차
# $ cd gmarket
# $ scrapy crawl GmarketBestsellers

In [62]:
%%writefile run.sh
cd gmarket
scrapy crawl GmarketBestsellers

Overwriting run.sh


In [63]:
# windows : git bash에서 직접 실행
# mac : !source run.sh
# ubuntu : !./run.sh

In [67]:
# ubuntu 사용자 권한 설정
# !chmod 777 run.sh

In [64]:
!source run.sh

2020-07-03 16:00:13 [scrapy.utils.log] INFO: Scrapy 2.2.0 started (bot: gmarket)
2020-07-03 16:00:13 [scrapy.utils.log] INFO: Versions: lxml 4.5.0.0, libxml2 2.9.9, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 20.3.0, Python 3.7.6 (default, Jan  8 2020, 13:42:34) - [Clang 4.0.1 (tags/RELEASE_401/final)], pyOpenSSL 19.1.0 (OpenSSL 1.1.1  11 Sep 2018), cryptography 2.8, Platform Darwin-19.4.0-x86_64-i386-64bit
2020-07-03 16:00:13 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor
2020-07-03 16:00:13 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'gmarket',
 'NEWSPIDER_MODULE': 'gmarket.spiders',
 'ROBOTSTXT_OBEY': True,
 'SPIDER_MODULES': ['gmarket.spiders']}
2020-07-03 16:00:13 [scrapy.extensions.telnet] INFO: Telnet Password: b00d65ad346aee31
2020-07-03 16:00:13 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',


2020-07-03 16:00:16 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1824852880&ver=637293888132890118> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:00:17 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1824852880&ver=637293888132890118>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1824852880&ver=637293888132890118',
 'o_price': '17,700',
 's_price': '17,700',
 'title': 'Percussive 마사지건 해드 4개 글로벌코드 /무료배송'}
2020-07-03 16:00:17 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1797275897&ver=637293888132890118> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:00:17 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1797275897&ver=637293888132890118>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1797275897&ver=637293888132890118',
 'o_price': '35,000',
 's_price': '28,000',
 'title': '[프리메

2020-07-03 16:00:20 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1647468449&ver=637293888132890118>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1647468449&ver=637293888132890118',
 'o_price': '25,500',
 's_price': '25,500',
 'title': '[브리타] 막스트라 플러스 필터 3개월분 +1개월분 (총 4개월분)'}
2020-07-03 16:00:20 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1783319409&ver=637293888132890118> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:00:20 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1671062830&ver=637293888132890118> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:00:21 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1783319409&ver=637293888132890118>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1783319409&ver=637293888132890118',
 'o_price': '25,900',
 's_price': '16,900',
 'title': '

2020-07-03 16:00:23 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1295855591&ver=637293888132890118> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:00:23 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1430261089&ver=637293888132890118>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1430261089&ver=637293888132890118',
 'o_price': '26,000',
 's_price': '25,220',
 'title': '[반에이크] 모다아울렛 반에이크 백화점동일상품 여름티셔츠外 1만원대부터 무료배송까지'}
2020-07-03 16:00:23 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1295855591&ver=637293888132890118>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1295855591&ver=637293888132890118',
 'o_price': '23,000',
 's_price': '6,900',
 'title': '[에비수] 에비수 브랜드 반팔티/티셔츠/반바지/청바지'}
2020-07-03 16:00:23 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1686035302&ver=637293888132890118> (referer: ht

2020-07-03 16:00:26 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1540651504&ver=637293888132890118>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1540651504&ver=637293888132890118',
 'o_price': '45,000',
 's_price': '23,900',
 'title': '[인포피아] 글루코랩 혈당측정기 +시험지110+침110+솜100'}
2020-07-03 16:00:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1282969118&ver=637293888132890118> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:00:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1836438664&ver=637293888132890118> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:00:27 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1282969118&ver=637293888132890118>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1282969118&ver=637293888132890118',
 'o_price': '15,900',
 's_price': '13,900',
 'title': '[유

2020-07-03 16:00:29 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1677850959&ver=637293888132890118>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1677850959&ver=637293888132890118',
 'o_price': '16,900',
 's_price': '16,900',
 'title': '[농심] 김치사발면 86g 24개 한박스'}
2020-07-03 16:00:29 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1543095376&ver=637293888132890118>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1543095376&ver=637293888132890118',
 'o_price': '25,900',
 's_price': '22,900',
 'title': '[BYO] CJ BYO 20억 생유산균 30포 x 3개 (총90포)'}
2020-07-03 16:00:29 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1583017269&ver=637293888132890118> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:00:29 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1583017269&ver=637293888132890118>
{'link': 'http://item.gmark

2020-07-03 16:00:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1607479365&ver=637293888132890118>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1607479365&ver=637293888132890118',
 'o_price': '15,500',
 's_price': '13,900',
 'title': '[의성마늘햄] 롯데 의성마늘 프랑크 70gx20개'}
2020-07-03 16:00:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=399488337&ver=637293888132890118>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=399488337&ver=637293888132890118',
 'o_price': '23,000',
 's_price': '6,900',
 'title': '[젤리스푼] 아동복/여아의류/아동바지/득템찬스 6900원 균일가'}
2020-07-03 16:00:33 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1586873578&ver=637293888132890118> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:00:33 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1586873578&ver=637293888132890118>
{'link': 'http://item.gmark

2020-07-03 16:00:36 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=845078469&ver=637293888132890118>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=845078469&ver=637293888132890118',
 'o_price': '180,000',
 's_price': '99,000',
 'title': '씨투엠에듀  지오플릭 교구 (가이드북포함)'}
2020-07-03 16:00:36 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1678618377&ver=637293888132890118> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:00:36 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1674006850&ver=637293888132890118> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:00:36 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1678618377&ver=637293888132890118>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1678618377&ver=637293888132890118',
 'o_price': '48,000',
 's_price': '17,800',
 'title': '[페리오] 치약 뉴 후레쉬 

2020-07-03 16:00:39 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1830306098&ver=637293888132890118>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1830306098&ver=637293888132890118',
 'o_price': '36,900',
 's_price': '36,900',
 'title': '동아 전과 / 백점 국과사/ 백점 전과목 세트 2학기 선택구매 (1-2. 2-2. 3-2. 4-2. 5-2. 6-2)'}
2020-07-03 16:00:39 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1818149347&ver=637293888132890118>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1818149347&ver=637293888132890118',
 'o_price': '19,800',
 's_price': '15,840',
 'title': '[오랄비] 오랄비 왁스 치실 10개'}
2020-07-03 16:00:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1835596642&ver=637293888132890118> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:00:39 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1321028651&ver=637293888132890118>
{'

2020-07-03 16:00:42 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1833787752&ver=637293888132890118> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:00:42 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1832601541&ver=637293888132890118> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:00:42 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1677882918&ver=637293888132890118> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:00:42 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1833787752&ver=637293888132890118>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1833787752&ver=637293888132890118',
 'o_price': '34,900',
 's_price': '34,100',
 'title': '[대우] 대우 에어 써큘레이터DEF-KC1020스탠드선풍기 공기순환'}
2020-07-03 16:00:42 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.

2020-07-03 16:00:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1674335126&ver=637293888132890118> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:00:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1656817974&ver=637293888132890118> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:00:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1684656499&ver=637293888132890118> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:00:45 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1674335126&ver=637293888132890118>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1674335126&ver=637293888132890118',
 'o_price': '32,000',
 's_price': '16,900',
 'title': '터치세븐 디자인 마카 드로잉 마카펜 80색 / 전문가용'}
2020-07-03 16:00:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item

2020-07-03 16:00:49 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1617952407&ver=637293888132890118>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1617952407&ver=637293888132890118',
 'o_price': '32,900',
 's_price': '31,270',
 'title': '[크록스] 모다아울렛 바야밴드  클로그 공용 샌들 205089-4CC'}
2020-07-03 16:00:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1728087311&ver=637293888132890118> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:00:49 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1728087311&ver=637293888132890118>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1728087311&ver=637293888132890118',
 'o_price': '40,000',
 's_price': '20,000',
 'title': '[에뛰드하우스] Beauty BIG SALE 30~50% + 중복쿠폰발급'}
2020-07-03 16:00:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1805913808&ver=637293888132890118> (referer

2020-07-03 16:00:51 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1795133051&ver=637293888132890118>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1795133051&ver=637293888132890118',
 'o_price': '29,800',
 's_price': '11,900',
 'title': '미식상회 참기름 대용량 350ml'}
2020-07-03 16:00:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1833813652&ver=637293888132890118> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:00:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1829759718&ver=637293888132890118> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:00:51 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1833813652&ver=637293888132890118>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1833813652&ver=637293888132890118',
 'o_price': '6,440',
 's_price': '5,900',
 'title': '일회용 멜트 블로운 3중 필터 마스크 

In [65]:
# 결과 csv로 저장하기

In [68]:
%%writefile run.sh
cd gmarket
scrapy crawl GmarketBestsellers -o gmarket200.csv

Overwriting run.sh


In [69]:
!source run.sh

2020-07-03 16:03:26 [scrapy.utils.log] INFO: Scrapy 2.2.0 started (bot: gmarket)
2020-07-03 16:03:26 [scrapy.utils.log] INFO: Versions: lxml 4.5.0.0, libxml2 2.9.9, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 20.3.0, Python 3.7.6 (default, Jan  8 2020, 13:42:34) - [Clang 4.0.1 (tags/RELEASE_401/final)], pyOpenSSL 19.1.0 (OpenSSL 1.1.1  11 Sep 2018), cryptography 2.8, Platform Darwin-19.4.0-x86_64-i386-64bit
2020-07-03 16:03:26 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor
2020-07-03 16:03:26 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'gmarket',
 'NEWSPIDER_MODULE': 'gmarket.spiders',
 'ROBOTSTXT_OBEY': True,
 'SPIDER_MODULES': ['gmarket.spiders']}
2020-07-03 16:03:26 [scrapy.extensions.telnet] INFO: Telnet Password: 7e71552f490d34bd
2020-07-03 16:03:26 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',


2020-07-03 16:03:29 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1829671455&ver=637293890068580782>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1829671455&ver=637293890068580782',
 'o_price': '59,800',
 's_price': '59,800',
 'title': '(행사) LEZEN 르젠 2세대 BLDC 선풍기 LZEF-DC02'}
2020-07-03 16:03:29 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1430266775&ver=637293890068580782>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1430266775&ver=637293890068580782',
 'o_price': '43,000',
 's_price': '12,900',
 'title': '시크루즈/빅세일20%/원피스/SET아이템/바지/빅사이즈'}
2020-07-03 16:03:30 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1229744343&ver=637293890068580782> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:03:30 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1797275897&ver=637293890068580782> (referer: http://cor

2020-07-03 16:03:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1647468449&ver=637293890068580782>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1647468449&ver=637293890068580782',
 'o_price': '25,500',
 's_price': '25,500',
 'title': '[브리타] 막스트라 플러스 필터 3개월분 +1개월분 (총 4개월분)'}
2020-07-03 16:03:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=373058782&ver=637293890068580782>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=373058782&ver=637293890068580782',
 'o_price': '33,000',
 's_price': '9,900',
 'title': '다온샵 20%빅세일 여름청바지/밴딩/린넨/반바지 3XL~'}
2020-07-03 16:03:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1783319409&ver=637293890068580782>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1783319409&ver=637293890068580782',
 'o_price': '25,900',
 's_price': '16,900',
 'title': '[스카트] 스카트 THE 보송 습기제거제 20개(1BOX)'}
2020-07-03 16:03:32 [scrapy.core.scrape

2020-07-03 16:03:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1295855591&ver=637293890068580782> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:03:35 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1295855591&ver=637293890068580782>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1295855591&ver=637293890068580782',
 'o_price': '23,000',
 's_price': '6,900',
 'title': '[에비수] 에비수 브랜드 반팔티/티셔츠/반바지/청바지'}
2020-07-03 16:03:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1430261089&ver=637293890068580782> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:03:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1566013256&ver=637293890068580782> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:03:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?g

2020-07-03 16:03:38 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1540651504&ver=637293890068580782> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1540651504&ver=637293890068580782>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1540651504&ver=637293890068580782',
 'o_price': '45,000',
 's_price': '23,900',
 'title': '[인포피아] 글루코랩 혈당측정기 +시험지110+침110+솜100'}
2020-07-03 16:03:38 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1836438664&ver=637293890068580782> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1836438664&ver=637293890068580782>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1836438664&ver=637293890068580782',
 'o_price': '29,900',
 's_price': '29,900',
 'title': 'KF

2020-07-03 16:03:41 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1677850959&ver=637293890068580782> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:03:41 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1677850959&ver=637293890068580782>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1677850959&ver=637293890068580782',
 'o_price': '16,900',
 's_price': '16,900',
 'title': '[농심] 김치사발면 86g 24개 한박스'}
2020-07-03 16:03:41 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1543095376&ver=637293890068580782> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:03:41 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=818391700&ver=637293890068580782> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:03:41 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscod

2020-07-03 16:03:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1607479365&ver=637293890068580782> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:03:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=399488337&ver=637293890068580782> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:03:44 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1607479365&ver=637293890068580782>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1607479365&ver=637293890068580782',
 'o_price': '15,500',
 's_price': '13,900',
 'title': '[의성마늘햄] 롯데 의성마늘 프랑크 70gx20개'}
2020-07-03 16:03:44 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=399488337&ver=637293890068580782>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=399488337&ver=637293890068580782',
 'o_price': '23,000',
 's_price': '6,900',
 'title': '[젤리스푼] 아동복/여아의

2020-07-03 16:03:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1731966908&ver=637293890068580782> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:03:46 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1731966908&ver=637293890068580782>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1731966908&ver=637293890068580782',
 'o_price': '18,000',
 's_price': '18,000',
 'title': 'Percussive 마사지건 국내220v  해드 4개'}
2020-07-03 16:03:47 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=845078469&ver=637293890068580782> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:03:47 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=845078469&ver=637293890068580782>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=845078469&ver=637293890068580782',
 'o_price': '180,000',
 's_price': '99,000',
 'title': '씨투엠에듀  지오플

2020-07-03 16:03:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1835596642&ver=637293890068580782> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:03:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=484828506&ver=637293890068580782> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:03:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1170196458&ver=637293890068580782> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:03:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1807209685&ver=637293890068580782> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:03:50 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1835596642&ver=637293890068580782>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1835596642&ver=637293890068

2020-07-03 16:03:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=775838866&ver=637293890068580782> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:03:53 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1677882918&ver=637293890068580782>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1677882918&ver=637293890068580782',
 'o_price': '99,000',
 's_price': '29,900',
 'title': '20%오가닉순면/레터링/린넨 티셔츠 7종세트'}
2020-07-03 16:03:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1833787752&ver=637293890068580782> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:03:53 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=775838866&ver=637293890068580782>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=775838866&ver=637293890068580782',
 'o_price': '96,000',
 's_price': '28,800',
 'title': '[닥터자르트] 빅세일 20%+

2020-07-03 16:03:56 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1656817974&ver=637293890068580782>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1656817974&ver=637293890068580782',
 'o_price': '37,900',
 's_price': '37,530',
 'title': '[에이지투웨니스] 에센스 커버팩트 오리지널 케이스1+리필3'}
2020-07-03 16:03:56 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1787960274&ver=637293890068580782>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1787960274&ver=637293890068580782',
 'o_price': '46,000',
 's_price': '13,800',
 'title': '브리치x한미미 20%+7%쿠폰 티셔츠/블라우스/원피스'}
2020-07-03 16:03:56 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=767588368&ver=637293890068580782>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=767588368&ver=637293890068580782',
 'o_price': '33,000',
 's_price': '9,900',
 'title': '14K귀걸이(Gold-Pin) 외 로즈골드 쥬얼리 특가전'}
2020-07-03 16:03:56 [scrapy.core.scraper] DEBUG

2020-07-03 16:03:59 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1835442554&ver=637293890068580782>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1835442554&ver=637293890068580782',
 'o_price': '50,000',
 's_price': '42,000',
 'title': '[웅진주니어] 똑똑한 유아 독해 어휘 시리즈 단계별 선택구매 (1/2/3 단계)'}
2020-07-03 16:03:59 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1617952407&ver=637293890068580782>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1617952407&ver=637293890068580782',
 'o_price': '32,900',
 's_price': '31,270',
 'title': '[크록스] 모다아울렛 바야밴드  클로그 공용 샌들 205089-4CC'}
2020-07-03 16:03:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=173366583&ver=637293890068580782> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:03:59 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=173366583&ver=637293890068580782>
{'link

2020-07-03 16:04:01 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1837171356&ver=637293890068580782>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1837171356&ver=637293890068580782',
 'o_price': '28,600',
 's_price': '28,600',
 'title': '(예약판매) 크래비티 CRAVITY 2020 썸머 패키지 COME TOGETHER (REST VER) / 발매일 : 7월 '
          '31일'}
2020-07-03 16:04:01 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1795133051&ver=637293890068580782>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1795133051&ver=637293890068580782',
 'o_price': '29,800',
 's_price': '11,900',
 'title': '미식상회 참기름 대용량 350ml'}
2020-07-03 16:04:02 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1829759718&ver=637293890068580782> (referer: http://corners.gmarket.co.kr/Bestsellers)
2020-07-03 16:04:02 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1833813652&ver=6372

In [70]:
!ls gmarket/

[1m[36mgmarket[m[m        gmarket200.csv scrapy.cfg


In [71]:
pd.read_csv("gmarket/gmarket200.csv")

Unnamed: 0,link,o_price,s_price,title
0,http://item.gmarket.co.kr/Item?goodscode=18332...,26900,26900,국내생산) 3중 MB필터 덴탈마스크 50매/일회용
1,http://item.gmarket.co.kr/Item?goodscode=15452...,120000,108000,설화수 자음 2종세트/윤조에센스 외 기획세트 모음
2,http://item.gmarket.co.kr/Item?goodscode=18353...,27500,9900,블랑슈 센시티브 아기물티슈 캡형 100매x10팩
3,http://item.gmarket.co.kr/Item?goodscode=16706...,12900,3900,비니수 기획특가 여름신상모음전
4,http://item.gmarket.co.kr/Item?goodscode=18113...,104000,52500,[라코스테] 라코스테 썸머 시즌오프 X 빅세일 티셔츠외/아이템 모
...,...,...,...,...
195,http://item.gmarket.co.kr/Item?goodscode=17006...,46300,13900,브리치X씨샵인더룸 20%+7%쿠폰 티셔츠/블라우스/팬츠
196,http://item.gmarket.co.kr/Item?goodscode=18277...,49000,36460,[네파키즈] (신세계강남점) 네파키즈 20SS 파스텔로 윈드자켓 KGD0610
197,http://item.gmarket.co.kr/Item?goodscode=15449...,20550,12900,슬라이스 쪽갈비 400g+400g+400g
198,http://item.gmarket.co.kr/Item?goodscode=16083...,80000,69720,(홈스쿨링 학습지특가) 원리한글 + 원리수학 SET / 영재와창의 (옵션선택)


  # This is added back by InteractiveShellApp.init_path()


ServerSelectionTimeoutError: your_public_ip:27017: [Errno 8] nodename nor servname provided, or not known