Skip to content
A web crawler that fetches K-pop song details and lyrics from top charts
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
kpop_crawler
.gitignore
README.md
scrapy.cfg

README.md

kpop_crawler

kpop_crawler is a web crawler that fetches song details and lyrics from Korean top chart websites. It implements the Scrapy framework and contains spiders that extend from scrapy.Spider.

Note: This project is in its early stages and is subject to (very) frequent changes. However, feel free to use or refer to it for your own use!

Output

The bugs_chart spider fetches information in the following structure:

attributes date rank title lyrics artist featuring composer lyricist arranger album time
example values 20170706 1 팔레트 (Feat. G-DRAGON) 이상하게도 요즘엔 그냥 쉬운 게 좋아\r\n... 아이유(IU) G-DRAGON 아이유(IU) 아이유(IU) [empty] Palette 03:37

Sources

Information is scraped from the following websites:

Dependencies

Usage

From the base directory, run the following Scrapy command from your terminal:

scrapy crawl bugs-chart -o lyrics.csv

To save the output in JSON format,

scrapy crawl bugs-chart -o lyrics.json

Disclaimer

The crawler is meant to be used for non-commercial purposes only (e.g. for sating one's own curiosity), and the information fetched should not be shared without permission of the rightful owners. When using the crawler, one should send requests at a reasonable rate.

You can’t perform that action at this time.