The Scrapy Movie Subtitles Scraper is a Python-based project using Scrapy, a popular web crawling and web scraping framework. The project is designed to extract movie data, including titles, plots, and scripts, from the website subslikescript.com. The extracted data can be stored in either a MongoDB Atlas database or a SQLite database, showcasing the data dumping capabilities of the project.
- Clone this repository:
git clone https://github.com/zararashraf/ScrapyMoviesSubtitlesScraper.git
- Install the required libraries:
pip install scrapy pymongo
- Configure the MongoDB connection string and the SQLite database settings in the pipelines.
- Run the spider by executing the command:
scrapy crawl transcripts
- Check the output data in the configured MongoDB Atlas or SQLite database.
The project consists of the following key components:
spiders/transcripts.py
: Contains the Scrapy spider for scraping movie data from subslikescript.com.pipelines.py
: Includes two pipelines for dumping the scraped data into a MongoDB Atlas database and a SQLite database.requirements.txt
: Lists all the required dependencies for the project.
Data in SQLite DB
- Python 3.x
- Scrapy for web scraping
- pymongo for interacting with MongoDB
- sqlite3 for working with SQLite databases
The source code for this project can be found on GitHub.
- Scrapy: https://scrapy.org/
- pymongo: https://pypi.org/project/pymongo/
- SQLite: https://www.sqlite.org/index.html
This project is open-source and available under the MIT License. Feel free to use, modify, and distribute the code as needed.