Skip to content

The Scrapy Movie Subtitles Scraper is a Python-based project using Scrapy, a popular web crawling and web scraping framework. The project is designed to extract movie data, including titles, plots, and scripts, from the website subslikescript.com.

License

Notifications You must be signed in to change notification settings

zararashraf/ScrapyMoviesSubtitlesScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Scrapy Movie Subtitles Scraper

Description

The Scrapy Movie Subtitles Scraper is a Python-based project using Scrapy, a popular web crawling and web scraping framework. The project is designed to extract movie data, including titles, plots, and scripts, from the website subslikescript.com. The extracted data can be stored in either a MongoDB Atlas database or a SQLite database, showcasing the data dumping capabilities of the project.

Installation

  1. Clone this repository: git clone https://github.com/zararashraf/ScrapyMoviesSubtitlesScraper.git
  2. Install the required libraries: pip install scrapy pymongo
  3. Configure the MongoDB connection string and the SQLite database settings in the pipelines.

Usage

  1. Run the spider by executing the command: scrapy crawl transcripts
  2. Check the output data in the configured MongoDB Atlas or SQLite database.

Project Structure

The project consists of the following key components:

  • spiders/transcripts.py: Contains the Scrapy spider for scraping movie data from subslikescript.com.
  • pipelines.py: Includes two pipelines for dumping the scraped data into a MongoDB Atlas database and a SQLite database.
  • requirements.txt: Lists all the required dependencies for the project.

Images

The Website in question. image (3)

Data in SQLite DB

Data in MongoDB Atlas image image (1)

Libraries and Technologies Used

  • Python 3.x
  • Scrapy for web scraping
  • pymongo for interacting with MongoDB
  • sqlite3 for working with SQLite databases

Code Repository

The source code for this project can be found on GitHub.

Credits

License

This project is open-source and available under the MIT License. Feel free to use, modify, and distribute the code as needed.

About

The Scrapy Movie Subtitles Scraper is a Python-based project using Scrapy, a popular web crawling and web scraping framework. The project is designed to extract movie data, including titles, plots, and scripts, from the website subslikescript.com.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages