Skip to content

khanosama783/RedditDataScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reddit Scraper

This Playwright code scrapes Reddit posts from the /r/rust/new/ subreddit and publishes them to an SQS queue. It scrapes posts that were published in the last 24 hours.

This scraper is designed to be efficient and scalable. It uses Playwright to automate the web scraping process and publishes the scraped posts to an SQS queue for further processing.

Usage

To use the scraper, you will need to install Node.js and Playwright. Once you have installed the required dependencies, you can start the scraper by running the following command:

node index.js

This will start the scraper and publish the scraped posts to the SQS queue.

Environment Variables

The scraper requires the following environment variables to be set:

CONNECTION_URL: The connection URL for the SQS queue.

Benefits of Using Playwright

Playwright is a modern web automation tool that is well-suited for web scraping. It has the following advantages:

It supports multiple browsers, including Chromium, Firefox, and WebKit.
It is easy to use and has a well-documented API.
It is efficient and scalable.

Conclusion

This Playwright code is a professional solution for scraping Reddit posts. It is efficient, scalable, and easy to use.

Additional Notes

The scraper can be easily modified to meet your specific needs. For example, you could modify it to scrape posts from a different subreddit, to scrape posts that were published in a different time period, or to scrape different types of data from the posts.
The scraper can be deployed to a production environment using a tool such as AWS Lambda. This would allow you to run the scraper on demand and to scale it automatically based on the amount of traffic.
The scraper can be integrated with other systems, such as a data warehouse or a machine learning model. This would allow you to further process the scraped data and to extract insights from it.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published