Skip to content

Find a place to live on Craigslist: filter by price, commute, etc and email the poster automatically.

License

Notifications You must be signed in to change notification settings

zhehaowang/Craigslist-nyc-apartment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Craigslist-nyc-apartment

This is a script that crawls recent Craigslist posting for apartments in New York City, filter them by some (hard coded) standards, and email the listing we are interested in.

The project was built and tested in the week of 04/27/17, and will not work if Craigslist updates their website (e.g. change element names, classes, layout, etc)!

Current standards are:

  • Cheaper than $1500
  • Has a location in the listing, and from that location takes less than 25 minutes commute to 731 Lexington Avenue (by Google map queried from Google API)
  • Not explicitly mentioning "female only"
  • Not a repost (having the same title or body hash)
  • Not already inquired (having sent an email already)

What it does afterwards:

  • In emailer folder there is a Flask app that sends email to recipients listed in the http GET parameter. If email option (hardcoded) is on, the crawler will tell the emailer app to send an email from a hardcoded account to a designated account.

How to run:

  • Replace YOUR.API.KEY in spider.py, replace email credentials and headers in emailer.py and settings.py
  • Meant to run as a cron (e.g. every 10 minutes), so that people gets your email rather soon after their listing appears, and the script don't need to worry about switching to next page since listings don't pop up that fast. Also sends email sooner boosts the response rate.
  • To run once, consult cron.sh

Workflow:

Logs and experience:

  • This ran for a week without being effectively blocked by captcha (which did happen in about a week, unfortunately)
  • This script keeps several logs:
    • actions.txt: the email addresses we are about to send inquiry to, if they have phone numbers on file, we keep a record of that, too
    • body-hashes.txt: sha256s of bodies and titles of listings we already queried (filter out user reposts)
    • logs.txt: warnings / errors in parsing each listing (for example housing type not present, expected tags not present, etc)
    • results.txt: well structured (json) data extracted from all listings we saw
    • starred.txt: well structured (json) data from listings that matches the filter criteria
    • urls.txt: listing urls we already checked (and won't check again)
  • The experiment was conducted between (04/27/17 and 05/04/17). The script ended up sending about 200 emails, fetching about 40 replies, scheduled about 30 FaceTime / Skype sessions scattered throughout two weeks. Among the 30 I'd be happy to go ahead with most of them (script does represent my criteria well :P), but ended up going to NYC in person and rented a different place.

Inspired by:

http://mherman.org/blog/2012/11/05/scraping-web-pages-with-scrapy/#.WQE8plPysUs

Dependency:

  • Scrapy

License:

MIT

Contact:

Drop Zhehao zhehao@cs.ucla.edu an email if interested!

Big thanks to all the people that kindly responded.

About

Find a place to live on Craigslist: filter by price, commute, etc and email the poster automatically.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published