Email Scraper is an open-source TypeScript library and CLI tool for extracting email addresses from webpages and entire websites. This email scraper helps developers automate email collection by providing ready-to-use modules that support both HTTP requests and headless browsers (Playwright/Puppeteer) for comprehensive web scraping.
Whether you need to scrape emails from a single contact page or crawl an entire website to collect email addresses, this tool provides a simple, reliable solution for email extraction and web scraping tasks.
- Email Extraction from Single Pages: Scrape email addresses from any webpage URL using HTTP requests or headless browsers
- Website Crawling: Automatically crawl entire websites to extract emails from multiple pages with configurable depth and page limits
- Multiple Scraping Methods: Choose between fast HTTP requests or full browser automation (Playwright/Puppeteer) for JavaScript-heavy sites
- TypeScript Support: Fully typed API for better developer experience and IDE autocomplete
- CLI Tool: Command-line interface for quick email scraping without writing code
- Library API: Import and use in your own Node.js projects for custom email scraping workflows
- Flexible Configuration: Customize timeout, crawl depth, page limits, and cross-domain crawling options
- Email Normalization: Automatically normalizes and deduplicates extracted email addresses
- Clone this repository:
git clone https://github.com/web-scraping-apis/email-scraper
cd email-scraper- Install dependencies:
npm install- Build the project:
npm run buildFor scraping JavaScript-rendered pages, install one of the supported browsers:
Using Playwright:
npm install playwright
npx playwright install chromiumUsing Puppeteer:
npm install puppeteerThe email scraper CLI makes it easy to extract emails from any webpage:
# Using HTTP requests (fast, for static pages)
npm run dev page https://example.com/contact
# Using headless browser (for JavaScript-rendered pages)
npm run dev page https://example.com/contact --browser
# After building, use the compiled version:
npm run scrape page https://example.com/contactCLI Options for Single Page Scraping:
-t, --timeout <ms>- Request timeout in milliseconds (default: 10000)-b, --browser- Use headless browser (Playwright/Puppeteer) instead of HTTP requests--wait-until <event>- Wait until event:load,domcontentloaded, ornetworkidle(default:load, browser mode only)
Crawl and extract emails from multiple pages on a website:
# Using HTTP requests (default)
npm run dev website https://example.com
# Using headless browser
npm run dev website https://example.com --browser
# After building:
npm run scrape website https://example.comCLI Options for Website Crawling:
-d, --max-depth <number>- Maximum crawl depth (default: 3)-p, --max-pages <number>- Maximum number of pages to crawl (default: 50)--cross-domain- Allow crawling to different domains (default: false)-b, --browser- Use headless browser (Playwright/Puppeteer) instead of HTTP requests--wait-until <event>- Wait until event:load,domcontentloaded, ornetworkidle(default:load, browser mode only)
Import and use the email scraper in your own TypeScript/JavaScript projects:
Extract emails from a webpage using simple HTTP requests:
import { scrapeEmailsFromUrl } from 'email-scraper';
const emails = await scrapeEmailsFromUrl('https://example.com/contact');
console.log(Array.from(emails));
// Output: ['contact@example.com', 'info@example.com']For JavaScript-rendered pages, use Playwright with the email scraper:
import { chromium } from 'playwright';
import { scrapeEmailsFromPage } from 'email-scraper';
const browser = await chromium.launch();
const page = await browser.newPage();
const emails = await scrapeEmailsFromPage(page, 'https://example.com/contact');
await browser.close();
console.log(Array.from(emails));Alternatively, use Puppeteer for browser automation:
import puppeteer from 'puppeteer';
import { scrapeEmailsFromPage } from 'email-scraper';
const browser = await puppeteer.launch();
const page = await browser.newPage();
const emails = await scrapeEmailsFromPage(page, 'https://example.com/contact');
await browser.close();Use the website crawler to extract emails from multiple pages:
import { scrapeEmailsFromWebsite } from 'email-scraper';
const emails = await scrapeEmailsFromWebsite('https://example.com', {
maxDepth: 3,
maxPages: 50,
sameDomainOnly: true,
useBrowser: false, // Set to true to use a browser
browser: browserInstance, // Optional: pass browser instance
});
console.log(`Found ${emails.size} unique emails`);For a more object-oriented approach, use the EmailScraper class:
import { EmailScraper } from 'email-scraper';
import { chromium } from 'playwright';
// With browser
const browser = await chromium.launch();
const scraper = new EmailScraper(browser);
const emails = await scraper.scrapeFromWebsite('https://example.com');
await browser.close();
// Without browser (HTTP only)
const scraper = new EmailScraper();
const emails = await scraper.scrapeFromUrl('https://example.com/contact');Note: After building (npm run build), you can import from ./dist/index instead of ./src/index when using the library locally.
-
scrapeEmailsFromUrl(url: string, options?: HttpScraperOptions): Promise<Set<string>>- Scrapes emails from a single webpage using HTTP requests
- Returns a Set of unique email addresses
-
scrapeEmailsFromPage(page: Page, url: string, options?: WebpageScraperOptions): Promise<Set<string>>- Scrapes emails from a webpage using a browser page instance
- Works with Playwright or Puppeteer page objects
-
scrapeEmailsFromWebsite(url: string, options?: WebsiteCrawlerOptions): Promise<Set<string>>- Crawls an entire website and extracts emails from all visited pages
- Supports configurable depth, page limits, and domain restrictions
-
extractEmails(text: string): Set<string>- Extracts email addresses from plain text using regex patterns
-
normalizeEmail(email: string): string- Normalizes email addresses to a standard format
EmailScraper- Main scraper class with convenience methods for both HTTP and browser-based scraping
A: You can use the email scraper CLI or library. For a single page, use npm run dev page <url>. For an entire website, use npm run dev website <url>. The tool automatically extracts email addresses from HTML content and text.
A: HTTP scraping is faster and works well for static HTML pages. Browser scraping (Playwright/Puppeteer) is necessary for JavaScript-rendered content where emails are loaded dynamically. Use the --browser flag to enable browser mode.
A: Use the --browser option with the CLI, or use scrapeEmailsFromPage() with a Playwright or Puppeteer browser instance. This allows the email scraper to wait for JavaScript to execute before extracting emails.
A: Yes, you can use the website crawler with --cross-domain flag, or write a script that calls scrapeEmailsFromUrl() or scrapeEmailsFromWebsite() multiple times with different URLs.
A: The scraper automatically deduplicates emails using a Set data structure. All functions return a Set<string> containing unique email addresses only.
A: This is a tool for extracting publicly available email addresses from websites. Always respect website terms of service, robots.txt files, and applicable laws (such as GDPR, CAN-SPAM) when scraping emails. Use responsibly and ethically.
A: After cloning the repo and running npm install and npm run build, you can import from ./dist/index. For npm package distribution, the library would be installed via npm install email-scraper (when published).
A: The email scraper recognizes standard email formats (user@domain.com) and handles various edge cases. It normalizes emails and validates them against common email patterns.
A: Yes, this email scraper can be used for lead generation by extracting contact emails from websites. However, ensure compliance with anti-spam laws and best practices for email marketing.
# Install dependencies
npm install
# Build TypeScript to JavaScript
npm run build# Development mode (uses ts-node, no build needed)
npm run dev page https://example.com
npm run dev website https://example.com
# Production mode (uses compiled JavaScript)
npm run scrape page https://example.com
npm run scrape website https://example.com/src- TypeScript source code/scrapers- Scraping implementations (HTTP, webpage, website crawler)/utils- Utility functions (email extraction, browser factory)/types- TypeScript type definitions
/dist- Compiled JavaScript outputcli.ts- Command-line interface implementation
Contributions are welcome! If you find a bug or have a feature request, please open an issue on GitHub.
This project is licensed under the MIT License - see the LICENSE file for details.
Keywords: email scraper, email extraction, web scraper, email harvesting, website crawler, email finder, contact scraper, lead generation tool, web scraping library, TypeScript scraper, Playwright scraper, Puppeteer scraper, email collection tool