Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DHT crawling #58

Closed
piedshag opened this issue Mar 29, 2015 · 5 comments
Closed

DHT crawling #58

piedshag opened this issue Mar 29, 2015 · 5 comments

Comments

@piedshag
Copy link

Hi just a quick question. Is there anyway I can crawl the bittorrent dht using this package. I have seen various implementations but they do not use this package. I am just stuck on how i can acquire the infohash of the swarm of various bittorrent peers. I have 0.05 BTC available for an appropriate solution Any help would be much appreciated. Cheers.

@feross
Copy link
Member

feross commented Apr 1, 2015

You can't really query a DHT node to find out what torrents it is tracking (as far as I'm aware). You can do the reverse though, then build a map of ip -> infohash as you crawl the DHT.

@feross feross closed this as completed Apr 1, 2015
@feross
Copy link
Member

feross commented Apr 6, 2015

@fbodz Looks like they're just issuing repeated, psuedo-random find_node or get_peer queries to the DHT and printing out a message whenever they find a new infohash.

So, in this way, you can discover infohashes. Then, if you combine that with a torrent client to connect and download the metadata (via the ut_metadata extension, supported in webtorrent btw) then you can learn what the torrent actually contains.

@ralyodio
Copy link

ralyodio commented Jun 8, 2020

Had anyone written a tutorial or script on how to do this?

@ralyodio
Copy link

ralyodio commented May 8, 2023

Can someone help me with this code? I'm trying to crawl the dht (using chat-gpt to help me, but its pretty buggy):

import DHT from 'bittorrent-dht';
import bencode from 'bencode';
import Protocol from 'bittorrent-protocol';
import net from 'net';
import Tracker from 'bittorrent-tracker';
import crypto from 'crypto';
import dotenv from 'dotenv-flow';
import Surreal from 'surrealdb.js';
import BaseController from './base.js';
import { Account } from '../../src/models/account.js';

dotenv.config()
const { DB_RPC_URL, DB_USER, DB_PASS, DB_NS, DB_DB, DB_PORT } = process.env;

export default class DHTCrawler extends BaseController {
    constructor(targetNodes = 1000) {
        super();
        // this.db = new Surreal(DB_RPC_URL);
        // this.account = new Account(this.db)
        this.targetNodes = targetNodes;
        this.dht = new DHT();
        this.discoveredInfoHashes = new Set();
    }

    async init() {
        await new Promise((resolve) => {
            this.dht.on('ready', () => {
                console.log('DHT is ready');
                resolve();
            });
        });

        this.dht.on('announce', (peer, infoHash) => {
            const { host, port } = peer;
            console.log(`announce: ${host}:${port} ${infoHash.toString('hex')}`);
        });

        this.dht.on('peer', (peer, infoHash, from) => {
            console.log('peer:', infoHash, peer.toString('hex'));
            const infoHashHex = infoHash.toString('hex');

            if (!this.discoveredInfoHashes.has(infoHashHex)) {
                this.discoveredInfoHashes.add(infoHashHex);
                console.log(`Discovered infohash: ${infoHashHex}`);
                this.fetchMetadata(infoHash, peer);
                this.lookupNext(infoHash);
            }
        });

        this.dht.on('response', (node) => {
            const nodeIdHex = node.r.id.toString('hex');
            if (!this.discoveredInfoHashes.has(nodeIdHex)) {
                this.discoveredInfoHashes.add(nodeIdHex);
                console.log(`Discovered response node: ${nodeIdHex}`);
            }
        });

        this.dht.on('find_node', (node) => {
            const nodeIdHex = node.toString('hex');
            if (!this.discoveredInfoHashes.has(nodeIdHex)) {
                this.discoveredInfoHashes.add(nodeIdHex);
                console.log(`Discovered find_node: ${nodeIdHex}`);
            }
        });


        // Bootstrap the DHT crawler with a known DHT node.
        this.dht.addNode({
            host: 'router.bittorrent.com',
            port: "6881"
        });

        console.log('DHT bootstrap completed');
        await this.lookupNext();
    }

    async fetchMetadata(infoHash, peer) {
        const socket = new net.Socket();
        const wire = new Protocol();

        const onMetadata = (metadata) => {
            const torrent = bencode.decode(metadata);
            console.log('Torrent metadata:', {
                infoHash,
                name: torrent.info.name.toString('utf-8'),
                files: torrent.info.files
                    ? torrent.info.files.map((file) => file.path.toString('utf-8'))
                    : [],
            });

            this.getSeedersAndLeechers(infoHash);
        };

        socket.setTimeout(5000, () => {
            socket.destroy();
        });

        socket.connect(peer.port, peer.host, () => {
            socket.pipe(wire).pipe(socket);
            wire.handshake(infoHash, this.dht.peerId, { dht: true });
        });

        wire.on('handshake', (infoHash, peerId, extensions) => {
            if (extensions.extended) {
                wire.extendedHandshake = { m: { ut_metadata: 1 } };
                wire.extended(0, bencode.encode(wire.extendedHandshake));
            }
        });

        wire.on('extended', (ext, buf) => {
            if (ext === 0) {
                const extendedHandshake = bencode.decode(buf);
                if (extendedHandshake.m && extendedHandshake.m.ut_metadata) {
                    const utMetadataId = extendedHandshake.m.ut_metadata;
                    wire.ut_metadata = new Protocol.UTMetadata(extendedHandshake.metadata_size);
                    wire.ut_metadata.fetch();
                    wire.on(`ut_metadata${utMetadataId}`, wire.ut_metadata.onMessage.bind(wire.ut_metadata));
                    wire.ut_metadata.on('metadata', onMetadata);
                }
            }
        });

        wire.on('timeout', () => {
            socket.destroy();
        });

        wire.on('close', () => {
            socket.destroy();
        });
    }

    getSeedersAndLeechers(infoHash) {
        const client = new Tracker({
            infoHash: infoHash,
            peerId: this.dht.peerId,
            announce: ['udp://tracker.openbittorrent.com:80'],
        });

        client.start();

        client.once('update', (data) => {
            console.log('Torrent seeders and leechers:', {
                infoHash,
                seeders: data.complete,
                leechers: data.incomplete,
            });
            client.stop();
        });

        client.on('error', (err) => {

            console.error(`Error getting seeders and leechers for ${infoHash}:`, err.message);
            client.stop();
        });
    }

    async lookupNext(infoHash) {
        if (this.dht.nodes.count() >= this.targetNodes) {
            console.log('Reached target node count');
            return;
        }

        if (!infoHash) {
            infoHash = crypto.randomBytes(20);
        }
        try {
            await new Promise((resolve, reject) => {
                this.dht.lookup(infoHash, (err) => {
                    if (err) {
                        reject(err);
                    } else {
                        resolve();
                    }
                });
            });
        } catch (err) {
            console.error('Error during lookup:', err);
        }


        setTimeout(() => this.lookupNext(infoHash), 1000);
    }
}

const crawler = new DHTCrawler();
crawler.init();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants