Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get meta data without downloading? #758

Closed
jimliang opened this issue Apr 17, 2016 · 12 comments
Closed

How to get meta data without downloading? #758

jimliang opened this issue Apr 17, 2016 · 12 comments
Labels

Comments

@jimliang
Copy link

@jimliang jimliang commented Apr 17, 2016

I just want to get meta data only,not download it.

@rom1504

This comment has been minimized.

Copy link
Member

@rom1504 rom1504 commented Apr 17, 2016

What do you mean by metadata? Get the torrent file ?

@jimliang

This comment has been minimized.

Copy link
Author

@jimliang jimliang commented Apr 17, 2016

@rom1504 yes

@Tercus

This comment has been minimized.

Copy link
Contributor

@Tercus Tercus commented Apr 17, 2016

I'd guess you could remove the torrent as soon as you have the metadata.

var meta = client.add(torrentId)
meta.on('metadata', function (torrent) {
  //something something
  torrent.destroy()
})

I didn't test that though...

@jimliang

This comment has been minimized.

Copy link
Author

@jimliang jimliang commented Apr 17, 2016

@Tercus I tried before.But will still download files.

@DiegoRBaquero

This comment has been minimized.

Copy link
Member

@DiegoRBaquero DiegoRBaquero commented Apr 20, 2016

The torrent will auto start once metadata is found. What @Tercus suggested is the way to go at this moment

@emschwartz

This comment has been minimized.

Copy link

@emschwartz emschwartz commented Apr 27, 2016

I'd also like to be able to get the metadata / full torrent file before downloading the file contents.

(I'm working on a fork of webtorrent that includes payments. The torrent file contains some license information needed to pay for the right to download a file. The downloader needs to pay for the license before an uploader will seed the file to them. In order to do this the downloader needs the full torrent file first.)

I might work on having a way to block all requests except those for metadata if there isn't a better solution out there.

@nightwolfz

This comment has been minimized.

Copy link

@nightwolfz nightwolfz commented Apr 30, 2016

+1

Having this feature would simplify my code so much.

@feross feross added the enhancement label May 4, 2016
@feross

This comment has been minimized.

Copy link
Member

@feross feross commented May 4, 2016

@nightwolfz What does your code currently look like?

@feross feross added question and removed enhancement labels May 4, 2016
@emschwartz

This comment has been minimized.

Copy link

@emschwartz emschwartz commented May 6, 2016

If I'm not mistaken, torrent metadata is sent even when peers are choked (because only "requests" are dropped when choking, whereas extended messages, which are used by ut_metadata, are unaffected). This means that if you want to get the metadata first you should be able to just keep the peer choked until after the 'metadata' event.

@Ohge

This comment has been minimized.

Copy link

@Ohge Ohge commented May 23, 2016

@feross I can't speak for @nightwolfz but here is what I'm using to pull and emit metadata:

    function cleanUp(path){
        if(fs.existsSync(path)){
            var files = fs.readdirSync(path);
            files.forEach(function(file,index){
                var curPath = path + "/" + file;
                if(fs.lstatSync(curPath).isDirectory()){
                    cleanUp(curPath);
                } else {
                    fs.unlinkSync(curPath);
                }
            });
            fs.rmdirSync(path);
        }
    }
    function pullMetadata(magnet){
        var torrent = wt.add(magnet);
        console.log('Checking: ' + torrent.infoHash);
        torrent.eto = setTimeout(function(){
            torrent.destroy(function(){
                console.log('Timeout: ' + torrent.infoHash);
            });
        }, 120000); // 2 minutes for slow connections, small swarms, and or large metadata torrents
        torrent.on('metadata', function(){
            torrent.pause();
            clearTimeout(torrent.eto);
            var fileid = 0;
            torrent.files.forEach(function(file){
                file.deselect();
                var data = {infohash: torrent.infoHash, fileid: fileid, path: file.path, length: file.length};
                self.emit('file', data);
                fileid += 1;
            });
            torrent.destroy(function(){
                console.log('Completed: ' + torrent.infoHash);
                cleanUp(loc + torrent.infoHash);
            });
        });
    }

Having args in wt.add that would disable the creation of the file storage, automatically destroy torrents that are timing out, and deselect all files by default would eliminate most of the above code (the cleanUp, setTimeout, torrent.pause, and file.deselect) and likely reduce the overall resource usage for high volume scraping. In addition the recursive delete aspect of cleaning up the file storage is potentially dangerous to less experienced developers.

Here is what I think the simplified code might look like if the features were implemented:

    function pullMetadata(magnet){
        var opts = {onlyMetadata: true, timeout: 120000}; // just an example
        var torrent = wt.add(magnet, opts);
        console.log('Checking: ' + torrent.infoHash);
        torrent.on('timeout', function(){
            console.log('Timeout: ' + torrent.infoHash);
        });
        torrent.on('metadata', function(){
            var fileid = 0;
            torrent.files.forEach(function(file){
                var data = {infohash: torrent.infoHash, fileid: fileid, path: file.path, length: file.length};
                self.emit('file', data);
                fileid += 1;
            });
            torrent.destroy(function(){
                console.log('Completed: ' + torrent.infoHash);
            });
        });
    }

I dunno maybe it makes more sense as a wrapper like webtorrent-metadata or something like that.

@feross

This comment has been minimized.

Copy link
Member

@feross feross commented May 23, 2016

You can use the rimraf package to have an easier time recursively deleting files.

Or better yet, use the store option to client.add to pass a custom chunk store. You can use memory-chunk-store and no files should be created.

@homeryan

This comment has been minimized.

Copy link

@homeryan homeryan commented Dec 6, 2016

Thanks to feross, bittorrent-protocol, torrent-discovery and ut_metadata provide all the plumbings you need.

First you need torrent-discovery to find peers using infohash only, discovery = new Discovery(opts), make sure { dht: true } option is included. As soon as you have peer address from discovery.on('peer', function (peer) {}) event, you can create a client (the ut_metadata usage example is a server) with wire.use(ut_metadata()) and send the peer a handshake with { dht:true } option. Call wire.ut_metadata.fetch() after the peer responds to you with a handshake. Now listen to ut_metadata.on('metadata', function (metadata) {}) event for the metadata. That's basically it.

I implemented a bep9-metadata-dl module using the modules mentioned above with callback and Promise interface, plus timeout and concurrent metadata download connections options. Thanks again to to feross. Without his work, it'll take months if not years to make it work.

The following is proof of concept code to fetch ubuntu-16.04.1-server-amd64.iso's metadata:

const Discovery = require('torrent-discovery');
const Protocol = require('bittorrent-protocol');
const ut_metadata = require('ut_metadata');
const addrToIPPort = require('addr-to-ip-port');
const bencode = require('bencode');
const net = require('net');

const SELF_HASH = '4290a5ff50130a90f1de64b1d9cc7822799affd5';   // Random infohash
const INFO_HASH = '90289fd34dfc1cf8f316a268add8354c85334458';   // ubuntu-16.04.1-server-amd64.iso

Discovery({ infoHash: INFO_HASH, peerId: SELF_HASH, port: 6881, dht: true })
.on('peer', function (peer) {
  const peerAddress = { address: addrToIPPort(peer)[0], port: addrToIPPort(peer)[1] };
  console.log(`download metadata from peer ${peerAddress.address}:${peerAddress.port}`);
  getMetadata(peerAddress, INFO_HASH);
});

const getMetadata = (peerAddress, infoHash) => {
  const socket = new net.Socket();
  socket.setTimeout(5000);
  socket.connect(peerAddress.port, peerAddress.address, () => {
    const wire = new Protocol();

    socket.pipe(wire).pipe(socket);
    wire.use(ut_metadata());

    wire.handshake(infoHash, SELF_HASH, { dht:true });
    wire.on('handshake', function (infoHash, peerId) {
      wire.ut_metadata.fetch();
    })

    wire.ut_metadata.on('metadata', function (rawMetadata) {
      metadata = bencode.decode(rawMetadata).info;                // Got it!
      console.log(`${metadata.name.toString('utf-8')}:`);
      console.log(metadata);
      process.exit(0);
    })
  });
  socket.on('error', err => { socket.destroy(); });
}
@feross feross closed this Jan 18, 2017
@lock lock bot locked as resolved and limited conversation to collaborators May 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
9 participants
You can’t perform that action at this time.