Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPNS publishing performance #3

Open
benhylau opened this issue May 19, 2018 · 24 comments
Open

IPNS publishing performance #3

benhylau opened this issue May 19, 2018 · 24 comments
Assignees
Labels
ipfs Related to IPFS

Comments

@benhylau
Copy link
Member

IPNS publishing takes more than a minute to propagate which is currently a blocker.

@darkdrgn2k
Copy link
Contributor

Running IPFS v .15 on Debian based system

When I try to publish it takes over ~2 minutes
Additionally when I try to access the published IPNS through any gateway its also takes ~2 mins

I have tried with no change the following

  • using127.0.0.1:8080 as a gateway where the data resides
  • checked swarm peers (over 600 all the time)
  • tried on NATed and NOT NATed ipfs server

Example: Publish takes ~ 2 minutes

ipfs@IP:/var/www/html/hls$ date > date.txt
ipfs@IP:/var/www/html/hls$ ipfs add date.txt
added QmXxk7Mn2LaKjsbFQHvCLtaXUChMfqEmqJrKxRdL6y9R1S date.txt
ipfs@IP:/var/www/html/hls$ time ipfs name publish QmPn6i5JxPnobYcHqdieDPH82nXnhmCHLqkgfL95jQ4j3W
Published to QmU6NeD2Uu34WKest1NZGvGmScLhN1zVo66K35GeE6Jft2: /ipfs/QmPn6i5JxPnobYcHqdieDPH82nXnhmCHLqkgfL95jQ4j3W

real	2m0.267s
user	0m0.044s
sys	0m0.004s
ipfs@IP:/var/www/html/hls$

Example: Request of IPNS content takes 2 mins

root@HTPC:~# time wget https://ipfs.io/ipns/QmSp4Do1bwo8HjgQXwUmicqkg8ddET1DZKBiCzxw2TdCyo
--2018-05-19 12:42:13--  https://ipfs.io/ipns/QmSp4Do1bwo8HjgQXwUmicqkg8ddET1DZKBiCzxw2TdCyo
Resolving ipfs.io (ipfs.io)... 209.94.78.78
Connecting to ipfs.io (ipfs.io)|209.94.78.78|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 221 [text/plain]
Saving to: ‘QmSp4Do1bwo8HjgQXwUmicqkg8ddET1DZKBiCzxw2TdCyo’

QmSp4Do1bwo8HjgQXwU 100%[===================>]     221  --.-KB/s    in 0s      

2018-05-19 12:44:04 (13.8 MB/s) - ‘QmSp4Do1bwo8HjgQXwUmicqkg8ddET1DZKBiCzxw2TdCyo’ saved [221/221]

real	1m50.735s
user	0m0.008s
sys	0m0.004s

However another user trying this had no problems
IE
https://ipfs.io/ipns/Qmbk1vmcgSczUjiz6KPjV3DoHXirPKhBbQKEttCz2gvxFR/hls/date.txt
takes seconds

@benhylau
Copy link
Member Author

@darkdrgn2k you are starting the wget immediately after the ipfs name publish completes? What if you do wget a second time does it take another 2 minutes or then it becomes immediate?

@darkdrgn2k
Copy link
Contributor

Could be a second or an hour does not have impact on it.

I tried a day later with same results.

Even after the first successfull try second one still takes as long too.

@benhylau
Copy link
Member Author

@flyingzumwalt shall we link this into a ipfs repo?

@whyrusleeping
Copy link

Hey @benhylau, Just now seeing this issue. To get good latencies, you will want to use the ipns over pubsub. There are some docs here: https://github.com/ipfs/go-ipfs/blob/master/docs/experimental-features.md#ipns-pubsub

Also see the ipfs name pubsub subcommand for some control commands.

You will need to have this feature enabled on both the publishers and the resolvers (your gateway nodes, ideally) and they will need to be connected in some sort of topology (to receive pubsub messages from a publisher, you have to be connected through some graph of subscribed nodes).

Both @lgierth and @flyingzumwalt are out on vacation right now, so you can ping me if you need anything until they are back.

@darkdrgn2k
Copy link
Contributor

@whyrusleeping we looked at this a while ago but it does not seem to have any effect.
Are we missing somethign maybe?

ipfs daemon --enable-namesys-pubsub
ipfs name pubsub state

enabled

time ipfs name publish QmXZUTbf28Ra85GZ1gHqURYh1v3vAh9ZBXMs1Te15h6vGp
Published to QmTVrsREgGQ7QN5wGhDJshAmPXPV4ZVcw7XQ9X7wDSwEWc: /ipfs/QmXZUTbf28Ra85GZ1gHqURYh1v3vAh9ZBXMs1Te15h6vGp

real	4m26.323s
user	0m0.080s
sys	0m0.036s

@whyrusleeping
Copy link

@darkdrgn2k ah, yeah. Under the hood it publishes to both pubsub and the slow DHT. subscribers should receive updates very quickly (less than a second ideally). We should probably add an option to publish only to pubsub, but doing that prevents non-pubsub users from resolving the content (the DHT is a fallback).

If you just want to test out the pubsub stuff working quickly, you could add a --timeout=2s flag to your publish command, and see how things work.

@darkdrgn2k
Copy link
Contributor

How are remove clients subscribed? Is it just enabling it in the client

I would assume if we are asking for the data from a gateway they need to do it a well? And ipfs. Io probobly doesn't run experimental flags?

Also anothe user is able to publish via regular ipns in a fraction of the time

What contributes to ipns speed?

@whyrusleeping
Copy link

Yeah, if the client has the feature enabled, they will use it. The first resolve will still be slow, as thats when the subscription starts, and since it doesnt have any value at the beginning of the subscription it has to fall back to the DHT, but subsequent updated lookups will be fast.

If you are asking for data from the gateway, the gateway will need this feature enabled. And yes, our ipfs.io gateways do not run experimental flags, though you might be able to ask @lgierth nicely when he comes back ;)

@whyrusleeping
Copy link

What contributes to ipns speed?

Standard (non-pubsub) IPNS is a simple DHT 'Put' operation. The main reason it is slow is that connectivity across the network is rather poor right now (too many clients behind NATs). The way that our DHT (Kademlia) works, is that it selects the K (we use 20, as suggested by the paper) closest peers by XOR metric to the key you are publishing to, and sends them the value. With how many badly NATed clients we're seeing, the majority of these dials are timing out, causing the 'Put' operation to be slow. To address this, we are working on a few different things, but the two main ones are; first: an option for clients to auto-detect if they are behind a bad NAT, and not participate in the DHT if so. Second: better relay logic, so we can ask other peers to relay connection for us to these peers that are NATed.

@darkdrgn2k
Copy link
Contributor

Thanks for the explanation.

If I understand correctly currently it is pure luck if IPNS will be fast or slow.

If i have a closed network (CJDNS) that has no access to the "Internet".
I place a IPNS server on a node that can route to both networks (Internet and CJDNS)
Bootstrap the nodes on the CJDNS network with that servers address

Would the CJDNS nodes be populated with unreachable Internet IP addresses as well? And as such still cause the IPNS delays we have on the regular network.

@whyrusleeping
Copy link

If I understand correctly currently it is pure luck if IPNS will be fast or slow.

Yes, as things stand now, its pretty much just luck that determines DHT publishing speed.

If you ran ipfs entirely on a cjdns network (with no egress to the normal internet), then you wouldnt have a problem. The current issues are only happening because so many users are running ipfs nodes at home without any NAT traversal or port forwarding.

@darkdrgn2k
Copy link
Contributor

IPNS over PUBSUB seems to be the way to go after all.

We timeout the publish so it doesn't spend time doing DHT

First grab of the IPNS is still a bit slow. But I think it is only the "first" person pulling from that gateway.

@benhylau
Copy link
Member Author

@darkdrgn2k before we close this one:

  1. Is our 1s timeout a little aggressive?
  2. Can we now drop our chunk duration to 10s?

Thanks for the help @whyrusleeping :)

@whyrusleeping
Copy link

@benhylau Not a problem :) Glad to hear things are working, please please please let me know how it goes or if anything goes wrong. I'll be watching the stream when its up!

@darkdrgn2k
Copy link
Contributor

I have been running the stream for several hours now work well
(http://tomesh.org/stream/)

Only issue so far is that the ipns sets a cache of 1 min creating a "stall" of new segments.

@whyrusleeping
Copy link

@darkdrgn2k ah, interesting. That happens only on new streams, right? things run smoothly once it gets started?

The one minute timeout is a server-side namesys cache. I think we should actually not be using that cache if we're using the pubsub routing (as the pubsub router effectively maintains its own cache). I'm gonna call this one a bug and file it for you.

@darkdrgn2k
Copy link
Contributor

Spinning up a production rig for the stream we found some issues with pubsub name.

Background
HLS creates ~15 second chunks of video that is hashed into IPFS
A play list is created with a sequence of chunks in a text file called an M3U8 file
This file is constantly downloaded by the player to learn of new chunks.

We publish the m3u8 text file every ~ 15 seconds (after every new chunk is created and added to ipfs)

We add them as such

m3u8hash=$(ipfs add current.m3u8 | awk '{print $2}')
ipfs name publish --timeout=5s $m3u8hash &

We found that IPNS would at times become stunned for a little while before updating the list.
The data returned would be stale, and sometimes as many as 10 updates would pass before it returned to normal.

Other Times it would seem to work decently.

Few things to possibly consider

  • Is 5s timeout enough to publish over IPNS PubSub
  • Is it still publishing over DHT and maybe being picked up by the gateway

After switching to a plain http hosed version of the m3u8 file things seem to stream much better.

@benhylau
Copy link
Member Author

benhylau commented Jul 2, 2018

First we need to move the IPNS publishing into a separate process. I am doing it manually to observe the results, using publish and resolve. They look like:

publish

grep SEQUENCE live/current.m3u8 && ipfs add live/current.m3u8 | awk '{print $2}' | ipfs name publish --timeout=15s 2>/dev/null || true

resolve

ipfs id | jq .ID | ipfs name resolve --timeout=15s 2>/dev/null | ipfs cat | grep SEQUENCE

A publish-resolve run looks like this:

root@ipfs-server:~# ./publish
#EXT-X-MEDIA-SEQUENCE:572
root@ipfs-server:~# ./resolve
Error: argument "ipfs-path" is required <-- Immediately after there is this awkward time
root@ipfs-server:~# ./resolve
#EXT-X-MEDIA-SEQUENCE:572 <-- Yay we resolved to the new playlist
root@ipfs-server:~# ./resolve
#EXT-X-MEDIA-SEQUENCE:542 <-- Huh? How did we resolve back to the old one?
root@ipfs-server:~# ./resolve
#EXT-X-MEDIA-SEQUENCE:542 <-- And it kept like that for a while, this is an IPFS bug?
...
root@ipfs-server:~# ./resolve
#EXT-X-MEDIA-SEQUENCE:542
root@ipfs-server:~# ./resolve
Error: argument "ipfs-path" is required <-- We enter these awkward times again with our 15s timeout
root@ipfs-server:~# ./resolve
Error: argument "ipfs-path" is required <-- This is when our players crap out we don't want to be here
root@ipfs-server:~# ./resolve
#EXT-X-MEDIA-SEQUENCE:572 <-- Okay we are back to the current playlist
root@ipfs-server:~# ./resolve
#EXT-X-MEDIA-SEQUENCE:572 <-- These return in < 1s, we want our player to see this!

Let's see across multiple nodes:

root@ipfs-server:~# date +%M:%S && grep SEQUENCE live/current.m3u8 && ipfs add live/current.m3u8 | awk '{print $2}' | ipfs name publish --timeout=5s || true && grep SEQUENCE live/current.m3u8 && ipfs id | jq .ID | ipfs name resolve | ipfs cat | grep SEQUENCE && curl -s "http://ipfs-mirror-0.mesh.world:8080/ipns/$(ipfs id | jq .ID | tr -d \")" | grep SEQUENCE && date +%M:%S
46:46
#EXT-X-MEDIA-SEQUENCE:631 <-- The sequence number of the current.m3u8 I am publishing
Error: Post http://127.0.0.1:5001/api/v0/name/publish?encoding=json&key=self&lifetime=24h&resolve=true&stream-channels=true&timeout=5s: context deadline exceeded
#EXT-X-MEDIA-SEQUENCE:631 <-- By the time I am done publishing ffmpeg is here
#EXT-X-MEDIA-SEQUENCE:572 <-- My local resolves to old value still, IPFS bug?
#EXT-X-MEDIA-SEQUENCE:631 <-- On the ipfs-mirror-0 node it picks up the new playlist immediately
46:52

If we put in those 15s timeouts in both publish and resolve:

root@ipfs-server:~# date +%M:%S && grep SEQUENCE live/current.m3u8 && ipfs add live/current.m3u8 | awk '{print $2}' | ipfs name publish --timeout=15s 2>/dev/null || true && grep SEQUENCE live/current.m3u8 && ipfs id | jq .ID | ipfs name resolve --timeout=15s 2>/dev/null | ipfs cat | grep SEQUENCE && curl -s "http://ipfs-mirror-0.mesh.world:8080/ipns/$(ipfs id | jq .ID | tr -d \")" | grep SEQUENCE && date +%M:%S
52:53
#EXT-X-MEDIA-SEQUENCE:655 <-- Published
#EXT-X-MEDIA-SEQUENCE:656 <-- Current
#EXT-X-MEDIA-SEQUENCE:648 <-- Local resolve
#EXT-X-MEDIA-SEQUENCE:655 <-- Remote mirror resolve
53:08

I think the way forward is to publish less frequently, and more aggressively resolve, such that most of the time resolutions by the player are fast-returning. A few things will trip up the player:

  • Going back to an old playlist
  • Slow returns will timeout

So we should aim for minimal downtime and have playlist always available in < 1s.

@benhylau
Copy link
Member Author

benhylau commented Jul 2, 2018

Sometimes we'd run into this bug where on one node you resolve:

root@ipfs-server:~# echo "QmWCNF1APg3nbsLyQDcrMTsd7hnjyGL1ddoi9MNrRdPQ2A" | ipfs name resolve --timeout=15s 2>/dev/null | ipfs cat
#EXTM3U
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:15
#EXT-X-MEDIA-SEQUENCE:752
#EXT-X-START:TIME-OFFSET=-60
#EXTINF:18.033000,
http://ipfs-server.mesh.world:8080/ipfs/QmZdMYtK2eVwVS3aeCfzPLuWHr9hw1Xg9uGFcYc8jpCR1y
#EXTINF:18.667000,
http://ipfs-server.mesh.world:8080/ipfs/QmS3gvcguGtdGAFhQ6rkgv35NCQfbSbNxmACbnMzRF2qTy
#EXTINF:9.766000,
http://ipfs-server.mesh.world:8080/ipfs/QmUyrb2522df9f1YxBUHzsy76MmQhXHeP6uw9zvdiEcYMc
#EXTINF:13.800000,
http://ipfs-server.mesh.world:8080/ipfs/QmVYXzkaj8dNXoEgmyYLnS8mvr2Xum3YV2AQyq2qJHs3UD
#EXTINF:19.734000,
http://ipfs-server.mesh.world:8080/ipfs/QmRRyWHD4QCd1RXuMMupPHWAnXRC758JLh7v1HgnZzy31k
#EXTINF:16.666000,
http://ipfs-server.mesh.world:8080/ipfs/QmNRdG631Zy4reatckDdJuPAo8x1zj4fjhmDDGc3q8T54A
#EXTINF:8.334000,
http://ipfs-server.mesh.world:8080/ipfs/QmUT7CZHAN7ans8B5UMG8KkwzsRy8FePQdUzYynvAssz6G
#EXTINF:16.666000,
http://ipfs-server.mesh.world:8080/ipfs/QmSBocE4LXR6r9uQBkB7pqnecFHznvRRS1UYahzgbKYYW9
#EXTINF:16.667000,
http://ipfs-server.mesh.world:8080/ipfs/QmPmv4qZ7LDRaxsMDRhe6Ye9ECrJCT5UpvqPdVAj1tMTi8
#EXTINF:17.200000,
http://ipfs-server.mesh.world:8080/ipfs/QmRLTdboXxhEeHqfcjCD8JBLs6YBsdoxSLRiCc5cjizFNP
#EXTINF:16.333000,
http://ipfs-server.mesh.world:8080/ipfs/QmeGuuVjECqdJ6hcN6AuR5JS1472JthpyBnarwG38kgCXF
#EXTINF:13.934000,
http://ipfs-server.mesh.world:8080/ipfs/Qmd4zwWYYgS8eGMKoY1ngWrMiGn6eMaGWi63vigN1BmbQ1
#EXTINF:16.666000,
http://ipfs-server.mesh.world:8080/ipfs/QmREnHmTSQGNCeqsMzpbeSVgpYohVUVhgSZtmBVCBqcYDc
#EXTINF:8.000000,
http://ipfs-server.mesh.world:8080/ipfs/QmbEGY7x59o5ZhwSDXmpXtbDvkKGbqRNC4XLW9u2EnZorT
#EXTINF:16.067000,
http://ipfs-server.mesh.world:8080/ipfs/QmNkAaLAemCbiHdbjFFzT9kTksvcXvh71JPfhoEddZKDsv
#EXTINF:16.200000,
http://ipfs-server.mesh.world:8080/ipfs/QmaTNvn7mzS2odSx8FqQnrzEuBiytpyfUcuLng3RxHssSJ
#EXTINF:19.767000,
http://ipfs-server.mesh.world:8080/ipfs/QmdmhRqv1X1nLPoARKA48AdU9ZUKSWaXyfMh1BLdS7ugGh
#EXTINF:12.500000,
http://ipfs-server.mesh.world:8080/ipfs/QmeDjuZq4L44SKrMtKANUD7wkXB2NU3kzMVebus19ydY5L
#EXTINF:11.233000,
http://ipfs-server.mesh.world:8080/ipfs/QmUPYoUJ5mcsDbSPiL5spCefVZ6mTJBDRRreMmQ6EW96iq
#EXTINF:20.067000,
http://ipfs-server.mesh.world:8080/ipfs/QmPx6D5Xiqhxo6SNx2gqHbLfgEuKzj62G9F4KwE2mgknwy
#EXT-X-ENDLIST

And on another node (in this case the mirror):

root@ipfs-mirror:~# echo "QmWCNF1APg3nbsLyQDcrMTsd7hnjyGL1ddoi9MNrRdPQ2A" | ipfs name resolve --timeout=15s 2>/dev/null | ipfs cat
Error: this dag node is a directory
root@ipfs-mirror:~# echo "QmWCNF1APg3nbsLyQDcrMTsd7hnjyGL1ddoi9MNrRdPQ2A" | ipfs name resolve --timeout=15s 2>/dev/null | ipfs cat
Error: argument "ipfs-path" is required

It is not a transient thing, ipfs-mirror will resolve like this indefinitely (at least for half an hour it's been like this).

cc. @whyrusleeping @lgierth

@whyrusleeping
Copy link

@benhylau very interesting... I wonder if its a connectivity thing. If a pubsub ipns node gets disconnected from the swarm of the publisher, it won't be able to resolve until they get connected again (its not receiving any updates). Could you verify with ipfs swarm peers that the two are connected while the failure is occurring?

@benhylau benhylau mentioned this issue Jul 2, 2018
6 tasks
@benhylau
Copy link
Member Author

benhylau commented Jul 8, 2018

@whyrusleeping we unfortunately have to remove IPNS dependency for v1.0 which is the tag we intend to use for Our Networks. Perhaps we can have a sprint with @lgierth to debug these IPNS issues we are experiencing :)

@whyrusleeping
Copy link

@benhylau i'm pretty sure I know the issue now. The pubsub code doesnt add any connection manager tags, so ipfs doesnt know to keep connections to pubsub topic friends alive, meaning they will randomly be closed. I'm gonna try and hack that in this week and we can see how it helps.

@benhylau benhylau added the ipfs Related to IPFS label Jul 13, 2018
@benhylau
Copy link
Member Author

This will re-enable IPNS for testing purposes: https://github.com/tomeshnet/ipfs-live-streaming/tree/enable-ipns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ipfs Related to IPFS
Projects
None yet
Development

No branches or pull requests

3 participants