New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load images from IPFS #360

Closed
geir54 opened this Issue Dec 16, 2016 · 31 comments

Comments

Projects
None yet
10 participants
@geir54
Copy link

geir54 commented Dec 16, 2016

It would be great if mastodon was using ipfs to load/store images. This way if a server goes down the content would stay up if someone else has loaded it. It would also reduce the bandwidth on the server hosting the original content.

To elaborate only the backend needs to run IPFS. The users could but that needs to be optional.

@ineffyble

This comment has been minimized.

Copy link
Collaborator

ineffyble commented Jan 18, 2017

Related to #477 but has a more limited scope (just media), which sounds more feasible, as alternate media backends already exist (i.e. S3)

@ProgVal

This comment has been minimized.

Copy link
Contributor

ProgVal commented Apr 3, 2017

Although I'm a big fan of the ideas in IPFS, I would prefer it not to be a depdency of Mastodon.
Mastodon already depends heavily on Ruby and Nodejs; adding IPFS would significantly complexify instance creation because it requires a Golang environment and a rather huge set of dependencies.

@swaldie swaldie added the enhancement label Apr 4, 2017

@Dragnucs Dragnucs referenced this issue Apr 13, 2017

Closed

IPFS Backend? #1678

@mecab

This comment has been minimized.

Copy link
Contributor

mecab commented May 9, 2017

I'm now adding IPFS media backend support by writing custom Paperclip Storage (mecab/paperclip-ipfs-storage) and integrated to mastodon. The storage is built on top of hjoest/ruby-ipfs-api so it is not depends on Golang. 😊

If you are interested, you can try it from my fork. (note you should use ipfs branch). It has an issue which is unstable connection with Docker so I will need to fix it before raising PR.

Here is the screen capture. You can see the image is served via IPFS gateway.

screen shot 2017-05-10 at 1 27 32 am

Anyway I'm just posted here to show current progress. Any suggestions and questions are welcome 😆

Possible related issues: #477, #1847

@Gargron

This comment has been minimized.

Copy link
Member

Gargron commented May 9, 2017

I'm worried about not being able to delete files. What does everyone else think?

@nightpool

This comment has been minimized.

Copy link
Collaborator

nightpool commented May 9, 2017

I would really like to take this and work on it at some point. I think that running an IPFS server alongside a mastodon instance as an optional enhancement (instead of using S3) makes a ton of sense—decentralized file storage for decentralized social media.

@Gargron I think that not deleting things is probably a red herring as far as our application goes. If the status gets deleted then there will be no references to the media file, and there will be no way to find it except by knowing the hash. Furthermore, it can (and should) also get unpinned by any of the nodes that are pinning it, causing them to garbage collect it, meaning that the only ways it would still be accessible is if any non-mastodon IPFS nodes had mirrored it. (same as status deletion, basically)

we should not use the IPFS gateway though—instead, mastodon instances should have their own local gateways and we should document setting that up. There's some bikeshedding to be had here about how to make these gateways non-public (some nginx magic?)

@nightpool

This comment has been minimized.

Copy link
Collaborator

nightpool commented May 9, 2017

re: deletion, I'll try asking the IPFS team if they'd consider best-effort deletion as an improvement to the protocol.

EDIT: this may be impractical—I don't know if IPFS nodes currently have any way of tracking the ""originator"" of content, so there's no real way to authenticate a deletion request. more investigation needed

@mecab

This comment has been minimized.

Copy link
Contributor

mecab commented May 10, 2017

@Gargron Yes, I understand that not being able to delete files could be a problem, but I agree to @nightpool that the files are slightly going be deleted as the hash got forgotten.

One concern is that some bad guy will spread the media's URL (i.e., with the hash) in other places after the toot has been deleted. However in this case he/she will try to paste the original content (instead of the URL) if we don't use IPFS.

@mecab

This comment has been minimized.

Copy link
Contributor

mecab commented May 10, 2017

Avoiding use of public gateway but setting up special gateway that hide the hash but it checks the availability of the original toot then proxy the contents from IPFS network could solve the problem.

@davidak

This comment has been minimized.

Copy link

davidak commented May 10, 2017

You see often screenshots of tweets (in the media) and i download pictures that i like. So you can't really delete something from the internet.

The idea of IPFS is to archive the internet when people care about the content (e.g. pin it).

@mecab

This comment has been minimized.

Copy link
Contributor

mecab commented May 10, 2017

@davidak Yes, it is what I wanted to say. I'm for that no way to delete the media could be not a big problem.

harukasan pushed a commit to pixiv/mastodon that referenced this issue May 18, 2017

@ProgVal

This comment has been minimized.

Copy link
Contributor

ProgVal commented Jun 16, 2017

@mecab Does your implementation support unpinning files that should be deleted?

@ProgVal

This comment has been minimized.

Copy link
Contributor

ProgVal commented Jun 16, 2017

@mecab Other question (sorry for the notification spam): is it possible to migrate an existing Mastodon instance to the IPFS storage, without breaking existing URLs?

@wxcafe

This comment has been minimized.

Copy link
Contributor

wxcafe commented Sep 26, 2017

Is this branch still compatible with the current state of mastodon? If not, would you be willing to port it @mecab?

@mecab

This comment has been minimized.

Copy link
Contributor

mecab commented Sep 27, 2017

@ProgVal Oops, I found your mention now, sorry.

Does your implementation support unpinning files that should be deleted?

Currently not, but I think it is possible to implement.

is it possible to migrate an existing Mastodon instance to the IPFS storage, without breaking existing URLs?

Could be. But it needs some script to done it.

@mecab

This comment has been minimized.

Copy link
Contributor

mecab commented Sep 27, 2017

@wxcafe

Is this branch still compatible with the current state of mastodon?

I'm sorry but I'm not sure about that since I cannot have enough time to develop recently. If IPFS support is urgent, it is no problems for me that you implement using my pieces of code, or start from scratch.

I think it needs just few modification to adapt to current state even if had conflicts. But please note I have still not resolved this (mecab/paperclip-ipfs-storage#1) issue.

@wxcafe

This comment has been minimized.

Copy link
Contributor

wxcafe commented Sep 29, 2017

it's not urgent (last comment before mine was over 3 months ago ^^), just checking. Thanks for the work you've already done on this 👍

@mecab

This comment has been minimized.

Copy link
Contributor

mecab commented Sep 29, 2017

I see, thanks!

abcang added a commit to pixiv/mastodon that referenced this issue Oct 19, 2017

Merge pull request tootsuite#360 from pixiv/pictures
Remove invalid example in the spec of Api::V1::TracksController
@ghost

This comment has been minimized.

Copy link

ghost commented Mar 25, 2018

IPFS should not be used in my opinion.

I want my stuff to be deleted if I want it to be deleted (even if rogue instances or people can copy it).

Never being able to delete pictures should not be the default behavior. (As in, it shouldn't assume that I want it to be archived forever and ever using IPFS)

This could be an option in the settings if possible.

@ProgVal

This comment has been minimized.

Copy link
Contributor

ProgVal commented Mar 25, 2018

@lionirdeadman IPFS does not automatically replicate your content. Unless someone manually pins the content, the content will only be cached temporarily by other nodes that access it.

@ghost

This comment has been minimized.

Copy link

ghost commented Mar 26, 2018

But that still means it's forever on the instance my account is on. Does it not?

(I may have misunderstood the talk above)

@ProgVal

@ProgVal

This comment has been minimized.

Copy link
Contributor

ProgVal commented Mar 26, 2018

@lionirdeadman It's forever on that instance, unless the instance unpins it. (search for "unpin" in previous messages in the thread)

@ghost

This comment has been minimized.

Copy link

ghost commented Mar 26, 2018

If I understood correctly, unpinning only forgets the location of the data and not the data itself.

So my data still exists there and I don't want that.

(Please correct me if I'm wrong)

@ProgVal

@ProgVal

This comment has been minimized.

Copy link
Contributor

ProgVal commented Mar 26, 2018

unpinning causes your instance to eventually forget the data.

@ghost

This comment has been minimized.

Copy link

ghost commented Mar 26, 2018

Eventually? How much time would it generally take for it to be completely gone/overwritten/deleted after being unpinned?

I feel that if it depends on instance activity, it's a bad idea because you can't guarantee to the user that the data will be deleted in any kind of timeframe and this could be legal trouble too.

@petersjt014

This comment has been minimized.

Copy link

petersjt014 commented Mar 27, 2018

It looks like scheduled garbage collection is disabled by default, but collection after reaching the watermark is not.

A bit on your earlier comment:

IPFS is content-addressed. When you ask the network to find a file, you are giving it the content (or at least a 'fingerprint' of it) and obtaining locations to download from. This is roughly the reverse of more traditional/common location-addressed systems like website urls.

If you add something to your repo, it's 'private' as long as only you know the hash. As soon as one other person knows, keeping it private relies on trusting them (and this doesn't apply if it's leaked in public, of course).

If others know the hash but the file is unique (meaning only you own it), then they can't get it unless you bring your client online while the the file is added to your repo--but if the file is small enough that someone could 'guess' it via brute force, then it's also no longer private. That probably isn't likely unless they already have a significant fragment of it and know how you chunked it (if you didn't use the default)--if not, then even 1 KiB represents 2^8192 - 1 possible files.

Nearly every possible file has a unique hash (or at least the chances of collisions are extremely low). For an example, I hashed "lol" (with a linefeed at the end because echo does that on unix) while offline on one machine and then (on a different one) queried the DHT for the resulting hash with ipfs dht findprovs QmQsZSD... and found that it already existed on nine other hosts. You can do this for any possible value, but well-known and/or short values will be much more likely to pop up.

There's more I could say here, but this comment is already pretty long. Hope this helps and that it isn't too pedantic.

@ProgVal

This comment has been minimized.

Copy link
Contributor

ProgVal commented Mar 27, 2018

Eventually? How much time would it generally take for it to be completely gone/overwritten/deleted after being unpinned?

I feel that if it depends on instance activity, it's a bad idea because you can't guarantee to the user that the data will be deleted in any kind of timeframe and this could be legal trouble too.

The instance can trigger a flush of its cache (or only remove your content from the cache)

If you add something to your repo, it's 'private' as long as only you know the hash. As soon as one other person knows, keeping it private relies on trusting them (and this doesn't apply if it's leaked in public, of course).

No. As for most systems with a DHT, IPFS "gossips", which allows other people in the network to get the hash of the content. See: ipfs/faq#181

Edit: I just realized this line is actually a very strong argument against using IPFS on Mastodon. Too bad :(

@ghost

This comment has been minimized.

Copy link

ghost commented Mar 27, 2018

Why have this in the first place if other hosts can simply cache the image and content? Why use IPFS if it can leak data (which may or may not be personal in the case of a direct toot)?

I feel there is little benefit and that users should be in control of whether or not they want their data handled this way.

@ProgVal

This comment has been minimized.

Copy link
Contributor

ProgVal commented Mar 27, 2018

Why have this in the first place if other hosts can simply cache the image and content?

Why have what? pinning?
Pinning makes sure that even if the original uploader disappears, someone else will still have it, even if the content hasn't been accessed in a while.

Why use IPFS if it can leak data (which may or may not be personal in the case of a direct toot)?

IPFS can deduplicate content across instances and prevents an instance from being a single point of failure.

I feel there is little benefit and that users should be in control of whether or not they want their data handled this way.

Yes. One possibility would be to only push content to IPFS if the toot is public (or unlisted?)

@petersjt014

This comment has been minimized.

Copy link

petersjt014 commented Mar 27, 2018

Ah, right. That makes sense. I was wondering why there's almost always a trickle of traffic--that'd be the gossip then.

For distributed content replication, something key-based would probably be best. There's at least one standard for this, and Zeronet is a working example (I think they implement the BEP or at least something like it).

@ghost

This comment has been minimized.

Copy link

ghost commented Mar 28, 2018

Yeah, I think pushing it for public would be good but it should still be optional to the user.

As for unlisted, I'm not sure.

@ProgVal

@tootsuite tootsuite deleted a comment from zumbatan Mar 29, 2018

@Gargron

This comment has been minimized.

Copy link
Member

Gargron commented May 17, 2018

It seems that in many ways the current implementation acts very similar to what IPFS would offer. IPFS pinning means the original instance is still on the hook for the required storage. One benefit is file-based deduplication, but this is possible without IPFS too. Both with and without IPFS it's obviously a not perfect solution because the same image can end up with a different hash based on compression levels, dimensions, or format. There is an outstanding issue for deduplicating #2317 (which is unfortunately most difficult due to the necessary data migrations on quite large tables required). IPFS and an IPFS gateway would also be an additional deployment dependency. Closing this.

@Gargron Gargron closed this May 17, 2018

@nightpool nightpool referenced this issue May 28, 2018

Closed

Feature: file storage plugin of IPFS #7657

1 of 1 task complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment