IPIP-504: `provider` query parameter as hint for HTTP Gateways #504

vasco-santos · 2025-05-26T15:13:58Z

Based on initial exploration https://github.com/vasco-santos/provider-hinted-uri/blob/main/EXPLORATION.md

SgtPooki

first read through.. makes sense and seems like it could really benefit the ecosystem. One problem I can imagine is that people start encoding dynamic IP address providers in urls and they quickly become useless, so we should probably call that out.

SgtPooki · 2025-05-29T12:25:40Z

IPIP/0504-provider-query-parameter.md

+
+The CID is the core of a Provider-Hinted URI. Clients MUST extract the CID before evaluating any hints. The format is designed to be compatible with current IPFS like URIs, while explicitly defining how to locate the CID and interpret `provider` query parameters.
+
+#### CID Extraction Rules


I feel like the below details should be linking to another spec i'm sure we have written somewhere

Mind pointing out where it lives? I do not know about this and could not find nothing when working on this

SgtPooki · 2025-05-29T12:27:24Z

IPIP/0504-provider-query-parameter.md

+#### Query Parameter: `provider`
+
+- Name: `provider`
+- Type: URI Query Parameter (repeating allowed)


why repeating parameter instead of comma-delimited?

Probably to make spec simpler to implement. Some reasons to repeat instead of comma-separating:

Future-proof in case future values may include ,?

URL-escaping: in many standard libraries. In Golang, items=apple,banana,orange gets turned into request for https://example.com/?items=apple%2Cbanana%2Corange. This means your server code needs to url-decode before splitting at ,. Going with repeated parameters avoids this complexity.

Prior art of Magnet links, which do not support comma-separated values, and magnet:?tr=udp://tracker1:80&tr=udp://tracker2:80 is how one specifies multiple trackers

SgtPooki · 2025-05-29T12:29:07Z

IPIP/0504-provider-query-parameter.md

+  - Ignore all `provider` parameters (if unsupported).
+  - Evaluate hints in order of appearance (left-to-right).
+  - Evaluate hints in parallel.
+  - Apply their own prioritization or fallback strategies. If all hints fail, clients SHOULD fall back to default discovery strategies (e.g., DHT/IPNI), if available. Or even rely on discovery strategies in parallel.


Or even rely on discovery strategies in parallel.

I feel like if there are provider hints, we SHOULD attempt to process them before IPNI/DHT to prevent any negative cascading network effort

Why a multiaddr with additional info could be a problem for DHT/IPNI? They should not try to match by multiaddr or anything like that.
Anyway, there would be no reason to remove this info before passing over to those systems if that is a concern and they do not support this

SgtPooki · 2025-05-29T12:31:51Z

IPIP/0504-provider-query-parameter.md

+  - Evaluate hints in parallel.
+  - Apply their own prioritization or fallback strategies. If all hints fail, clients SHOULD fall back to default discovery strategies (e.g., DHT/IPNI), if available. Or even rely on discovery strategies in parallel.
+
+Note that the `multiaddr` string should point to the `origin` server where given CID is provided, and not include the actual CID in the Hint multiaddr as a subdomain/path.


Are CIDs valid in multiaddr at all (besides Peer ID)? might be worth linking to multiaddr spec: https://github.com/libp2p/specs/blob/master/addressing/README.md#multiaddr-in-libp2p

I am a bit confused. CID is not encoded in the multiaddr

lidel · 2025-05-29T15:37:09Z

IPIP/0504-provider-query-parameter.md

+
+TODO
+
+How will end users benefit from this work?


Things to add here:

Empower user with ability to exfiltrate / migrate data from providers that do not announce data on Amino DHT and/or during IPNI (cid.contact) outages

Improved initial seeding: opportunistically append /p2p/peerid hint to links generated by IPFS Desktop (if publicly diallable)

@lidel I wrote some use cases that may be interesting to add here: https://github.com/vasco-santos/provider-hinted-uri/blob/main/EXPLORATION.md#use-cases

What do you think?

Yes, feel free to copy 1-3 make sense to be included here (unsure about 4, but we can discuss once its copied here).

aschmahmann · 2025-05-28T17:50:45Z

IPIP/0504-provider-query-parameter.md

+
+## Motivation
+
+Content-addressable systems, such as IPFS, allow data to be identified by the hash of its contents (CID), enabling verifiable, immutable references. However, retrieving content typically relies on side content discovery systems (e.g. DHT, IPNI), even when a client MAY know one (or more) provider of the bytes. A provider in this context is any node, peer, gateway, or service that can serve content identified by a CID.


While I understand that there's a latency improvement that can be had here by hard-coding a provider into the URL in practice when I've seen this come up in the past it's been due to people not wanting to use the "mainnet" routing systems while sort of pretending that the content is available via mainnet (e.g. a pinning service not wanting to advertise their data to the Amino DHT or IPNI, but instead have their users use URIs like ipfs://bafyfoo?provider=<the-pinning-service>. In this light this proposal seems more likely to harm than help the IPFS ecosystem.

Some examples:

URIs become ephemeral. While encoding ipfs://bafyfoo used to not be ephemeral ipfs://bafyfoo?provider=<pinning-service-that-had-the-cid-when-the-link-was-made> is now ephemeral. Yes, you could fallback to ignoring the provider but:

If the ecosystem has come to rely on them for routing then the link is just broken

If the ecosystem has come to rely on them for a performance boost then users encoding ipfs://bafyfoo?provider=<pinning-service-that-had-the-data-when-the-link-was-made> into their applications, smart contracts, etc. will now need to figure out how to update the provider part of the URI vs previously when they could just be ephemeral

This incentivizes pinning service lock-in where moving data off a given provider means latency goes up for all related links unless all the places the link has been shared are updated

Some alternatives to this approach that seem like they resolve much of the problems for users:

Invest in improving the routing system(s) for mainnet which have received really very little investment over the past several years and which could really use it

Application developers can hard-code additional routing systems into their applications. For example, if the developers for a given dApp are already hosting all their data and/or paying a pinning service to host the data saying to check that endpoint first seems fine and is inline with how many of them already operate by hard-coding a gateway endpoint provided by the pinning service they pay for storage

It'd be useful to understand why the benefits of this outweigh the associated ecosystem risks

I've seen many situations in which you want to share content addressed data with all the benefits of verification and p2p without necessarily caring about long-term persistence. If we don't formalise this in a spec, we end up pushing users to solving this in user/application space (see example), leading to fragmentation, and no ecosystem benefits from a conventional approach.

Application developers can hard-code additional routing systems into their applications. For example, if the developers for a given dApp are already hosting all their data and/or paying a pinning service to host the data saying to check that endpoint first seems fine and is inline with how many of them already operate by hard-coding a gateway endpoint provided by the pinning service they pay for storage

This would typically be done using the Delegated Routing API for which we have a lot of tooling and support in implementations. Practically, this which would involve setting a application specific endpoint somewhere, which — just like a specific provider maddr — can go down, or become stale eventually. But for as long as that delegated routing endpoint is up, it helps the app can map CIDs to provider maddrs.

It seems to me that insisting that this is the recommended to do it, over encoding it along with the CID is overly pedantic, given that the two approaches are not all that dissimilar, with the one exception that your mixing "permanent" information, i.e. the CID, with impermanent, i.e. a specific provider maddr.

URIs become ephemeral. While encoding ipfs://bafyfoo used to not be ephemeral ipfs://bafyfoo?provider= is now ephemeral.

This is the crux of the debate here; and ultimately a question of where this boundary between ephemeral and permanent should be delineated. I happen to think it maps elegantly to query parameters.

But more broadly, content routing is hard and adds a performance tax that undercuts adoption in scenarios where the benefits of verification and content-addressing are desired.

Optimising for successful retrieval by CID should be a higher level goal, and think that the broad strokes proposed here advance this goal. Moreover, it paves a path for incremental adoption of content addressing.

So my take is that that this we should just encode all of these caveats into this IPIP, e.g. strongly recommend against persistence of the provider hint in long term storage like on-chain.

Thanks for writing this @2color . This is exactly what I feel like! I wrote some use cases that can benefit directly from this https://github.com/vasco-santos/provider-hinted-uri/blob/main/EXPLORATION.md#use-cases

Note that some of them are actually depending on a follow up of this (adding tags) to the multiaddr, which for now is a different conversation, as also tags are optional ways to expand on this even further

aschmahmann · 2025-05-28T18:02:35Z

IPIP/0504-provider-query-parameter.md

+
+- Name: `provider`
+- Type: URI Query Parameter (repeating allowed)
+- Value: Multiaddr string (`?provider=multiaddr`).


Independent of my main objection to the idea of provider I'm not sure multiaddr is a great idea here. Maybe it's the best we have, but my suspicion is that importing the not-really-existent multiaddr spec (due to many years of neglect from the libp2p side of things here) into the gateway spec is a pretty unfortunate dependency.

In Go we saw so many badly written parsers of multiaddrs that there's an in progress regex-like library to try to make them easier to work with.

whoa that parser library is neat! would it be possible to just retrospecify how that library does it and call THAT the spec, or at least a starting point for it? i'm not sure this IPIP needs to be blocked on that specification process, but it seems as good a time/occasion as any to finally nail down multiaddr (and make it easier for a provably interoperable parser library to be made for other languages!)

Not a fan of pulling in libp2p concepts into gateway spec here (peerid, multiaddr).
HTTP gateway spec should not come with hard lock-in into libp2p (RASL does not).

What if in the future everything will be HTTP providers – are we cosplaying libp2p with multiaddrs and fake PeerIDs still?

Maybe to do bare minimum to future proof this, state this field is an opaque string that should be parsed as Multiaddrs (if starting with /) OR URIs (everything else). Keeping the door open for a sane implementation of webseeds.

I find surprising that multiaddr is assumed to be a concept of libp2p. I see space for multiaddr usage outside of libp2p as any other multiformat.

Anyway, after syncing with @bumblefudge last week, I totally agree that we should also accept string provider, while keeping multiaddr if starting by /.

lidel · 2025-06-06T17:55:26Z

IPIP/0504-provider-query-parameter.md

+  - Ignore all `provider` parameters (if unsupported).
+  - Evaluate hints in order of appearance (left-to-right).
+  - Evaluate hints in parallel.
+  - Apply their own prioritization or fallback strategies. If all hints fail, clients SHOULD fall back to default discovery strategies (e.g., DHT/IPNI), if available. Or even rely on discovery strategies in parallel.


If all hints fail, clients SHOULD fall back to default discovery strategies (e.g., DHT/IPNI), if available.

The idea that hints should be executed first, and then regular routing system should be used as a fallback only when all hints failed, is in my opinion, an antipattern:

⚠️ it only benefits users in short term

⚠️ creates latency and cost center over time

I am having hard time seeing "User benefit" here, without actually writing in the spec that ?provider SHOULD be the fallback evaluated AFTER regular routing system.

Tried to clarify my position with example below.

On why acting on ?provider= should never supersede regular routing, and why this has to be a fallback thing

If ?provider is used exclusively first, that looks good only short term:

I made a viral content and pinned it to a storage provider (like https://storacha.network, https://pinata.cloud, https://filebase.com) and paid for a month of storage.

I shared this content as https://gateway.example.com/ipfs/cid?provider=https://storage.provider or ipfs:///cid?provider=https://storage.provider, it became popular, people put this in blogpost, chats, wikipedia put it in citation references, web crawlers consumed it and LLMs now have this URL with this hardcoded hint

Everyone is happy, because routing system is skipped, third-party gateways or clients instantly fetch data from "webseeed" at https://storage.provider, don't need to ask DHT/IPNI.

Then, I stopped paying storage provider and https://storage.provider/ipfs/cid started returning HTTP Error 404 Not Found because they will not host my content for free.

How does IPFS experience look like now and long term ?

(A) If ?provider= is evaluated FIRST, and the rest of routing is fallback

🔴 Every time user opens hardcoded address with ?provider=https://storage.provider there is a delay, because the request to webseed needs to fail before regular routing system engages

🔴 This effectively put users hostage: someone will have to keep paying the specific storage provider to have decreased latency. This also means nobody will have incentive to work on improving routing systems.

🔴 But what if the hardcoded storage provider goes out of business? The latency is there forever.

This is user-hostile and is why ?provider can't be implemented in a way that uses regular routing system as a fallback if all URL hints fail.

(B) If ?provider= is evaluated in PARALLEL to regular routing system

It is better for user-perceived latency, but still not the best for either party:

The user wastes time on a request to https://storage.provider/ipfs/cid however they asked routing in parallel and eventually found a provider. In this scenario

🟠 user did not pay the cost of extra latency, but sent parallel HTTP request that always fails.

🟠 storage provider will still get requests for CID they no longer pin.. forever.

(C) If ?provider= is evaluated as a fallback (AFTER regular routing system)

I my mind this was always the only sensible way of introducing these ad-hoc routing hints.
Use them as a fallback (or after some delay), when regular routing system does not provide better provider.

🟢 user does not send unnecessary requests if there are other providers

🟢 storage provider is not receiving requests for content they don't host anymore

This is good to be raised. While I tend to position myself in a different perspective here, I understand that this may have multiple correct ways.

What feels more accurate to me is to allow clients to decide implementation details, or to allow configuration of this, rather than the spec mandate a certain implementation behaviour. The current proposal framing addresses this by stating:

Clients MAY:

...

Apply their own prioritization or fallback strategies. If all hints fail, clients SHOULD fall back to default discovery strategies (e.g., DHT/IPNI), if available. Or even rely on discovery strategies in parallel.

In other words, this frames allows a client to implement A, B, C, or even allow configuration to let user choose a path that is not the default

What feels more accurate to me is to allow clients to decide implementation details, or to allow configuration of this, rather than the spec mandate a certain implementation behaviour.

Reality is, leaving implementation details of "evaluation order" out of spec IS effectively leading to (A): even if someone initially implements (C), over time, motivated people will eliminate "annoying latency" by switching client code from (C) to (B), and then to "remove unnecessary lookups" switch from (B) or (A).

Next steps:

the IPIP should clearly address risks that come with evaluation order choice, and inform reader about the externalized ecosystem cost listed above ( 🟠 🔴 ).

Do not pretend downsides do not exist, but document them and make a compelling case in "User benefit" the benefits outweigh the identified cost for IPFS ecosystem long term.

the Gateway spec should not ignore the topic of ?provider= evaluation order, but address it in the spec head-on:

a "SHOULD" recommendation for (C) with rationale of 🟢

a "MAY" for (B) with 🟠 cost documented

a "MUST NOT" for (A) with 🔴 risks documented as reason why it is considered harmful

we can't block people from YOLOing, but with this, whatever implementation or part of ecosystem want to shut themselves and their users in the foot long erm, we can always point at spec they ignored

SgtPooki · 2025-06-10T15:03:21Z

IPIP/0504-provider-query-parameter.md

+### Security
+
+While guiding client-side resolution, there are no relevant security considerations to have. However, there may be privacy implications if these hints are forwarded to the servers under certain circumstances. The semantics of hint placement influence visibility and use:
+
+- If the `provider` parameter is included in the **query** (`?...`), it MAY be communicated to the server depending on the client parsing the parameter.
+- If the `provider` is encoded as a **fragment** (`#...`), it is only accessible to the client (browsers do not send fragments to the server).
+
+This distinction allows URI publishers to tailor behavior:
+
+- **Client-only mode:** Use a fragment (`#provider=...`) to ensure the server remains unaware of hint data. This is useful for privacy-preserving client apps or when hints are intended to guide only the client.
+- **Server-assisted mode:** Use query parameters (`?provider=...`) to allow the server to parse and act on provider hints. This may enable proxy behavior, similar to existing IPFS gateways like `ipfs.io` or `dweb.link`.
+
+Publishers of such URIs should consider the **security profile** and **trust assumptions** of their environment when deciding how to encode hints.
+
+This flexibility supports a spectrum of use cases—from fully local client-side fetch strategies to cooperative client-server resolution pipelines.


Let's call out client security/privacy concerns here as well. i.e. If malicious or nosy tracking entity (nft marketplace?) starts encoding provider hints into all their URLS, every single client can be easily tracked. Another point for not prioritizing provider hints over routing system as called out by @lidel.

lidel · 2025-06-17T17:45:55Z

IPIP/0504-provider-query-parameter.md

+- **Reduced Latency on First Fetch**
+  - By embedding direct provider hints, clients can skip discovery lookups and go straight to fetching bytes—significantly lowering time to first byte, especially in cold-start scenarios.
+-	**Faster Initial Seeding**
+  - Clients like IPFS Desktop can opportunistically add hints to links they generate (e.g., using their peer ID or trusted gateway), enabling faster and more deterministic bootstrapping when others use those links.


💭 alternative approach for this use case is for IPFS Desktop to do manual provide on Amino DHT of the CID that is being shared. This would not require hardcoding anything in shared URL.

ipip: provider query parameter

c5e690a

vasco-santos force-pushed the ipip/provider-query-parameter branch from 2d56874 to c5e690a Compare May 26, 2025 15:26

vasco-santos mentioned this pull request May 26, 2025

feat: support provider query parameter ipfs/helia-verified-fetch#242

Open

3 tasks

SgtPooki reviewed May 29, 2025

View reviewed changes

lidel changed the title ~~ipip: provider query parameter~~ IPIP-504: provider query parameter as hint for HTTP Gateways May 29, 2025

lidel reviewed May 29, 2025

View reviewed changes

lidel mentioned this pull request May 29, 2025

chore: add tag code multiformats/multicodec#380

Draft

aschmahmann reviewed May 29, 2025

View reviewed changes

lidel reviewed Jun 6, 2025

View reviewed changes

chore: add user benefit section and minimal additions for other sections

25ba9b2

SgtPooki reviewed Jun 10, 2025

View reviewed changes

lidel reviewed Jun 17, 2025

View reviewed changes


		The CID is the core of a Provider-Hinted URI. Clients MUST extract the CID before evaluating any hints. The format is designed to be compatible with current IPFS like URIs, while explicitly defining how to locate the CID and interpret `provider` query parameters.

		#### CID Extraction Rules


		## Motivation

		Content-addressable systems, such as IPFS, allow data to be identified by the hash of its contents (CID), enabling verifiable, immutable references. However, retrieving content typically relies on side content discovery systems (e.g. DHT, IPNI), even when a client MAY know one (or more) provider of the bytes. A provider in this context is any node, peer, gateway, or service that can serve content identified by a CID.

IPIP-504: provider query parameter as hint for HTTP Gateways #504

Are you sure you want to change the base?

IPIP-504: provider query parameter as hint for HTTP Gateways #504

Uh oh!

Conversation

vasco-santos commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SgtPooki left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lidel May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

2color Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lidel Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lidel Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

On why acting on ?provider= should never supersede regular routing, and why this has to be a fallback thing

(A) If ?provider= is evaluated FIRST, and the rest of routing is fallback

(B) If ?provider= is evaluated in PARALLEL to regular routing system

(C) If ?provider= is evaluated as a fallback (AFTER regular routing system)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lidel Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SgtPooki Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

IPIP-504: `provider` query parameter as hint for HTTP Gateways #504

IPIP-504: `provider` query parameter as hint for HTTP Gateways #504

vasco-santos commented May 26, 2025 •

edited

Loading

lidel May 29, 2025 •

edited

Loading

2color Jun 4, 2025 •

edited

Loading

lidel Jun 6, 2025 •

edited

Loading

lidel Jun 6, 2025 •

edited

Loading

On why acting on `?provider=` should never supersede regular routing, and why this has to be a fallback thing

(A) If `?provider=` is evaluated FIRST, and the rest of routing is fallback

(B) If `?provider=` is evaluated in PARALLEL to regular routing system

(C) If `?provider=` is evaluated as a fallback (AFTER regular routing system)

lidel Jun 17, 2025 •

edited

Loading

SgtPooki Jun 10, 2025 •

edited

Loading