Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ads.txt #201

Closed
torgo opened this issue Sep 26, 2017 · 15 comments
Closed

ads.txt #201

torgo opened this issue Sep 26, 2017 · 15 comments
Assignees

Comments

@torgo
Copy link
Member

torgo commented Sep 26, 2017

Hello TAG!

I'm requesting a TAG review of:

Further details (optional):

  • This issue has been opened up by the TAG
@torgo
Copy link
Member Author

torgo commented Sep 26, 2017

Some of the issues we have identified in our discussion at our Nice f2f:

  • This spec defines a well known, hard coded URL. There is now a standard for placing these paths within a .well-known prefix, see https://tools.ietf.org/html/rfc5785
  • The spec does not define the format using a formal syntax grammar, eg. ABNF, making it very hard to understand what would be valid examples of this format. For example, there is no specification for which whitespace characters are acceptable as separators. For examples of good grammar specifications, see https://www.w3.org/TR/tabular-data-model/
  • The spec requires that the ads.txt file is published on a 'root domain'. There is no technical definition of 'root domain' in web architecture, and sites with authority and control over an origin may reasonably not have control over the parent origin.
  • It appears possible that this document is allowing for parseable content to follow on from a comment on the same line as the comment text. This would be so unusual that we suspect that this is not actually the intent of the authors.
  • The document specifies that ads.txt should be available on HTTP and HTTPS. This is enormously concerning, especially since some sites are moving away from listening for HTTP traffic at all, and requiring the use of HTTP for any web specification should be considered contrary to the very principles of good web architecture and detrimental to the future development of the web. See the TAG finding on securing the web
  • The document contains a normative reference to w3schools regarding URL encoding, which is a site generally regarded as a poor source of information about the web, and certainly not a primary source on any subject. On this point, https://tools.ietf.org/html/rfc3986 would be the correct normative reference.
  • Google has a system called App links and we are wondering why a mechanism like that is not appropriate for this use case.

We are happy to engage with the authors, and we appreciate the importance of the problem that this is trying to solve. Making this more compatible with web architecture would be appreciated and will help the authors get better buy in from the web community.

(most of the words in this comment by @triblondon)

@torgo torgo added the Progress: pending external feedback The TAG is waiting on response to comments/questions asked by the TAG during the review label Sep 26, 2017
@slightlyoff
Copy link
Member

The author of the document responded to a private ping, noting there's an updated version of the document here.

The 1.0.1 update indicates that crawlers should follow redirects within the same CNAME entry (although the language is wolly regarding "root domain"); e.g. it allows redirects between https://example.com and http://example.com, enabling downgrade of connection security.

There appear to be additions for "SUBDOMAIN" which is a redirect type. It does not appear to be well-specified and it's unclear why redirects with an eTLD+1 policy aren't being used instead.

@tantek
Copy link

tantek commented Oct 17, 2017

@torgo re: "On this point, https://tools.ietf.org/html/rfc3986 would be the correct normative reference." why not https://url.spec.whatwg.org/ instead which I believe more and more W3C RECs are citing. E.g. https://www.w3.org/TR/webmention/#normative-references and https://www.w3.org/TR/websub/#normative-references (the latter a PR hopefully soon to be REC)

@slightlyoff
Copy link
Member

We've re-visited this at the London F2F meeting. Most of the issues remain. I'm pinging the authors via private mail.

@triblondon
Copy link

triblondon commented Feb 1, 2018

Up to date list of concerns, referencing the 1.0.1 version of the doc:

  • This spec defines a well known, hard coded URL. There is now a standard for placing these paths within a .well-known prefix, see https://tools.ietf.org/html/rfc5785
  • The spec does not define the format using a formal syntax grammar, eg. ABNF, making it very hard to understand what would be valid examples of this format. For example, there is no specification for which whitespace characters are acceptable as separators. For examples of good grammar specifications, see https://www.w3.org/TR/tabular-data-model/
  • The spec requires that the ads.txt file is published on a 'root domain'. There is no technical definition of 'root domain' in web architecture, and sites with authority and control over an origin may reasonably not have control over the parent origin.
  • The document specifies that ads.txt should be available on "HTTP and/or HTTPS". This is enormously concerning, especially since some sites are moving away from listening for HTTP traffic at all, and suggesting the use of HTTP for any web specification should be considered contrary to the very principles of good web architecture and detrimental to the future development of the web. See the TAG finding on securing the web
  • The document contains a normative reference to w3schools regarding URL encoding. W3Schools is a site which has been widely regarded as a poor source of information about the web, and certainly not a primary source on any subject. On this point, https://tools.ietf.org/html/rfc3986 or https://url.spec.whatwg.org/ would be the correct normative reference.
  • The doc indicates that crawlers should follow redirects within the same CNAME entry (although the language is woolly regarding "root domain"); e.g. it allows redirects between https://example.com and http://example.com, enabling downgrade of connection security.
  • There appear to be additions for "SUBDOMAIN" which is a redirect type. It does not appear to be well-specified and it's unclear why redirects with an eTLD+1 policy aren't being used instead.
  • Google has a system called App links and we are wondering why a mechanism like that is not appropriate for this use case.

@plinss plinss added this to the tag-telcon-2018-02-20 milestone Feb 1, 2018
@triblondon
Copy link

Alex and I have pinged IAB people and we'll follow up on a telcon

@slightlyoff
Copy link
Member

Met with George several times in February, debriefed in Tokyo. Just pinged again to understand if they plan to publish a new version which will address our concerns.

@wseltzer
Copy link

wseltzer commented Feb 5, 2019

A venue for further discussion could be the Improving Web Advertising BG which has active participation from IAB TechLab.

@cynthia cynthia removed the extra time label Feb 7, 2019
@torgo
Copy link
Member Author

torgo commented Mar 1, 2019

@wseltzer just following up on this. Does the Web Advertising BG hold regular calls? can we potentially tee up this discussion point and maybe members of the TAG could join for that session?

@wseltzer
Copy link

wseltzer commented Mar 6, 2019

@torgo yes, the group meets every 2 weeks, with upcoming calls planned for March 14 and March 28.

@plinss plinss removed this from the 2019-04-03-telcon milestone Apr 8, 2019
@plinss plinss added this to the 2019-04-17-telcon milestone Apr 8, 2019
@torgo torgo assigned alice and unassigned slightlyoff May 20, 2019
@ylafon
Copy link
Member

ylafon commented May 21, 2019

@ylafon
Copy link
Member

ylafon commented May 21, 2019

As of version 1.0.2, we notice that most comments were not addressed yet, apart from a clarification in the redirect section. In this section, codes others than 302 are allowed, but 308 is missing from the updated list. The section 5.3 would greatly benefit from a clarification of the parsing model, whitespace definition, etc...

We are still concerned about the possible "downgrade redirect" issue, as the current specification still allows redirect from https to http. In general the specification should mandate the use of https only (and MAY default to http if not available, with the trust issues associated with its use).

Also, as the document defines a document format, it would be better for it to have a proper media type definition rather than using text/plain, at worst, using the generic text/csv would be better.
Note that the RFC defining the text/csv media type also define its grammar (see comment on section 5.3) https://tools.ietf.org/html/rfc4180

@torgo
Copy link
Member Author

torgo commented May 21, 2019

@wseltzer we are thinking since we haven't made enough progress on this issue that it should be migrated over to the advertising BG. Would the BG be a good forum for discussing ads.txt and feeding back on its design? Let us know and maybe we can migrate the issue over this week.

@torgo torgo added Progress: propose closing we think it should be closed but are waiting on some feedback or consensus Progress: stalled and removed Progress: pending external feedback The TAG is waiting on response to comments/questions asked by the TAG during the review labels Jan 21, 2020
@plinss plinss removed this from the 2020-01-20-week milestone Jan 27, 2020
@cmlight
Copy link

cmlight commented Jan 29, 2020

Hi, ads.txt working group member here. Yes, it would be great to get these concerns addressed in the next ads.txt (and related specs) version update. Items I had previously written down that I'm hoping to make more technically precise include:

  • Character encoding: we see files published in various character encodings which may not be properly interpreted by all platforms. We should specify a character encoding such as UTF-8 for the file content so that validators can consistently flag issues
  • Byte-order mark headers: we see files that have non-visible byte order marks (https://en.wikipedia.org/wiki/Byte_order_mark) which can trip up parsing if not interpreted properly. We should include specifics in the spec about whether these are allowed or not
  • Line endings: the spec does not specify which byte sequences are considered line endings. We've encountered files encoded using atypical (or containing a mix of) line ending types which could trip up parsers. We should update the spec to include specifics of what byte sequences (0a; 0d0a; 0d; etc) are considered valid, parseable line endings.
  • Public suffix list specificity: the publicsuffix.org list contains two sections: an ICANN section and a private section. The ads.txt spec doesn't specify whether the private section is valid for use.
  • SUBDOMAIN= directive specificity and limitations: I'd like to make the spec provide more detail and examples about how SUBDOMAIN= directives behave and interact with each other, along with potentially defining a limit to the number of levels.
  • Security: I'd like to see if we can be more precise in the standard about how to treat HTTPS URLs, when it is permissible to fall back to HTTP, what validations the crawler should perform (e.g. SSL certificate validation), and the valid transport security protocols accepted. We should consider security risks that should be mitigated with precise rules.

I will work with @slightlyoff on this.

Stepping back from the specific recommendations in this thread, I was wondering if you have any pointers to documents that explain how to write a good spec, if such a thing exists? Also, I would like to somehow put together a compatibility testing suite that participants can use to confirm that their crawlers and parsers were implemented correctly. If you have any tips on this or examples of well-written solutions that do this, that would be great to learn from.

@torgo
Copy link
Member Author

torgo commented Mar 4, 2020

Hi @cmlight -

First of all, thanks for the visibility on some of the issues you are tackling. It seems like there is active work happening on a new spec. I think what needs to happen is that when a new spec is ready for review, someone files a new design review issue here with us.

Regarding how to write a good spec, we can provide feedback but we are not really equipped to help write the spec itself and some of the answer to that question is venue specific. One approach might be to bring this work to a venue where you might have greater opportunity to bring in expertise in spec development and expertise in related web technologies. For example, a w3c community group could be a good low-friction venue. In general, successful web specifications tend to be developed in an open environment and according to a transparent process.

@torgo torgo closed this as completed Mar 4, 2020
@torgo torgo removed Progress: propose closing we think it should be closed but are waiting on some feedback or consensus Progress: stalled labels Mar 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants