Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Token format broken by incompatible characters in comment (such as &) #107

barneym opened this Issue May 25, 2011 · 10 comments


None yet
4 participants

barneym commented May 25, 2011

Ran into an issue where there was an &(ampersand) in the method summary that was not getting escaped/translated and was breaking NSXMLParser when it went back to read the Token.xml files. I suspect this will happen with any XML restricted characters.

I just started using AppleDoc about an hour ago so can't provide much more information on where the problem is occurring or where to fix it, but expect it could bite people with a large amount of existing doc-comments switching from another tool.


jgavris commented May 25, 2011

indeed, i had to deal with this as well. if there's an option that makes the ampersands compatible or something that'd be cool.


tomaz commented May 25, 2011

This is probably related to docsetutil crashing when indexing. You mean text like this & that?

After implementing full Markdown support, I had issues with docsetutil not accepting HTML escaped chars, – for example. Seemed like docsetutil doesn't handle these properly (complains that symbols are missing or something like that), so I changed abstract to use plain text instead of HTML. Although this required some cleanup code by itself, it was less work and as Xcode quick help doesn't respect any formatting, seemed reasonable. That being said, it leaves door open for new issues related to it which will emerge with different usages. But on the other hand, these should be very simple to fix once identified :)


tomaz commented May 25, 2011

Oh, about reporting: if you get a crash (Oops message), appledoc will spit out current stack frame which will help pinpoint the method where it occured, so posting that will help.

Generally, you can use --verbose cmd line switch to get more messages in the output, it requires a numeric value in range 1-6 with greater value maning more verbose. Using 6 will be reeeeaaaaly verbose, but it provides a lot of clues for debugging. In such case, run the tool and post just the relevant part(s) as the whole output will probably be too much. If you're having troubles with specific files, run just those. Don't use verbose 4 or greater for normal use though as it will significantly slow down runtime! There is some quick help available on http://tomaz.github.com/appledoc (there's easy to miss menu in the top/right corner which gives you some commenting and cmd line settings examples).


tomaz commented May 27, 2011

Hm, upon testing I can confirm it's docsetutil that crashes. Although it would be very simple converting all free standing &'s to &, this would look weird in Xcode quick help (which is the only place that uses this docs to my knowledge). I need to do some more testing, but probably the quickest solution would be converting & to and...

barneym commented May 28, 2011

My understanding is that Apple's own documents go through an intermediary XML state. It would seem like it would properly decode the & back to the correct character in display. Is this not the case?

I do agree that converting & to and is the quickest solution, but I would keep it an open issue because the first time someone writes ..."The reference variable &myVar is used to...", well, let's just say it's a "no win" scenario. :P

Thanks again for digging into this.


tomaz commented May 29, 2011

Sure, to get docsetutil indexing working, you need to supply it xml files describing the "folder" structure (nodes.xml) and quick help descriptions (tokens*.xml). Unless Apple generates index directly, they also go through docsetutil, so they also generate intermediate nodes and tokens xmls.

I didn't work on this in detail yet, quick testing showed that & is accepted, but didn't install docset yet to see what happens in Xcode. Didn't try switching back to html description mode in tokens files. There were no such issues reported until then (although that may only indicate no one used such combinations).

I completely agree on my quick suggestion being limited in the best way...


tomaz commented Jun 2, 2011

Also see #114 for update on related issue.


tomaz commented Jun 9, 2011

I closed #114 as it's related and will be addressed with the same patch, so here's excerption from that thread: basically it's docsetutil that crashes whenever it finds & or &symbol; (like &asterisk; or ‐ and similar) in abstract description. I've filed bug report to apple (rdar://9574081) as I can't do much with it except changing standalone & to and and converting all possible HTML symbols to their character representation. However both would require substantial effort, and I guess many more new issues when encountering just that one HTML symbol not yet covered... Or shortly: maintenance nightmare :)

What I will do for the moment is letting appledoc continue with indexing remaining files even though docsetutil crashes on one. This will at least get documentation ready as much as possible. What will be missing from it is quick help integration with all symbols in the offending file from the object containing offending symbol. Or perhaps simply removing all these from abstracts (this would only affect quick help, full documentation in document viewer, will be fine). Then wait to see what Apple will say and act accordingly.

Will keep the issue open for now, expect mentioned "fix" shortly.

tomaz added a commit that referenced this issue Jun 9, 2011

Addressing `&` and `&token;` issues when indexing documentation set, …
…partially easing #107.

Note: unfortunately it wasn't possible to continue indexing remaining files after encountering an error - docsetutil takes path to docset bundle, not individual files. So all symbols in the offending file AND all symbols in subsequent files will not be indexed.

Also note that appledoc will log a warning and hence exit with code 1 in case it gets these errors! You'll probably want to use `--exit-threshold 2` or more if you're running from Xcode build script.

For the moment, the only "workaround" is to stay clear of using these symbols in your own documentation. If you're using third party libraries or frameworks containing these symbols and only running them through appledoc to get similar looking docs encountering issues, you can only pass it through `--create-html` phase. If you want to have it integrated in Xcode, then go ahead and experiment to see how much of it is usable...

I'm getting this issue too, with method signatures containing pointer-address ampersands, like withFormat:&format;. Tomaz, you mention "What I will do for the moment is letting appledoc continue with indexing remaining files". Is that what's happening when I read this warning?

WARN | docsetutil failed to index the documentation set, continuing with what was indexed...

Would it also be an option to wait for that error and only replace the HTML characters in that file and re-process it? Could you maybe point to the class & line where you use the docsetutil, then I can try to nudge at it a bit and see how it reacts.


tomaz commented Jun 27, 2011

Yes, that's it. At first I wanted to implement something like you propose, but then realized docsetutil indexing runs over the whole docset (or more precise over tokens.xml files which contain extracted documentation - run with --keep-intermediate-files to inspect these, generated inside docset folders), it's not invoked on individual files (which kind of makes sense as it should generate one index file). So when it fails, anything beyond that point doesn't get indexed.

The whole of docset handling is performed inside GBDocSetOutputGenerator class, indexing is handled by -processTokensXml: which generates tokens.xml files and -indexDocSet: which invokes docsetutil over it.

To actually get the file, you would need to scan the error string returned from docsetutil (i.e. inside runCommand:arguments:block: block), fix the file and re-start the indexing again. Make sure you don't end with infinite loop, failing on the same file again and again :)

@tomaz tomaz closed this in 7d08ea7 Dec 15, 2011

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment