-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XML-parser chokes on some usernames #205
Comments
Thanks for the report, this is indeed a rather major problem. I've reported it upstream with the osm4j library (issue #11), I hope we can collaborate to resolve this. I've also attached a small example to reproduce the problem to the upstream issue. |
Hey, I've added a test in osm4j as reported in the upstream issue and was not able to reproduce the problem. I've taken a look at OSM2World's code and it looks like your completely rewriting the file in case it has detected that it has been created by JOSM here
Is it possible that something is going wrong with the charsets during that conversion? Could you take a look at the temporary file created from the example file? |
Looks like that piece of code converts the emoji to this: |
I can still read that modified file in my test though... |
That's interesting! I'm indeed rewriting files which have been generated by JOSM because (I believe) osm4j does not have built-in support for the JOSM dialect of OSM XML with additional attributes such as For some reason, I did not initially have that on my radar as a possible cause – but I now believe that it's likely to be involved because OSM2World is able to read the problem files if I change the I'll investigate further and report the results. |
thanks!
Maybe we could improve osm4j to at least parse the files from JOSM even if it cannot currently store the additional data in its data model. Will need to find a suitable test file... I've created topobyte/osm4j#12 to track this. |
I've created a fix in 2ae6340. It writes the transformed XML file to an UTF-16 Java String and relies on Apache Commons' IOUtils to output this as correctly encoded UTF-8. As of 26e9bbd, I've also used the opportunity to replace the temporary files with an in-memory representation. I've added one unit test and did some manual testing as well. It appears to work fine now. Not the most elegant approach, but to achieve a nice long-term solution, I'd rather help improve osm4j. 🙂️ The next OSM2World build will contain the fix. |
Some usernames contain characters that make the XML parser refuse loading a file. If that happens to be in a relation, then a large portion of the map cannot be rendered. Example:
Everything by the user https://www.openstreetmap.org/user/osm-pt-account%20%f0%9f%98%8e is just causing issues.
The text was updated successfully, but these errors were encountered: