Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GFF2XML -> XML2GFF (invalid base64 length) #67

Closed
drake127 opened this issue Dec 1, 2020 · 5 comments
Closed

GFF2XML -> XML2GFF (invalid base64 length) #67

drake127 opened this issue Dec 1, 2020 · 5 comments

Comments

@drake127
Copy link
Contributor

drake127 commented Dec 1, 2020

Hi,

it looks like xml2gff doesn't work for XMLs that contains longer base64 as it fails because of "Invalid length for a base64-encoded string".

I checked the string itself and it's correct, just contains whitespaces from XML formatting (and newlines).

@DrMcCoy
Copy link
Member

DrMcCoy commented Dec 1, 2020

Can you give me an example file to check against?

@drake127
Copy link
Contributor Author

drake127 commented Dec 1, 2020

Sure, here you are. Only thing I did differently I used --encoding 0=cp-1250 but I don't think it should matter in this case.

It fails in countLength(Ustring) with string (please note newline and leading spaces):


      RHJvYm7saprtIHphaGFsZW7hIHBvc3RhdmEgdiBwcm9zdP1jaCBsZXNu7WNoIJph
      dGVjaCBzIG1l6GVtIHUgcGFzdS4KCg==

(url)
serialized.zip

@DrMcCoy
Copy link
Member

DrMcCoy commented Dec 1, 2020

Thanks, I'll have a look at it later in the evening

@drake127
Copy link
Contributor Author

drake127 commented Dec 1, 2020

FYI - I modified decodeBase64 to strip whitespace characters and it is working now.

Just one unrelated question though, the resulting GFF is 5 KB less than the original. When I convert it back to XML, the result is the same. Is it expected? The header looks entirely different:
image

DrMcCoy added a commit that referenced this issue Dec 1, 2020
Since we're outputting whitespace to indent and break up long base64
strings in gff2xml and we're then also skipping those when reading
them back in in xml2gff, we need to skip them as well when counting
the lengths of a base64'd string.

See issue #67.
@DrMcCoy
Copy link
Member

DrMcCoy commented Dec 1, 2020

Ah, yes, found the issue. Should be fixed with c6315f7, thanks for reporting! :)

Yeah, the file being shorter is okay. Our GFF code tries harder to consolidate the same string data (*). For example, the original files contains the string "STAMINA_MAX" multiple times, while the GFF produced by xml2gff contains the string just once, and all fields using that same string just reference this one instance.

That also explains the differences in the header, because those values there are offsets to the different sections in the GFF, one of them being the string table and another the external field data table. We're creating files that are logically identical, i.e. that contain the exact same information, not files that are byte-by-byte identical.

(*) Technically, it just throws extended field data value into a map and duplicates get consolidated that way.

@DrMcCoy DrMcCoy closed this as completed Dec 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants