Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Projects with large numbers of file and long filenames cannot be pushed or pulled #25

Closed
jjr-rh opened this issue Jan 16, 2013 · 13 comments

Comments

@jjr-rh
Copy link

jjr-rh commented Jan 16, 2013

I maintain the Fedora Installation Guide and Installation Quick Start Guide, and last year we had to move translation to another platform because of insurmountable difficulties with Transifex.

There are two problems:

  1. Transifex can't seem to handle the long filenames that the sheer size of the Installation Guide necessitates, so we would have to maintain some kind of mapping between the filename and the Transifex module names. This would require a massive amount of manual handling.

One example of a long filename in the guide: Specialized_Storage_Devices_common-variablelist-1.xml

  1. Pushing and pulling the 800+ small files that comprise the Installation Guide across the 20+ languages that we have partial translations for is prohibitive as the connection keeps failing. Only 368 files are currently in the Transifex project.

https://fedora.transifex.com/projects/p/fedora-install-guide/

Can the limitations on filename length and the number of files be extended so that Transifex can accept a project of this size?

@mpessas
Copy link
Contributor

mpessas commented Jan 16, 2013

Hi,

We can do the first (increasing the filename length). What are your requirements?

Regarding the second issue, we have a limit on the number of API requests performed within an hour. Had you hit that limit? Or was there a different issue, e.g. unresponsive servers?

So, email us at support at transifex dot com and let us know.

Apostolis

@glezos
Copy link
Contributor

glezos commented Jan 25, 2013

Kind ping to this.

@mpessas
Copy link
Contributor

mpessas commented Jan 31, 2013

Ping? Any updates here?

@pmkovar
Copy link

pmkovar commented Feb 4, 2013

Tried to push the Fedora IG source files with tx push -s --skip (used --skip to ignore non-existing files present in the config file). The push finished at around 5 PM CEST.

For reference, the config file is here:

http://git.fedorahosted.org/cgit/docs/install-guide.git/tree/.tx/config?h=f18

The source POT files are here:

http://git.fedorahosted.org/cgit/docs/install-guide.git/tree/pot?h=f18

The push seemed to be successful (seemed to got only errors on non-existing files), but this probably needs more testing by others.

The list of updated IG resources (seems to contain the newly created resources):

https://fedora.transifex.com/projects/p/fedora/language/en/?project=2177

@pmkovar
Copy link

pmkovar commented Feb 4, 2013

Tx devs, dglezos mentioned elsewhere that the tx client could feature a command which would find any .po/.pot
files with no entries in the config file etc, and/or auto-create config file entries based on the PO/POT file location in the directory structure.

Given the huge number of source POT files in the IG project, we really seem to need such a feature to make the project reasonably manageable for IG maintainers.

Do you want me to file a separate ticket for that?

@glezos
Copy link
Contributor

glezos commented Feb 4, 2013

Please do open a new ticket, thank you.

Since the Transifex Client is an open-source project, it'd be great to get someone with Python skills to provide a patch for this. We would be happy to help, review and make it live..

@jjr-rh
Copy link
Author

jjr-rh commented Feb 5, 2013

Thanks for testing the push, Petr, and I'm glad you've had such success. It does indeed seem to have worked in that based on that 737 figure, all the files in the config file are likely there, minus the files newly created for F18. Files deleted for F18, which would be those you reported weren't found, have remained where they were in the Transifex project.

In contrast, I've been trying to push for the last two days but the following files were being rejected and stopped the push:

  • files with filenames longer than 50 characters
  • files with no strings to translate (namely files comprising only xi:include tags)

After initially pushing from the f18 branch, I switched to f17 because the discrepancies between the pot files and the config file were stopping the push even more. (These pushes are just tests so we can address f18 later.)

So I had to manually remove the entries for all of these files from the config file. Because I don't have a bird's-eye view of the files that have no strings to translate, I had to restart the push after each was encountered, hence the delay.

I did not encounter either of these problems last year prior to the shift to Zanata. Although we suspected the long filenames were a problem, I don't recall the Transifex client specifically stating this as it does now. Perhaps the code at Transifex's end has been updated.

The f18 config file Petr used is the same as the one I've been using for f17, and doesn't include the new files in the f18 branch. So I'm unsure how he managed to push all the files, particularly those whose entries I've had to remove from my config file. Is there any more light you can shed on this, Petr?

Because Petr succeeded, I'm hesitant to state that these two problems need to be fixed. If they do, however, then the filename character limit could be increased to 75 or 100 or more to avoid stopping the push, as the longest Installation Guide filename is 58. Transifex would also have to accept or ignore pot files that contain no translatable strings (which behaviour is implemented would depend on whether the Fedora Docs Project requires the number of files in a Transifex project to match the number in the git repo).

I did encounter a timeout though. 121 files from the end (out of 777), the following message appeared:

Exception: Remote server replied: Connection timed out

I assume this isn't a matter of unresponsive servers because I had already pushed so many files. If there are a limit on API requests performed within an hour, what is that limit? Should it be cross-referenced with the number of files I am pushing? I see on http://help.transifex.com/features/client/ that we are recommended to use --resource or --language instead when dealing with large numbers of files. Perhaps scripting a series of individual pushes for each language is the best approach. But then again, Petr doesn't seem to have encountered this.

Plus, Petr has raised the other key problem: how to update the config file for a new release. Petr, could you let me know when you create that issue and I'll add my thoughts to it? Or I can go ahead and create it myself.

@mpessas
Copy link
Contributor

mpessas commented Feb 5, 2013

Hi,

The limit on the slug was a bug with the API code. Uploads from the
web interface would work (the limit is 200 characters and we are about
to increase it further). This has been fixed and it will be pushed
later today,

Regarding the issue with empty resources, that is expected behavior;
Transifex does not create resources, if there is no translatable
content in the file.

Keep in mind there is the --skip option, so that the client will
skip over any errors it encounters, when it pushes files.

Regarding the timeout, could you send some more information? Do you
remember what time it was more or less, so that we can check the logs?

The limit on the API right now is 3600requests/hour IIRC. But, if it
is too low, we could bump it up.

Lastly, regarding the update of the configuration file, we could have
a set --update-local option that tries to update the .tx/config
file with new files in the directory. But let's track that in the new
ticket.

@pmkovar
Copy link

pmkovar commented Feb 5, 2013

As mpessas points out, I ran tx push with the --skip option, so that's why missing files didn't stop the push. Also, I only pushed the source POT files (the -s option).

Today, I successfully pushed a filename longer than 50 characters:

tx push -r fedora-install-guide.Adding_Partitions-section-2-ilist-1-litem-2a -s
Pushing translations for resource fedora-install-guide.Adding_Partitions-section-2-ilist-1-litem-2a:
Pushing source file (pot/Adding_Partitions-section-2-itemizedlist-1-listitem-2a.pot)
Done.

So this seems to work for me. Jack, could it be that you are using an old version of the tx client, maybe?

@pmkovar
Copy link

pmkovar commented Feb 5, 2013

I created a new ticket to track the feature request re: update config file entries:

#27

Thanks!

@jjr-rh
Copy link
Author

jjr-rh commented Feb 6, 2013

I was indeed using an outdated version: 0.7. But 0.8 is not in the Fedora repos (http://koji.fedoraproject.org/koji/packageinfo?packageID=11647), so I mistakenly assumed that I was up-to-date until I checked the Transifex documentation site. Is this absence expected?

Hopefully this means that the slug limit problem is indeed moot.

I've installed 0.8 from the Python Package Index and ran tx push -s --skip (thanks for the tip about --skip, although fortunately I was already running -s). But for some reason it's much slower and more memory intensive this time, so I'll need to run it overnight.

Regarding --skip though, if this is required to skip both files that don't exist and files with no translatable content, I would suggest that the latter be default behaviour rather than requiring an option. As far as I can see, there is no utility to being notified about this each time, at least there isn't for this project. If this needs to be known, perhaps a list of these files could be reported after the push. However, just disregard this if there are benefits I'm unaware of.

mpessas, the timeout occurred at around 15:30 Brisbane time, which is UTC/GMT +10 hours. Perhaps this is moot now given the outdated client situation though, at least until the result of the next push is known.

I'll update tomorrow with that result.

@jjr-rh
Copy link
Author

jjr-rh commented Feb 8, 2013

The push was successful. All pot files in the config file were pushed, barring those excluded by --skip.

I then pushed the translations successfully too, also using --skip.

So this issue appears to be resolved assuming future pushes are this consistent, which is great. Thank you both for your help. The next hurdle will be issue #27

@mpessas
Copy link
Contributor

mpessas commented Feb 11, 2013

Great, closing this ticket then.

@mpessas mpessas closed this as completed Feb 11, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants