-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8-bit bytestrings vs. Unicode strings in Python: fix this once and for all #62
Comments
So I tried a simple fix of adding
to arbitrator.py at line 623, (probably should be done earlier though) Now I get this errors, which probably means, yes earlier, or maybe we have to convert to non-unicode altogether to use those processors and transporters. 2011-06-10 17:08:29,332 - Arbitrator.Transporter - ERROR - The transporter 'mosso' has failed while transporting the file '/tmp/daemon/var/www/webroot/sites/default/files/imagefield_thumbs/featured_img/WoodyHallæ°åç_0_1306282603.jpg' (action: 1). Error: 'u'\u6c0f''. and google to find this update |
one more error cropping up now as well, then I'll be quite. Exception in thread Thread-1: |
Thank you for the fix above. However after those 3 lines of code fixed the original problem, File Conveyor runs for a little while and I get this output. I am not very familiar with Python, but any help would be appreciated, Thanks. 2011-06-20 15:32:50,795 - Arbitrator - WARNING - Arbitrator is initializing. |
IMHO true fix should involve http://www.gerixsoft.com/sites/gerixsoft.com/files/fileconveyor-utf8.patch fixed the problem for me. |
Uploaded the patch you linked to to gist.github.com, in case your site goes offline: https://gist.github.com/1118004. |
The patch posted by andriy-gerasika is definitely interesting. Look at the documentation of However, the log output provided at #25 suggests this is not the right way to solve the problem: Clearly, this implies that we should be using proper Unicode strings in Python. Whatever that may be, because that's absolutely not clear. (If anything is messy in Python, it's unicode strings.) I guess a good starting point is http://docs.python.org/howto/unicode.html. |
This one just won't die huh? I did a bunch of research on this before, but I really don't remember at which point we should be intercepting this. If I remember correctly, we should be handling it at the point where we harvest the path names, before they go in the DB. I think I tried dbcon.text_factory = str and it didn't work IIRC. I can't remember why, but I think it had something to do with the point at which it was trying to massage the text. Sorry I can't be more help, I'm kinda foggy on this one. |
That's what we'd tried Jacob. I don't specifically remember if we'd implemented the |
@jacobSingh: Thanks for weighing in! :) It's clear that additional attention will be needed to solve this for once and for all. And no clear solution is available at the moment. |
I'll email you some of the files that were causing the issues we were seeing. |
I've read through Python's entire Unicode HOWTO (which seems to be authoritative). Especially the Unicode filenames section is interesting. The Combined with There's also this daunting post on Stack Overflow, which was also fairly informational. From effbot.org's "Python Unicode Objects", I discovered the need to use Further, I changed the After all, the initial call to Next, I had to ensure all strings received through Because we're now using Unicode strings everywhere in File Conveyor (as it should be), we'll need to encode it to byte strings to be able to use certain functions (like this: Finally, I was having problems with This covers Unicode issues 99% of the way, but there's still the potential problem of not knowing the encoding of the file system of the destination — for that, I just created #75. Phew. That was not easy! I hope I didn't forget to mention anything. P.S.: some more interesting functions:
|
I still get errors (using release d1c55b8): /usr/lib/python2.6/urllib.py:1222: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal and later (maybe for a different file, I am not sure): 2011-08-27 02:27:17,199 - Arbitrator - ERROR - Unhandled exception of type 'ProgrammingError' detected, arguments: '('You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.',)'. |
Did you start from scratch with File Conveyor or did you upgrade from a previous File Conveyor installation? In the latter case, did you run the upgrade script? |
I am using release d1c55b8. i want to use file converyor to sync my drupal site to rackspace cloudfiles. the server it Ubuntu 10.04 with python2.6.5 i got the following error which is same as EricB1021 Exception in thread FSMonitorThread: Then i try with python 2.5 but no luck 2011-09-16 04:06:58,628 - Arbitrator - WARNING - Fully up and running now. then i replace the dependencies/cloudfiles with the latest rackspace clouldfiles https://github.com/rackspace/python-cloudfiles as suggested @ #75. still no luck. i check the following in the python console
Any way to solve the problem? |
Same problem using Amazon CloudFront. |
I found that the problem is related to the uploaded file name. for example, if i have some cck images which have spaces in the file names, those files cannot be synced and i notice that the spaces are converted into %20 as shown in the Drupal CDN module statistics. Anyway, i try to solve the problem by converting the string from to utf-8. everything seems work fine but there are still some errors in the daemon.log. probably the issue is not completely resolved in the right way. i forked the code and commited my changes. see if it works for u. |
Thanks ykyuen, Your fork worked perfectly for me. |
The (hopefully) last Unicode problem has been fixed at #90. |
thanks @wimleers =) |
@ykyuen Please let me know if you have any more suggestions or bugs :) |
I'm actually still getting the same error
|
@woutrbe Which version of File Conveyor? Which particular file name triggers this error? |
Sorry about the late reply, I'm using the latest version here on github, installed with
I tried outputting the file name, but it seems it doesn't even get to that point. |
Can you enable |
It's not giving that much more information when I enable debug loggin for both I've printed the path in pathscanner.py, but that doesn't seem to be outputting anything. |
Wow. The error you're getting doesn't occur in File Conveyor; it occurs in Python's internals!
I'm afraid there's not much I can do there then. Some googling let me to these things:
fails
As per the latter link, I'm convinced this is the solution:
Could you please try that? If that doesn't work, can you do this on your system and report back your output (mine is inline):
In this case I think that the solution might be to do this in
|
Hi, I know this is an old post, but I'm still having this issue... I followed your steps in the post above but it seems to just throw up this error:
|
A quick googling reveals that it's essentially evil to call |
Hi @wimleers , Thank you very much for this fantastic tool. Unfortunatly, I still have a problem after applying your changes to fsmonitor_inotify.py Here's my output, hope you can help. Thank you. /var/fileconveyor/fileconveyor/filter.py:10: DeprecationWarning: the sets module is deprecated |
d'oh, the mention of Can you try that? |
FYI Exception in thread FSMonitorThread: |
Sigh :( I won't have time any time soon to dive deeper into this. Sorry. |
I haven't been able to resolve the For my server setup (Ubuntu 10.04, Python 2.6.5 and latest File Conveyor), I encountered two First issue: The daemon would throw an exception before it attempted to transfer any files. If you encounter the same problem, check your server's locale settings with
Adding Second issue: The daemon would throw an exception when it attempted to transfer a filename which contained special characters. I tried in vain to fix this but I was unsuccessful (until today I hadn't written a single line of python code). Creating a solution that worked with Rackspace Cloud Files was non-negotiable, so I set out to create an acceptable workaround for a Drupal 7 site that has over 50GB of images. I patched diff --git a/fileconveyor/arbitrator.py b/fileconveyor/arbitrator.py
index 394b4b4..6fa3e5a 100644
--- a/fileconveyor/arbitrator.py
+++ b/fileconveyor/arbitrator.py
@@ -347,6 +347,7 @@ class Arbitrator(threading.Thread):
while self.discover_queue.qsize() > 0:
# Discover queue -> pipeline queue.
(input_file, event) = self.discover_queue.get()
item = self.pipeline_queue.get_item_for_key(key=input_file)
# If the file does not yet exist in the pipeline queue, put() it.
@@ -400,6 +401,16 @@ class Arbitrator(threading.Thread):
(input_file, event) = self.filter_queue.get()
self.lock.release()
+ # Skip filenames which we know will not work with File Convyeor or the CDN module.
+ import re
+ path, filename = os.path.split(input_file)
+ regexp = re.compile(r'^[a-zA-Z0-9_ .-]+$')
+ if regexp.search(filename) is None:
+ import codecs
+ output_file = codecs.open('skipped_files.txt', 'a', 'utf8')
+ output_file.write(input_file + '\n')
+ continue
+
# The file may have already been deleted, e.g. when the file was
# moved from the pipeline list into the pipeline queue after the
# application was interrupted. When that's the case, drop the With this patch in, the daemon doesn't throw an exception and will transfer the files that it is able to transfer. The patch also logs the problematic files to |
I've seen issue #25 and gotten the most recent version with the related update, but I'm still getting similar errors still however now in arbitrator.py
I belive this is one of the problem files WoodyHallæ°åç_0.jpg
Python 2.6.5
Ubuntu 10.04.1 LTS
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
self.run()
File "arbitrator.py", line 289, in run
self.__process_db_queue()
File "arbitrator.py", line 634, in __process_db_queue
self.dbcur.execute("SELECT COUNT(*) FROM synced_files WHERE input_file=? AND server=?", (input_file, server))
ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.
The text was updated successfully, but these errors were encountered: