Skip to content
This repository has been archived by the owner on Feb 20, 2024. It is now read-only.

Avoiding the GIL #8

Closed
rogerdowning opened this issue Feb 3, 2014 · 8 comments
Closed

Avoiding the GIL #8

rogerdowning opened this issue Feb 3, 2014 · 8 comments

Comments

@rogerdowning
Copy link

Hi,
STFC in the UK are moving to using XRoot to access their Castor system, replacing RFIO. I have a service which interfaces with Castor for deposition of experimental data into an archive. The service is driven by a multi-threaded TCPServer written in Python, so I was pleased to find your bindings.
The service receives large numbers of small files and concatenates them into large files before sending them to Castor for storage on tape. On retrieval from tape, we perform a stager_get and then we run multiple retrieval jobs copying the large files back to disk, and we serve files to the client from there. We do this to attempt to avoid long delays waiting for files to be staged from tape, and it works well for us.
I modified the backend to use these bindings for copying the large files to Castor (previously it was shelling out to rfcp), but I found that when I called FileSystem.copy() on a large file it would hang the whole process until the copy finished. I assume this is because the copy() is implemented in C and therefore is not subject to the timeslicing done by the Python interpreter (2.6.6)?
I'm aware of the CopyProcess() functionality you provide, but would that also pause until all the jobs are complete?
If I were to perform a FileSystem.copy() asynchronously with a callback, would that allow other threads of execution to carry on in the meantime?
I could use a loop and File.write() but since we're not concerned with writing or reading portions of the files it would seem preferable to just deal with a put/get style of operation.
I have sorted this for the time being by just shelling out to xrdcp for the copy to Castor, but I would really like to use these bindings. Is there any strategy I can adopt that would circumvent the locking I think I see?

Thanks in advance,

Roger Downing

@jlsalmon
Copy link
Contributor

jlsalmon commented Feb 3, 2014

Hi Roger,

I don't believe that the new XRootD client (upon which pyxrootd is based) currently supports asynchronous copy jobs (@ljanyst please correct me if I'm wrong?) hence why pyxrootd doesn't support it either and will block with both FileSystem.copy() and CopyProcess.

I know threading in Python is a bit of a nightmare, but (tentative suggestion) you could try FileSystem.copy() in a separate "thread"?

Cheers,
Justin

@rogerdowning
Copy link
Author

Hi there,
Thanks for responding! Unfortunately, the FileSystem.copy() already runs in its own thread, but locks the process because the Python 2.x interpreter won't interrupt it :-( For now, I'm OK with shelling out to xrdcp for the parallel copy. I hope in the future to support direct streaming of data to and from Castor because we're seeing high contention on the RAID array where the data lands from Castor ( 800 MB/s inbound tends to kill outbound performance!), and this will involve moving to File.write() ops which should work better because the write loops can be interleaved.

Cheers,

Roger Downing

STFC Daresbury Laboratory,
Keckwick Lane,
Warrington
WA4 4AD
UK

tel: +44 1925 603937


From: Justin Lewis Salmon [notifications@github.com]
Sent: 03 February 2014 22:07
To: xrootd/xrootd-python
Cc: Downing, Roger (STFC,DL,SC)
Subject: Re: [xrootd-python] Avoiding the GIL (#8)

Hi Roger,

I don't believe that the new XRootD client (upon which pyxrootd is based) currently supports asynchronous copy jobs (@ljanysthttps://github.com/ljanyst please correct me if I'm wrong?) hence why pyxrootd doesn't support it either and will block with both FileSystem.copy() and CopyProcess.

I know threading in Python is a bit of a nightmare, but (tentative suggestion) you could try FileSystem.copy() in a separate "thread"?

Cheers,
Justin

Reply to this email directly or view it on GitHubhttps://github.com//issues/8#issuecomment-34006176.

Scanned by iCritical.

@bbockelm
Copy link

bbockelm commented Feb 4, 2014

@jussy - I think you want to look at this:

http://docs.python.org/2/c-api/init.html#threads

You want to wrap the copy job invocation with Py_BEGIN_ALLOW_THREADS / Py_END_ALLOW_THREADS.

Otherwise, FileSystem.copy() holds the global interpreter lock and no other python threads can run.

@ljanyst
Copy link
Contributor

ljanyst commented Feb 4, 2014

@jussy @bbockelm is right, do you have time fix it or should I?

@jlsalmon
Copy link
Contributor

jlsalmon commented Feb 4, 2014

Ok, I misunderstood the problem. I am using those macros for the async stuff in File/FileSystem but didn't think to use them here. @bbockelm thanks for the correct suggestion.

@ljanyst I will do it, I just about have time :)

@ljanyst
Copy link
Contributor

ljanyst commented Feb 4, 2014

Great, thanks!

@rogerdowning
Copy link
Author

This is brilliant, thanks so much guys!

Roger Downing

STFC Daresbury Laboratory,
Keckwick Lane,
Warrington
WA4 4AD
UK

tel: +44 1925 603937


From: Lukasz Janyst [notifications@github.com]
Sent: 04 February 2014 08:41
To: xrootd/xrootd-python
Cc: Downing, Roger (STFC,DL,SC)
Subject: Re: [xrootd-python] Avoiding the GIL (#8)

Great, thanks!

Reply to this email directly or view it on GitHubhttps://github.com//issues/8#issuecomment-34039951.

Scanned by iCritical.

@jlsalmon
Copy link
Contributor

jlsalmon commented Feb 4, 2014

@rogerdowning This is now fixed in HEAD. Thanks for reporting!

You can get the RPMs from TeamCity at:
https://teamcity-dss.cern.ch:8443/viewType.html?buildTypeId=bt80

Cheers,
Justin

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants