You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am operating a CMS Tier-2 center. I am contacting you because there is an issue related to throttle syntax.
Since our site subscribes to the global federation redirection of CMS experiments, we are supposed to respond to data requests from other CMS sites. This is called CMS AAA.
The problem is that when a particular user creates an enormous amount of grid jobs requesting data that is only on our site.
We used dCache as a backend and ran an xrootd daemon in the front for the federation subscription.
We set "throttle.throttle concurrency 30" in this xroot daemon setting, but we found that all of these transfer requests were delivered to dCache.
In some cases, the number rises momentarily and is blocked, but xrootd delivered the request to dCache, rather than the data transfer itself. So, it shows the concurrent jobs decrease after requesting delivery. Therefore, not only will there be tens of thousands of job requests waiting on the dCache side, but xrootd itself has tens of thousands of TCP sessions, which is a huge burden on the server.
I think the xrootd server is responsible for the actual transmission burden because I set the pss instead of redirecting. But I think it's a fatal bug to leave out of the throttle number.
Please review it.
Also, I would like to ask if it is possible to develop a queue system to pass site stability checks such as the SAM test in the throttle setting.
Regards,
The text was updated successfully, but these errors were encountered:
Yes, this is equivalent to a DOS but the problem is that you may be the
only site that has the file of interest. So, do we fail the jobs or let
them on? There are two mitigations. The one you are already using is the
throttle plugin. The other is using the acceptable share of requests.
Set the percentage of requests your AAA site is willing to serve (the
default is 100%). See
https://xrootd.slac.stanford.edu/doc/dev54/cms_config.htm#_Toc53611076
Note that this will not be effective if your site is the only site that
has the file of interest. In that case, further throttle restrictions
would be advised.
Andy
On Wed, 23 Nov 2022, Geonmo Ryu wrote:
Hello, XRootD Developers,
I am operating a CMS Tier-2 center. I am contacting you because there is an issue related to throttle syntax.
Since our site subscribes to the global federation redirection of CMS experiments, we are supposed to respond to data requests from other CMS sites. This is called CMS AAA.
The problem is that when a particular user creates an enormous amount of grid jobs requesting data that is only on our site.
We used dCache as a backend and ran an xrootd daemon in the front for the federation subscription.
We set "throttle.throttle concurrency 30" in this xroot daemon setting, but we found that all of these transfer requests were delivered to dCache.
In some cases, the number rises momentarily and is blocked, but xrootd delivered the request to dCache, rather than the data transfer itself. So, it shows the concurrent jobs decrease after requesting delivery. Therefore, not only will there be tens of thousands of job requests waiting on the dCache side, but xrootd itself has tens of thousands of TCP sessions, which is a huge burden on the server.
I think the xrootd server is responsible for the actual transmission burden because I set the pss instead of redirecting. But I think it's a fatal bug to leave out of the throttle number.
Please review it.
Also, I would like to ask if it is possible to develop a queue system to pass site stability checks such as the SAM test in the throttle setting.
Regards,
The text was updated successfully, but these errors were encountered: