Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xrdcl-http stating wrong path #1394

Closed
pmusset opened this issue Feb 4, 2021 · 16 comments
Closed

xrdcl-http stating wrong path #1394

pmusset opened this issue Feb 4, 2021 · 16 comments
Assignees

Comments

@pmusset
Copy link

pmusset commented Feb 4, 2021

I am trying to use xrdcl-http with xcache on xrootd 5.1-rc6 and he seems that when it is trying to access to a file at the origin, it is putting the filepath twice. For example, if I try to access https://origin.in2p3.fr:2880/path/of/the/file, it seems like it is trying to access https:/origin.in2p3.fr:2880/path/of/the/file/path/of/the/file

[2021-02-04 13:55:16.526021 +0000][Debug  ][XrdClHttp         ] HttpFileSystemPlugIn constructed with URL: https://p0@ccdcalitest10.in2p3.fr:2880//pnfs/in2p3.fr/data/escape/cc_in2p3_dcache/NESSIE/35/6b/test_iorandom_768MB_8.out.
[2021-02-04 13:55:16.526095 +0000][Debug  ][XrdClHttp         ] HttpFileSystemPlugIn::Stat - path = https://ccdcalitest10.in2p3.fr:2880//pnfs/in2p3.fr/data/escape/cc_in2p3_dcache/NESSIE/35/6b/test_iorandom_768MB_8.out/pnfs/in2p3.fr/data/escape/cc_in2p3_dcache/NESSIE/35/6b/test_iorandom_768MB_8.out, timeout = 0
[2021-02-04 13:55:16.654836 +0000][Error  ][XrdClHttp         ] Stat failed: [ERROR] Internal error: invalid cross-device link: Result HTTP 404 : File not found  after 3 attempts
@wyang007
Copy link
Member

wyang007 commented Feb 5, 2021

I can't repeat this error. Is this from the xrootd log? Can I take a look at your xcache config file, and the script you used to starte the xcache?

@pmusset
Copy link
Author

pmusset commented Feb 9, 2021

Yes it is from the xrootd logs here
logtest.txt

I use this command when I see this problem

curl  --capath /etc/grid-security/certificates/ --cacert $X509_USER_PROXY --cert $X509_USER_PROXY --output /dev/null https://ccosfip00282.in2p3.fr:1094//https://ccdcalitest10.in2p3.fr:2880/pnfs/in2p3.fr/data/escape/cc_in2p3_dcache/NESSIE/35/6b/test_iorandom_768MB_8.out

I get the problem only when the cache is empty. When the file was fetch through xroot and is already present on the cache and I use the same commend as above it works.

Here are the xcache and xrdcl http config files
xcache-config.cfg.txt
xrdcl-http-plugin.conf.txt

I am launching xcache with xrootd -d -c /etc/xrootd/xcache-config.cfg -n xcache from a container built with this dockerfile

@wyang007
Copy link
Member

wyang007 commented Feb 9, 2021

Thanks for more info. I was having difficulty to have my log showing the "[XrdClHttp ] HttpFileSystemPlugIn...", until I realized I was using root://xcache//https://data_source. Those info will show up in the log if I use https://xcache//https//data_source.

This is a known problem in xrdcl-http. It has been fixed (xrootd/xrdcl-http@323be8f#diff-5c4ea2f2572ee031fc0af08f4074c2d2c7c8aeb39751d20efacedc20b5a47e3dL152), and is now committed to the xrdcl-http.

For a while I was a bit hesitate to commit the change. There are a few other places like this in HttpFileSysmtePlugIn.cc (and I don't understand the original author's intention, some said it was there to handle relative paths).

@pmusset
Copy link
Author

pmusset commented Feb 10, 2021

Seems to work in RC7 but it doesn't looks like it was what I thought was the origin of another problem.
When I use http(s) to access the origin and the cache is empty, the clients only get a few ko and then the transfer stops.
Here is the output I get :

curl  --capath /etc/grid-security/certificates/ --cacert $X509_USER_PROXY --cert $X509_USER_PROXY --output /dev/null https://ccosfip00282.in2p3.fr:1094//https://ccdcalitest10.in2p3.fr:2880/pnfs/in2p3.fr/data/escape/cc_in2p3_dcache/NESSIE/35/6b/test_iorandom_768MB_8.out
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0  768M    0   238    0     0    533      0  17d 11h --:--:--  17d 11h   533
curl: (18) transfer closed with 805306130 bytes remaining to read

And in the xcache logs I get

210210 17:04:48 107 XrdPfc_File: error Read() io 0x7f9ecc010840, block 2 finished with error 42 no message of desired type /pnfs/in2p3.fr/data/escape/cc_in2p3_dcache/NESSIE/35/6b/test_iorandom_768MB_8.out
210210 17:04:48 107 XrdPfc_IO: warning Read() error in File::Read(), exit status=-42, error=no message of desired type https://u26@ccdcalitest10.in2p3.fr:2880//pnfs/in2p3.fr/data/escape/cc_in2p3_dcache/NESSIE/35/6b/test_iorandom_768MB_8.out?

Full logs here

Should I open a new issue ?

@wyang007
Copy link
Member

Does this happen to all files or just this file? will it work with files from other site?

I have a similar setup using 5.1.0.rc7 and it works for me, thouhgh it is very slow (<500kb/s) to get the 768MB file

@abh3
Copy link
Member

abh3 commented Feb 10, 2021 via email

@pmusset
Copy link
Author

pmusset commented Feb 11, 2021

Really strange. No I dont have the problem with three of the four sites I tried yesterday. Still happens with an EOS instance at Cern
Normally nothing has changed since yesterday. Or I didnt pull correctly my docker image with the rc7 and it only worked today.

Which debug level should I rise ?

Is there still an issue where xrdcl-http doesnt check in /etc/grid-security/certificates for the certificates ? Could it be linked to that ? On XrootD 4 I know I had to do

RUN cp /etc/grid-security/certificates/*.pem /etc/pki/ca-trust/source/anchors/ && update-ca-trust extract

@abh3
Copy link
Member

abh3 commented Feb 11, 2021 via email

@abh3
Copy link
Member

abh3 commented Feb 23, 2021

This seems to have stalled and this bug is rated as critical. Perhaps it's not critical and we should downgrade it. What's the status?

@pmusset
Copy link
Author

pmusset commented Feb 23, 2021

Sorry I was in holiday last week.
I've send you an email yesterday to abh@stanford.edu with the logs

@osschar
Copy link
Contributor

osschar commented Feb 23, 2021

@pmusset is this completely reproducible? Can you try running with a higher xrdcl log-level, like this:

# To debug connections to the fedration (5 Dump, 4 Debug, 3 Error, 2 Warning, 1 Info)
pss.setopt DebugLevel 4
pss.trace  all

This will produce a lot of output, so start with lower level (say, 1 or 2) if this takes time to happen (note, the numbers are not progressive in the log level).

It seems this should be reproducible by just running xrdcp with the same remote, i.e.,

xrdcp -A -d2 -f https://ccdcalitest10.in2p3.fr:2880//pnfs/in2p3.fr/data/escape/cc_in2p3_dcache/NESSIE/35/6b/test_iorandom_768MB_8.out /dev/null

I'm puzzled by the errno 42, do you have hitchhiker's hiding around your server room?

@abh3
Copy link
Member

abh3 commented Feb 23, 2021 via email

@wyang007
Copy link
Member

@pmusset, from the xrootd server log file you sent to Andy, i see these lines:

[2021-02-22 14:53:53.567882 +0000][Debug ][XrdClHttp ] Opened: https://u24@eoseulake.cern.ch:443//eos/eulake/tests/rucio_test/eulake_1/ESCAPE_CCIN2P3_RAHAL_test/1f/50/tes2_iorandom_256MB_1.out?
Posix::Stat(https://u24@eoseulake.cern.ch:443//eos/eulake/tests/rucio_test/eulake_1/ESCAPE_CCIN2P3_RAHAL_test/1f/50/tes2_iorandom_256MB_1.out?)
[2021-02-22 14:53:53.605503 +0000][Debug ][XrdClHttp ] Stat-ed URL: https://u24@eoseulake.cern.ch:443//eos/eulake/tests/rucio_test/eulake_1/ESCAPE_CCIN2P3_RAHAL_test/1f/50/tes2_iorandom_256MB_1.out?
210222 14:53:53 106 XrdPfc_Cache: info Attach() https://u24@eoseulake.cern.ch:443//eos/eulake/tests/rucio_test/eulake_1/ESCAPE_CCIN2P3_RAHAL_test/1f/50/tes2_iorandom_256MB_1.out?
210222 14:53:53 106 XrdPfc_Cache: debug GetFile /eos/eulake/tests/rucio_test/eulake_1/ESCAPE_CCIN2P3_RAHAL_test/1f/50/tes2_iorandom_256MB_1.out, io 0x7f1fdc180250
210222 14:53:53 106 XrdPfc_IO: debug initCachedStat got stat from client res = 0, size = 268435456 https://u24@eoseulake.cern.ch:443//eos/eulake/tests/rucio_test/eulake_1/ESCAPE_CCIN2P3_RAHAL_test/1f/50/tes2_iorandom_256MB_1.out?

This tells me that the xrdcl-http stat() request was successful because it got the correct file size. I can do the same thing using xrdfs root://eoseulake.cern.ch/ stat /eos/eulake/tests/rucio_test/eulake_1/ESCAPE_CCIN2P3_RAHAL_test/1f/50/tes2_iorandom_256MB_1.out (with my ATLAS or dTeam certificate).

BTW, xrdcl-http open() was also successful, but I guess open() is probably meaningless in HTTP protocol. It is only meaningful in the Davix library.

Later in the xrootd server log, I see:

210222 14:53:53 106 XrdPfc_IO: dump Read() 0x7f1fdc180250 off: 0 size: 262144 https://u24@eoseulake.cern.ch:443//eos/eulake/tests/rucio_test/eulake_1/ESCAPE_CCIN2P3_RAHAL_test/1f/50/tes2_iorandom_256MB_1.out?
...
[2021-02-22 14:53:53.638951 +0000][Debug ][XrdClHttp ] Read 0 bytes, at offset 0, from URL: https://u24@eoseulake.cern.ch:443//eos/eulake/tests/rucio_test/eulake_1/ESCAPE_CCIN2P3_RAHAL_test/1f/50/tes2_iorandom_256MB_1.out?
...
210222 14:53:53 105 XrdPfc_File: error ProcessBlockResponse Wrong number of bytes received, assuming remote/local file size mismatch, unlinking local files and initiating shutdown of File object /eos/eulake/tests/rucio_test/eulake_1/ESCAPE_CCIN2P3_RAHAL_test/1f/50/tes2_iorandom_256MB_1.out
...
210222 14:53:53 106 XrdPfc_File: error Read() io 0x7f1fdc180250, block 0 finished with error 121 remote I/O error /eos/eulake/tests/rucio_test/eulake_1/ESCAPE_CCIN2P3_RAHAL_test/1f/50/tes2_iorandom_256MB_1.out

After seeing this, I tried to use curl to copy (with my ATLAS and dTeam certificates). Though I got a "HTTP/1.1 200 OK", the content I got back is not your data but a html file telling me "There was an error loading the page you requested — This page may have been deleted or moved." along with a few links to CERN and a CERN logo.

So, question: what certificate should I use to access this file. If I don't have the right x509 certificate, I expect the web server NOT to tell me "HTTP/1.1 200 OK" but something like "HTTP/1.1 403" (btw, if I don't have an x509 certificate at all, and use the -k option with curl, I do get "HTTP/1.1 403 token authorization failed").

So what a mess in the world of industry standard?

@abh3
Copy link
Member

abh3 commented Feb 25, 2021

OK, what the last log shows me is that this is an error that happens server-side when doing async I/O and is related to it running out of AIO objects in certain cases. This has been fixed just a couple of days ago and the fix will appear in 5.1.1. That problem has been in the code since day one and only rears it's ugly head when either the server is overloaded or when the client is relatively slow. Anyway, I assume the "wrong path" problem has been fixed and this is simply a new problem, yes?

@pmusset
Copy link
Author

pmusset commented Feb 25, 2021

Yes. The ticket can be closed I think

@abh3
Copy link
Member

abh3 commented Feb 25, 2021

The initial problem has been resolved.

@abh3 abh3 closed this as completed Feb 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants