Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More resilient IPv6/IPv4 dual stack handling #644

Closed
rynge opened this issue Jan 12, 2018 · 13 comments
Closed

More resilient IPv6/IPv4 dual stack handling #644

rynge opened this issue Jan 12, 2018 · 13 comments

Comments

@rynge
Copy link

rynge commented Jan 12, 2018

One of our OSG submit hosts, xd-login.opensciencegrid.org, has a somewhat odd IPv6 configuration. IPv6 stack is enabled, the main external interface only has a IPv4 address and a secondary internal interface has both IPv4 and IPv6 enabled. There is no default IPv6 route. Detailed interface and route information below.

The result is that xrootd client think this host is IPv6 enabled, and only tries IPv6 connections. For example:

[2018-01-10 18:26:56.053031 +0000][Debug  ][PostMaster        ] [redirector.opensciencegrid.org:1094] Found 4 address(es): [2001:18e8:2:6::242]:1094, [2001:18e8:2:6::241]:1094, [::ffff:129.79.53.79]:1094, [::ffff:129.79.53.62]:1094
[2018-01-10 18:26:56.053067 +0000][Debug  ][AsyncSock         ] [redirector.opensciencegrid.org:1094 #0.0] Attempting connection to [2001:18e8:2:6::242]:1094
[2018-01-10 18:26:56.053086 +0000][Error  ][AsyncSock         ] [redirector.opensciencegrid.org:1094 #0.0] Unable to initiate the connection: [ERROR] Socket error: Network is unreachable

We understand that this host configuration is nor perfect, and we will work on improving the setup. However, we have no indicators that other tools using the network having issues with the configuration, and we believe the reason is that most tools handle dual-stack more gracefully. For example, from the wget's man page: "By default, an IPv6-aware Wget will use the address family specified by the host’s DNS record. If the DNS responds with both IPv4 and IPv6 addresses, Wget will try them in sequence until it finds one it can connect to.". Maybe xrootd should use a similar approach?

Please note that it is not really this host we are concerned about, but for example compute nodes outside our administrative domain, which could cause a lot of job failures.

To be more explicit, we would like xrootd to be made more resilient by trying address families in order (IPv6, and if that fails IPv4). If there rrdns is used, the order of the addresses can be randomized, but only within the ordered address families.

More detailed debugging information:

$ ifconfig 
em1       Link encap:Ethernet  HWaddr EC:F4:BB:C4:06:60                                             
          inet addr:129.79.53.198  Bcast:129.79.53.255  Mask:255.255.255.0                          
          inet6 addr: fe80::eef4:bbff:fec4:660/64 Scope:Link                                        
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1                                        
          RX packets:2897670688 errors:0 dropped:12277 overruns:0 frame:0                           
          TX packets:4462670781 errors:0 dropped:0 overruns:0 carrier:0                             
          collisions:0 txqueuelen:10000                                                             
          RX bytes:2356243862322 (2.1 TiB)  TX bytes:5335218056178 (4.8 TiB)                        
                                                                                                    
em3       Link encap:Ethernet  HWaddr EC:F4:BB:C4:06:64                                             
          inet addr:192.168.96.31  Bcast:192.168.99.255  Mask:255.255.252.0                                                    
          inet6 addr: fe80::eef4:bbff:fec4:664/64 Scope:Link                                                                                             
          inet6 addr: fd2f:6feb:37::1f/48 Scope:Global                                                                                                                              
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1                                                                                                                                                
          RX packets:1283502 errors:0 dropped:0 overruns:0 frame:0                                                                                                                                          
          TX packets:674102 errors:0 dropped:0 overruns:0 carrier:0                                                                                                                                                                  
          collisions:0 txqueuelen:1000                                                                                                                                                                                               
          RX bytes:503129914 (479.8 MiB)  TX bytes:241147274 (229.9 MiB)
          Memory:dae80000-daefffff 
$ route -A inet6
Kernel IPv6 routing table
Destination                                 Next Hop                                Flags Metric Ref    Use Iface
radioflyer.goc/128                          radioflyer.goc                          UC    0      2        1 em3     
fd2f:6feb:37::/48                           *                                       U     256    0        0 em3     
fe80::eef4:bbff:fec4:664/128                fe80::eef4:bbff:fec4:664                UC    0      2        0 em1     
fe80::/64                                   *                                       U     256    0        0 em1     
fe80::/64                                   *                                       U     256    0        0 em3     
localhost/128                               *                                       U     0      1884803       6 lo      
xd-login.goc/128                            *                                       U     0      162669       1 lo      
fe80::eef4:bbff:fec4:660/128                *                                       U     0      37392       1 lo      
fe80::eef4:bbff:fec4:664/128                *                                       U     0      37622       1 lo      
ff00::/8                                    *                                       U     256    0        0 em1     
ff00::/8                                    *                                       U     256    0        0 em3  
$ xrdcp -d 3 -f root://hcc-marian.unl.edu//tmp/test.1M /dev/null
[2018-01-12 16:54:34.311025 +0000][Dump   ][App               ] Chunk size: 16777216, parallel chunks 4, streams: 1
[2018-01-12 16:54:34.311069 +0000][Dump   ][App               ] Processing source entry: root://hcc-marian.unl.edu//tmp/test.1M, type xroot, target file: /dev/null
[2018-01-12 16:54:34.311114 +0000][Dump   ][Utility           ] URL: root://hcc-marian.unl.edu//tmp/test.1M
[2018-01-12 16:54:34.311114 +0000][Dump   ][Utility           ] Protocol:  root
[2018-01-12 16:54:34.311114 +0000][Dump   ][Utility           ] User Name: 
[2018-01-12 16:54:34.311114 +0000][Dump   ][Utility           ] Password:  
[2018-01-12 16:54:34.311114 +0000][Dump   ][Utility           ] Host Name: hcc-marian.unl.edu
[2018-01-12 16:54:34.311114 +0000][Dump   ][Utility           ] Port:      1094
[2018-01-12 16:54:34.311114 +0000][Dump   ][Utility           ] Path:      /tmp/test.1M
[2018-01-12 16:54:34.311193 +0000][Dump   ][Utility           ] Adding job with properties: 'checkSumMode' = 'none', 'checkSumPreset' = '', 'checkSumType' = '', 'chunkSize' = '16777216', 'coerce' = '0', 'dynamicSource' = '0', 'force' = '1', 'initTimeout' = '600', 'makeDir' = '0', 'parallelChunks' = '4', 'posc' = '0', 'source' = 'root://hcc-marian.unl.edu//tmp/test.1M', 'target' = '/dev/null', 'thirdParty' = 'none', 'tpcTimeout' = '1800', 'xcp' = '0', 'xcpBlockSize' = '134217728', 'zipArchive' = '0'
[2018-01-12 16:54:34.311215 +0000][Debug  ][Utility           ] CopyProcess: 2 jobs to prepare
[2018-01-12 16:54:34.311228 +0000][Dump   ][Utility           ] URL: root://hcc-marian.unl.edu//tmp/test.1M
[2018-01-12 16:54:34.311228 +0000][Dump   ][Utility           ] Protocol:  root
[2018-01-12 16:54:34.311228 +0000][Dump   ][Utility           ] User Name: 
[2018-01-12 16:54:34.311228 +0000][Dump   ][Utility           ] Password:  
[2018-01-12 16:54:34.311228 +0000][Dump   ][Utility           ] Host Name: hcc-marian.unl.edu
[2018-01-12 16:54:34.311228 +0000][Dump   ][Utility           ] Port:      1094
[2018-01-12 16:54:34.311228 +0000][Dump   ][Utility           ] Path:      /tmp/test.1M
[2018-01-12 16:54:34.311248 +0000][Dump   ][Utility           ] URL: /dev/null
[2018-01-12 16:54:34.311248 +0000][Dump   ][Utility           ] Protocol:  file
[2018-01-12 16:54:34.311248 +0000][Dump   ][Utility           ] User Name: 
[2018-01-12 16:54:34.311248 +0000][Dump   ][Utility           ] Password:  
[2018-01-12 16:54:34.311248 +0000][Dump   ][Utility           ] Host Name: localhost
[2018-01-12 16:54:34.311248 +0000][Dump   ][Utility           ] Port:      1094
[2018-01-12 16:54:34.311248 +0000][Dump   ][Utility           ] Path:      /dev/null
[2018-01-12 16:54:34.311272 +0000][Dump   ][Utility           ] URL: root://hcc-marian.unl.edu//tmp/test.1M
[2018-01-12 16:54:34.311272 +0000][Dump   ][Utility           ] Protocol:  root
[2018-01-12 16:54:34.311272 +0000][Dump   ][Utility           ] User Name: 
[2018-01-12 16:54:34.311272 +0000][Dump   ][Utility           ] Password:  
[2018-01-12 16:54:34.311272 +0000][Dump   ][Utility           ] Host Name: hcc-marian.unl.edu
[2018-01-12 16:54:34.311272 +0000][Dump   ][Utility           ] Port:      1094
[2018-01-12 16:54:34.311272 +0000][Dump   ][Utility           ] Path:      /tmp/test.1M
[2018-01-12 16:54:34.311289 +0000][Dump   ][Utility           ] URL: /dev/null
[2018-01-12 16:54:34.311289 +0000][Dump   ][Utility           ] Protocol:  file
[2018-01-12 16:54:34.311289 +0000][Dump   ][Utility           ] User Name: 
[2018-01-12 16:54:34.311289 +0000][Dump   ][Utility           ] Password:  
[2018-01-12 16:54:34.311289 +0000][Dump   ][Utility           ] Host Name: localhost
[2018-01-12 16:54:34.311289 +0000][Dump   ][Utility           ] Port:      1094
[2018-01-12 16:54:34.311289 +0000][Dump   ][Utility           ] Path:      /dev/null
[2018-01-12 16:54:34.311301 +0000][Debug  ][Utility           ] Creating a classic copy job, from root://hcc-marian.unl.edu:1094//tmp/test.1M to file://localhost/dev/null
[2018-01-12 16:54:34.311323 +0000][Debug  ][Utility           ] Monitor library name not set. No monitoring
[2018-01-12 16:54:34.311373 +0000][Debug  ][Poller            ] Available pollers: built-in
[2018-01-12 16:54:34.311383 +0000][Debug  ][Poller            ] Attempting to create a poller according to preference: built-in
[2018-01-12 16:54:34.311389 +0000][Debug  ][Poller            ] Creating poller: built-in
[2018-01-12 16:54:34.311397 +0000][Debug  ][Poller            ] Creating and starting the built-in poller...
[2018-01-12 16:54:34.311517 +0000][Debug  ][Poller            ] Using 1 poller threads
[2018-01-12 16:54:34.311528 +0000][Debug  ][TaskMgr           ] Starting the task manager...
[2018-01-12 16:54:34.311553 +0000][Debug  ][TaskMgr           ] Task manager started
[2018-01-12 16:54:34.311562 +0000][Debug  ][JobMgr            ] Starting the job manager...
[2018-01-12 16:54:34.311627 +0000][Debug  ][JobMgr            ] Job manager started, 3 workers
[2018-01-12 16:54:34.311637 +0000][Debug  ][TaskMgr           ] Registering task: "FileTimer task" to be run at: [2018-01-12 16:54:34 +0000]
[2018-01-12 16:54:34.311646 +0000][Debug  ][Utility           ] Opening root://hcc-marian.unl.edu:1094//tmp/test.1M for reading
[2018-01-12 16:54:34.311672 +0000][Dump   ][Utility           ] URL: root://hcc-marian.unl.edu:1094//tmp/test.1M
[2018-01-12 16:54:34.311672 +0000][Dump   ][Utility           ] Protocol:  root
[2018-01-12 16:54:34.311672 +0000][Dump   ][Utility           ] User Name: 
[2018-01-12 16:54:34.311672 +0000][Dump   ][Utility           ] Password:  
[2018-01-12 16:54:34.311672 +0000][Dump   ][Utility           ] Host Name: hcc-marian.unl.edu
[2018-01-12 16:54:34.311672 +0000][Dump   ][Utility           ] Port:      1094
[2018-01-12 16:54:34.311672 +0000][Dump   ][Utility           ] Path:      /tmp/test.1M
[2018-01-12 16:54:34.311713 +0000][Dump   ][Utility           ] URL: root://hcc-marian.unl.edu:1094//tmp/test.1M
[2018-01-12 16:54:34.311713 +0000][Dump   ][Utility           ] Protocol:  root
[2018-01-12 16:54:34.311713 +0000][Dump   ][Utility           ] User Name: 
[2018-01-12 16:54:34.311713 +0000][Dump   ][Utility           ] Password:  
[2018-01-12 16:54:34.311713 +0000][Dump   ][Utility           ] Host Name: hcc-marian.unl.edu
[2018-01-12 16:54:34.311713 +0000][Dump   ][Utility           ] Port:      1094
[2018-01-12 16:54:34.311713 +0000][Dump   ][Utility           ] Path:      /tmp/test.1M
[2018-01-12 16:54:34.311727 +0000][Debug  ][File              ] [0x178fef0@root://hcc-marian.unl.edu:1094//tmp/test.1M] Sending an open command
[2018-01-12 16:54:34.311744 +0000][Dump   ][XRootD            ] [hcc-marian.unl.edu:1094] Sending message kXR_open (file: /tmp/test.1M, mode: 00, flags: kXR_open_read kXR_async kXR_retstat )
[2018-01-12 16:54:34.311763 +0000][Dump   ][Utility           ] URL: hcc-marian.unl.edu:1094
[2018-01-12 16:54:34.311763 +0000][Dump   ][Utility           ] Protocol:  root
[2018-01-12 16:54:34.311763 +0000][Dump   ][Utility           ] User Name: 
[2018-01-12 16:54:34.311763 +0000][Dump   ][Utility           ] Password:  
[2018-01-12 16:54:34.311763 +0000][Dump   ][Utility           ] Host Name: hcc-marian.unl.edu
[2018-01-12 16:54:34.311763 +0000][Dump   ][Utility           ] Port:      1094
[2018-01-12 16:54:34.311763 +0000][Dump   ][Utility           ] Path:      
[2018-01-12 16:54:34.311780 +0000][Debug  ][PostMaster        ] Creating new channel to: hcc-marian.unl.edu:1094 1 stream(s)
[2018-01-12 16:54:34.311797 +0000][Debug  ][PostMaster        ] [hcc-marian.unl.edu:1094 #0] Stream parameters: Network Stack: IPAuto, Connection Window: 120, ConnectionRetry: 5, Stream Error Widnow: 1800
[2018-01-12 16:54:34.311871 +0000][Debug  ][TaskMgr           ] Registering task: "TickGeneratorTask for: hcc-marian.unl.edu:1094" to be run at: [2018-01-12 16:54:49 +0000]
[2018-01-12 16:54:34.311891 +0000][Dump   ][PostMaster        ] [hcc-marian.unl.edu:1094 #0] Sending message kXR_open (file: /tmp/test.1M, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ) (0x1790f20) through substream 0 expecting answer at 0
[2018-01-12 16:54:34.312289 +0000][Debug  ][PostMaster        ] [hcc-marian.unl.edu:1094] Found 2 address(es): [2600:900:6:1301:5054:ff:fe1c:ccec]:1094, [::ffff:129.93.244.231]:1094
[2018-01-12 16:54:34.312313 +0000][Debug  ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Attempting connection to [2600:900:6:1301:5054:ff:fe1c:ccec]:1094
[2018-01-12 16:54:34.312329 +0000][Error  ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Unable to initiate the connection: [ERROR] Socket error: Network is unreachable
[2018-01-12 16:54:34.312337 +0000][Error  ][XRootD            ] [hcc-marian.unl.edu:1094] Unable to send the message kXR_open (file: /tmp/test.1M, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ): [FATAL] Socket error: Network is unreachable
[0B/0B][100%][==================================================][0B/s]  
Run: [FATAL] Socket error: Network is unreachable
[2018-01-12 16:54:34.312403 +0000][Debug  ][JobMgr            ] Stopping the job manager...
[2018-01-12 16:54:34.312409 +0000][Dump   ][JobMgr            ] Stopping worker #0...
[2018-01-12 16:54:34.312557 +0000][Dump   ][JobMgr            ] Worker #0 stopped
[2018-01-12 16:54:34.312566 +0000][Dump   ][JobMgr            ] Stopping worker #1...
[2018-01-12 16:54:34.312594 +0000][Dump   ][JobMgr            ] Worker #1 stopped
[2018-01-12 16:54:34.312600 +0000][Dump   ][JobMgr            ] Stopping worker #2...
[2018-01-12 16:54:34.312626 +0000][Dump   ][JobMgr            ] Worker #2 stopped
[2018-01-12 16:54:34.312632 +0000][Debug  ][JobMgr            ] Job manager stopped
[2018-01-12 16:54:34.312636 +0000][Debug  ][TaskMgr           ] Stopping the task manager...
[2018-01-12 16:54:34.312684 +0000][Debug  ][TaskMgr           ] Task manager stopped
[2018-01-12 16:54:34.312691 +0000][Debug  ][Poller            ] Stopping the poller...
[2018-01-12 16:54:34.312735 +0000][Debug  ][TaskMgr           ] Requesting unregistration of: "TickGeneratorTask for: hcc-marian.unl.edu:1094"
[2018-01-12 16:54:34.312750 +0000][Debug  ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Closing the socket
[2018-01-12 16:54:34.312777 +0000][Debug  ][PostMaster        ] [hcc-marian.unl.edu:1094 #0] Destroying stream
[2018-01-12 16:54:34.312785 +0000][Debug  ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Closing the socket

$ XRD_NETWORKSTACK=IPv4 xrdcp -d 3 -f root://hcc-marian.unl.edu//tmp/test.1M /dev/null 2>&1  
[2018-01-12 16:55:05.202827 +0000][Dump   ][App               ] Chunk size: 16777216, parallel chunks 4, streams: 1
[2018-01-12 16:55:05.202868 +0000][Dump   ][App               ] Processing source entry: root://hcc-marian.unl.edu//tmp/test.1M, type xroot, target file: /dev/null
[2018-01-12 16:55:05.202893 +0000][Dump   ][Utility           ] URL: root://hcc-marian.unl.edu//tmp/test.1M
[2018-01-12 16:55:05.202893 +0000][Dump   ][Utility           ] Protocol:  root
[2018-01-12 16:55:05.202893 +0000][Dump   ][Utility           ] User Name: 
[2018-01-12 16:55:05.202893 +0000][Dump   ][Utility           ] Password:  
[2018-01-12 16:55:05.202893 +0000][Dump   ][Utility           ] Host Name: hcc-marian.unl.edu
[2018-01-12 16:55:05.202893 +0000][Dump   ][Utility           ] Port:      1094
[2018-01-12 16:55:05.202893 +0000][Dump   ][Utility           ] Path:      /tmp/test.1M
[2018-01-12 16:55:05.202962 +0000][Dump   ][Utility           ] Adding job with properties: 'checkSumMode' = 'none', 'checkSumPreset' = '', 'checkSumType' = '', 'chunkSize' = '16777216', 'coerce' = '0', 'dynamicSource' = '0', 'force' = '1', 'initTimeout' = '600', 'makeDir' = '0', 'parallelChunks' = '4', 'posc' = '0', 'source' = 'root://hcc-marian.unl.edu//tmp/test.1M', 'target' = '/dev/null', 'thirdParty' = 'none', 'tpcTimeout' = '1800', 'xcp' = '0', 'xcpBlockSize' = '134217728', 'zipArchive' = '0'
[2018-01-12 16:55:05.202983 +0000][Debug  ][Utility           ] CopyProcess: 2 jobs to prepare
[2018-01-12 16:55:05.202996 +0000][Dump   ][Utility           ] URL: root://hcc-marian.unl.edu//tmp/test.1M
[2018-01-12 16:55:05.202996 +0000][Dump   ][Utility           ] Protocol:  root
[2018-01-12 16:55:05.202996 +0000][Dump   ][Utility           ] User Name: 
[2018-01-12 16:55:05.202996 +0000][Dump   ][Utility           ] Password:  
[2018-01-12 16:55:05.202996 +0000][Dump   ][Utility           ] Host Name: hcc-marian.unl.edu
[2018-01-12 16:55:05.202996 +0000][Dump   ][Utility           ] Port:      1094
[2018-01-12 16:55:05.202996 +0000][Dump   ][Utility           ] Path:      /tmp/test.1M
[2018-01-12 16:55:05.203014 +0000][Dump   ][Utility           ] URL: /dev/null
[2018-01-12 16:55:05.203014 +0000][Dump   ][Utility           ] Protocol:  file
[2018-01-12 16:55:05.203014 +0000][Dump   ][Utility           ] User Name: 
[2018-01-12 16:55:05.203014 +0000][Dump   ][Utility           ] Password:  
[2018-01-12 16:55:05.203014 +0000][Dump   ][Utility           ] Host Name: localhost
[2018-01-12 16:55:05.203014 +0000][Dump   ][Utility           ] Port:      1094
[2018-01-12 16:55:05.203014 +0000][Dump   ][Utility           ] Path:      /dev/null
[2018-01-12 16:55:05.203036 +0000][Dump   ][Utility           ] URL: root://hcc-marian.unl.edu//tmp/test.1M
[2018-01-12 16:55:05.203036 +0000][Dump   ][Utility           ] Protocol:  root
[2018-01-12 16:55:05.203036 +0000][Dump   ][Utility           ] User Name: 
[2018-01-12 16:55:05.203036 +0000][Dump   ][Utility           ] Password:  
[2018-01-12 16:55:05.203036 +0000][Dump   ][Utility           ] Host Name: hcc-marian.unl.edu
[2018-01-12 16:55:05.203036 +0000][Dump   ][Utility           ] Port:      1094
[2018-01-12 16:55:05.203036 +0000][Dump   ][Utility           ] Path:      /tmp/test.1M
[2018-01-12 16:55:05.203054 +0000][Dump   ][Utility           ] URL: /dev/null
[2018-01-12 16:55:05.203054 +0000][Dump   ][Utility           ] Protocol:  file
[2018-01-12 16:55:05.203054 +0000][Dump   ][Utility           ] User Name: 
[2018-01-12 16:55:05.203054 +0000][Dump   ][Utility           ] Password:  
[2018-01-12 16:55:05.203054 +0000][Dump   ][Utility           ] Host Name: localhost
[2018-01-12 16:55:05.203054 +0000][Dump   ][Utility           ] Port:      1094
[2018-01-12 16:55:05.203054 +0000][Dump   ][Utility           ] Path:      /dev/null
[2018-01-12 16:55:05.203066 +0000][Debug  ][Utility           ] Creating a classic copy job, from root://hcc-marian.unl.edu:1094//tmp/test.1M to file://localhost/dev/null
[2018-01-12 16:55:05.203087 +0000][Debug  ][Utility           ] Monitor library name not set. No monitoring
[2018-01-12 16:55:05.203133 +0000][Debug  ][Poller            ] Available pollers: built-in
[2018-01-12 16:55:05.203140 +0000][Debug  ][Poller            ] Attempting to create a poller according to preference: built-in
[2018-01-12 16:55:05.203146 +0000][Debug  ][Poller            ] Creating poller: built-in
[2018-01-12 16:55:05.203153 +0000][Debug  ][Poller            ] Creating and starting the built-in poller...
[2018-01-12 16:55:05.203233 +0000][Debug  ][Poller            ] Using 1 poller threads
[2018-01-12 16:55:05.203242 +0000][Debug  ][TaskMgr           ] Starting the task manager...
[2018-01-12 16:55:05.203267 +0000][Debug  ][TaskMgr           ] Task manager started
[2018-01-12 16:55:05.203274 +0000][Debug  ][JobMgr            ] Starting the job manager...
[2018-01-12 16:55:05.203321 +0000][Debug  ][JobMgr            ] Job manager started, 3 workers
[2018-01-12 16:55:05.203331 +0000][Debug  ][TaskMgr           ] Registering task: "FileTimer task" to be run at: [2018-01-12 16:55:05 +0000]
[2018-01-12 16:55:05.203339 +0000][Debug  ][Utility           ] Opening root://hcc-marian.unl.edu:1094//tmp/test.1M for reading
[2018-01-12 16:55:05.203357 +0000][Dump   ][Utility           ] URL: root://hcc-marian.unl.edu:1094//tmp/test.1M
[2018-01-12 16:55:05.203357 +0000][Dump   ][Utility           ] Protocol:  root
[2018-01-12 16:55:05.203357 +0000][Dump   ][Utility           ] User Name: 
[2018-01-12 16:55:05.203357 +0000][Dump   ][Utility           ] Password:  
[2018-01-12 16:55:05.203357 +0000][Dump   ][Utility           ] Host Name: hcc-marian.unl.edu
[2018-01-12 16:55:05.203357 +0000][Dump   ][Utility           ] Port:      1094
[2018-01-12 16:55:05.203357 +0000][Dump   ][Utility           ] Path:      /tmp/test.1M
[2018-01-12 16:55:05.203382 +0000][Dump   ][Utility           ] URL: root://hcc-marian.unl.edu:1094//tmp/test.1M
[2018-01-12 16:55:05.203382 +0000][Dump   ][Utility           ] Protocol:  root
[2018-01-12 16:55:05.203382 +0000][Dump   ][Utility           ] User Name: 
[2018-01-12 16:55:05.203382 +0000][Dump   ][Utility           ] Password:  
[2018-01-12 16:55:05.203382 +0000][Dump   ][Utility           ] Host Name: hcc-marian.unl.edu
[2018-01-12 16:55:05.203382 +0000][Dump   ][Utility           ] Port:      1094
[2018-01-12 16:55:05.203382 +0000][Dump   ][Utility           ] Path:      /tmp/test.1M
[2018-01-12 16:55:05.203770 +0000][Debug  ][File              ] [0x2341ef0@root://hcc-marian.unl.edu:1094//tmp/test.1M] Sending an open command
[2018-01-12 16:55:05.203789 +0000][Dump   ][XRootD            ] [hcc-marian.unl.edu:1094] Sending message kXR_open (file: /tmp/test.1M, mode: 00, flags: kXR_open_read kXR_async kXR_retstat )
[2018-01-12 16:55:05.203808 +0000][Dump   ][Utility           ] URL: hcc-marian.unl.edu:1094
[2018-01-12 16:55:05.203808 +0000][Dump   ][Utility           ] Protocol:  root
[2018-01-12 16:55:05.203808 +0000][Dump   ][Utility           ] User Name: 
[2018-01-12 16:55:05.203808 +0000][Dump   ][Utility           ] Password:  
[2018-01-12 16:55:05.203808 +0000][Dump   ][Utility           ] Host Name: hcc-marian.unl.edu
[2018-01-12 16:55:05.203808 +0000][Dump   ][Utility           ] Port:      1094
[2018-01-12 16:55:05.203808 +0000][Dump   ][Utility           ] Path:      
[2018-01-12 16:55:05.203827 +0000][Debug  ][PostMaster        ] Creating new channel to: hcc-marian.unl.edu:1094 1 stream(s)
[2018-01-12 16:55:05.203843 +0000][Debug  ][PostMaster        ] [hcc-marian.unl.edu:1094 #0] Stream parameters: Network Stack: IPv4, Connection Window: 120, ConnectionRetry: 5, Stream Error Widnow: 1800
[2018-01-12 16:55:05.203915 +0000][Debug  ][TaskMgr           ] Registering task: "TickGeneratorTask for: hcc-marian.unl.edu:1094" to be run at: [2018-01-12 16:55:20 +0000]
[2018-01-12 16:55:05.203935 +0000][Dump   ][PostMaster        ] [hcc-marian.unl.edu:1094 #0] Sending message kXR_open (file: /tmp/test.1M, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ) (0x2342f20) through substream 0 expecting answer at 0
[2018-01-12 16:55:05.204191 +0000][Debug  ][PostMaster        ] [hcc-marian.unl.edu:1094] Found 1 address(es): [::ffff:129.93.244.231]:1094
[2018-01-12 16:55:05.204220 +0000][Debug  ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Attempting connection to [::ffff:129.93.244.231]:1094
[2018-01-12 16:55:05.204245 +0000][Debug  ][Poller            ] Adding socket 0x2347050 to the poller
[2018-01-12 16:55:05.226257 +0000][Debug  ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Async connection call returned
[2018-01-12 16:55:05.226284 +0000][Debug  ][XRootDTransport   ] [hcc-marian.unl.edu:1094 #0.0] Sending out the initial hand shake + kXR_protocol
[2018-01-12 16:55:05.226313 +0000][Dump   ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Wrote a message:  (0x4c000950), 44 bytes
[2018-01-12 16:55:05.248319 +0000][Dump   ][XRootDTransport   ] [msg: 0x4c000950] Expecting 8 bytes of message body
[2018-01-12 16:55:05.248336 +0000][Dump   ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Received message header, size: 8
[2018-01-12 16:55:05.248345 +0000][Dump   ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Received a message of 16 bytes
[2018-01-12 16:55:05.248354 +0000][Debug  ][XRootDTransport   ] [hcc-marian.unl.edu:1094 #0.0] Got the server hand shake response (type: server [], protocol version 310)
[2018-01-12 16:55:05.248364 +0000][Dump   ][XRootDTransport   ] [msg: 0x4c000950] Expecting 8 bytes of message body
[2018-01-12 16:55:05.248369 +0000][Dump   ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Received message header, size: 8
[2018-01-12 16:55:05.248375 +0000][Dump   ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Received a message of 16 bytes
[2018-01-12 16:55:05.248381 +0000][Debug  ][XRootDTransport   ] [hcc-marian.unl.edu:1094 #0.0] kXR_protocol successful (type: server [], protocol version 310)
[2018-01-12 16:55:05.248553 +0000][Debug  ][XRootDTransport   ] [hcc-marian.unl.edu:1094 #0.0] Sending out kXR_login request, username: rynge, cgi: ?xrd.cc=us&xrd.tz=0&xrd.appname=xrdcp&xrd.info=&xrd.hostname=xd-login.opensciencegrid.org&xrd.rn=v4.8.0, dual-stack: true, private IPv4: false, private IPv6: false
[2018-01-12 16:55:05.248588 +0000][Dump   ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Wrote a message:  (0x4c000a60), 127 bytes
[2018-01-12 16:55:05.270569 +0000][Dump   ][XRootDTransport   ] [msg: 0x4c000950] Expecting 16 bytes of message body
[2018-01-12 16:55:05.270587 +0000][Dump   ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Received message header, size: 8
[2018-01-12 16:55:05.270595 +0000][Dump   ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Received a message of 24 bytes
[2018-01-12 16:55:05.270604 +0000][Debug  ][XRootDTransport   ] [hcc-marian.unl.edu:1094 #0.0] Logged in, session: 08000000661300001700000008000000
[2018-01-12 16:55:05.270613 +0000][Debug  ][PostMaster        ] [hcc-marian.unl.edu:1094 #0] Stream 0 connected.
[2018-01-12 16:55:05.270635 +0000][Dump   ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Wrote a message: kXR_open (file: /tmp/test.1M, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ) (0x2342f20), 36 bytes
[2018-01-12 16:55:05.270645 +0000][Dump   ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Successfully sent message: kXR_open (file: /tmp/test.1M, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ) (0x2342f20).
[2018-01-12 16:55:05.270656 +0000][Dump   ][XRootD            ] [hcc-marian.unl.edu:1094] Message kXR_open (file: /tmp/test.1M, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ) has been successfully sent.
[2018-01-12 16:55:05.270665 +0000][Dump   ][PostMaster        ] [hcc-marian.unl.edu:1094 #0.0] All messages consumed, disable uplink
[2018-01-12 16:55:05.293123 +0000][Dump   ][XRootDTransport   ] [msg: 0x4c000a40] Expecting 41 bytes of message body
[2018-01-12 16:55:05.293142 +0000][Dump   ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Received message header for 0x4c000a40 size: 8
[2018-01-12 16:55:05.293157 +0000][Dump   ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Received message 0x4c000a40 of 49 bytes
[2018-01-12 16:55:05.293165 +0000][Dump   ][PostMaster        ] [hcc-marian.unl.edu:1094 #0] Handling received message: 0x4c000a40.
[2018-01-12 16:55:05.293227 +0000][Dump   ][XRootD            ] [hcc-marian.unl.edu:1094] Got a kXR_ok response to request kXR_open (file: /tmp/test.1M, mode: 00, flags: kXR_open_read kXR_async kXR_retstat )
[2018-01-12 16:55:05.293250 +0000][Dump   ][XRootD            ] [hcc-marian.unl.edu:1094] Parsing the response to kXR_open (file: /tmp/test.1M, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ) as OpenInfo
[2018-01-12 16:55:05.293256 +0000][Dump   ][XRootD            ] [hcc-marian.unl.edu:1094] Parsing StatInfo in response to kXR_open (file: /tmp/test.1M, mode: 00, flags: kXR_open_read kXR_async kXR_retstat )
[2018-01-12 16:55:05.293285 +0000][Debug  ][File              ] [0x2341ef0@root://hcc-marian.unl.edu:1094//tmp/test.1M] Open has returned with status [SUCCESS] 
[2018-01-12 16:55:05.293291 +0000][Debug  ][File              ] [0x2341ef0@root://hcc-marian.unl.edu:1094//tmp/test.1M] successfully opened at hcc-marian.unl.edu:1094, handle: 0x0, session id: 1
[2018-01-12 16:55:05.293360 +0000][Debug  ][Utility           ] Opening file://localhost/dev/null?oss.asize=20 for writing
[2018-01-12 16:55:05.293392 +0000][Dump   ][Utility           ] URL: file://localhost/dev/null?oss.asize=20
[2018-01-12 16:55:05.293392 +0000][Dump   ][Utility           ] Protocol:  file
[2018-01-12 16:55:05.293392 +0000][Dump   ][Utility           ] User Name: 
[2018-01-12 16:55:05.293392 +0000][Dump   ][Utility           ] Password:  
[2018-01-12 16:55:05.293392 +0000][Dump   ][Utility           ] Host Name: localhost
[2018-01-12 16:55:05.293392 +0000][Dump   ][Utility           ] Port:      1094
[2018-01-12 16:55:05.293392 +0000][Dump   ][Utility           ] Path:      /dev/null
[2018-01-12 16:55:05.293416 +0000][Dump   ][Utility           ] URL: file://localhost/dev/null?oss.asize=20
[2018-01-12 16:55:05.293416 +0000][Dump   ][Utility           ] Protocol:  file
[2018-01-12 16:55:05.293416 +0000][Dump   ][Utility           ] User Name: 
[2018-01-12 16:55:05.293416 +0000][Dump   ][Utility           ] Password:  
[2018-01-12 16:55:05.293416 +0000][Dump   ][Utility           ] Host Name: localhost
[2018-01-12 16:55:05.293416 +0000][Dump   ][Utility           ] Port:      1094
[2018-01-12 16:55:05.293416 +0000][Dump   ][Utility           ] Path:      /dev/null
[2018-01-12 16:55:05.293429 +0000][Debug  ][File              ] [0x2347300@file://localhost/dev/null?oss.asize=20] Sending an open command
[2018-01-12 16:55:05.293447 +0000][Dump   ][Utility           ] URL: file://localhost/dev/null?oss.asize=20
[2018-01-12 16:55:05.293447 +0000][Dump   ][Utility           ] Protocol:  file
[2018-01-12 16:55:05.293447 +0000][Dump   ][Utility           ] User Name: 
[2018-01-12 16:55:05.293447 +0000][Dump   ][Utility           ] Password:  
[2018-01-12 16:55:05.293447 +0000][Dump   ][Utility           ] Host Name: localhost
[2018-01-12 16:55:05.293447 +0000][Dump   ][Utility           ] Port:      1094
[2018-01-12 16:55:05.293447 +0000][Dump   ][Utility           ] Path:      /dev/null
[2018-01-12 16:55:05.293481 +0000][Dump   ][Utility           ] URL: file://localhost/dev/null?oss.asize=20
[2018-01-12 16:55:05.293481 +0000][Dump   ][Utility           ] Protocol:  file
[2018-01-12 16:55:05.293481 +0000][Dump   ][Utility           ] User Name: 
[2018-01-12 16:55:05.293481 +0000][Dump   ][Utility           ] Password:  
[2018-01-12 16:55:05.293481 +0000][Dump   ][Utility           ] Host Name: localhost
[2018-01-12 16:55:05.293481 +0000][Dump   ][Utility           ] Port:      1094
[2018-01-12 16:55:05.293481 +0000][Dump   ][Utility           ] Path:      /dev/null
[2018-01-12 16:55:05.293529 +0000][Debug  ][File              ] [0x2347300@file://localhost/dev/null?oss.asize=20] Open has returned with status [SUCCESS] 
[2018-01-12 16:55:05.293538 +0000][Debug  ][File              ] [0x2347300@file://localhost/dev/null?oss.asize=20] successfully opened at localhost, handle: 0xa, session id: 1
[2018-01-12 16:55:05.293556 +0000][Debug  ][File              ] [0x2341ef0@root://hcc-marian.unl.edu:1094//tmp/test.1M] Sending a read command for handle 0x0 to hcc-marian.unl.edu:1094
[2018-01-12 16:55:05.293576 +0000][Dump   ][XRootD            ] [hcc-marian.unl.edu:1094] Sending message kXR_read (handle: 0x00000000, offset: 0, size: 20)
[2018-01-12 16:55:05.293589 +0000][Dump   ][PostMaster        ] [hcc-marian.unl.edu:1094 #0] Sending message kXR_read (handle: 0x00000000, offset: 0, size: 20) (0x23488b0) through substream 0 expecting answer at 0
[2018-01-12 16:55:05.293616 +0000][Dump   ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Wrote a message: kXR_read (handle: 0x00000000, offset: 0, size: 20) (0x23488b0), 32 bytes
[2018-01-12 16:55:05.293629 +0000][Dump   ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Successfully sent message: kXR_read (handle: 0x00000000, offset: 0, size: 20) (0x23488b0).
[2018-01-12 16:55:05.293635 +0000][Dump   ][XRootD            ] [hcc-marian.unl.edu:1094] Message kXR_read (handle: 0x00000000, offset: 0, size: 20) has been successfully sent.
[2018-01-12 16:55:05.293641 +0000][Dump   ][PostMaster        ] [hcc-marian.unl.edu:1094 #0.0] All messages consumed, disable uplink
[2018-01-12 16:55:05.315657 +0000][Dump   ][XRootDTransport   ] [msg: 0x4c0009b0] Expecting 20 bytes of message body
[2018-01-12 16:55:05.315673 +0000][Dump   ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Received message header for 0x4c0009b0 size: 8
[2018-01-12 16:55:05.315679 +0000][Dump   ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Will use the raw handler to read body of message 0x4c0009b0
[2018-01-12 16:55:05.315690 +0000][Dump   ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Received message 0x4c0009b0 of 28 bytes
[2018-01-12 16:55:05.315695 +0000][Dump   ][PostMaster        ] [hcc-marian.unl.edu:1094 #0] Handling received message: 0x4c0009b0.
[2018-01-12 16:55:05.315759 +0000][Dump   ][XRootD            ] [hcc-marian.unl.edu:1094] Got a kXR_ok response to request kXR_read (handle: 0x00000000, offset: 0, size: 20)
[2018-01-12 16:55:05.315774 +0000][Dump   ][XRootD            ] [hcc-marian.unl.edu:1094] Parsing the response to kXR_read (handle: 0x00000000, offset: 0, size: 20) as ChunkInfo
[2018-01-12 16:55:05.315784 +0000][Dump   ][File              ] [0x2341ef0@root://hcc-marian.unl.edu:1094//tmp/test.1M] Got state response for message kXR_read (handle: 0x00000000, offset: 0, size: 20)
[2018-01-12 16:55:05.315808 +0000][Debug  ][File              ] [0x2347300@file://localhost/dev/null?oss.asize=20] Sending a write command for handle 0xa to localhost
[2018-01-12 16:55:05.315849 +0000][Dump   ][Utility           ] URL: file://localhost/dev/null?oss.asize=20
[2018-01-12 16:55:05.315849 +0000][Dump   ][Utility           ] Protocol:  file
[2018-01-12 16:55:05.315849 +0000][Dump   ][Utility           ] User Name: 
[2018-01-12 16:55:05.315849 +0000][Dump   ][Utility           ] Password:  
[2018-01-12 16:55:05.315849 +0000][Dump   ][Utility           ] Host Name: localhost
[2018-01-12 16:55:05.315849 +0000][Dump   ][Utility           ] Port:      1094
[2018-01-12 16:55:05.315849 +0000][Dump   ][Utility           ] Path:      /dev/null
[2018-01-12 16:55:05.315873 +0000][Dump   ][File              ] [0x2347300@file://localhost/dev/null?oss.asize=20] Got state response for message kXR_write (handle: 0x0a000000, offset: 0, size: 20)
[20B/20B][100%][==================================================][20B/s]  [2018-01-12 16:55:05.315896 +0000][Debug  ][File              ] [0x2347300@file://localhost/dev/null?oss.asize=20] Sending a close command for handle 0xa to localhost
[2018-01-12 16:55:05.315916 +0000][Dump   ][Utility           ] URL: file://localhost/dev/null?oss.asize=20
[2018-01-12 16:55:05.315916 +0000][Dump   ][Utility           ] Protocol:  file
[2018-01-12 16:55:05.315916 +0000][Dump   ][Utility           ] User Name: 
[2018-01-12 16:55:05.315916 +0000][Dump   ][Utility           ] Password:  
[2018-01-12 16:55:05.315916 +0000][Dump   ][Utility           ] Host Name: localhost
[2018-01-12 16:55:05.315916 +0000][Dump   ][Utility           ] Port:      1094
[2018-01-12 16:55:05.315916 +0000][Dump   ][Utility           ] Path:      /dev/null
[2018-01-12 16:55:05.315939 +0000][Debug  ][File              ] [0x2347300@file://localhost/dev/null?oss.asize=20] Close returned from localhost with: [SUCCESS] 
[2018-01-12 16:55:05.315946 +0000][Dump   ][File              ] [0x2347300@file://localhost/dev/null?oss.asize=20] Items in the fly 0, queued for recovery 0
[2018-01-12 16:55:05.315966 +0000][Debug  ][File              ] [0x2341ef0@root://hcc-marian.unl.edu:1094//tmp/test.1M] Sending a close command for handle 0x0 to hcc-marian.unl.edu:1094
[2018-01-12 16:55:05.315977 +0000][Dump   ][XRootD            ] [hcc-marian.unl.edu:1094] Sending message kXR_close (handle: 0x00000000)
[2018-01-12 16:55:05.315986 +0000][Dump   ][PostMaster        ] [hcc-marian.unl.edu:1094 #0] Sending message kXR_close (handle: 0x00000000) (0x23467d0) through substream 0 expecting answer at 0
[2018-01-12 16:55:05.316033 +0000][Dump   ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Wrote a message: kXR_close (handle: 0x00000000) (0x23467d0), 24 bytes
[2018-01-12 16:55:05.316049 +0000][Dump   ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Successfully sent message: kXR_close (handle: 0x00000000) (0x23467d0).
[2018-01-12 16:55:05.316056 +0000][Dump   ][XRootD            ] [hcc-marian.unl.edu:1094] Message kXR_close (handle: 0x00000000) has been successfully sent.
[2018-01-12 16:55:05.316062 +0000][Dump   ][PostMaster        ] [hcc-marian.unl.edu:1094 #0.0] All messages consumed, disable uplink
[2018-01-12 16:55:05.338088 +0000][Dump   ][XRootDTransport   ] [msg: 0x4c000a40] Expecting 0 bytes of message body
[2018-01-12 16:55:05.338109 +0000][Dump   ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Received message header for 0x4c000a40 size: 8
[2018-01-12 16:55:05.338116 +0000][Dump   ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Received message 0x4c000a40 of 8 bytes
[2018-01-12 16:55:05.338123 +0000][Dump   ][PostMaster        ] [hcc-marian.unl.edu:1094 #0] Handling received message: 0x4c000a40.
[2018-01-12 16:55:05.338143 +0000][Dump   ][XRootD            ] [hcc-marian.unl.edu:1094] Got a kXR_ok response to request kXR_close (handle: 0x00000000)
[2018-01-12 16:55:05.338161 +0000][Debug  ][File              ] [0x2341ef0@root://hcc-marian.unl.edu:1094//tmp/test.1M] Close returned from hcc-marian.unl.edu:1094 with: [SUCCESS] 
[2018-01-12 16:55:05.338168 +0000][Dump   ][File              ] [0x2341ef0@root://hcc-marian.unl.edu:1094//tmp/test.1M] Items in the fly 0, queued for recovery 0
[20B/20B][100%][==================================================][20B/s]  
[2018-01-12 16:55:05.338274 +0000][Debug  ][JobMgr            ] Stopping the job manager...
[2018-01-12 16:55:05.338287 +0000][Dump   ][JobMgr            ] Stopping worker #0...
[2018-01-12 16:55:05.338385 +0000][Dump   ][JobMgr            ] Worker #0 stopped
[2018-01-12 16:55:05.338396 +0000][Dump   ][JobMgr            ] Stopping worker #1...
[2018-01-12 16:55:05.338424 +0000][Dump   ][JobMgr            ] Worker #1 stopped
[2018-01-12 16:55:05.338433 +0000][Dump   ][JobMgr            ] Stopping worker #2...
[2018-01-12 16:55:05.338459 +0000][Dump   ][JobMgr            ] Worker #2 stopped
[2018-01-12 16:55:05.338468 +0000][Debug  ][JobMgr            ] Job manager stopped
[2018-01-12 16:55:05.338475 +0000][Debug  ][TaskMgr           ] Stopping the task manager...
[2018-01-12 16:55:05.338535 +0000][Debug  ][TaskMgr           ] Task manager stopped
[2018-01-12 16:55:05.338546 +0000][Debug  ][Poller            ] Stopping the poller...
[2018-01-12 16:55:05.338589 +0000][Debug  ][TaskMgr           ] Requesting unregistration of: "TickGeneratorTask for: hcc-marian.unl.edu:1094"
[2018-01-12 16:55:05.338601 +0000][Debug  ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Closing the socket
[2018-01-12 16:55:05.338610 +0000][Debug  ][Poller            ] <[::ffff:129.79.53.198]:56350><--><[::ffff:129.93.244.231]:1094> Removing socket from the poller
[2018-01-12 16:55:05.338634 +0000][Debug  ][PostMaster        ] [hcc-marian.unl.edu:1094 #0] Destroying stream
[2018-01-12 16:55:05.338645 +0000][Debug  ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Closing the socket
@simonmichal
Copy link
Contributor

Hi Mats,

What's the version of xrdcp you were using?
A similar problem has been reported in #625 and fixed in 4e8d6c9
This fix is available only in 4.8.0.

Long story short, before 4.8.0, there was a single Connection Window for all the interfaces. The default size of the connection window was equal to the default connection timeout, meaning that by default there was time only to try one interface. This has been fixed in 4.8.0, and now the connection window is applied per interface.

BTW, from what I see you are running with number of retries set to 1, right?

Cheers,
Michal

@rynge
Copy link
Author

rynge commented Jan 16, 2018

The problem sounds like the same one, but we have 4.8.0:

$ xrdcp -V
v4.8.0

This comes from the OSG package xrootd-client-4.8.0-1.osg34.el6.x86_64

@simonmichal
Copy link
Contributor

OK, after a careful examination I can see that your problem is quite different than the one in #625
Actually in your case the IPv6 address is garbage from the the client point of view:

[2018-01-12 16:54:34.312329 +0000][Error  ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Unable to initiate the connection: [ERROR] Socket error: Network is unreachable
[2018-01-12 16:54:34.312337 +0000][Error  ][XRootD            ] [hcc-marian.unl.edu:1094] Unable to send the message kXR_open (file: /tmp/test.1M, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ): [FATAL] Socket error: Network is unreachable

Currently, this is considered by XRootD client as a fatal and unrecoverable error, it has nothing to do with the Connection Window.

I'll provide a complementary patch.

BTW

I tried running:
xrdcp -d 3 -f root://hcc-marian.unl.edu//tmp/test.1M /dev/null

and now it works just fine, so I suppose you guys have fixed your setup,
could you undo the fix so I can test the patch?

Michal

@bbockelm
Copy link
Contributor

Hi Michal,

The problem is client-side above. The job landed on a host with a valid IPv6 public address -- but the network the host is attached to is not actually routing IPv6 packets.

I think the request is to have the Xrootd client behave a bit more like wget (too often, the sysadmin does wget https://foo.com and goes "yup, it works").

wget will try the addresses in the order returned by getaddrinfo on the host. In this case, the IPv4 addresses are returned first, then IPv6 addresses.

Since Xrootd actually wants to randomize address order (as this is how load-balancing is implemented), the idea is to randomize only within address families.

So, if there are two IPv4 addresses followed by two IPv6 addresses, the randomization should be applied first to the two IPv4 addresses then the two IPv6 addresses - leaving the ordering of the address families the same.

Brian

@simonmichal
Copy link
Contributor

Hi Brian,

By policy we try first IPv6, and I don't think that only because of wget (which is used by lazy sysadmin), we should change our ways ;-)
I don't agree also that this is a pure client-side problem, as the DNS returned an IP address that cannot be reached (simple posix connect fails), so I would rather say that your system is miss configured ;-)

On the other hand, I don't see why the client couldn't give it a try and use the next IP address in line.

Michal

@bbockelm
Copy link
Contributor

Well, I certainly agree with the principle -- it's silly to deploy an IPv6 address but not the corresponding network. Further, IIRC, the various RFCs clearly state that IPv6 addresses should be tried first. It's also been how I've advocated to approach the problem previously.

However, I think Mats has convinced me that the correct way may not be the best way because:

  • The end-user shouldn't be punished at $REMOTE_GRID_SITE_XYZ because of a misconfigured worker node. Currently, the only workaround is to tell the user to ban the site (undesirable) or unconditionally disable IPv6 (also undesirable).
  • It'd be useful to have the behavior hew closer to other common CLI clients to avoid admins recommending "use curl instead of xrdcp because it works more often". In this case, we could always recommend the user to utilize wget / curl to talk to the Xrootd server for the purpose of reliability ... which would just be a bizarre thing to say!

@simonmichal
Copy link
Contributor

My proposition is to modify the client so it is not aborting after this kind of error, instead the client should try all the IP addresses on the list (this way IPv6 still gets tried first, and if it's bogus we move to the next entry, possibly IPv4).

I suppose this should make you guys happy, wouldn't it? ;-)

@rynge : could you provide a host were I can test a patch?

@rynge
Copy link
Author

rynge commented Jan 16, 2018

@simonmichal - Yes, we left the host as it was to be able to test. Please email me at rynge@isi.edu your preferred username and and ssh key, and will get you an account.

@simonmichal
Copy link
Contributor

Perfect :-)

@simonmichal
Copy link
Contributor

I have tested the patch, and it looks all good:

8-01-18 10:26:42.227838 +0000][Debug  ][PostMaster        ] Creating new channel to: hcc-marian.unl.edu:1094 1 stream(s)
[2018-01-18 10:26:42.227856 +0000][Debug  ][PostMaster        ] [hcc-marian.unl.edu:1094 #0] Stream parameters: Network Stack: IPAuto, Connection Window: 120, ConnectionRetry: 5, Stream Error Widnow: 1800
[2018-01-18 10:26:42.227938 +0000][Debug  ][TaskMgr           ] Registering task: "TickGeneratorTask for: hcc-marian.unl.edu:1094" to be run at: [2018-01-18 10:26:57 +0000]
[2018-01-18 10:26:42.227963 +0000][Dump   ][PostMaster        ] [hcc-marian.unl.edu:1094 #0] Sending message kXR_open (file: /tmp/test.1M, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ) (0x21fd4c0) through substream 0 expecting answer at 0
[2018-01-18 10:26:42.228342 +0000][Debug  ][PostMaster        ] [hcc-marian.unl.edu:1094] Found 2 address(es): [2600:900:6:1301:5054:ff:fe1c:ccec]:1094, [::ffff:129.93.244.231]:1094
[2018-01-18 10:26:42.228367 +0000][Debug  ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Attempting connection to [2600:900:6:1301:5054:ff:fe1c:ccec]:1094
[2018-01-18 10:26:42.228387 +0000][Error  ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Unable to initiate the connection: [ERROR] Socket error: Network is unreachable
[2018-01-18 10:26:42.228399 +0000][Debug  ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Attempting connection to [::ffff:129.93.244.231]:1094
[2018-01-18 10:26:42.228419 +0000][Debug  ][Poller            ] Adding socket 0x21fe140 to the poller
[2018-01-18 10:26:42.250404 +0000][Debug  ][AsyncSock         ] [hcc-marian.unl.edu:1094 #0.0] Async connection call returned
[2018-01-18 10:26:42.250455 +0000][Debug  ][XRootDTransport   ] [hcc-marian.unl.edu:1094 #0.0] Sending out the initial hand shake + kXR_protocol

I also added a new envar XRD_PREFERIPV4 (by default set to 0), if set the client will try first with IPv4.

I suppose I can close this one :-)

@zvada
Copy link

zvada commented Jan 18, 2018

Michal, just curiosity, what differs between XRD_PREFERIPV4 versus XRD_NETWORKSTACK=IPv4? That's what actually worked when we used XRD_NETWORKSTACK=IPv4 xrdcp ... from Mat's client.

@xrootd-dev
Copy link

xrootd-dev commented Jan 18, 2018 via email

@zvada
Copy link

zvada commented Jan 19, 2018

Thanks Andy, well explained!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants