Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Erratic file/directory creation when using xrootd mounted via fuse #192

Closed
malleite opened this issue Jan 15, 2015 · 5 comments
Closed

Erratic file/directory creation when using xrootd mounted via fuse #192

malleite opened this issue Jan 15, 2015 · 5 comments
Assignees

Comments

@malleite
Copy link

Hi,

I have seen an issue when using xrootd fs mounted via automount. I have one redirector and 8 data servers.

When trying to create a directory or a file via mkdir or touch (or cp) I get the following behavior :

touch 8TeV
(1 or 2 seconds after cames the message):
touch: setting times of `8TeV': No such file or directory

then I try again with mkdir :

mkdir 8TeV
mkdir: cannot create directory `8TeV': File exists

An ls -l of the directory shows no sign of 8TeV file or directory created, even when I unmount and mount again the directory.

Then I look on what is in there, this time using xrd connected to the redirector
root://lipnode01:1094//> dirlist /data/xrootdfs/SM
Error 3011: Unable to open directory /data/xrootdfs/SM; no such file or directory

In server lipnode01:1094 or in some of its child nodes.
drwx(051) 4096 2015-01-15 20:34:14 /data/xrootdfs/SM/8TeVxx
drwx(051) 4096 2015-01-15 20:30:31 /data/xrootdfs/SM/8TeV
drwx(051) 4096 2015-01-15 20:32:34 /data/xrootdfs/SM/8TeVx
drwx(051) 4096 2014-12-10 15:02:59 /data/xrootdfs/SM/WWW
drwx(051) 4096 2014-11-03 02:21:04 /data/xrootdfs/SM/Wplusenu
drwx(051) 4096 2014-11-30 02:51:57 /data/xrootdfs/SM/DAOD_STDM4
drwx(051) 4096 2015-01-14 19:25:18 /data/xrootdfs/SM/13TeV
drwx(051) 4096 2015-01-14 19:26:15 /data/xrootdfs/SM/13TeV
drwx(051) 4096 2014-11-30 06:02:39 /data/xrootdfs/SM/Wplusenu
drwx(051) 4096 2014-11-30 02:52:02 /data/xrootdfs/SM/DAOD_STDM4
drwx(051) 4096 2014-11-18 21:58:12 /data/xrootdfs/SM/Wminenu
drwx(051) 4096 2014-11-27 15:58:04 /data/xrootdfs/SM/DAOD_STDM4
drwx(051) 4096 2015-01-14 19:23:12 /data/xrootdfs/SM/13TeV
drwx(051) 4096 2014-11-30 02:51:07 /data/xrootdfs/SM/Wminenu
drwx(051) 4096 2014-11-30 06:00:24 /data/xrootdfs/SM/Wplusenu

And I see the 8TeV(x,xx are other tentatives) directories in there.

The mounting options for the autofs are as bellow :
/atlas -fstype=fuse,rw,uid=496,allow_other,rdr=root://lipnode01:1094//data/xrootdfs :xrootdfs.sh

I saw this behavior in xrootd 4.0.4, and just migrated to 4.1.1 and I still have the same problem.
This is running on a stock SLC6 system. Network is fast enough as I can systematically move files around with xrdcp at about 90MB/s. I suspect this has to do with some timeout, but I am not sure, as this is kind of intermittent (sometimes the systems works fine !)

Thanks for any clues ...

Marco

@xrootd-dev
Copy link

Hi Marco,

usually this happens when there is a data server node that is known by the redirector but not known by the xrootdfs. To see if this is the case, can you do the following on the machine that run xrootdfs?

xrd lipnode01:1094 locateall /data/xrootdfs (this give you the list of data servers known by the redirector)
getfattr -n xrootdfs.fs.dataserverlist —only-value /atlas (this give you the list of data servers known by xrootdfs)

xrd lipnode01:1094 locateall /data/xrootdfs/SM/8TeV (this tells you which data server(s) have /data/xrootdfs/SM/8TeV)

regards,
Wei Yang | yangw@slac.stanford.edu | 1-650-926-3338

On Jan 15, 2015, at 1:13 PM, malleite notifications@github.com wrote:

Hi,

I have seen an issue when using xrootd fs mounted via automount. I have one redirector and 8 data servers.

When trying to create a directory or a file via mkdir or touch (or cp) I get the following behavior :

touch 8TeV
(1 or 2 seconds after cames the message):
touch: setting times of `8TeV': No such file or directory

then I try again with mkdir :

mkdir 8TeV
mkdir: cannot create directory `8TeV': File exists

An ls -l of the directory shows no sign of 8TeV file or directory created, even when I unmount and mount again the directory.

Then I look on what is in there, this time using xrd connected to the redirector
root://lipnode01:1094//> dirlist /data/xrootdfs/SM
Error 3011: Unable to open directory /data/xrootdfs/SM; no such file or directory

In server lipnode01:1094 or in some of its child nodes.
drwx(051) 4096 2015-01-15 20:34:14 /data/xrootdfs/SM/8TeVxx
drwx(051) 4096 2015-01-15 20:30:31 /data/xrootdfs/SM/8TeV
drwx(051) 4096 2015-01-15 20:32:34 /data/xrootdfs/SM/8TeVx
drwx(051) 4096 2014-12-10 15:02:59 /data/xrootdfs/SM/WWW
drwx(051) 4096 2014-11-03 02:21:04 /data/xrootdfs/SM/Wplusenu
drwx(051) 4096 2014-11-30 02:51:57 /data/xrootdfs/SM/DAOD_STDM4
drwx(051) 4096 2015-01-14 19:25:18 /data/xrootdfs/SM/13TeV
drwx(051) 4096 2015-01-14 19:26:15 /data/xrootdfs/SM/13TeV
drwx(051) 4096 2014-11-30 06:02:39 /data/xrootdfs/SM/Wplusenu
drwx(051) 4096 2014-11-30 02:52:02 /data/xrootdfs/SM/DAOD_STDM4
drwx(051) 4096 2014-11-18 21:58:12 /data/xrootdfs/SM/Wminenu
drwx(051) 4096 2014-11-27 15:58:04 /data/xrootdfs/SM/DAOD_STDM4
drwx(051) 4096 2015-01-14 19:23:12 /data/xrootdfs/SM/13TeV
drwx(051) 4096 2014-11-30 02:51:07 /data/xrootdfs/SM/Wminenu
drwx(051) 4096 2014-11-30 06:00:24 /data/xrootdfs/SM/Wplusenu

And I see the 8TeV(x,xx are other tentatives) directories in there.

The mounting options for the autofs are as bellow :
/atlas -fstype=fuse,rw,uid=496,allow_other,rdr=root://lipnode01:1094//data/xrootdfs :xrootdfs.sh

I saw this behavior in xrootd 4.0.4, and just migrated to 4.1.1 and I still have the same problem.
This is running on a stock SLC6 system. Network is fast enough as I can systematically move files around with xrdcp at about 90MB/s. I suspect this has to do with some timeout, but I am not sure, as this is kind of intermittent (sometimes the systems works fine !)

Thanks for any clues ...

Marco


Reply to this email directly or view it on GitHub.

Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1

@malleite
Copy link
Author

Hi Wei,

Thanks for the fast feedback !

I increased the timeout, and it seems to help (but as I observed, the behavior is
intermittent, so I am not sure if this is a remedy) :
/atlas -fstype=fuse,rw,uid=496,allow_other,entry_timeout=8,attr_timeout=8,
debug,rdr=root://lipnode01:1094//data/xrootdfs :xrootdfs.sh

Thanks,
Marco

Here we go

[root@lipnode01 leite]# xrd lipnode01:1094 locateall /data/xrootdfs

------------- Location #1
InfoType: kXrdcLocDataServer
CanWrite: true
Location: '192.168.0.9:1094'
------------- Location #2
InfoType: kXrdcLocDataServer
CanWrite: true
Location: '192.168.0.2:1094'
------------- Location #3
InfoType: kXrdcLocDataServer
CanWrite: true
Location: '192.168.0.5:1094'
------------- Location #4
InfoType: kXrdcLocDataServer
CanWrite: true
Location: '192.168.0.6:1094'
------------- Location #5
InfoType: kXrdcLocDataServer
CanWrite: true
Location: '192.168.0.3:1094'
------------- Location #6
InfoType: kXrdcLocDataServer
CanWrite: true
Location: '192.168.0.7:1094'
------------- Location #7
InfoType: kXrdcLocDataServer
CanWrite: true
Location: '192.168.0.8:1094'
------------- Location #8
InfoType: kXrdcLocDataServer
CanWrite: true
Location: '192.168.0.4:1094'

root@lipnode01 leite]# getfattr -n xrootdfs.fs.dataserverlist —only-value /atlas
getfattr: —only-value: No such file or directory
getfattr: Removing leading '/' from absolute path names

file: atlas

xrootdfs.fs.dataserverlist="lipnode09:1094\012lipnode02:1094\012lipnode05:1094\012lipnode06:1094\012lipnode03:1094\012lipnode07:1094\012lipnode08:1094\012lipnode04:1094\012"

[root@lipnode01 leite]# xrd lipnode01:1094 locateall /data/xrootdfs/SM/8TeV

------------- Location #1
InfoType: kXrdcLocDataServer
CanWrite: true
Location: '192.168.0.9:1094'
------------- Location #2
InfoType: kXrdcLocDataServer
CanWrite: true
Location: '192.168.0.2:1094'
------------- Location #3
InfoType: kXrdcLocDataServer
CanWrite: true
Location: '192.168.0.5:1094'
------------- Location #4
InfoType: kXrdcLocDataServer
CanWrite: true
Location: '192.168.0.3:1094'
------------- Location #5
InfoType: kXrdcLocDataServer
CanWrite: true
Location: '192.168.0.4:1094'

@wyang007
Copy link
Member

Thanks Marco. We however can’t repeat this error. Does any of your machines (data servers, redirector, host running xrootdfs) have multiple network interfaces. On the host running xrootdfs, can you ssh to lipnode02-9?

Wei Yang | yangw@slac.stanford.edu | 1-650-926-3338

On Jan 15, 2015, at 3:54 PM, malleite notifications@github.com wrote:

Hi Wei,

Thanks for the fast feedback !

I increased the timeout, and it seems to help (but as I observed, the behavior is
intermittent, so I am not sure if this is a remedy) :
/atlas -fstype=fuse,rw,uid=496,allow_other,entry_timeout=8,attr_timeout=8,
debug,rdr=root://lipnode01:1094//data/xrootdfs :xrootdfs.sh

Thanks,
Marco

Here we go

[root@lipnode01 leite]# xrd lipnode01:1094 locateall /data/xrootdfs

------------- Location #1
InfoType: kXrdcLocDataServer
CanWrite: true
Location: '192.168.0.9:1094'
------------- Location #2
InfoType: kXrdcLocDataServer
CanWrite: true
Location: '192.168.0.2:1094'
------------- Location #3
InfoType: kXrdcLocDataServer
CanWrite: true
Location: '192.168.0.5:1094'
------------- Location #4
InfoType: kXrdcLocDataServer
CanWrite: true
Location: '192.168.0.6:1094'
------------- Location #5
InfoType: kXrdcLocDataServer
CanWrite: true
Location: '192.168.0.3:1094'
------------- Location #6
InfoType: kXrdcLocDataServer
CanWrite: true
Location: '192.168.0.7:1094'
------------- Location #7
InfoType: kXrdcLocDataServer
CanWrite: true
Location: '192.168.0.8:1094'
------------- Location #8
InfoType: kXrdcLocDataServer
CanWrite: true
Location: '192.168.0.4:1094'

root@lipnode01 leite]# getfattr -n xrootdfs.fs.dataserverlist —only-value /atlas
getfattr: —only-value: No such file or directory
getfattr: Removing leading '/' from absolute path names

file: atlas

xrootdfs.fs.dataserverlist="lipnode09:1094\012lipnode02:1094\012lipnode05:1094\012lipnode06:1094\012lipnode03:1094\012lipnode07:1094\012lipnode08:1094\012lipnode04:1094\012"

[root@lipnode01 leite]# xrd lipnode01:1094 locateall /data/xrootdfs/SM/8TeV

------------- Location #1
InfoType: kXrdcLocDataServer
CanWrite: true
Location: '192.168.0.9:1094'
------------- Location #2
InfoType: kXrdcLocDataServer
CanWrite: true
Location: '192.168.0.2:1094'
------------- Location #3
InfoType: kXrdcLocDataServer
CanWrite: true
Location: '192.168.0.5:1094'
------------- Location #4
InfoType: kXrdcLocDataServer
CanWrite: true
Location: '192.168.0.3:1094'
------------- Location #5
InfoType: kXrdcLocDataServer
CanWrite: true
Location: '192.168.0.4:1094'


Reply to this email directly or view it on GitHub.

@malleite
Copy link
Author

Thanks Wei.
Yes, they have a dual Gb interface bonded to provide load balancing.
I can connect to any machine without problem, and all machines are pratically idle now (no load or network traffic, and they are in a private network).

Thanks,
Marco

@wyang007
Copy link
Member

I guess this ticket is too old. If the problem persist, please re-open this ticket or open a new ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants