Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server reports all pages with bad checksum when first page of a file is corrupted #1864

Closed
amadio opened this issue Dec 14, 2022 · 0 comments · Fixed by #1865
Closed

Server reports all pages with bad checksum when first page of a file is corrupted #1864

amadio opened this issue Dec 14, 2022 · 0 comments · Fixed by #1865
Assignees

Comments

@amadio
Copy link
Member

amadio commented Dec 14, 2022

This bug was discovered by @simonmichal, and can be reproduced by applying the change below on the client, then trying to copy a file with xrdcp:

--- a/src/XrdCl/XrdClFileStateHandler.cc
+++ b/src/XrdCl/XrdClFileStateHandler.cc
@@ -1431,6 +1431,8 @@ namespace XrdCl
           }
         } );
 
+    cksums.front() = 0;
+
     auto st = PgWriteImpl( self, offset, size, buffer, cksums, 0, h, timeout );
     if( !st.IsOK() )
     {

The change above just sets the checksum of the first page to 0, but the server reports all pages having a bad checksum. After some debugging, we reached the conclusion that there is a bug in XrdOucPgrwUtils::csVer (shown below):

bool XrdOucPgrwUtils::csVer(dataInfo &dInfo, off_t &bado, int &badc)
{
   int pgOff = dInfo.offs & pgPageMask;

// Make sure we have something to do
//
   fprintf(stderr, "dInfo: %p, %ld %d\n", dInfo.data, dInfo.offs, dInfo.count);

   if (dInfo.count <= 0) return true;

// If this is unaligned, the we must verify the checksum of the leading bytes
// to align them to the next page boundary if one exists.
//
   if (pgOff)
      {off_t tempsave;
       int chkLen = pgPageSize - pgOff;
       if (dInfo.count < chkLen) {chkLen = dInfo.count; dInfo.count = 0;}
          else dInfo.count -= chkLen;

       bool aOK = XrdOucCRC::Ver32C((void *)dInfo.data, chkLen, dInfo.csval[0]);

       dInfo.data += chkLen;
       tempsave    = dInfo.offs;
       dInfo.offs += chkLen;
       dInfo.csval++;

       if (!aOK)
          {bado = tempsave;
           badc = chkLen;
           return false;
          }
      }

// Verify the remaining checksums, if any are left (offset is page aligned)
//
   if (dInfo.count > 0)
      {uint32_t valcs;
       int pgNum = XrdOucCRC::Ver32C((void *)dInfo.data,  dInfo.count,
                                             dInfo.csval, valcs);
       if (pgNum >= 0)
          {bado = dInfo.offs + (pgPageSize * pgNum);
           int xlen = (bado - dInfo.offs);
           dInfo.offs  += xlen;
           dInfo.count -= xlen;
           badc = (dInfo.count <= pgPageSize ? dInfo.count : pgPageSize);
           dInfo.offs  += badc;
           dInfo.count -= badc;
           dInfo.csval += (pgNum+1);
           return false;
          }
      }

// All sent well
//
   return true;
}

When if (pgNum >=0) is entered, dInfo.csval is updated, but the pointer dInfo.data remains the same, so subsequent pages are checked against the wrong checksum, which results in bad offsets being reported for all pages.

amadio added a commit to amadio/xrootd that referenced this issue Dec 14, 2022
amadio added a commit to amadio/xrootd that referenced this issue Dec 14, 2022
amadio added a commit to amadio/xrootd that referenced this issue Dec 15, 2022
amadio added a commit to amadio/xrootd that referenced this issue Dec 15, 2022
amadio added a commit to amadio/xrootd that referenced this issue Dec 16, 2022
amadio added a commit to amadio/xrootd that referenced this issue Dec 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants