New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large-ish data set; REBATCH breaks MTZ file which causes POINTLESS to choke #155

Closed
graeme-winter opened this Issue Jul 6, 2017 · 12 comments

Comments

Projects
None yet
2 participants
@graeme-winter
Contributor

graeme-winter commented Jul 6, 2017

xia2 pipeline=dials on 1200 degree data set;

<a name="outREBATCH"><h3>Output File</h3></a>
<!--SUMMARY_END--></FONT></B>
Error: file on 1 has no batch 1 ! 
 !!!! Orientation data for batch       2 found when expecting data for batch       1
<B><FONT COLOR="#FF0000"><!--SUMMARY_BEGIN-->
 REBATCH:  *** Invalid orientation data ***
 REBATCH:  *** Invalid orientation data ***
Times: User:       0.8s System:    0.0s Elapsed:     0:01  

looks to be root cause of error

  • why was this not trapped?
  • how can it be avoided

Part of process was chopping down data set to something smaller for analysis:

      0     0     0     0     0

 * Space group = 'P222' (number     16)

 Data line--- batch 0 to 1 reject
 Data line--- batch 1802 to 1000000 reject
<B><FONT COLOR="#FF0000"><!--SUMMARY_BEGIN-->
@graeme-winter

This comment has been minimized.

Show comment
Hide comment
@graeme-winter

graeme-winter Jul 6, 2017

Contributor

Assert: this is not a bug in xia2 or REBATCH (I think)

Contributor

graeme-winter commented Jul 6, 2017

Assert: this is not a bug in xia2 or REBATCH (I think)

@graeme-winter

This comment has been minimized.

Show comment
Hide comment
@graeme-winter

graeme-winter Jul 6, 2017

Contributor

FALSE:

>>> from dials.array_family import flex
>>> import cPickle as pickle
>>> f = open('14_integrated.pickle', 'rb')
>>> p = pickle.load(f)
>>> xyzcal = p['xyzcal.px']
>>> z = xyzcal.parts()[2]
>>> flex.min(z)
-13.370289332707275
>>> flex.max(z)
12009.996905019354

i.e. reflections are correctly predicted outside the scan range... however

mtzdump dials_integrated.mtz:

 Col Sort    Min    Max    Num      %     Mean     Mean   Resolution   Type Column
 num order               Missing complete          abs.   Low    High       label 

   1 NONE     0      36      0  100.00     15.0     15.0  43.70   1.21   H  H
   2 NONE     0      39      0  100.00     16.6     16.6  43.70   1.21   H  K
   3 NONE     0      49      0  100.00     18.2     18.2  43.70   1.21   H  L
   4 NONE     1       8      0  100.00      4.4      4.4  43.70   1.21   Y  M/ISYM
   5 ASC      4   11999      0  100.00   6000.7   6000.7  43.70   1.21   B  BATCH
   6 NONE   -1.6   894.4     0  100.00    14.29    14.30  43.70   1.21   J  IPR
   7 NONE    0.0    16.5     0  100.00     1.92     1.92  43.70   1.21   Q  SIGIPR
   8 NONE   -5.8   983.2     0  100.00    15.97    16.10  43.70   1.21   J  I
   9 NONE    0.0    17.4     0  100.00     2.38     2.38  43.70   1.21   Q  SIGI
  10 NONE    0.0   140.2     0  100.00     6.31     6.31  43.70   1.21   R  BG
  11 NONE    0.0    13.4     0  100.00     2.60     2.60  43.70   1.21   R  SIGBG
  12 BOTH    1.0     1.0     0  100.00     1.00     1.00  43.70   1.21   R  FRACTIONCALC
  13 NONE    7.0  2062.9     0  100.00  1032.76  1032.76  43.70   1.21   R  XDET
  14 NONE    6.9  2160.1     0  100.00  1084.28  1084.28  43.70   1.21   R  YDET
  15 NONE    0.1  1199.7     0  100.00   599.67   599.67  43.70   1.21   R  ROT
  16 NONE    0.0     0.7     0  100.00     0.44     0.44  43.70   1.21   R  LP
  17 NONE    0.8     0.9     0  100.00     0.87     0.87  43.70   1.21   R  DQE

got some empty batches at the start, none at end?

Contributor

graeme-winter commented Jul 6, 2017

FALSE:

>>> from dials.array_family import flex
>>> import cPickle as pickle
>>> f = open('14_integrated.pickle', 'rb')
>>> p = pickle.load(f)
>>> xyzcal = p['xyzcal.px']
>>> z = xyzcal.parts()[2]
>>> flex.min(z)
-13.370289332707275
>>> flex.max(z)
12009.996905019354

i.e. reflections are correctly predicted outside the scan range... however

mtzdump dials_integrated.mtz:

 Col Sort    Min    Max    Num      %     Mean     Mean   Resolution   Type Column
 num order               Missing complete          abs.   Low    High       label 

   1 NONE     0      36      0  100.00     15.0     15.0  43.70   1.21   H  H
   2 NONE     0      39      0  100.00     16.6     16.6  43.70   1.21   H  K
   3 NONE     0      49      0  100.00     18.2     18.2  43.70   1.21   H  L
   4 NONE     1       8      0  100.00      4.4      4.4  43.70   1.21   Y  M/ISYM
   5 ASC      4   11999      0  100.00   6000.7   6000.7  43.70   1.21   B  BATCH
   6 NONE   -1.6   894.4     0  100.00    14.29    14.30  43.70   1.21   J  IPR
   7 NONE    0.0    16.5     0  100.00     1.92     1.92  43.70   1.21   Q  SIGIPR
   8 NONE   -5.8   983.2     0  100.00    15.97    16.10  43.70   1.21   J  I
   9 NONE    0.0    17.4     0  100.00     2.38     2.38  43.70   1.21   Q  SIGI
  10 NONE    0.0   140.2     0  100.00     6.31     6.31  43.70   1.21   R  BG
  11 NONE    0.0    13.4     0  100.00     2.60     2.60  43.70   1.21   R  SIGBG
  12 BOTH    1.0     1.0     0  100.00     1.00     1.00  43.70   1.21   R  FRACTIONCALC
  13 NONE    7.0  2062.9     0  100.00  1032.76  1032.76  43.70   1.21   R  XDET
  14 NONE    6.9  2160.1     0  100.00  1084.28  1084.28  43.70   1.21   R  YDET
  15 NONE    0.1  1199.7     0  100.00   599.67   599.67  43.70   1.21   R  ROT
  16 NONE    0.0     0.7     0  100.00     0.44     0.44  43.70   1.21   R  LP
  17 NONE    0.8     0.9     0  100.00     0.87     0.87  43.70   1.21   R  DQE

got some empty batches at the start, none at end?

@graeme-winter

This comment has been minimized.

Show comment
Hide comment
@graeme-winter

graeme-winter Jul 6, 2017

Contributor

Assert: number of times BATCH number incremented in export_mtz > 1 i.e. we increment to prevent batch == 0 too many times (and possibly inconsistent number of times, too)

Contributor

graeme-winter commented Jul 6, 2017

Assert: number of times BATCH number incremented in export_mtz > 1 i.e. we increment to prevent batch == 0 too many times (and possibly inconsistent number of times, too)

@graeme-winter graeme-winter self-assigned this Jul 6, 2017

@graeme-winter

This comment has been minimized.

Show comment
Hide comment
Contributor

graeme-winter commented Jul 6, 2017

@graeme-winter

This comment has been minimized.

Show comment
Hide comment
@graeme-winter

graeme-winter Jul 6, 2017

Contributor

Having "fixed" this now get


<a name="outREBATCH"><h3>Output File</h3></a>
<!--SUMMARY_END--></FONT></B>

 *** Wrong batch! batch number in file is       3, batch number from internal lookup        1

<B><FONT COLOR="#FF0000"><!--SUMMARY_BEGIN-->
 REBATCH:   *** Crazy batch number ***
 REBATCH:   *** Crazy batch number ***
Times: User:       0.8s System:    0.0s Elapsed:     0:01  
</pre>
</html>
<!--SUMMARY_END--></FONT></B>
Contributor

graeme-winter commented Jul 6, 2017

Having "fixed" this now get


<a name="outREBATCH"><h3>Output File</h3></a>
<!--SUMMARY_END--></FONT></B>

 *** Wrong batch! batch number in file is       3, batch number from internal lookup        1

<B><FONT COLOR="#FF0000"><!--SUMMARY_BEGIN-->
 REBATCH:   *** Crazy batch number ***
 REBATCH:   *** Crazy batch number ***
Times: User:       0.8s System:    0.0s Elapsed:     0:01  
</pre>
</html>
<!--SUMMARY_END--></FONT></B>
@graeme-winter

This comment has been minimized.

Show comment
Hide comment
@graeme-winter

graeme-winter Jul 6, 2017

Contributor

Deeper investigation of this suggests that REBATCH is simply ... broken. Reimplement REBATCH in cctbx Python? #discuss

Contributor

graeme-winter commented Jul 6, 2017

Deeper investigation of this suggests that REBATCH is simply ... broken. Reimplement REBATCH in cctbx Python? #discuss

@graeme-winter

This comment has been minimized.

Show comment
Hide comment
@graeme-winter

graeme-winter Jul 6, 2017

Contributor

Looks like I use REBATCH for:

          rb.set_hklin(hklin)
          rb.set_first_batch(counter * max_batches + 1)
          rb.set_project_info(pname, xname, dname)
          rb.set_hklout(hklout)

Quite possibly this is all; can code this up in Python

Contributor

graeme-winter commented Jul 6, 2017

Looks like I use REBATCH for:

          rb.set_hklin(hklin)
          rb.set_first_batch(counter * max_batches + 1)
          rb.set_project_info(pname, xname, dname)
          rb.set_hklout(hklout)

Quite possibly this is all; can code this up in Python

@graeme-winter

This comment has been minimized.

Show comment
Hide comment
@graeme-winter

graeme-winter Jul 6, 2017

Contributor

Exhibit 'A':

first = 12345
pname = 'pname'
xname = 'xname'
dname = 'dname'

from iotbx import mtz

m = mtz.object('hklin.mtz')
for c in m.crystals():
  if c.name() == 'HKL_base':
    continue
  c.set_project_name(pname)
  c.set_name(xname)

batches = [b.num() for b in m.batches()]

start = min(batches)

for b in m.batches():
  b.set_num(b.num() - start + first)

batch_col = m.get_column('BATCH')
batch_vals = batch_col.extract_values()
batch_vals += (first - start)
batch_col.set_values(batch_vals)
batch_col.mtz_dataset().set_name(dname)

m.write('hklout.mtz')
Contributor

graeme-winter commented Jul 6, 2017

Exhibit 'A':

first = 12345
pname = 'pname'
xname = 'xname'
dname = 'dname'

from iotbx import mtz

m = mtz.object('hklin.mtz')
for c in m.crystals():
  if c.name() == 'HKL_base':
    continue
  c.set_project_name(pname)
  c.set_name(xname)

batches = [b.num() for b in m.batches()]

start = min(batches)

for b in m.batches():
  b.set_num(b.num() - start + first)

batch_col = m.get_column('BATCH')
batch_vals = batch_col.extract_values()
batch_vals += (first - start)
batch_col.set_values(batch_vals)
batch_col.mtz_dataset().set_name(dname)

m.write('hklout.mtz')

graeme-winter added a commit that referenced this issue Jul 7, 2017

graeme-winter added a commit that referenced this issue Jul 7, 2017

@graeme-winter

This comment has been minimized.

Show comment
Hide comment
@graeme-winter

graeme-winter Jul 7, 2017

Contributor
For AUTOMATIC/DEFAULT/NATIVE                 Overall    Low     High
High resolution limit                           1.21    3.28    1.21
Low resolution limit                           43.70   43.73    1.23
Completeness                                   75.8   100.0     9.6
Multiplicity                                   41.1    45.4     9.8
I/sigma                                        23.8    84.0     1.9
Rmerge(I)                                     0.114   0.046   0.726
Rmerge(I+/-)                                  0.113   0.045   0.696
Rmeas(I)                                      0.115   0.046   0.763
Rmeas(I+/-)                                   0.115   0.046   0.754
Rpim(I)                                       0.017   0.007   0.226
Rpim(I+/-)                                    0.024   0.009   0.285
CC half                                       1.000   1.000   0.827
Wilson B factor                               5.349
Anomalous completeness                         74.8   100.0     5.6
Anomalous multiplicity                         21.6    25.5     6.4
Anomalous correlation                         0.068   0.153   0.130
Anomalous slope                               0.964
Total observations                          2009551  158475    3004
Total unique                                  48934    3489     306
Assuming spacegroup: P 21 21 21
Unit cell (with estimated std devs):
54.20323(12) 57.92484(13) 66.58784(16)
90.0         90.0         90.0        

works :)

Contributor

graeme-winter commented Jul 7, 2017

For AUTOMATIC/DEFAULT/NATIVE                 Overall    Low     High
High resolution limit                           1.21    3.28    1.21
Low resolution limit                           43.70   43.73    1.23
Completeness                                   75.8   100.0     9.6
Multiplicity                                   41.1    45.4     9.8
I/sigma                                        23.8    84.0     1.9
Rmerge(I)                                     0.114   0.046   0.726
Rmerge(I+/-)                                  0.113   0.045   0.696
Rmeas(I)                                      0.115   0.046   0.763
Rmeas(I+/-)                                   0.115   0.046   0.754
Rpim(I)                                       0.017   0.007   0.226
Rpim(I+/-)                                    0.024   0.009   0.285
CC half                                       1.000   1.000   0.827
Wilson B factor                               5.349
Anomalous completeness                         74.8   100.0     5.6
Anomalous multiplicity                         21.6    25.5     6.4
Anomalous correlation                         0.068   0.153   0.130
Anomalous slope                               0.964
Total observations                          2009551  158475    3004
Total unique                                  48934    3489     306
Assuming spacegroup: P 21 21 21
Unit cell (with estimated std devs):
54.20323(12) 57.92484(13) 66.58784(16)
90.0         90.0         90.0        

works :)

@graeme-winter

This comment has been minimized.

Show comment
Hide comment
@graeme-winter

graeme-winter Jul 7, 2017

Contributor

Longer term: should start replacing all the ccp4 jiffy program junk with cctbx Python; less code than calling external programs...

Contributor

graeme-winter commented Jul 7, 2017

Longer term: should start replacing all the ccp4 jiffy program junk with cctbx Python; less code than calling external programs...

@graeme-winter

This comment has been minimized.

Show comment
Hide comment
@graeme-winter

graeme-winter Jul 7, 2017

Contributor

Still working on this one. Had issues with the removal of blank images; now working; many cherries annoyingly though

[gw56@ws133 xia2]$ git push
Counting objects: 22, done.
Delta compression using up to 24 threads.
Compressing objects: 100% (16/16), done.
Writing objects: 100% (16/16), 1.68 KiB, done.
Total 16 (delta 13), reused 0 (delta 0)
remote: Resolving deltas: 100% (13/13), completed with 6 local objects.
To git@github.com:xia2/xia2.git
   43f6b08..03c0407  master -> master
Contributor

graeme-winter commented Jul 7, 2017

Still working on this one. Had issues with the removal of blank images; now working; many cherries annoyingly though

[gw56@ws133 xia2]$ git push
Counting objects: 22, done.
Delta compression using up to 24 threads.
Compressing objects: 100% (16/16), done.
Writing objects: 100% (16/16), 1.68 KiB, done.
Total 16 (delta 13), reused 0 (delta 0)
remote: Resolving deltas: 100% (13/13), completed with 6 local objects.
To git@github.com:xia2/xia2.git
   43f6b08..03c0407  master -> master
@graeme-winter

This comment has been minimized.

Show comment
Hide comment
@graeme-winter

graeme-winter Jul 7, 2017

Contributor

@Anthchirp + release notes for next point release; fixes for Eiger data sets with > 10,000 frames

Contributor

graeme-winter commented Jul 7, 2017

@Anthchirp + release notes for next point release; fixes for Eiger data sets with > 10,000 frames

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment