Tom White
tomwhite

Organizations

@cloudera @jclouds
Jul 29, 2016
@tomwhite
Jul 29, 2016
@tomwhite
Jul 29, 2016
@tomwhite
Jul 29, 2016
@tomwhite
Jul 29, 2016
@tomwhite
Jul 29, 2016
Jul 24, 2016
@tomwhite
Jul 22, 2016
tomwhite commented on issue bigdatagenomics/adam#1003
@tomwhite

@jpdna @heuermh that's pretty old now - I think using Spark to do the partitioning is the way forward, and Impala supports nested types so flatteni…

Jul 22, 2016
tomwhite commented on issue bigdatagenomics/adam#651
@tomwhite

@heuermh I would actually favour the Spark dataframes/datasets route to doing partitioning, since it's better supported than Kite. Also, flattening…

Jul 22, 2016
tomwhite commented on pull request broadinstitute/hail#480
@tomwhite

Is there a Hive CLI equivalent of LIKE PARQUET <file>? I can't figure out how to get Hive to infer the schema from the Parquet file rather than sp…

Jul 20, 2016
@tomwhite
  • @tomwhite 38e383b
    Add docs on querying a variant store with Impala.
Jul 20, 2016
@tomwhite
  • @tomwhite d4b9bf7
    Add docs on querying a variant store with Impala.
tomwhite created branch tw_sql at broadinstitute/hail
Jul 20, 2016
Jul 18, 2016
tomwhite commented on pull request broadinstitute/gatk#1947
@tomwhite

I agree with @tedsharpe that ctx.defaultParallelism() is the recommended way to get the number of cores. If there's a race condition, then that is …

Jul 14, 2016
tomwhite commented on pull request broadinstitute/hail#450
@tomwhite

@cseed Thanks for updating to use AnnotationImpex (not sure about that name :). The changes look good to me and can be merged from my point of view. …

Jul 12, 2016
tomwhite commented on issue broadinstitute/gatk#1988
@tomwhite

@droazen yes. the prototype currently uses an inefficient groupBy operation Can you point me in the direction of the code for this please, @lberg…

Jul 12, 2016
@tomwhite
Jul 11, 2016
tomwhite commented on pull request broadinstitute/hail#464
@tomwhite

Sorry to come late to this, but it might be better to have a more explicit name for the flag, like --min-split-size. "Block" is a pretty overloaded…

Jul 7, 2016
@tomwhite

I've created a fix in #113.

Jul 7, 2016
@tomwhite
Delegate to htsjdk for finding BAM index files. Recognizes .bai and .…
1 commit with 8 additions and 5 deletions
Jul 7, 2016
Jul 7, 2016
tomwhite commented on pull request broadinstitute/gatk#1963
@tomwhite

I added the test you suggested, @akiezun, and verified that it fails against head.

Jul 7, 2016
@tomwhite
  • @tomwhite 2167f0b
    Add support for block gzipped files with a .bgz suffix (as well as
  • @akiezun 575984f
    Optimizations for GenotypeGVCFs + porting synchronized caches from ga…
  • 15 more commits »
Jul 7, 2016
@tomwhite

Thanks for the report @jamesemery. How are you accessing the files, and what is the error that you get?

Jul 6, 2016
tomwhite commented on pull request broadinstitute/hail#426
@tomwhite

@cseed, no, BGZFCodec doesn't write empty gzip blocks at the end of files. See https://github.com/HadoopGenomics/Hadoop-BAM/blob/master/src/main/j…

Jul 6, 2016
@tomwhite
  • @tomwhite 6af7a03
    Add SAMFileMerger for merging sharded BAM or CRAM files.
Jul 6, 2016
tomwhite commented on pull request HadoopGenomics/Hadoop-BAM#111
@tomwhite

@fnothaft, that would be great - thanks!

Jul 6, 2016
tomwhite commented on pull request HadoopGenomics/Hadoop-BAM#111
@tomwhite

@fnothaft would you (or someone else on the ADAM team) be able to review this? I tried running GATK with this change and it passes tests - would be…

Jul 6, 2016
tomwhite commented on pull request HadoopGenomics/Hadoop-BAM#111
@tomwhite

Addresses #48, #60, and #61.