Skip to content

Version 0.17.0

Compare
Choose a tag to compare
@simonvh simonvh released this 22 Dec 14:05
· 105 commits to master since this release

Changelog

[0.17.0] - 2021-12-22

Added

  • Added --genomes_dir argument to gimme motif2factors.
  • Added --version flag.
  • Function sample() for fast sequence sampling from a Motif() instance.
  • Added JASPAR 2022 motif databases.
  • Updated Homer motif database.
  • Operators:
    • + - take the combination of two motifs (average), based on pfm, which means that motifs with higher counts will be weighed more heavily.
    • & - take the combination of two motifs (average), based on the ppm, which means that both motifs will be weighed equally.
    • << - "shift" motif left (adding a non-informative position to the right side)
    • >> - "shift" motif right (adding a non-informative position to the left side)
    • ~ - reverse complement
    • * - multiply the pfm by a value
  • Progress bar for scanning.
  • list_installed_libraries() to list available motif libraries.

Changed

  • Motif() class completely restructured:
    • Split into multiple files with coherent function.
    • Uses numpy.array internally.
    • All functions that mention pwm renamed to ppm (position-probability matrix), as the definition of a PWM is usually a log-odds matrix, not a probability matrix.
      • to_pwm() is deprecated, use to_ppm() instead.
    • Changed functions pwm_min_score() and pwm_max_score() to properties max_score and min_score.
    • All internal data is correctly updated when Motif() is changed, for instance by trimming (#218).

Fixed

  • gimme motif2factors can now unzip genome fastas.
  • gimme motif2factors will sanitize genome names.
  • Fixed bugs related to partial rerun of gimme motif2factors.
  • Fixed unhandled OSError during installation on Mac.
  • Fixed bug related to RFE() (#226).
  • Positional probability matrix now sum to 1 over all positions (#209).
  • Fixed issue with pandas >= 1.3.
  • Fixed issue with non_reducing_slice import from pandas.
  • Fix threshold calculation if more than 20,000 sequences are supplied.
  • Fix issue with config file getting corrupted.
  • Fix FPR threshold calculation.