Skip to content

Releases: vanheeringen-lab/gimmemotifs

Version 0.18.0

11 Jan 12:51
Compare
Choose a tag to compare

[0.18.0] - 2023-01-11

Added

  • gimme scan and gimme maelstrom now accept a random seed for (most) operations
    • for (optimal) deterministic behaviour, delete the cache and then run the command with a seed
  • Scanner now accepts a np.random.RandomState and progress on init.
    • progress=None (the default) should print progress bars to the command line only, not to file.
  • Scanner.set_genome now accepts the optional argument genomes_dir
  • gimmemotifs.maelstrom.Moap.create now accepts a np.random.RandomState.
  • gimmemotifs.maelstrom.run_maelstrom now accepts a np.random.RandomState.

Changed

  • gimme diff (diff_plot() to be exact) will now print to stdout, like all other functions
  • now using the logger instead of print/sys.stderr.write in many more places
  • string formatting now (mostly) done with f-strings
  • refactored Fasta class
  • split scanner.py into 3 submodules:
    • scanner/__init__.py with the exported functions
    • scanner/base.py with the Scanner class
    • scanner/utils.py with the rest
  • gimmemotifs/maelstrom.py renamed to gimmemotifs/maelstrom/_init__.py
    • rank.py and moap.py are now submodules of maelstrom.

Fixed

  • gimme maelstrom works with or without xgboost (but will give a warning without xgboost)
  • fixed warning "in validate_matrix(): Row sums in df are not close to 1. Reormalizing rows..."
  • fixed multiprocess.Pool Warnings
  • fixed a pandas copywarning (in gc_bin_bedfile() to be exact)
  • fixed warnings when leaving files open
  • fixed deprecation warning in maelstrom (and in tests)
  • fixed futurewarning in report.py
  • silence warnings from external tools in motif prediction (pp_predict_motifs() to be exact)
  • updated last references from Motif.pwm_scan and Motif.pwm_scan_all to Motif.scan and Motif.scan_all respectively
  • typo in gimme motifs output ("%matches background" to "% matches background")
  • Scanner now uses a cheaper method to determine a genome's identity
    • (filesize + name instead of the md5sum of the whole genome's contents)
  • gimme motifs gives an informative error when fraction is not within 0-1.
  • gimme threshold works again

Removed

  • removed old python2 code (scanning with MOODS & import shenanigans)

Version 0.17.2

12 Oct 15:18
Compare
Choose a tag to compare

[0.17.2] - 2022-10-12

Changed

  • made xgboost an optional dependency (to save space on bioconda)
  • an existing config will now update available tools when accessed (e4b3275)
  • applied the bioconda patch to compile_externals.py (11b0c2c)
  • coverage_table and combine_peaks have their positional arguments under positional arguments (20819ee)
  • coverage_table should be slightly faster now (20819ee)

Fixed

  • biofluff dependency back in requirements
  • pinned conda and mamba versions in .travis.yaml
    • temp fix until conda>=4.12 can install mamba properly
  • documentation is working again!
  • gimmemotifs now supports pandas >=1.30

Removed

  • pyarrow dependency

Version 0.17.1

02 Jun 11:32
Compare
Choose a tag to compare

Changelog

[0.17.1] - 2022-06-02

Fixed

  • motifs require to have unique ids when clustering, thanks @akmorrow13!
  • motif2factors removes apostrophes so it wont crash :)
  • removed a print

Version 0.17.0

22 Dec 14:05
Compare
Choose a tag to compare

Changelog

[0.17.0] - 2021-12-22

Added

  • Added --genomes_dir argument to gimme motif2factors.
  • Added --version flag.
  • Function sample() for fast sequence sampling from a Motif() instance.
  • Added JASPAR 2022 motif databases.
  • Updated Homer motif database.
  • Operators:
    • + - take the combination of two motifs (average), based on pfm, which means that motifs with higher counts will be weighed more heavily.
    • & - take the combination of two motifs (average), based on the ppm, which means that both motifs will be weighed equally.
    • << - "shift" motif left (adding a non-informative position to the right side)
    • >> - "shift" motif right (adding a non-informative position to the left side)
    • ~ - reverse complement
    • * - multiply the pfm by a value
  • Progress bar for scanning.
  • list_installed_libraries() to list available motif libraries.

Changed

  • Motif() class completely restructured:
    • Split into multiple files with coherent function.
    • Uses numpy.array internally.
    • All functions that mention pwm renamed to ppm (position-probability matrix), as the definition of a PWM is usually a log-odds matrix, not a probability matrix.
      • to_pwm() is deprecated, use to_ppm() instead.
    • Changed functions pwm_min_score() and pwm_max_score() to properties max_score and min_score.
    • All internal data is correctly updated when Motif() is changed, for instance by trimming (#218).

Fixed

  • gimme motif2factors can now unzip genome fastas.
  • gimme motif2factors will sanitize genome names.
  • Fixed bugs related to partial rerun of gimme motif2factors.
  • Fixed unhandled OSError during installation on Mac.
  • Fixed bug related to RFE() (#226).
  • Positional probability matrix now sum to 1 over all positions (#209).
  • Fixed issue with pandas >= 1.3.
  • Fixed issue with non_reducing_slice import from pandas.
  • Fix threshold calculation if more than 20,000 sequences are supplied.
  • Fix issue with config file getting corrupted.
  • Fix FPR threshold calculation.

Version 0.16.1

28 Jun 09:44
Compare
Choose a tag to compare

[0.16.1] - 2021-06-28

Bugfix release.

Added

  • Added warning when the number of sequences used for de novo motif prediction is low.

Fixed

  • Fixed bug with gimme motif2factors.
  • Fixed "Motif does not occur in motif database when running maelstrom" (#192).
  • Fixed bugs related to runs where no (significant) motifs is found.

Version 0.16.0

28 May 14:31
Compare
Choose a tag to compare

[0.16.0] - 2021-05-28

Many bugfixes, thanks to @kirbyziegler, @irzhegalova, @wangmhan, @ClarissaFeuersteinAkgoz and @fgualdr for reporting and proposing solutions!
Thanks to @Maarten-vd-Sande for the speed improvements.

Added

  • gimme motif2factors command to annotate a motif database with TFs from different species
    based on orthogroups.
  • Informative error message with link to fix when cache is corrupted (running on a cluster).
  • Print an informative error message if the input file is not in the correct format.

Changed

  • Speed improvements to motif scanning, which is now up to 2X faster!
  • Size of input regions is now automatically adjusted (#123, #128, #129)
  • Quantile normalization in coverage_table now uses multiple CPUs.

Fixed

  • Fixes issue where % of motif occurence would be incorrectly reported in gimme maelstrom output (#162).
  • Fix issues with running Trawler (#181)
  • Fix issues with running YAMDA (#180)
  • Fix issues with parsing XXmotif output (#178)
  • Fix issue where command line argument (such as single strand) are ignored (#177)
  • Fix pyarrow dependency (#176)
  • The correct % of regions with motif is now reported (#162)
  • Fix issue with running gimme motifs with the HOMER database (#135)
  • Fix issue with the --size parameter in gimme motifs, which now works as expected (#128)

Version 0.15.3

01 Feb 18:54
Compare
Choose a tag to compare

[0.15.3] - 2021-02-01

Fixed

  • _non_reducing_slice vs non_reducing_slice for pandas>=1.2 (#168)
  • When using original region size, skip regions smaller than 10bp and warn if no
    regions are left.
  • Fixed creating statistics report crashed with KeyError: 'Factor' (#170)
  • Fixed bug with creating GC bins for a genome with unusual GC% (like Plasmodium).
  • Fixed bug that occurs when upgrading pyarrow with an existing GimmeMotifs
    cache.

Version 0.15.2

26 Nov 07:04
Compare
Choose a tag to compare

[0.15.2] - 2020-11-26

Changed

  • Refactoring to make coverage_table and combine_peaks available via API.

Fixed

  • Fix issue with -s parameter of gimme motifs (#146)
  • Fix issues (hopefully) with scanning large input files.

Version 0.15.1

07 Oct 06:54
Compare
Choose a tag to compare

[0.15.1] - 2020-10-07

Bugfix release.

Added

  • Motif.plot_logo() accepts an ax argument.

Fixed

  • Support for pandas>=1.1
  • coverage_table doesn't add a newline at the end of the file.

Version 0.15.0

30 Sep 05:35
Compare
Choose a tag to compare

[0.15.0] - 2020-09-29

Added

  • Added additional columns to gimme maelstrom output for better intepretation (correlation of motif to signal and % of regions with motif).
  • Added support for multi-species input in genome@chrom:start-end format.
  • gimme maelstrom warns if data is not row-centered and will center by default.
  • gimme maelstrom selects a set of non-redundant (or less redundant) motifs by default.
  • Added SVR regressor for gimme maelstrom.
  • Added quantile normalization to coverage_table.

Removed

  • Removed the lightning classifiers and regressors as the package is no longer actively maintained.

Changed

  • Visually improved HTML output.
  • Score of maelstrom is now an aggregate z-score based on combining z-scores from individual methods using Stouffer's method. The z-scores of individual methods are generated using the inverse normal transform.
  • Reorganized some classes and functions.

Fixed

  • Fixed minor issues with sorting columns in HTML output.
  • gimme motifs doesn't crash when no motifs are found.
  • Fixed error with Ensembl chromosome names in combine_peaks.