Skip to content

Commit

Permalink
Merge branch 'feature/sync' into 'main'
Browse files Browse the repository at this point in the history
rebase main

See merge request msaid/inferys/mokapot!37
  • Loading branch information
Siegfried Gessulat authored and gessulat committed Feb 27, 2024
2 parents 0b4fdc5 + 0fd515b commit 7cfe4d2
Show file tree
Hide file tree
Showing 8 changed files with 92 additions and 25 deletions.
33 changes: 18 additions & 15 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,28 @@
# Changelog for mokapot

## [Unreleased]

## [v0.10.1] - 2023-09-11
### Breaking changes
- Mokapot now uses `numpy.random.Generator` instead of the deprecated `numpy.random.RandomState` API.
New `rng` arguments have been added to functions and classes that rely on randomness in lieu of setting a global random seed with `np.random.seed()`. Thanks @sjust-seerbio!
New `rng` arguments have been added to functions and classes that rely on randomness in lieu of setting a global random seed with `np.random.seed()`. Thanks @sjust-seerbio! (#55)

### Changed
- Added linting with Ruff to tests and pre-commit hooks (along with others)!

### Fixed
- The PepXML reader, which broke due to a Pandas update.
- Potential bug if lowercase peptide sequences were used and protein-level confidence estimates were enabled
- Multiprocessing led to the same training set being used for all splits (#104).

## [0.9.1] - 2022-12-14
## [v0.9.1] - 2022-12-14
### Changed
- Cross-validation classes are now detected by looking for inheritance from the `sklearn.model_selection._search.BaseSearchCV` class.

### Fixed
- Fixed backward compatibility issue for Python <3.10.

## [0.9.0] - 2022-12-02
## [v0.9.0] - 2022-12-02
### Added
- Support for plugins, allowing mokapot to use new models.
- Added a custom Docker image with optional dependencies.
Expand All @@ -31,11 +34,11 @@
- Updated GitHub Actions.
- Migrated to a full pyproject.toml setuptools build. Thanks @jspaezp!

## [0.8.3] - 2022-07-20
## [v0.8.3] - 2022-07-20
### Fixed
- Fixed the reported mokapot score when group FDR is used.

## [0.8.2] - 2022-07-18
## [v0.8.2] - 2022-07-18
### Added
- `mokapot.Model()` objects now recorded the CV fold that they were fit on.
This means that they can be provided to `mokapot.brew()` in any order
Expand All @@ -45,7 +48,7 @@
- Resolved issue where models were required to have an intercept term.
- The PepXML parser would sometimes try and log transform features with `0`'s, resulting in missing values.

## [0.8.1] - 2022-06-24
## [v0.8.1] - 2022-06-24

### Added
- Support for previously trained models in the `brew()` function and the CLI
Expand All @@ -56,7 +59,7 @@
`min_length-1`.
- Links to example datasets in the documentation.

## [0.8.0] - 2022-03-11
## [v0.8.0] - 2022-03-11

Thanks to @sambenfredj, @gessulat, @tkschmidt, and @MatthewThe for
PR #44, which made these things happen!
Expand All @@ -72,17 +75,17 @@ PR #44, which made these things happen!
- Parallelization within `mokapot.brew()` now uses `joblib`
instead of `concurrent.futures`.

## [0.7.4] - 2021-09-03
## [v0.7.4] - 2021-09-03
### Changed
- Improved documentation and added warnings for `--subset_max_train`. Thanks
@jspaezp!

## [0.7.3] - 2021-07-20
## [v0.7.3] - 2021-07-20
### Fixed
- Fixed bug where the `--keep_decoys` did not work with `--aggregate`. Also,
added tests to cover this. Thanks @jspaezp!

## [0.7.2] - 2021-07-16
## [v0.7.2] - 2021-07-16
### Added
- `--keep_decoys` option to the command line interface. Thanks @jspaezp!
- Notes about setting a random seed to the Python API documentation. (Issue #30)
Expand All @@ -96,12 +99,12 @@ PR #44, which made these things happen!
### Changed
- Updates to unit tests. Warnings are now treated as errors for system tests.

## [0.7.1] - 2021-03-22
## [v0.7.1] - 2021-03-22
### Changed
- Updated the build to align with
[PEP517](https://www.python.org/dev/peps/pep-0517/)

## [0.7.0] - 2021-03-19
## [v0.7.0] - 2021-03-19
### Added
- Support for downstream peptide and protein quantitation with
[FlashLFQ](https://github.com/smith-chem-wisc/FlashLFQ). This is accomplished
Expand All @@ -127,23 +130,23 @@ PR #44, which made these things happen!
`importlib.metadata` to the standard library, saving a few hundred
milliseconds.

## [0.6.2] - 2021-03-12
## [v0.6.2] - 2021-03-12
### Added
- Now checks to verify there are no debugging print statements in the code
base when linting.

### Fixed
- Removed debugging print statements.

## [0.6.1] - 2021-03-11
## [v0.6.1] - 2021-03-11
### Fixed
- Parsing Percolator tab-delimited files with a "DefaultDirection" line.
- `Label` column is now converted to boolean during PIN file parsing.
Previously, problems occurred if the `Label` column was of dtype `object`.
- Parsing modifications from pepXML files were indexed incorrectly on the
peptide string.

## [0.6.0] - 2021-03-03
## [v0.6.0] - 2021-03-03
### Added
- Support for parsing PSMs from PepXML input files.
- This changelog.
Expand Down
4 changes: 2 additions & 2 deletions mokapot/brew.py
Original file line number Diff line number Diff line change
Expand Up @@ -272,13 +272,13 @@ def brew(
descs = [True] * len(psms)

if using_best_feat:
logging.warning(
LOGGER.warning(
"Learned model did not improve over the best feature. "
"Now scoring by the best feature for each collection "
"of PSMs."
)
elif reset:
logging.warning(
LOGGER.warning(

Check warning on line 281 in mokapot/brew.py

View check run for this annotation

Codecov / codecov/patch

mokapot/brew.py#L281

Added line #L281 was not covered by tests
"Learned model did not improve upon the pretrained "
"input model. Now re-scoring each collection of PSMs "
"using the original model."
Expand Down
4 changes: 2 additions & 2 deletions mokapot/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -511,7 +511,7 @@ def load_model(model_file, data_to_rescale=None):
# Try a percolator model first:
try:
weights = pd.read_csv(model_file, sep="\t", nrows=2).loc[1, :]
logging.info("Loading the Percolator model.")
LOGGER.info("Loading the Percolator model.")

Check warning on line 514 in mokapot/model.py

View check run for this annotation

Codecov / codecov/patch

mokapot/model.py#L514

Added line #L514 was not covered by tests

weight_cols = [c for c in weights.index if c != "m0"]
model = Model(estimator=LinearSVC(), scaler=StandardScaler())

Check warning on line 517 in mokapot/model.py

View check run for this annotation

Codecov / codecov/patch

mokapot/model.py#L517

Added line #L517 was not covered by tests
Expand All @@ -524,7 +524,7 @@ def load_model(model_file, data_to_rescale=None):

# Then try loading it with pickle:
except (KeyError, UnicodeDecodeError):
logging.info("Loading mokapot model.")
LOGGER.info("Loading mokapot model.")
with open(model_file, "rb") as mod_in:
model = pickle.load(mod_in)

Expand Down
19 changes: 17 additions & 2 deletions tests/conftest.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""
This file contains fixtures that are used at multiple points in the tests.
"""

import logging
import pytest
import numpy as np
import pandas as pd
Expand All @@ -10,6 +10,12 @@
from mokapot.qvalues import tdc


@pytest.fixture(autouse=True)
def set_logging(caplog):
"""Add logging to everything."""
caplog.set_level(level=logging.INFO, logger="mokapot")


@pytest.fixture(scope="session")
def psm_df_6():
"""A DataFrame containing 6 PSMs"""
Expand Down Expand Up @@ -81,6 +87,9 @@ def psm_df_1000(tmp_path):
"score": np.concatenate(
[rng.normal(3, size=200), rng.normal(size=300)]
),
"score2": np.concatenate(
[rng.normal(3, size=200), rng.normal(size=300)]
),
"filename": "test.mzML",
"ret_time": rng.uniform(0, 60 * 120, size=500),
"charge": rng.choice([2, 3, 4], size=500),
Expand All @@ -89,6 +98,12 @@ def psm_df_1000(tmp_path):
decoys = {
"specid": np.arange(500, 1000),
"target": [False] * 500,
"spectrum": np.arange(500),
"group": rng.choice(2, size=500),
"peptide": [_random_peptide(5, rng) for _ in range(500)],
"score": rng.normal(size=500),
"score2": rng.normal(size=500),
"filename": "test.mzML",
"scannr": np.random.randint(0, 1000, 500),
"calcmass": rng.uniform(500, 2000, size=500),
"expmass": rng.uniform(500, 2000, size=500),
Expand Down Expand Up @@ -124,7 +139,7 @@ def psms(psm_df_1000):
target_column="target",
spectrum_columns="spectrum",
peptide_column="peptide",
feature_columns="score",
feature_columns=["score", "score2"],
filename_column="filename",
scan_column="spectrum",
calcmass_column="calcmass",
Expand Down
3 changes: 0 additions & 3 deletions tests/system_tests/test_system.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,6 @@

logging.basicConfig(level=logging.INFO)

# Warnings are errors for these tests
pytestmark = pytest.mark.filterwarnings("error")


def test_compare_to_percolator(tmp_path):
"""Test that mokapot get almost the same answer as Percolator"""
Expand Down
11 changes: 10 additions & 1 deletion tests/unit_tests/test_brew.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,11 @@ def test_brew_test_fdr_error(psms_ondisk, svm):
# @pytest.mark.skip(reason="Not currently working, at least on MacOS.")
def test_brew_multiprocess(psms_ondisk, svm):
"""Test that multiprocessing doesn't yield an error"""
mokapot.brew(psms_ondisk, svm, test_fdr=0.05, max_workers=2)
_, models, _, _ = mokapot.brew(psms_ondisk, svm, test_fdr=0.05, max_workers=2)
# The models should not be the same:
assert_not_close(models[0].estimator.coef_, models[1].estimator.coef_)
assert_not_close(models[1].estimator.coef_, models[2].estimator.coef_)
assert_not_close(models[2].estimator.coef_, models[0].estimator.coef_)


def test_brew_trained_models(psms_ondisk, svm):
Expand Down Expand Up @@ -141,3 +145,8 @@ def test_brew_using_non_trained_models_error(psms_ondisk, svm):
"One or more of the provided models was not previously trained"
in str(err)
)


def assert_not_close(x, y):
"""Assert that two arrays are not equal"""
np.testing.assert_raises(AssertionError, np.testing.assert_allclose, x, y)
19 changes: 19 additions & 0 deletions tests/unit_tests/test_writer_flashlfq.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,25 @@
import pandas as pd


def test_sanity(psms, tmp_path):
"""Run simple sanity checks"""

# conf = psms.assign_confidence(eval_fdr=0.05)
# test1 = conf.to_flashlfq(tmp_path / "test1.txt")
# mokapot.to_flashlfq(conf, tmp_path / "test2.txt")
# test3 = mokapot.to_flashlfq([conf, conf], tmp_path / "test3.txt")
# with pytest.raises(ValueError):
# mokapot.to_flashlfq("blah", tmp_path / "test4.txt")
#
# df1 = pd.read_table(test1)
# df3 = pd.read_table(test3)
# assert 2 * len(df1) == len(df3)
# assert len(df1.columns) == 7

# TODO needs to be adapted to OnDisk confidence assignment
pass


def test_basic(mock_conf, tmp_path):
"""Test that the basic output works"""
conf = mock_conf
Expand Down
24 changes: 24 additions & 0 deletions tests/unit_tests/test_writer_txt.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,30 @@
import pandas as pd


def test_sanity(psms, tmp_path):
"""Run simple sanity checks"""
# conf = psms.assign_confidence(eval_fdr=0.05)
# test1 = conf.to_txt(dest_dir=tmp_path, file_root="test1")
# mokapot.to_txt(conf, dest_dir=tmp_path, file_root="test2")
# test3 = mokapot.to_txt([conf, conf], dest_dir=tmp_path, file_root="test3")
# with pytest.raises(ValueError):
# mokapot.to_txt("blah", dest_dir=tmp_path)
#
# test4 = mokapot.to_txt(conf, dest_dir=tmp_path, decoys=True)
# assert len(test1) == 2
# assert len(test4) == 4
#
# fnames = [Path(f).name for f in test1]
# assert fnames == ["test1.mokapot.psms.txt", "test1.mokapot.peptides.txt"]
#
# df1 = pd.read_table(test1[0])
# df3 = pd.read_table(test3[1])
# assert 2 * len(df1) == len(df3)

# TODO needs to be adapted to OnDisk confidence assignment
pass


def test_columns(mock_conf, tmp_path):
"""Test other specific things"""
conf = mock_conf
Expand Down

0 comments on commit 7cfe4d2

Please sign in to comment.