Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/improve speed and limit memory (#11) #100

Open
wants to merge 45 commits into
base: main
Choose a base branch
from

Commits on Apr 12, 2023

  1. Feature/improve speed and limit memory (#11)

    Improve speed and limit memory consumption
    
    - stream input files for inference
    - add feature: skip deduplication
    - add feature: ensemble model
    - add feature: rescale input before inference with pre-trained models
    sambenfredj committed Apr 12, 2023
    Configuration menu
    Copy the full SHA
    2f879e5 View commit details
    Browse the repository at this point in the history

Commits on Apr 17, 2023

  1. 💄 linting (#12)

    💄 fix linting
    gessulat committed Apr 17, 2023
    Configuration menu
    Copy the full SHA
    bf0c2ce View commit details
    Browse the repository at this point in the history

Commits on Apr 20, 2023

  1. Fix bugs (#17)

    - fix bug member variables not assigned when model is not trained
    - allow throw when input file is malformed: remove skip on bad lines from pandas read function
    gessulat committed Apr 20, 2023
    Configuration menu
    Copy the full SHA
    3ad792c View commit details
    Browse the repository at this point in the history

Commits on May 5, 2023

  1. Configuration menu
    Copy the full SHA
    29f5549 View commit details
    Browse the repository at this point in the history

Commits on May 9, 2023

  1. Fix test brew: (#20)

    - Create new object of OnDiskPsmDataset to use for brew tests
    - Update brew function outputs and assert statements
    sambenfredj committed May 9, 2023
    Configuration menu
    Copy the full SHA
    ae7f880 View commit details
    Browse the repository at this point in the history

Commits on May 11, 2023

  1. fix test datasets: (#19)

    - remove assign confidence tests because datasets don't have assign confidence methods anymore
    - add eval_fdr value to the _update_labels function
    sambenfredj committed May 11, 2023
    Configuration menu
    Copy the full SHA
    4e7235b View commit details
    Browse the repository at this point in the history
  2. Fix test confidence (#22)

    * Fix test confidence:
    - fix bugs for grouped confidence
    - fix test_one_group : create file using psm_df_1000 to create OnDiskPsmDataset.
    - remove test_pickle because confidence does not return dataframe results anymore.
    - add test_multi_groups to test that different group results are saved correctly.
    
    * fix bugs:
    - overwrite default fdr for update_labels function
    - return dataframe for psm_df_1000 to use with LinearPsmDataset
    sambenfredj committed May 11, 2023
    Configuration menu
    Copy the full SHA
    3e7dda9 View commit details
    Browse the repository at this point in the history

Commits on May 15, 2023

  1. Fix cli tests: (#28)

    - Remove test_cli_pepxml because xml files don't work with streaming
    - Replace old output file names
    - Add random generator 'rng' variable to confidence since it is required for proteins
    - Remove subset_max_train from PluginModel
    - Fix bug: convert targets column after reading in chunks
    - Fix peptide column name for confidence
    - Fix test cli plugins : replace DecisionTreeClassifier with LinearSVC BECAUSE DecisionTreeClassifier return scores as 0 or 1
    sambenfredj committed May 15, 2023
    Configuration menu
    Copy the full SHA
    c5d158a View commit details
    Browse the repository at this point in the history

Commits on May 16, 2023

  1. Fix system tests: (#29)

    - Refactor test structure : Separate brew and confidence functions, read results from output files.
    - Fix bugs: fix output columns for proteins, sort proteins data by score
    sambenfredj committed May 16, 2023
    Configuration menu
    Copy the full SHA
    1d2fdf0 View commit details
    Browse the repository at this point in the history

Commits on May 17, 2023

  1. Fix parser pin test: (#30)

    - Add label value to initial direction because it has to have a numerical number
    - Read pin does not return dataframe anymore
    - Compare output of read_pin function to example dataframe
    sambenfredj committed May 17, 2023
    Configuration menu
    Copy the full SHA
    6e08b70 View commit details
    Browse the repository at this point in the history

Commits on May 22, 2023

  1. Add tests: (#31)

    - Add skip_deduplication flag test
    - Add ensemble flag test
    - Agg rescale flag test
    - Fix bug: remove target_column variable from read file for read_data_for_rescale
    sambenfredj committed May 22, 2023
    Configuration menu
    Copy the full SHA
    d16cedc View commit details
    Browse the repository at this point in the history
  2. Fix writer tests: (#32)

    - Remove writer tests with confidence object becaause LinearPsmDataset does not have asign_confidence method anymore and results are streamed to output files while computing confidence
    sambenfredj committed May 22, 2023
    Configuration menu
    Copy the full SHA
    40f8394 View commit details
    Browse the repository at this point in the history

Commits on Aug 4, 2023

  1. fix error no psms found during training : if no psms passed the fdr v…

    …alue then raise error that model performed worse (#33)
    sambenfredj committed Aug 4, 2023
    Configuration menu
    Copy the full SHA
    531d4ae View commit details
    Browse the repository at this point in the history
  2. Introduce new executable and bug fixes

    * Create new executable to aggregate psms to peptides.
    * Fix bugs:
    - fix error no psms found during training : if no psms passed the fdr value then raise error that model performed worse
    - raise error when pep values are all equal to 1
    - prefixes paths to dest_dir to not pollute the workdir
    - catch error to prevent traces logged: Catch all errors to not break structured logging by error traces
    - fixes parallelism in parse_in_chunks to max_workers
    - fix indeterminism
    - fixed small column chunk bug
    - fix bug when using multiple input files
    * Fix and add tests:
    - remove writer tests with confidence object because LinearPsmDataset does not have asign_confidence method anymore and results are streamed to output files while computing confidence
    - add test for the new function "get_unique_peptides_from_psms"
    - add cli test for aggregatePsmsToPeptides
    sambenfredj committed Aug 4, 2023
    Configuration menu
    Copy the full SHA
    84c427b View commit details
    Browse the repository at this point in the history

Commits on Feb 16, 2024

  1. ✨ force ci re-run

    gessulat committed Feb 16, 2024
    Configuration menu
    Copy the full SHA
    b85d176 View commit details
    Browse the repository at this point in the history

Commits on Feb 22, 2024

  1. Merge branch 'develop' into 'main'

    dev to main
    
    See merge request msaid/inferys/mokapot!36
    Siegfried Gessulat committed Feb 22, 2024
    Configuration menu
    Copy the full SHA
    58e8481 View commit details
    Browse the repository at this point in the history
  2. 💄 lint mokapot

    gessulat committed Feb 22, 2024
    Configuration menu
    Copy the full SHA
    74f91f1 View commit details
    Browse the repository at this point in the history
  3. 💄 lints tests

    gessulat committed Feb 22, 2024
    Configuration menu
    Copy the full SHA
    2985a7f View commit details
    Browse the repository at this point in the history
  4. 💄 fixes format with ruff

    - adds line break in dataset.py
    - updates call of ruff in CI
    - updates pyproject.toml according to new ruff api
    gessulat committed Feb 22, 2024
    Configuration menu
    Copy the full SHA
    12ebe26 View commit details
    Browse the repository at this point in the history
  5. 💄 fixes format with ruff

    - adds line break in dataset.py
    - updates call of ruff in CI
    - updates pyproject.toml according to new ruff api
    gessulat committed Feb 22, 2024
    Configuration menu
    Copy the full SHA
    49608e1 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    6ccc88e View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    0b4fdc5 View commit details
    Browse the repository at this point in the history

Commits on Feb 27, 2024

  1. Feature/improve speed and limit memory (#11)

    Improve speed and limit memory consumption
    
    - stream input files for inference
    - add feature: skip deduplication
    - add feature: ensemble model
    - add feature: rescale input before inference with pre-trained models
    sambenfredj authored and gessulat committed Feb 27, 2024
    Configuration menu
    Copy the full SHA
    f595804 View commit details
    Browse the repository at this point in the history
  2. 💄 linting (#12)

    💄 fix linting
    gessulat committed Feb 27, 2024
    Configuration menu
    Copy the full SHA
    46fbf6b View commit details
    Browse the repository at this point in the history
  3. Fix bugs (#17)

    - fix bug member variables not assigned when model is not trained
    - allow throw when input file is malformed: remove skip on bad lines from pandas read function
    gessulat committed Feb 27, 2024
    Configuration menu
    Copy the full SHA
    ee95fbd View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    f3d50c8 View commit details
    Browse the repository at this point in the history
  5. Fix test brew: (#20)

    - Create new object of OnDiskPsmDataset to use for brew tests
    - Update brew function outputs and assert statements
    sambenfredj authored and gessulat committed Feb 27, 2024
    Configuration menu
    Copy the full SHA
    4293410 View commit details
    Browse the repository at this point in the history
  6. fix test datasets: (#19)

    - remove assign confidence tests because datasets don't have assign confidence methods anymore
    - add eval_fdr value to the _update_labels function
    sambenfredj authored and gessulat committed Feb 27, 2024
    Configuration menu
    Copy the full SHA
    623b7d8 View commit details
    Browse the repository at this point in the history
  7. Fix test confidence (#22)

    * Fix test confidence:
    - fix bugs for grouped confidence
    - fix test_one_group : create file using psm_df_1000 to create OnDiskPsmDataset.
    - remove test_pickle because confidence does not return dataframe results anymore.
    - add test_multi_groups to test that different group results are saved correctly.
    
    * fix bugs:
    - overwrite default fdr for update_labels function
    - return dataframe for psm_df_1000 to use with LinearPsmDataset
    sambenfredj authored and gessulat committed Feb 27, 2024
    Configuration menu
    Copy the full SHA
    8f417dd View commit details
    Browse the repository at this point in the history
  8. Fix cli tests: (#28)

    - Remove test_cli_pepxml because xml files don't work with streaming
    - Replace old output file names
    - Add random generator 'rng' variable to confidence since it is required for proteins
    - Remove subset_max_train from PluginModel
    - Fix bug: convert targets column after reading in chunks
    - Fix peptide column name for confidence
    - Fix test cli plugins : replace DecisionTreeClassifier with LinearSVC BECAUSE DecisionTreeClassifier return scores as 0 or 1
    sambenfredj authored and gessulat committed Feb 27, 2024
    Configuration menu
    Copy the full SHA
    2e1723e View commit details
    Browse the repository at this point in the history
  9. Fix system tests: (#29)

    - Refactor test structure : Separate brew and confidence functions, read results from output files.
    - Fix bugs: fix output columns for proteins, sort proteins data by score
    sambenfredj authored and gessulat committed Feb 27, 2024
    Configuration menu
    Copy the full SHA
    6355834 View commit details
    Browse the repository at this point in the history
  10. Fix parser pin test: (#30)

    - Add label value to initial direction because it has to have a numerical number
    - Read pin does not return dataframe anymore
    - Compare output of read_pin function to example dataframe
    sambenfredj authored and gessulat committed Feb 27, 2024
    Configuration menu
    Copy the full SHA
    296fb73 View commit details
    Browse the repository at this point in the history
  11. Add tests: (#31)

    - Add skip_deduplication flag test
    - Add ensemble flag test
    - Agg rescale flag test
    - Fix bug: remove target_column variable from read file for read_data_for_rescale
    sambenfredj authored and gessulat committed Feb 27, 2024
    Configuration menu
    Copy the full SHA
    096b07f View commit details
    Browse the repository at this point in the history
  12. Fix writer tests: (#32)

    - Remove writer tests with confidence object becaause LinearPsmDataset does not have asign_confidence method anymore and results are streamed to output files while computing confidence
    sambenfredj authored and gessulat committed Feb 27, 2024
    Configuration menu
    Copy the full SHA
    d497fcc View commit details
    Browse the repository at this point in the history
  13. fix error no psms found during training : if no psms passed the fdr v…

    …alue then raise error that model performed worse (#33)
    sambenfredj authored and gessulat committed Feb 27, 2024
    Configuration menu
    Copy the full SHA
    d241adb View commit details
    Browse the repository at this point in the history
  14. Introduce new executable and bug fixes

    * Create new executable to aggregate psms to peptides.
    * Fix bugs:
    - fix error no psms found during training : if no psms passed the fdr value then raise error that model performed worse
    - raise error when pep values are all equal to 1
    - prefixes paths to dest_dir to not pollute the workdir
    - catch error to prevent traces logged: Catch all errors to not break structured logging by error traces
    - fixes parallelism in parse_in_chunks to max_workers
    - fix indeterminism
    - fixed small column chunk bug
    - fix bug when using multiple input files
    * Fix and add tests:
    - remove writer tests with confidence object because LinearPsmDataset does not have asign_confidence method anymore and results are streamed to output files while computing confidence
    - add test for the new function "get_unique_peptides_from_psms"
    - add cli test for aggregatePsmsToPeptides
    sambenfredj authored and gessulat committed Feb 27, 2024
    Configuration menu
    Copy the full SHA
    41ed445 View commit details
    Browse the repository at this point in the history
  15. ✨ force ci re-run

    gessulat committed Feb 27, 2024
    Configuration menu
    Copy the full SHA
    ac43547 View commit details
    Browse the repository at this point in the history
  16. 💄 lint mokapot

    gessulat committed Feb 27, 2024
    Configuration menu
    Copy the full SHA
    4a9872f View commit details
    Browse the repository at this point in the history
  17. 💄 lints tests

    gessulat committed Feb 27, 2024
    Configuration menu
    Copy the full SHA
    346a0c0 View commit details
    Browse the repository at this point in the history
  18. 💄 fixes format with ruff

    - adds line break in dataset.py
    - updates call of ruff in CI
    - updates pyproject.toml according to new ruff api
    gessulat committed Feb 27, 2024
    Configuration menu
    Copy the full SHA
    f543166 View commit details
    Browse the repository at this point in the history
  19. 💄 fixes format with ruff

    - adds line break in dataset.py
    - updates call of ruff in CI
    - updates pyproject.toml according to new ruff api
    gessulat committed Feb 27, 2024
    Configuration menu
    Copy the full SHA
    0742dc2 View commit details
    Browse the repository at this point in the history
  20. Configuration menu
    Copy the full SHA
    a2602df View commit details
    Browse the repository at this point in the history
  21. Configuration menu
    Copy the full SHA
    f12a43d View commit details
    Browse the repository at this point in the history
  22. Merge branch 'main' into 'feature/sync'

    # Conflicts:
    #   tests/conftest.py
    #   tests/system_tests/test_system.py
    #   tests/unit_tests/test_brew.py
    #   tests/unit_tests/test_writer_flashlfq.py
    #   tests/unit_tests/test_writer_txt.py
    Siegfried Gessulat committed Feb 27, 2024
    Configuration menu
    Copy the full SHA
    0fd515b View commit details
    Browse the repository at this point in the history
  23. Merge branch 'feature/sync' into 'main'

    rebase main
    
    See merge request msaid/inferys/mokapot!37
    Siegfried Gessulat authored and gessulat committed Feb 27, 2024
    Configuration menu
    Copy the full SHA
    6726dea View commit details
    Browse the repository at this point in the history