MOAgent is a graphical user interface which enables searching for biomarker candidates without coding skills in MS data by using Artificial Intelligence. An early version of the used algorithms from MOBiceps within MOAgent was applied in Nature Communications, 2023 to identify phenotype-specific proteins of myeloproliferative neoplasms (blood cancer).
Quick links:
- MOAgent Installation Instructions
- Get started in three steps
- Advanced installation outside of a Virtual Machine (VM)
- MOAgent Tutorial
- Demo
- Example files and file requirements
- General notes
Requirement | Specification |
---|---|
Operating System | Ubuntu 18.04/20.04/22.04 Windows 8.1/10/11 MacOS 10.15 - 14 |
CPU Architecture | x86_64 (64-bit Intel or AMD CPUs with Virtualization support) |
CPU Cores | 4 cores |
RAM | 8 GB RAM |
Storage | 50 GB |
Follow the steps below to install and use MOAgent:
-
Install VirtualBox
-
Download and extract MOAgentVM
-
Open MOAgentVM.vbox file with VirtualBox (Default username: moagent, password: 123)
After login MOAgent will start automatically. If not, use the desktop shortcut MOAgent.
- MOBiceps
- Anaconda (We recommend to use conda or other virtual python environments)
- docker (If you want to convert raw data)
-
Install Anaconda
Download and install Anaconda 23.3.1 from here.
For Ubuntu 20.04 LTS users: In the download folder where the downloaded anaconda file is located, open a terminal and execute:
$ bash Anaconda3-2023.03-1-Linux-x86_64.sh $ yes # to install $ enter # for default path $ yes # to init anaconda and make changes in your bash
-
Create the Conda Environment
Open a terminal within the git folder where the .yml file is located and execute the following command:
$ conda env create -f moagent.yml
-
Install the iq R package.
$ conda activate MOAgent $ R $ install.packages("iq")
-
Start MOAgent Open a terminal
$ conda activate MOAgent $ python MOAgent.py
For Spectronaut users, you need to use the iq.rs report schema in the repo to extract the needed columns for MOAgent. For Fragpipe users, we tested functionality with diann and ionquant output of the LF-MBR and DIA_SpecLib_Quant workflow. In general you can use any feature expression matrix where rows represent samples and columns features. This tutorial will guide you through using the MOAgent graphical user interface (GUI). This GUI allows you to convert data formats, generate feature tables, and use RFE++ for feature selection.
The GUI is organized into three main sections, which are accessible through the "Workflow" dropdown menu:
- Data Convert: This section is used for converting data from one format to another.
- Feature ML Table: This section is used for generating a feature table.
- RFE++: This section is used for Recursive Feature Elimination (RFE).
MOAgent GUI currently offers a data conversion functionality that can convert raw, mzXML and mzML into png, mzML or mzXML format.
Alternatively, you can use the conversion functionality from the command line using the MOBiceps expression_table.py
function directly.
To use the data conversion workflow in the GUI:
- Select "Data Convert" from the "Workflow" menu. This will open the Data Convert section.
- Chose the input format (for example mzML)
- Chose the output format (for example mzXML)
- Specify the number of processing threads (optional)
- Hit the Start button You will find two directories in the specified Input directory folder. One containing the original files, the other the conversions.
The script accepts the following parameters:
--p
or--path_to_folder
: Absolute path to the folder containing all files to be converted. (Default: Current working directory)--s
or--orig_format
: Source file format. (Default: 'raw')--f
or--file_format
: Target file format. (Default: 'mzML')--c
or--core_number
: Specifies the number of threads to be used to convert all files. -1 corresponds to all possible. (Default: -1)
To execute the script, navigate to its location in your terminal and use the following command:
python /path/to/MOBiceps/data_convert.py --p /path/to/folder --s original_format --f target_format --c number_of_cores
or within your code if you installed MOBiceps via pip
import MOBiceps as mob
mob.convertRAWMP(original_format,target_format,number_of_cores)
The "Feature ML Table" functionality allows for the conversion of search output data into a feature expression table that is compatible with the machine learning (ML) process. It also provides the option to perform imputation of missing values. You can access this functionality via the MOAgent GUI or the command line through the expression_table.py
script.
To use the "Feature ML Table" feature in the GUI:
- Select "Feature ML Table" from the "Workflow" menu. This will open the Feature ML Table section.
- In the 'Search output' field, specify the file containing your search output. You can use the 'Browse' button to navigate to the file.
- In the 'Class annotations' field, specify the file that contains your class annotations. You can use the 'Browse' button to navigate to the file.
- In the 'Output path' field, specify the directory where you want to save your output. You can use the 'Browse' button to navigate to the directory.
- In the 'Imputation' dropdown menu, select the imputation method to be used. Currently "mean", "median", "zero", "gaussian" are supported. We recommend to use no imputation or your more sophisticated approaches from dedicated imputation packages.
- In the 'Feature level' dropdown menu, select if the feature table should be constructed for peptide or protein level.
- Click the 'Start' button to start the data conversion process.
Alternatively, you can use the "Feature ML Table" function from the command line by executing the MOBiceps data_convert.py
. The following parameters are accepted:
--s
: Path to search output. Currently Spectronaut and DIA-NN output is supported. (Default: Current working directory)--c
: Path to class annotation file. (Default: Current working directory)--o
: Output path. (Default: Current working directory)--m
: Imputation method. Currently "mean", "median", "zero", "gaussian" are supported. (Default: "none")--f
: Feature level. "peptide" and "protein" are supported. (Default: "peptide")
To execute the script, navigate to its location in your terminal and use the following command:
python /path/to/MOBiceps/expression_table.py --s /path/to/search/output --c /path/to/class/annotation --o /path/to/output
or within your code if you installed MOBiceps via pip
import MOBiceps as mob
mob.build_expression_table(path_to_search_output, path_to_class_annotation, path_to_output)
You can use the RFE++ feature through the MOAgent GUI or directly from the command line.
To use the RFE++ feature in the GUI:
- Select "RFE++" from the "Workflow" menu. This will open the RFE++ section.
- In the 'Search output or expression table' field, specify the file you want to use. You can use the 'Browse' button to navigate to the file.
- In the 'Class annotations' field, specify the file that contains your class annotations. You can use the 'Browse' button to navigate to the file.
- In the 'Replicate annotations' field, specify the file that contains your replicate annotations. You can use the 'Browse' button to navigate to the file.
- In the 'Output directory' field, specify the directory where you want to save your output. You can use the 'Browse' button to navigate to the directory.
- In the 'Bootstrap augmentation' check box, specify if bootstrap should be applied to augment the samples. The 'Noisy augmentation' check box specifies, if class-dependent Gaussian noise learned from the data should be applied during augmentation.
- In the 'Feature level' dropdown menu, select if the analyses should be done on peptide or protein level.
- In the 'GPU support' check box, specify if a GPU should be used. By default, a GPU is not used. (Not available in VM. Only available if a GPU is installed - for experienced users)
- In the 'Force handalbe amount of features' check box it can be specified, if up to less than 30 features should be filtered, even if the optimal phenotype classification is achieved with far more than 30 features.
- In the considered Classes field write in comma separated which classes should be considered in the analysis. If the field is empty, all available classes will be considered.
- Click the 'Start' button to estimate the most contributing phenotype-specific features.
Alternatively, you can use the RFE++ feature from the command line by executing the rfePlusPlusWF.py
script. The script accepts the following parameters:
--i
: Path to the folder containing the search output of Spectronaut or DIA-NN. (Default: Current working directory)--c
: Path to the class annotation file.--s
: Path to the sample annotation file. (Optional)--o
: Output path. (Default: Current working directory)--b
: Use bootstrapping augmentation. (Default: False)--m
: Imputation method. Currently "mean", "median", "zero", "frequent" and "none" are supported. (Default: 'none')--f
: Feature level. "peptide" and "protein" are supported. (Default: 'peptide')--g
: Support for GPU if set to True. (Default: False)--n
: Bootstrapping with noisy resampling. (Default: False)--h
: Force the reduction to a handable amount of features. (Default: True)--p
: specify which classes should be considered.
To execute the script, navigate to its location in your terminal and use the following command:
python /path/to/MOBiceps/rfePlusPlusWF.py --i /path/to/search/output --c /path/to/class/annotation --o /path/to/output --p classA classB classC
or within your code if you installed MOBiceps via pip
import MOBiceps as mob
mob.execute_rfePP(path_to_search_output, path_to_class_annotation, path_to_output, phenotype_class_list)
To test MOAgent outside of the provided MOAgentVM hosted on Zenodo, you can use the provided input files in the Demo folder.
- Therefore, make sure to set
--i
parameter of RFE++ functionrfePlusPlusWF.py
orpath_to_search_output
parameter ofexecute_rfePP
of theMOBiceps
package toMOAgent/Demo/input/metabolite_expression_table.csv
or select this file for theSearch output or expression table
field in the GUI via theBrowse
button. - Additionally, set the
--c
parameter of RFE++ functionrfePlusPlusWF.py
orpath_to_class_annotation
ofexecute_rfePP
of theMOBiceps
package toMOAgent/Demo/input/class_annotations.csv
or select this file for theClass annotations
field in the GUI via theBrowse
button. - Finally, set the
--o
parameter of RFE++ functionrfePlusPlusWF.py
orpath_to_output
ofexecute_rfePP
of theMOBiceps
package toMOAgent/Demo/output
or select this path for theOutput directory
field in the GUI via theBrowse
button and hit theStart
button.
The results of the MOAgent analysis will be available in the folder “<..>/Demo/output/”. They will be similar to the reported results which can be found here More case studies.
The class annotation and sample annotation files need to have a specific structure as shown in the following.
Make sure the column names are files and class. The file type should be .csv. The classes should not just be numerical.
Make sure the column names are files and PatientID. The file type should be .csv.
Spectronaut supports report exports in csv and tsv format. Change the file ending by renaming the file to .txt.
The DIANN main report file in a label free quantification default workflow from Fragpipe is .tsv and should be used.
- General help about VirtualBox (e.g. how to mount local drives to MOAgent) you can find in the UserManual
- We recommend accessing your data using FileZilla, ssh (scp, sftp) and if necessary the VPN functionality within the VM in combination with the file browser.
- Have fun and please leave a star if you find this repo helpful!
- If you have any questions, please do not hesitate to contact jsettelmeier@ethz.ch.