Skip to content

Commit

Permalink
Merge branch 'dev'
Browse files Browse the repository at this point in the history
  • Loading branch information
Zhuoqing Fang committed Dec 26, 2017
2 parents c6568cc + 92c142e commit f88e3b8
Show file tree
Hide file tree
Showing 13 changed files with 176 additions and 177 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
*.eggs
*.EGG
*.EGG-INFO
*.idea
bin
build
develop-eggs
Expand Down
6 changes: 6 additions & 0 deletions .idea/vcs.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

14 changes: 7 additions & 7 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ GSEAPY: Gene Set Enrichment Analysis in Python.



The main documentation for GSEAPY can be found at http://gseapy.rtfd.io/
The main documentation for GSEApy can be found at http://gseapy.rtfd.io/

An example to use gseapy, please click here: `Example <http://gseapy.readthedocs.io/en/master/gseapy_example.html>`_

Expand All @@ -39,7 +39,7 @@ An example to use gseapy, please click here: `Example <http://gseapy.readthedocs
GSEAPY is a python wrapper for **GSEA** and **Enrichr**.
--------------------------------------------------------------------------------------------

GSEAPY could be used for **RNA-seq, ChIP-seq, Microarry** data. It's used for convenient GO enrichments and produce **publishable quality figures** in python.
GSEAPY could be used for **RNA-seq, ChIP-seq, Microarry** data. It's used for convenient GO enrichment and produce **publishable quality figures** in python.


GSEAPY has five sub-commands available: ``gsea``, ``prerank``, ``ssgsea``, ``replot`` ``enrichr``.
Expand All @@ -49,7 +49,7 @@ GSEAPY has five sub-commands available: ``gsea``, ``prerank``, ``ssgsea``, ``rep
:prerank: The ``prerank`` module produce **Prerank tool** results. The input expects a pre-ranked gene list dataset with correlation values, which in .rnk format, and gene_sets file in gmt format. ``prerank`` module is an API to `GSEA` pre-rank tools.
:ssgsea: The ``ssgsea`` module perform **single sample GSEA(ssGSEA)** analysis. The input expects a pd.Series (indexed by gene name), or pd.DataFrame (include ``GCT`` file) with expression values and ``GMT`` file. For multi sample input, ssGSEA reconigze gct format, too. ssGSEA enrichment score for the gene set as described by `D. Barbie et al 2009 <http://www.nature.com/nature/journal/v462/n7269/abs/nature08460.html>`_.

:replot: The ``replot`` module reproduce GSEA desktop version results. The only input for GSEAPY is the location to ``GSEA`` Desktop output results.
:replot: The ``replot`` module reproduce GSEA desktop version results. The only input for GSEApy is the location to ``GSEA`` Desktop output results.

:enrichr: The ``enrichr`` module enable you perform gene set enrichment analysis using ``Enrichr`` API. Enrichr is open source and freely available online at: http://amp.pharm.mssm.edu/Enrichr . It runs very fast.

Expand All @@ -71,7 +71,7 @@ do gene set enrichment analysis in python. So, here is my reason:

* **Running inside python interactive console without switch to R!!!**
* User friendly for both wet and dry lab usrers.
* Produce or reproduce pubilishable figures.
* Produce or reproduce publishable figures.
* Perform batch jobs easy.
* Easy to use in bash shell or your data analysis workflow, e.g. snakemake.

Expand Down Expand Up @@ -229,12 +229,12 @@ see detail here: `Example <http://gseapy.readthedocs.io/en/master/gseapy_example
.. code:: python
# assign dataframe, and use enrichr libary data set 'KEGG_2016'
# assign dataframe, and use enrichr library data set 'KEGG_2016'
expression_dataframe = pd.DataFrame()
sample_name = ['A','A','A','B','B','B'] # always only two group,any names you like
# assign gene_sets parameter with enrichr library name or gmt file on your local computor.
# assign gene_sets parameter with enrichr library name or gmt file on your local computer.
gseapy.gsea(data=expression_dataframe, gene_sets='KEGG_2016', cls= sample_names, outdir='test')
# using prerank tool
Expand Down Expand Up @@ -263,7 +263,7 @@ see detail here: `Example <http://gseapy.readthedocs.io/en/master/gseapy_example
GSEAPY supported gene set libaries :
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To see the full list of gseapy supported gene set librarys, please click here: `Library <http://amp.pharm.mssm.edu/Enrichr/#stats>`_
To see the full list of gseapy supported gene set libraries, please click here: `Library <http://amp.pharm.mssm.edu/Enrichr/#stats>`_

Or use ``get_library_name`` function inside python console.

Expand Down
12 changes: 6 additions & 6 deletions docs/gseapy_tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@
A Protocol to Prepare files for GSEAPY
======================================

As a biological reseacher, I like protocols, so as other reseachers, too.
As a biological researcher, I like protocols, so as other researchers, too.

Here is an short tutorial to walk you through gseapy.

For file format explaination, please see `here <http://software.broadinstitute.org/gsea/doc/GSEAUserGuideFrame.html.>`_
For file format explanation, please see `here <http://software.broadinstitute.org/gsea/doc/GSEAUserGuideFrame.html.>`_

In order to run gseapy successfully, install gseapy use pip.

Expand All @@ -27,7 +27,7 @@ Use ``gsea`` command, or :func:`gsea`
Follow the steps blow.

One thing you should know is that the gseapy input files are totally the same as
``GSEA`` desktop requried. You can use these files below to run ``GSEA`` desktop, too.
``GSEA`` desktop required. You can use these files below to run ``GSEA`` desktop, too.


1. Prepare an tabular text file of gene expression like this:
Expand All @@ -44,7 +44,7 @@ commands below:
df = pd.read_table('./test/gsea_data.txt')
df.head()
#or assign df to the paramter 'data'
#or assign df to the parameter 'data'
.. raw:: html
Expand Down Expand Up @@ -157,7 +157,7 @@ An example of cls file looks like below.

All you need to do is to download gene set database file from ``GSEA`` website.

Or you could use enrichr library. In this case, just provide libarary name to parameter 'gene_sets'
Or you could use enrichr library. In this case, just provide library name to parameter 'gene_sets'

If you would like to use you own gene_sets.gmts files, build such a file use excel,
and then rename to gene_sets.gmt.
Expand Down Expand Up @@ -243,7 +243,7 @@ Use ``ssgsea`` command, or :func:`ssgsea`
Use ``enrichr`` command, or :func:`enrichr`
===============================================================

The only thing you need to prepeare is a gene list file.
The only thing you need to prepare is a gene list file.

**Note**: Enrichr uses a list of Entrez gene symbols as input.

Expand Down
8 changes: 4 additions & 4 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@ GSEAPY: Gene Set Enrichment Analysis in Python.

.. image:: https://img.shields.io/badge/license-MIT-blue.svg
:target: https://img.shields.io/badge/license-MIT-blue.svg
.. image:: https://img.shields.io/badge/python-3.5-blue.svg
:target: https://img.shields.io/badge/python-3.5-blue.svg
.. image:: https://img.shields.io/badge/python-3.6-blue.svg
:target: https://img.shields.io/badge/python-3.6-blue.svg
.. image:: https://img.shields.io/badge/python-2.7-blue.svg
:target: https://img.shields.io/badge/python-2.7-blue.svg

Expand Down Expand Up @@ -69,8 +69,8 @@ I would like to use Pandas to explore my data, but I did not find a convenient
do gene set enrichment analysis in python. So, here is my reason:

* **Running inside python interactive console without switch to R!!!**
* User friendly for both wet and dry lab usrers.
* Produce and reproduce pubilishable figures.
* User friendly for both wet and dry lab users.
* Produce and reproduce publishable figures.
* Perform batch jobs easy(using for loops).
* Easy to use in bash shell or your data analysis workflow, e.g. snakemake.

Expand Down
12 changes: 6 additions & 6 deletions docs/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ GSEAPY: Gene Set Enrichment Analysis in Python.

.. image:: https://img.shields.io/badge/license-MIT-blue.svg
:target: https://img.shields.io/badge/license-MIT-blue.svg
.. image:: https://img.shields.io/badge/python-3.5-blue.svg
:target: https://img.shields.io/badge/python-3.5-blue.svg
.. image:: https://img.shields.io/badge/python-3.6-blue.svg
:target: https://img.shields.io/badge/python-3.6-blue.svg
.. image:: https://img.shields.io/badge/python-2.7-blue.svg
:target: https://img.shields.io/badge/python-2.7-blue.svg

Expand Down Expand Up @@ -56,7 +56,7 @@ do gene set enrichment analysis in python. So, here is my reason:

* **Running inside python interactive console without switch to R!!!**
* User friendly for both wet and dry lab usrers.
* Produce pubilishable figures.
* Produce publishable figures.
* Perform batch jobs easy(using for loops).
* Easy to use in bash shell or your data analysis workflow, e.g. snakemake.

Expand Down Expand Up @@ -101,11 +101,11 @@ A graphical introduction of Enrichr

**Note**: Enrichr uses a list of Entrez gene symbols as input. You should convert all gene names to uppercase.

For example, both a list object and txt file are supported for ``enrchr`` API
For example, both a list object and txt file are supported for ``enrichr`` API

.. code:: python
# if you perfer to run gseapy.enrchr() inside python console, you could assign a list object to
# if you prefer to run gseapy.enrchr() inside python console, you could assign a list object to
# gseapy like this.
gene_list = ['SCARA3', 'LOC100044683', 'CMBL', 'CLIC6', 'IL13RA1', 'TACSTD2', 'DKKL1',
'CSF1', 'CITED1', 'SYNPO2L']
Expand Down Expand Up @@ -148,7 +148,7 @@ Installation
# for windows users
$ conda install -c bioninja gseapy
# or use pip to install the lastest release
# or use pip to install the latest release
$ pip install gseapy
| You may instead want to use the development version from Github, by running
Expand Down
16 changes: 8 additions & 8 deletions gseapy/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
__version__ = '0.9.1'

def main():
"""The Main function/pipeline for GSEAPY."""
"""The Main function/pipeline for GSEApy."""

# Parse options...
argparser = prepare_argparser()
Expand Down Expand Up @@ -175,7 +175,7 @@ def add_prerank_parser(subparsers):
# group for input files
prerank_input = argparser_prerank.add_argument_group("Input files arguments")
prerank_input.add_argument("-r", "--rnk", dest="rnk", action="store", type=str, required=True,
help="ranking dataset file in .rnk format.Same with GSEA.")
help="ranking metric file in .rnk format.Same with GSEA.")
prerank_input.add_argument("-g", "--gmt", dest="gmt", action="store", type=str, required=True,
help="Gene set database in GMT format. Same with GSEA.")
prerank_input.add_argument("-l", "--label", action='store', nargs=2, dest='label',
Expand Down Expand Up @@ -224,11 +224,11 @@ def add_singlesample_parser(subparsers):

# group for General options.
group_opt = argparser_gsea.add_argument_group("GSEA advanced arguments")
group_opt.add_argument("--norm-method", dest = "norm", action="store", type=str,
group_opt.add_argument("--nm", "--norm-method", dest = "norm", action="store", type=str,
default='rank', metavar='normalize',
choices=("rank", "log", "log_rank", "custom"),
help="Sample normalization method. Choose from {'rank', 'log', 'log_rank','custom'}. Default: rank")
group_opt.add_argument("--no-scale", action='store_false', dest='scale', default=True,
group_opt.add_argument("--ns", "--no-scale", action='store_false', dest='scale', default=True,
help="If the flag was set, don't normalize the enrichment scores by number of genes.")
group_opt.add_argument("-n", "--permu-num", dest = "n", action="store", type=int, default=1000, metavar='perNum',
help="Number of random permutations. For calculating esnulls. Default: 1000")
Expand Down Expand Up @@ -275,18 +275,18 @@ def add_enrichr_parser(subparsers):
enrichr_opt.add_argument("-i", "--input-list", action="store", dest="gene_list", type=str, required=True, metavar='geneSymbols',
help="Enrichr uses a list of Entrez gene symbols as input.")
enrichr_opt.add_argument("-g", "--gene-sets", action="store", dest="library", type=str, required=True, metavar='gmt',
help="Enrichr library name required. see online tool for libary names.")
enrichr_opt.add_argument("--description", action="store", dest="descrip", type=str, default='enrichr', metavar='strings',
help="Enrichr library name required. see online tool for library names.")
enrichr_opt.add_argument("--ds", "--description", action="store", dest="descrip", type=str, default='enrichr', metavar='strings',
help="It is recommended to enter a short description for your list so that multiple lists \
can be differentiated from each other if you choose to save or share your list.")
enrichr_opt.add_argument("--cut-off", action="store", dest="thresh", metavar='float', type=float, default=0.05,
enrichr_opt.add_argument("--cut", "--cut-off", action="store", dest="thresh", metavar='float', type=float, default=0.05,
help="Adjust-Pval cutoff, used for generating plots. Default: 0.05.")
enrichr_opt.add_argument("-t", "--top-term", dest="term", action="store", type=int, default=10, metavar='int',
help="Numbers of top terms showed in the plot. Default: 10")
#enrichr_opt.add_argument("--scale", dest = "scale", action="store", type=float, default=0.5, metavar='float',
# help="scatter dot scale in the dotplot. Default: 0.5")
enrichr_opt.add_argument("--no-plot", action='store_true', dest='no_plot', default=False,
help="Suppress the plot output.This is useful only if data are intrested. Default: False.")
help="Suppress the plot output.This is useful only if data are interested. Default: False.")


enrichr_output = argparser_enrichr.add_argument_group("Output figure arguments")
Expand Down

0 comments on commit f88e3b8

Please sign in to comment.