About

Here are some python scripts I wrote. Most of them process fasta/fastq/gb files.

From now on, I will add some descriptions for each program.

You can type

python3 pyfile -h

or

python3 pyfile

to print usage of each program.

You can ask me any question about these programs via wpwupingwp@outlook.com .

Requirement

python3

Be sure to install python3 rather than python 2.7. Besides, to use subprocess.run(), you would better install python 3.5 or above.
biopython
BLAST Suite

And notice that all scripts were just tested on Linux system, although theoretically they may works fine on Windows.

Batch

Many of programs in this repository support batch mode. See examples below. Note that "*.fasta" is files you want to process, and i is variable you can use other name if you want. And parameters of program was omitted.

Microsoft Windows

for i in (*.fasta) do python program.py %i

Linux

for i in *.fasta;do python3 program.py $i

Help information

Just type:

python3 program.py -h

This folder

parallel.py

Parallel run other programs.

Usage

python3 parallel.py "command %i" "file"

Make sure you do not omit quotation mark.

The "%i" in "command" is the filename. You can use glob pattern in "file".

example

python3 parallel.py "python3 gb2fasta.py %i" "*.gb"

split.py

Split fasta or fastq files according to given "-s".

Usage

python3 split.py -i input_file -s 10000000 -o output_path

It only support fasta or fastq file. The option "-s" means how many sequences you want in one file. The default value is 100000. You can change output folder by "-o".

Example

python3 split.py -i pe150.fastq -s 100000 -o pe150_split

convert.py

Convert file format.

Usage

python3 convert.py old_file_name old_format new_file_name new_format

Example

python3 convert.py Zea.nex nexus Zea.fasta fasta

xml2fasta.py

Convert xml format BLAST result to fasta format and output result table.

Usage

BLAST your sequences.
Download xml format result.
Run

python3 xml2fasta.py BlastResult.xml

python3 xml2fasta.py BlastResult.xml -s

python3 xml2fasta.py BlastResult.xml -ss

If you use option "-s", it will only proces first hsp for each hit in every query sequence.

If you use option "-ss", it will only process first hsp of first hit for each query sequence.

Note that for Microsoft Windows user, maybe you should replace "python3" with "python".

Result

fasta file. The first sequence is your query sequence, and others are matched fragment sequences of the query.
tsv file. Table for simple analyze.
NotFound.log Hint for those query sequences did not found match by BLAST.

trim.py

Trim fragment in given fasta file, or replace trimmed bases with 'N'.

Usage

python3 trim.py input.fasta from:to

Here from and to are integers which represents region you want to cut off. If you want to cut tail of sequence and you do not know specific length of sequence, you can use negative from with a big to to handle it. For instance, "-20:10000" means cut last 20 bases -- assumes that every sequence you give shorter than 10000.

Example

Cut middle

python3 trim.py rbcL.fasta 100:150

Cut head

python3 trim.py rbcL.fasta 1:24

Cut tail

python3 trim.py rbcL.fasta "-5:1000000"

no_same.py

Remove identical sequence in give fasta/nexus file. New file will be write into ".new" with the same format of input file.

Duplicated sequences will be printed on screen.

Usage

python3 no_same.py input_file

Example

python3 no_same.py cbs.fasta

vlookup_assistant.py

Expand a given table according to range.

Input table (CSV format) looks like this:

A,B,C

It will generate a new table:

D,E

where D was expanded from range(B, C) and E is related A.

add_gene_name.py

Rename fasta files in one directory according to gene info provided by the first record in each file

pick.py

Pick fasta record according to id list

screen.py

Screen sequence assembled by spades according to sequence length and coverage info in sequence id.

Warning: This program use regular expression to recognize infomation, it may generate wrong output when it was used on other sequence if format.

nex_for_mb.py

Remove illegal characters in sequence id for Mrbayes.

Only support nexus format. Sequence ID longer than 90 will be cutted

python3 nex_for_mb.py nexus_file_name

fasta2nexus.py

Combine fasta files into one nexus file with partition information.

python3 fasta2nexus.py input_files -o output_filename

old

Some old code.

cp

Some program to deal with genbank files, most of them belongs to chloroplast.

plot

Use matplotlib to draw figures for my master thesis.

inhibitor

Some code to analyze data from microreader. For Cystathionine beta-synthase inhibitor project.

Template

Some useful code fragments.

1kp

Programs for 1kp.

Name		Name	Last commit message	Last commit date
Latest commit History 1,032 Commits
1kp		1kp
align		align
checklist		checklist
cp		cp
fluent_python		fluent_python
inhibitor		inhibitor
learn		learn
old		old
owncloud		owncloud
parallel		parallel
pdf		pdf
plot		plot
.gitattributes		.gitattributes
.gitignore		.gitignore
LinkNode.py		LinkNode.py
README.md		README.md
T.py		T.py
TNRS.py		TNRS.py
Template.py		Template.py
_config.yml		_config.yml
a_jian_b.py		a_jian_b.py
a_jiao_b.py		a_jiao_b.py
add_number.py		add_number.py
add_uv.py		add_uv.py
all2m4a.py		all2m4a.py
all2png.py		all2png.py
amerge_folder.py		amerge_folder.py
arp_hap_to_nex.py		arp_hap_to_nex.py
build_with_nuitka.py		build_with_nuitka.py
convert.py		convert.py
count_files.py		count_files.py
ctypes_.py		ctypes_.py
del_space.py		del_space.py
divide_trees.py		divide_trees.py
draw_contig.py		draw_contig.py
face.py		face.py
fasta2nexus.py		fasta2nexus.py
filter_len.py		filter_len.py
find_ssr.py		find_ssr.py
fuzzy_match.py		fuzzy_match.py
get_cp_info.py		get_cp_info.py
get_hulianwang_jia_questions.py		get_hulianwang_jia_questions.py
gisaid.py		gisaid.py
iplant.py		iplant.py
join_fastq.py		join_fastq.py
keep_connect.py		keep_connect.py
kmp.py		kmp.py
merge_folder.py		merge_folder.py
mv_by_prefix.py		mv_by_prefix.py
my_plot.py		my_plot.py
nex_for_mb.py		nex_for_mb.py
no_same.py		no_same.py
only4.py		only4.py
order_seqs.py		order_seqs.py
organize.py		organize.py
parallel.py		parallel.py
pick.py		pick.py
print_len.py		print_len.py
pyecharts_demo.py		pyecharts_demo.py
qr.py		qr.py
rc.py		rc.py
reencode_video.py		reencode_video.py
remove_NC.py		remove_NC.py
remove_long.py		remove_long.py
rename_contig.py		rename_contig.py
rename_pathlib.py		rename_pathlib.py
replace_ambiguous_base_with_N.py		replace_ambiguous_base_with_N.py
rss.py		rss.py
run_fastp.py		run_fastp.py
screen.py		screen.py
sleep.py		sleep.py
split.py		split.py
split_align.py		split_align.py
split_fq_or_gz.py		split_fq_or_gz.py
split_gff.py		split_gff.py
split_msa.py		split_msa.py
split_nexus.py		split_nexus.py
split_ref.py		split_ref.py
test		test
time_count.py		time_count.py
trim.py		trim.py
trna_rename.py		trna_rename.py
update_pip.sh		update_pip.sh
upper.py		upper.py
use_pycorrector.py		use_pycorrector.py
wraps.py		wraps.py
xiehandan.py		xiehandan.py
xml2fasta.py		xml2fasta.py
zhaopin_ceshi.py		zhaopin_ceshi.py

wpwupingwp/python

Folders and files

Latest commit

History

Repository files navigation

About

Requirement

Batch

Microsoft Windows

Linux

Help information

This folder

parallel.py

Usage

example

split.py

Usage

Example

convert.py

Usage

Example

xml2fasta.py

Usage

Result

trim.py

Usage

Example

no_same.py

Usage

Example

vlookup_assistant.py

add_gene_name.py

pick.py

screen.py

nex_for_mb.py

fasta2nexus.py

old

cp

plot

inhibitor

Template

1kp

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages