Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat (encodeFasta): encodeFasta added to encode fasta header #389

Merged
merged 2 commits into from
Feb 12, 2022

Conversation

zhuchcn
Copy link
Member

@zhuchcn zhuchcn commented Feb 12, 2022

The command encodeFasta is added. It takes 1 input and 1 output. For example:

moPepGen encodeFasta -i variant.fasta -o encoded.fasta

Example of encoded FASTA:

>563744e3-f6c9-4dcf-8436-2f54bf670294
FKFMTRR
>ce6f97ce-34a9-413b-b94d-55b9732ff36a
ERERLYLCGVTGSPTENCAK
>27456aec-1705-4aa7-860d-aff91082c67b
GQQPCTVAEGRCLTCEPGWNRTK
>fc79213d-bc3b-485a-b246-baf5065f6bce
KPLVVDISER
>62529d60-5d4c-476b-a302-c7638f0c57fc
MFKFMAR

It also writes a dict file with a suffix of .dict added to the output file specified. So the in the example above, the dict file name is encoded.fasta.dict

$ head encode.fasta.dict
563744e3-f6c9-4dcf-8436-2f54bf670294    CIRC-ENST00000614167.2-E1-E2|27
ce6f97ce-34a9-413b-b94d-55b9732ff36a    CIRC-ENST00000614167.2-E1-E2|77
27456aec-1705-4aa7-860d-aff91082c67b    ENST00000622235.5|RES-202-G-A|3
fc79213d-bc3b-485a-b246-baf5065f6bce    CIRC-ENST00000614167.2-E1-E2|58
62529d60-5d4c-476b-a302-c7638f0c57fc    CIRC-ENST00000614167.2-E1-E2|RES-101-A-G|1 CIRC-ENST00000614167.2-E1-E2|RES-101-A-G|2

Closes #365

@lydiayliu
Copy link
Collaborator

Just thought of this, do we need to pair this with a decodeFasta? Anybody should be able to use the dict as a lookup table? XD

@zhuchcn
Copy link
Member Author

zhuchcn commented Feb 12, 2022

I thought about this! But just not sure how useful it would be. It's really easy to implement so maybe we can do that once we find it's useful.

@zhuchcn zhuchcn merged commit 9c1a972 into main Feb 12, 2022
@lydiayliu lydiayliu deleted the czhu-feat-encode-fasta branch February 23, 2022 01:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rmats + variants CPCG0395 invalid aa
2 participants