Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HyPhy command line documentation #1630

Closed
casparbein opened this issue Jul 26, 2023 · 8 comments
Closed

HyPhy command line documentation #1630

casparbein opened this issue Jul 26, 2023 · 8 comments
Assignees
Labels

Comments

@casparbein
Copy link

Hi,

Thanks for developing and maintaining the HyPhy suite!

I have a general question/comment: As seen in this issue, sometimes there seem to be command line options in HyPhy that I could not find in the documentation anywhere. Since we are running several HyPhy tools over a large number of alignments, interactive command line operations are not feasible. Is there an exhaustive manual, where options like ENV or others are explained? Overall, the command line functionality seems to be not as powerful as the interactive mode, since for example in absrel, the output will be only a json file (as opposed to the additional CSV and Nexus file of the interactive mode).

Thanks a lot!

@spond
Copy link
Member

spond commented Jul 26, 2023

Dear @casparbein,

  1. Unfortunately, there is no comprehensive list of environment variables. You can find references to them in the issues here (and a fair number of them were created in response to the issues). I'll return to this issue in a bit and create a short list of the most relevant environmental variables.
  2. I am not sure what you mean by the interactive/command line functionality. Can you elaborate? I understand "interactive" as simply the command line mode where the user is prompted interactively for options.

Best,
Sergei

@casparbein
Copy link
Author

Hi Sergei,

thanks for your quick reply. As to your second question, yes, I mean the command line mode where you have to interact with the program as opposed to a statement where you specify all parameters in advance. In absrel, for example, I would run something like this:
hyphy absrel --alignment sequence.fa --tree tree.nh ENV='TOLERATE_NUMERICAL_ERRORS=1;' --output out.json
where the output is a json file. Here, under the absrel section, four output files are listed. The json alone is fine; I was just wondering if I missed a command line option (something like --format) where I could specify more things.

Cheers,
Bernhard

@spond
Copy link
Member

spond commented Jul 26, 2023

Dear @casparbein,

Ah, I understand now. The tutorial itself is a bit out-of-date (it's from ~2017), so some of the options have disappeared. With the newer HyPhy analysis, i.e. the ones that take --key value type arguments you have three types of arguments, required, optional, and conditional.

You can see most analysis arguments by typing

$hyphy absrel --help

code
	Which genetic code should be used
	default value: Universal

alignment [required]
	An in-frame codon alignment in one of the formats supported by HyPhy

tree [conditionally required]
	A phylogenetic tree (optionally annotated with {})
	applies to: Please select a tree file for the data:

branches
	Branches to test
	default value: All

multiple-hits
	Include support for multiple nucleotide substitutions
	default value: None

srv
	Include synonymous rate variation
	default value: No

syn-rates
	The number alpha rate classes to include in the model [1-10, default 3]
	default value: absrel.synonymous_rate_classes [computed at run time]

output
	Write the resulting JSON to this file (default is to save to the same path as the alignment file + 'ABSREL.json')
	default value: absrel.codon_data_info[terms.json.json] [computed at run time]

The conditional arguments, like --tree apply only in certain cases, for example if the alignment file does not contain a tree in it.

The optional arguments, like --branches will have default values (All) that are used unless overriden.

Because hyphy absrel --help works by scanning the script file for absrel for specific commands, it may not always detect every available option, especially if they are defined in script files that are loaded at run time by absrel.

One more reason to develop better docs. Unfortunately, as you well know, I am sure, documentation is the last thing that academic s/w developers usually focus on.

I'll create a list of key environment variables in this issue and ask @stevenweaver to also post a version of it on our main website.

Best,
Sergei

@spond
Copy link
Member

spond commented Jul 26, 2023

PS. We have been trying to standardize common analysis outputs to be a single JSON file.

@casparbein
Copy link
Author

Dear Sergei,

thank you for your detailed reply, I appreciate your efforts to extend the documentation. Also, I want to reiterate that the programs in the HyPhy suite we use work really well for our purposes. The issue I raised was more to convince ourselves that we are not missing important parameters that we simply could not find in the manual or the website.

Thanks again for taking this seriously.
Cheers,
Bernhard

@spond
Copy link
Member

spond commented Jul 31, 2023

Here are some environment variables that may be of general use. Note that some of the analyses provide their own values for some of these variables, and those will take precedence over whatever is specified on the command line.

Variable Description Value range
TOLERATE_NUMERICAL_ERRORS What should be done when internal diagnostics indicate that likelihood function calculations may be subject to numerical error/instability. In most cases, these issues are encountered if the optimizer arrives at a set of parameter values that are in some sense extreme (close to the lowest/highest values).
  • 0 or FALSE (default): treat numerical stability issues as errors and terminate when encountered
  • 1 or TRUE treat numerical stability issues as warnings, print warnings to console, and continue execution
TOLERATE_CONSTRAINT_VIOLATION What should be done when :=, :< or :> constraints on model parameters cannot be satisfied (no feasible solution can be automatically found). In most cases, these issues are encountered if the optimizer arrives at a set of parameter values that are in some sense extreme, or if a set of mutually contradictory constraints is specified (e.g. x2 :< x1; x2 :> x3; x3 := x1 + 1;)
  • 0 or FALSE (default): treat constraint violoation issues as errors and terminate when encountered
  • 1 or TRUE treat constraint violation issues as warnings, print warnings to console, and continue execution if possible
NORMALIZE_SEQUENCE_NAMES Ask HyPhy to automatically convert sequence names to valid idetifiers, by replacing and other "inadmissible" characters with _. This is done because HyPhy needs to be able to create parameter names like tree.node.parameter where node is a sequence name for leaf nodes. If node has spaces, arithmetic operation symbols, etc, this will lead to run-time errors.
  • 1 or TRUE (default): convert all sequence names to valid HyPhy identifiers
  • 0 or FALSE, keep sequence names as is
COUNT_GAPS_IN_FREQUENCIES If set to TRUE, - will contribute 1/N to character counts for base frequency estimation; for example ACGT- will count 1.25 for each base.
  • 1 or TRUE (default): use gaps to estimate empirical base frequencies
  • 0 or FALSE, skip gaps when estimating frequencies
SKIP_OMISSIONS If set to TRUE, then any site in a multiple sequence alignment with gaps or N-fold degeneracies will be automatically filtered out
  • 1 or TRUE: automatically filter sites with N-fold degeneracies
  • 0 or FALSE (default), keep all sites
USE_MEMORY_SAVING_DATA_STRUCTURES Any alignments with more that this many sites will not generate some of the additional information (maps of duplicate site patterns, etc). This may trigger error messages in many standard analyses which expect those objects to be present. Set to larger values to override this behavior An integer >1, 100000 by default
DATA_FILE_PRINT_FORMAT Whenever a dataset / datafilter is written out, use this format An integer from the following list
  • kFormatMEGASequential = 0
  • kFormatMEGAInterleaved = 1
  • kFormatPHYLIPSequential = 2
  • kFormatPHYLIPInterleaved = 3
  • kFormatNEXUSLabelsSequential = 4
  • kFormatNEXUSLabelsInterleaved = 5
  • kFormatNEXUSSequential = 6 (default)
  • kFormatNEXUSInterleaved = 7
  • kFormatCharacterList = 8
  • kFormatFASTASequential = 9
  • kFormatFASTAInterleaved = 10
  • kFormatPAML = 11

@casparbein
Copy link
Author

Dear Sergei,

thank you very much for this list! I am sure that it will be helpful for us and others.

Cheers,
Bernhard

@spond
Copy link
Member

spond commented Aug 4, 2023

Dear @casparbein,

I'll keep adding to it; there's quite a few more. Although I think it might be better done including specific examples.

Best,
Sergei

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants