Discover the algorithms underlying a variety of bioinformatics topics: computational mass spectrometry, alignment, dynamic programming, genome assembly, genome rearrangements, phylogeny, probability, string algorithms and others.

探索各种生物信息学相关的算法：计算质谱，比对，动态规划，基因组装配，基因组重排，系统发育，概率，字符串算法等。

# Counting DNA Nucleotides

# 计算DNA核苷酸

## A Rapid Introduction to Molecular Biology

## 分子生物学快速入门

Making up all living material, the **cell** is considered to be the building block of life. The **nucleus**, a component of most **eukaryotic** cells, was identified as the hub of cellular activity 150 years ago. Viewed under a light microscope, the nucleus appears only as a darker region of the cell, but as we increase magnification, we find that the nucleus is densely filled with a stew of macromolecules called **chromatin**. During **mitosis** (eukaryotic cell division), most of the chromatin condenses into long, thin strings called **chromosomes**. See Figure 1 for a figure of cells in different stages of mitosis.

**细胞**作为构成所有生物原料被认为是生命的基石。**细胞核**是大多数**真核细胞**的组成部分，150年前被确定为细胞活动的中心。在光学显微镜下观察，细胞核仅作为细胞的较暗区域出现，但随着我们增加放大倍数，我们发现细胞核密集地充满了称为**染色质**的大分子物质。在**有丝分裂**期间（真核细胞分裂），大多数染色质浓缩成长而细的细胞串，称为**染色体**。有关有丝分裂不同阶段的细胞图见下图。

![Figure 1. A 1900 drawing by Edmund Wilson of onion cells at different stages of mitosis. The sample has been dyed, causing chromatin in the cells (which soaks up the dye) to appear in greater contrast to the rest of the cell.](Images/001.png)

**Figure 1.** A 1900 drawing by Edmund Wilson of onion cells at different stages of mitosis. The sample has been dyed, causing chromatin in the cells (which soaks up the dye) to appear in greater contrast to the rest of the cell.

**图1.** 在1900年Emmund Wilson在有丝分裂不同阶段绘制的洋葱细胞图。由于样品已被染色，导致细胞中的染色质（吸收染料）与细胞的其他部分形成鲜明对比。

One class of the macromolecules contained in chromatin are called **nucleic acids**. Early 20th century research into the chemical identity of nucleic acids culminated with the conclusion that nucleic acids are **polymers**, or repeating chains of smaller, similarly structured molecules known as **monomers**. Because of their tendency to be long and thin, nucleic acid polymers are commonly called **strands**.

染色质中含有的一类大分子称为**核酸**。20世纪早期对核酸化学特性的研究最终得出结论：核酸是**聚合物**，或者将这种重复结构的称为**单体**。由于它们倾向于长而薄，核酸聚合物通常被称为**链**。

The nucleic acid monomer is called a **nucleotide** and is used as a unit of strand length (abbreviated to nt). Each nucleotide is formed of three parts: a **sugar** molecule, a negatively charged **ion** called a phosphate, and a compound called a **nucleobase** ("base" for short). Polymerization is achieved as the sugar of one nucleotide bonds to the phosphate of the next nucleotide in the chain, which forms a **sugar-phosphate backbone** for the nucleic acid strand. A key point is that the nucleotides of a specific type of nucleic acid always contain the same sugar and phosphate molecules, and they differ only in their choice of base. Thus, one strand of a nucleic acid can be differentiated from another based solely on the order of its bases; this ordering of bases defines a nucleic acid's **primary structure**.

核酸单体称为**核苷酸**，并作为链长度的单位（缩写为*nt*）。每个核苷酸由三部分组成：**糖分子**，带有负离子的**磷酸盐**，和**核碱基**化合物（简称“碱基”）。当一个核苷酸的糖与链中下一个核苷酸的磷酸键合时开始聚合，其形成核酸链的**糖-磷酸骨架**。关键点在于特定类型核酸的核苷酸总是含有相同的糖和磷酸盐分子，它们的区别仅在于它们对碱基的选择。因此，核酸的一条链可以仅基于其碱基的顺序与另一条链区分开;碱基的这种排序定义了核酸的**主要结构**。

For example, Figure 2 shows a strand of **deoxyribose nucleic acid** (DNA), in which the sugar is called **deoxyribose**, and the only four choices for nucleobases are molecules called **adenine** (A), **cytosine** (C), **guanine** (G), and **thymine** (T).

例如，图2显示了**脱氧核糖核酸**（DNA）链，其中糖被称为**脱氧核糖**，分别有四种碱基：**腺嘌呤**（A），**胞嘧啶**（C），**鸟嘌呤**（G）和**胸腺嘧啶**（T）。

![Figure 2. A sketch of DNA's primary structure.](Images/002.png)

**Figure 2.** A sketch of DNA's primary structure.


**图2.** DNA的主要结构草图。

For reasons we will soon see, DNA is found in all living organisms on Earth, including bacteria; it is even found in many viruses (which are often considered to be nonliving). Because of its importance, we reserve the term **genome** to refer to the sum total of the DNA contained in an organism's chromosomes.

DNA存在于地球上的所有生物体中，包括细菌;它甚至存在于许多病毒中（通常被认为是非生命的）。由于其重要性，我们使用“**基因组**”来指代生物体染色体中包含的DNA的总和。

## Problem

## 问题

A **string** is simply an ordered collection of symbols selected from some **alphabet** and formed into a word; the **length** of a string is the number of symbols that it contains.

**字符串**只是从某些**字母表**中选择的符号的有序集合，并形成一个单词;字符串的**长度**是它包含的符号数。 

An example of a length 21 **DNA string** (whose alphabet contains the symbols 'A', 'C', 'G', and 'T') is "ATGCTTCAGAAAGGTCTTACG."

长度为21的**DNA串**（其字母包含符号'A'，'C'，'G'和'T'）的示例是“ATGCTTCAGAAAGGTCTTACG”。

**Given:** A DNA string s of length at most 1000 nt.

**Return:** Four integers (separated by spaces) counting the respective number of times that the symbols 'A', 'C', 'G', and 'T' occur in s.

## Sample Dataset

## 样本数据集

```
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC
```

## Sample Output
 
## 样本输出

```
20 12 17 21
```

In [24]:
def count_DNA(string):
    return string.count("A"), string.count("C"), string.count("G"), string.count("T") 

In [25]:
print(count_DNA("AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC"))

(20, 12, 17, 21)


In [27]:
with open("Bioinformatics_Stronghold/data/rosalind_dna.txt", "r") as r_dna:
    r_dna = r_dna.read()

In [28]:
print(count_DNA(r_dna))

(243, 228, 221, 231)


In [29]:
r_dna.count("A")

243

# Transcribing DNA into RNA

# DNA 转录为 RNA

## The Second Nucleic Acid

## 第二种核酸

In “**Counting DNA Nucleotides**”, we described the **primary structure** of a **nucleic acid** as a polymer of **nucleotide** units, and we mentioned that the omnipresent nucleic acid **DNA** is composed of a varied sequence of four bases.

在“**计数DNA核苷酸**”中，我们描述了**核酸**的**一级结构**作为**核苷酸**单位的聚合物，我们提到了无处不在的核酸**DNA**由四个碱基的不同序列组成。

Yet a second nucleic acid exists alongside DNA in the **chromatin**; this molecule, which possesses a different sugar called **ribose**, came to be known as **ribose nucleic acid**, or RNA. RNA differs further from DNA in that it contains a base called **uracil** in place of **thymine**; structural differences between DNA and RNA are shown in Figure 1. Biologists initially believed that RNA was only contained in plant **cells**, whereas DNA was restricted to animal cells. However, this hypothesis dissipated as improved chemical methods discovered both nucleic acids in the cells of all life forms on Earth.

然而，第二种核酸与**染色质**中的DNA一起存在;这种分子具有不同的糖，称为**核糖**，后来被称为**核糖核酸**或RNA。RNA与DNA的不同之处在于它含有一种叫做尿嘧啶的**碱基代替**胸腺嘧啶**;DNA和RNA之间的结构差异如图1所示。生物学家最初认为RNA仅包含在**植物细胞**中，而DNA仅限于动物细胞。然而，随着改进的化学方法在地球上所有生命形式的细胞中发现了两种核酸，这一假设消失了。

![Figure 1. Structural differences between RNA and DNA](Images/003.png)

**Figure 1.** Structural differences between RNA and DNA

**图1. ** RNA和DNA之间的结构差异

The **primary structure** of DNA and RNA is so similar because the former serves as a blueprint for the creation of a special kind of RNA molecule called **messenger RNA**, or mRNA. mRNA is created during RNA transcription, during which a **strand** of DNA is used as a template for constructing a strand of RNA by copying nucleotides one at a time, where uracil is used in place of thymine.

DNA和RNA的**主要结构**是如此相似，因为前者是创建**信使RNA**或mRNA这种特殊RNA分子的蓝图。mRNA在RNA转录期间产生，在此期间，DNA的**链**用作构建RNA链的模板，通过一次复制一个核苷酸，其中使用尿嘧啶代替胸腺嘧啶。

In eukaryotes, DNA remains in the **nucleus**, while RNA can enter the far reaches of the cell to carry out DNA's instructions. In future problems, we will examine the process and ramifications of RNA transcription in more detail.

在真核生物中，DNA存在于**细胞核**中，而RNA可以进入细胞的远端以执行DNA的命令。在以后的问题中，我们将更详细地研究RNA转录的过程和分枝。

## Problem

An **RNA string** is a string formed from the alphabet containing 'A', 'C', 'G', and 'U'.

**RNA串**是由包含'A'，'C'，'G'和'U'的字母组成的字符串。

Given a DNA string t corresponding to a coding strand, its transcribed RNA string u is formed by replacing all occurrences of 'T' in t with 'U' in u.

给定对应于编码链的DNA串*t*，其转录的RNA串*u*通过用*u*中的'U'替换t中所有出现的'T'而形成。

**Given:** A DNA string t having length at most 1000 nt.

**Return:** The transcribed RNA string of t.

## Sample Dataset

```
GATGGAACTTGACTACGTAAATT
```

## Sample Output

```
GAUGGAACUUGACUACGUAAAUU
```

In [20]:
def transcribing_RNA(string):
    return string.replace("T", "U")

In [21]:
print(transcribing_RNA("GATGGAACTTGACTACGTAAATT"))

GAUGGAACUUGACUACGUAAAUU


In [22]:
with open("../Bioinfo/Bioinformatics_Stronghold/data/rosalind_rna.txt") as rna:
    rna = rna.read()

In [23]:
print(transcribing_RNA(rna))

AACUGCGGCAUCUUAAUCGUGCACUCUUCACAAUGACUACAUGAACAUCAAUUCAGGACGAGGUCUUAUAGCCGGUACUAUGCUUGUCCUGUGAAGGUGCCAUGAGAACAUUGAGAAUAACGCCCCUGGCGCCUUUCACAAUCAUUUCGGGUCACUCCCCAUAUCCGCUAGGGCAACGGUGACGUCUUUCACGAAAUUCAUAGGUAAAGACCGACUUUCAAGCUUGCUAUACGAAUCGCCAGGUCCCUAUUAAACACUAGUAUACAUACACCUCCAGGUGGACCGCGAGUCAAAACAACCAAUUACCUUAGCCUGCAAUCGACCGAGUUAUGGCAGUCCGGAGGAUACGCCGCUCCUCGACCGCAUUUAACUGGUUUGUUGUCACACGAACGCGAUUCUACUGGUAAUUUAAUAUUCUAGAUGCUCUAAGAGCACUUUCUGUGAUGUGAUGCGAAAGGCAUGAACGCUAAACAACUGCCUCGCACAUCACUGUUCACAAGUAGAGCGAUGCCGUGUACUCACAUCAUGCCGUAGUUCUGGUGAUGUUCAGUGCCAGUACAAUGCAUCCUUGGCGCCCGCACGAGCUCCUUGAUAACACUGUGACAGAUAAGGCUAUCUGUAUACCGUCUUCGCGUCCUUCAGGCCUUCAGGGGAAGAGCGCUCAGAGAUACUUGAUCACGAUUCCGCCGGGCUCUGACGGAAUCCAACAGACACAAUUCUAGGCGGUAACCGGCCUUACUUGCGUAUGUGAGUUUCCUGAAAAUGCAUUUUCUAUUGCACCAUGAAUGCCUGGAGAAGUAAUCUCGUCGUCACUCCCAGUCCGACAAGCCAAUAAUCUACCCGCUUUGACUGUGACUAACAACAAUUCUCGCGGGCCGAACGGCAGGACCGGGGUUCGAGACACGAAUCAGCAGGAACAGGCCAGGCCUAGGUAAUGGCUAUGUUCUUGUCG



# Complementing a Strand of DNA

# DNA的互补链

## The Secondary and Tertiary Structures of DNA

## DNA的二级和三级结构

In “Counting DNA Nucleotides”, we introduced nucleic acids, and we saw that the **primary structure** of a nucleic acid is determined by the ordering of its **nucleobases** along the **sugar-phosphate backbone** that constitutes the bonds of the nucleic acid **polymer**. Yet primary structure tells us nothing about the larger, 3-dimensional shape of the molecule, which is vital for a complete understanding of nucleic acids.

在“计数DNA核苷酸”中，我们引入了核酸，并且我们看到核酸的**一级结构**由其核碱基沿着构成核酸聚合物键的**糖-磷酸**主链的有序性决定。然而，初级结构没有告诉我们关于分子的更大的三维形状，这对于完全理解核酸是至关重要的。

The search for a complete chemical structure of nucleic acids was central to molecular biology research in the mid-20th Century, culminating in 1953 with a publication in Nature of fewer than 800 words by James Watson and Francis Crick. Consolidating a high resolution X-ray image created by Rosalind Franklin and Raymond Gosling with a number of established chemical results, Watson and Crick proposed the following structure for DNA:

寻找完整的核酸化学结构是20世纪中叶分子生物学研究的核心，最终于1953年在詹姆斯·沃森和弗朗西斯·克里克的“自然”杂志上发表了不到800字的文章。结合由Rosalind Franklin和Raymond Gosling创建的高分辨率X射线图像以及许多已确定的化学结果，Watson和Crick提出了以下DNA结构：

1. The DNA molecule is made up of two strands, running in opposite directions.

2. Each base bonds to a base in the opposite strand. Adenine always bonds with thymine, and cytosine always bonds with guanine; the complementof a base is the base to which it always bonds; see Figure 1.

3. The two strands are twisted together into a long spiral staircase structure called a double helix; see Figure 2.


1. DNA分子由两条链组成，以相反的方向运行。 

2. 每个碱基与相反链中的碱基键合。腺嘌呤总是与胸腺嘧啶结合，胞嘧啶总是与鸟嘌呤结合;碱基互补是它始终联系的基础;参见图1. 

3. 将两股绞合成一个称为双螺旋的长螺旋楼梯结构;见图2。

![Figure 1. Base pairing across the two strands of DNA.](Images/004.png)

**Figure 1.** Base pairing across the two strands of DNA.

![Figure 2. The double helix of DNA on the molecular scale.](Images/005.png)

**Figure 2.** The double helix of DNA on the molecular scale.

Because they dictate how bases from different strands interact with each other, (1) and (2) above compose the secondary structure of DNA. (3) describes the 3-dimensional shape of the DNA molecule, or its tertiary structure.

因为它们决定了来自不同链的碱基如何相互作用，上述（1）和（2）构成了DNA的二级结构。（3）描述了DNA分子的三维形状或其三级结构。

In light of Watson and Crick's model, the bonding of two complementary bases is called a **base pair** (bp). Therefore, the length of a DNA molecule will commonly be given in bp instead of nt. By complementarity, once we know the order of bases on one strand, we can immediately deduce the sequence of bases in the complementary strand. These bases will run in the opposite order to match the fact that the two strands of DNA run in opposite directions.

根据Watson和Crick的模型，两个互补碱基的键合称为**碱基对**（bp）。因此，DNA分子的长度通常以bp而不是nt给出。通过互补性，一旦我们知道一条链上碱基的顺序，我们就可以立即推断出互补链中的碱基序列。这些碱基将以相反的顺序运行以匹配两条DNA链以相反方向运行的事实。

## Problem

In DNA strings, symbols 'A' and 'T' are complements of each other, as are 'C' and 'G'.

在DNA字符串中，符号“A”和“T”是彼此的互补，“C”和“G”也是如此。

The reverse complement of a DNA string *s* is the string *sc* formed by reversing the symbols of *s*, then taking the complement of each symbol (e.g., the reverse complement of "GTCA" is "TGAC").

DNA串*s*的反向互补是通过反转*s*的符号形成的串*sc*，然后取每个符号的补码（例如，“GTCA”的反向补码是“TGAC”）。

**Given:** A DNA string s of length at most 1000 bp.

**Return:** The reverse complement sc of s.

## Sample Dataset

```
AAAACCCGGT
```

## Sample Output


```
ACCGGGTTTT
```

In [36]:
def complement_strand(string):
    rules = {"A":"T", "T":"A", "C":"G", "G":"C", "\n":""}
    return "".join(rules[i] for i in string[::-1])

In [37]:
print(complement_strand("AAAACCCGGT"))

ACCGGGTTTT


In [38]:
with open("../Bioinfo/Bioinformatics_Stronghold/data/rosalind_revc.txt", "r") as revc:
    revc = revc.read()

In [39]:
print(complement_strand(revc))

ACGAGAGGCCTTTCATACTGAATTCGCTCCTTTACCGATGCTGAAGGTTCGCGTAGGCATGGCAATTGAAGACGCTCCGCATTGACCCCTCTCGCGTTAACTCAAACAAGCTGGGCGTGCCGTAGGAGACTTTCAGCTACTGACCTTGCTCTTTCGGACTGGCAAGAAGGTAAGTCTGCTAAAGTCTTTCAGAACGTCCCCCTAAGTACGGAAGGGTTCGTTATTACGAGGATAGATATCGGCAATCTGGAGAGTCCAGAGTTATTGGCATTCGAGGGGATTCGAGAGTGCGTCCTGGCATGAACGATCCAGTCGGGTACTCCGGATAGCCCAAAGATCTGTATTAATGGCGCAGATGACCTGACCGGTCGGAGTCTGGCTCACCCAATGGAGCCGATGGTCAAACTAGGCGGAACATATTTTAGAGGACCGTGTAATCCAAGTCAAGTCTTCAGCAGGTATTAGGGCGAGCTGTATCTAGGCGGAGCTGCTATGAGTAGTCTCGCTTTCCGTCGTCTCGTCTTGCTAATCGATTTGTCATTGCTCGAGCAAGTTATTCCAGACCAACTACTAGCTCCAAAACGTAGTCGAGACCTGGTTATAGCGTTGTAGCTCTACCCTCATACAAGTGTTTGACGCTGAATGATCGTAAATGAAGCTTAGATTATCAGCTTGTCGTCAATATCTTAGGTGCAGAAACGAGAGAGTCTACAGTGTGTTCTATATCAGCGTACCATGATCGTCTCCCGCTGCCCAATGCAGCATTGTCAGGTGGAATCATTGTCTAATGACTTTCGATCAGTCGCCGGTGGCC


# Mendel's First Law 

# 孟德尔第一定律

## Introduction to Mendelian Inheritance

## 孟德尔遗传定律简介

Modern laws of inheritance were first described by Gregor Mendel (an Augustinian Friar) in 1865. The contemporary hereditary model, called **blending inheritance**, stated that an organism must exhibit a blend of its parent's traits. This rule is obviously violated both empirically (consider the huge number of people who are taller than both their parents) and statistically (over time, blended traits would simply blend into the average, severely limiting variation).

现代的遗传定律首先由格雷戈尔·孟德尔（奥古斯丁·弗莱尔）在1865年描述。当时的遗传模型，称为**混合遗传**，表明有机体必须表现出其父母特征的混合。这个规则显然在经验上都被违反（考虑到比他们父母都高的人数）和统计学（随着时间的推移，混合特征会简单地融入平均，严重限制的变化）。

Mendel, working with thousands of pea plants, believed that rather than viewing traits as continuous processes, they should instead be divided into discrete building blocks called **factors**. Furthermore, he proposed that every factor possesses distinct forms, called **alleles**.

孟德尔研究了成千上万的豌豆植物，认为不应将特征视为连续过程，而应将其划分为称为**因子**的离散构建块。此外，他提出每个因素都有不同的形式，称为**等位基因**。

In what has come to be known as his **first law** (also known as the law of segregation), Mendel stated that every organism possesses a pair of alleles for a given factor. If an individual's two alleles for a given factor are the same, then it is **homozygous** for the factor; if the alleles differ, then the individual is **heterozygous**. The first law concludes that for any factor, an organism randomly passes one of its two alleles to each offspring, so that an individual receives one allele from each parent.

在后来被称为他的**第一定律**（也称为分离定律）的事件中，孟德尔说每个生物体都拥有一对特定因子的等位基因。如果个体的两个等位基因对于给定因子是相同的，则该因子是**纯合的**;如果等位基因不同，那么个体是**杂合的**。第一定律得出结论，对于任何因素，有机体随机地将其两个等位基因中的一个传递给每个后代，以便个体从每个母体接收一个等位基因。

Mendel also believed that any factor corresponds to only two possible alleles, the **dominant** and **recessive** alleles. An organism only needs to possess one copy of the dominant allele to display the trait represented by the dominant allele. In other words, the only way that an organism can display a trait encoded by a recessive allele is if the individual is homozygous recessive for that factor.

孟德尔还认为，任何因子只对应于两个可能的等位基因，**显性**和**隐性**等位基因。生物体仅需要拥有一个显性等位基因就可以显示由显性等位基因代表的性状。换句话说，生物体能够显示由隐性等位基因编码的性状的唯一方式是个体是否是该因子的纯合隐性。

We may encode the dominant allele of a factor by a capital letter (e.g., A) and the recessive allele by a lower case letter (e.g., a). Because a heterozygous organism can possess a recessive allele without displaying the recessive form of the trait, we henceforth define an organism's **genotype** to be its precise genetic makeup and its **phenotype** as the physical manifestation of its underlying traits.

我们可以用大写字母（例如A）编码因子的显性等位基因，用小写字母（例如a）编码隐性等位基因。因为杂合生物可以具有隐性等位基因而不显示性状的隐性形式，所以我们今后将生物体的**基因型**定义为其精确的基因组成，并将其**表型**定义为其潜在性状的物理表现。

The different possibilities describing an individual's inheritance of two alleles from its parents can be represented by a **Punnett square**; see Figure 1 for an example.

描述个体从父母那里继承两个等位基因的不同可能性可以用**旁氏表**表示;有关示例，请参见图1。

![Figure 1. A Punnett square representing the possible outcomes of crossing a heterozygous organism (Yy) with a homozygous recessive organism (yy); here, the dominant allele Y corresponds to yellow pea pods, and the recessive allele y corresponds to green pea pods.](Images/006.png)

**Figure 1.** A Punnett square representing the possible outcomes of crossing a heterozygous organism (Yy) with a homozygous recessive organism (yy); here, the dominant allele Y corresponds to yellow pea pods, and the recessive allele y corresponds to green pea pods.

**图1.**一个旁氏表，表示杂合生物（Yy）与纯合隐性生物（yy）杂交的可能结果;这里，显性等位基因Y对应于黄豌豆荚，而隐性等位基因y对应于绿豌豆荚。

## Problem

**Probability** is the mathematical study of randomly occurring phenomena. We will model such a phenomenon with a **random variable**, which is simply a variable that can take a number of different distinct **outcomes** depending on the result of an underlying random process.

**概率**是研究随机发生现象的数学方法。我们将使用**随机变量**对这种现象进行建模，**随机变量**只是一个变量，它可以根据潜在随机过程的结果取得许多不同的**结果**。

For example, say that we have a bag containing 3 red balls and 2 blue balls. If we let $X$ represent the random variable corresponding to the color of a drawn ball, then the **probability** of each of the two outcomes is given by  and $Pr(X=blue)=\frac{2}{5}$.

例如，假设我们有一个包含3个红球和2个蓝色球的包。如果我们让$X$代表对应于绘制球颜色的随机变量，则两个结果中每一个的**概率**由$Pr(X=red)=\frac{3}{5}$和$Pr(X=blue)=\frac{2}{5}$。

Random variables can be combined to yield new random variables. Returning to the ball example, let $Y$ model the color of a second ball drawn from the bag (without replacing the first ball). The probability of $Y$ being red depends on whether the first ball was red or blue. To represent all outcomes of $X$ and $Y$, we therefore use a **probability tree diagram**. This branching diagram represents all possible individual probabilities for $X$ and $Y$, with outcomes at the endpoints ("leaves") of the tree. The probability of any outcome is given by the product of probabilities along the path from the beginning of the tree; see Figure 2 for an illustrative example.

随机变量可以组合以产生新的随机变量。回到球的例子，让$Y$模拟从球袋中抽出的第二个球的颜色（第一个球不放回）。 $Y$变红的概率取决于第一球是红色还是蓝色。为了表示$X$和$ Y $的所有结果，我们因此使用**概率树图**。该分支图表示$X$和$Y$的所有可能的个体概率，其结果在树的端点（“叶子”）处。任何结果的概率都是从树的开始沿路径的概率乘积给出的;有关说明性示例，请参见图2。

![Figure 2. The probability of any outcome (leaf) in a probability tree diagram is given by the product of probabilities from the start of the tree to the outcome. For example, the probability that X is blue and Y is blue is equal to (2/5)(1/4), or 1/10.](Images/008.png)

**Figure 2.** The probability of any outcome (leaf) in a probability tree diagram is given by the product of probabilities from the start of the tree to the outcome. For example, the probability that X is blue and Y is blue is equal to (2/5)(1/4), or 1/10.

**图2.** 概率树图中任何结果（叶）的概率由从树的开始到结果的概率的乘积给出。例如，X为蓝色且Y为蓝色的概率等于（2/5）（1/4）或1/10。


An **event** is simply a collection of outcomes. Because outcomes are distinct, the probability of an event can be written as the sum of the probabilities of its constituent outcomes. For our colored ball example, let $A$ be the event "$Y$ is blue." $Pr(A)$ is equal to the sum of the probabilities of two different outcomes: 

$$
Pr(X=blue\ and\ Y=blue)+Pr(X=red\ and\ Y=blue)
$$

or $\frac{3}{10}+\frac{1}{10}=\frac{2}{5}$ (see Figure 2 above).

**事件**只是结果的集合。由于结果是截然不同的，事件的概率可以写成其组成结果概率的总和。对于我们的彩球示例，让$A$成为“$Y$为蓝色”的事件。 

$$
Pr(X=blue\ and\ Y=blue)+Pr(X=red\ and\ Y=blue)
$$

或者$\frac{3}{10}+\frac{1}{10}=\frac{2}{5}$（见上图2）。

**Given:** Three positive integers $k$, $m$, and $n$, representing a population containing $k+m+n$ organisms: $k$ individuals are homozygous dominant for a factor, $m$ are heterozygous, and $n$ are homozygous recessive.

**给定：**三个正整数$k$，$m$和$n$，代表一个含有$k+m+n$有机体的人口：$k$个体是纯合子占优势的因子，$m$是杂合子，$n$是纯合的隐性。

**Return:** The probability that two randomly selected mating organisms will produce an individual possessing a dominant allele (and thus displaying the dominant phenotype). Assume that any two organisms can mate.

**返回：** 两个随机选择的交配生物将产生具有显性等位基因的个体（从而显示显性表型）的概率。假设任何两种生物都可以交配。

## Sample Dataset

```
2 2 2
```

## Sample Output

```
0.78333
```