# 在 Vertex AI Workbench 上使用 AlphaFold

[Vertex AI Workbench](https://cloud.google.com/vertex-ai/docs/workbench) 提供了一个端到端的基于笔记本的生产环境，可以预先配置运行 AlphaFold 所需的运行时依赖项。通过 [用户管理的笔记本](https://cloud.google.com/vertex-ai/docs/workbench/user-managed/introduction)，您可以配置 GPU 加速器来运行使用 Tensorflow 的 AlphaFold，而无需安装和管理驱动程序或 JupyterLab 实例。这个笔记本允许您使用 AlphaFold v2.1.0 的略简化版本轻松预测蛋白质的结构。

## ![](https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/main/community-content/alphafold_on_workbench/vertexai_40.png) [在 Vertex AI Workbench 中启动此笔记本](https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://github.com/GoogleCloudPlatform/vertex-ai-samples/raw/main/community-content/alphafold_on_workbench/AlphaFold.ipynb)

**与 AlphaFold v2.1.0 的差异**

与 AlphaFold v2.1.0 相比，这个笔记本不使用**模板（同源结构）**，而是使用 [BFD 数据库](https://bfd.mmseqs.com/) 的一部分。我们已经验证了这些变化在数千个最近的 PDB 结构上。虽然在许多目标上准确性将接近完整的 AlphaFold 系统，但由于 MSA 较小且缺少模板，少数目标的准确性会大幅下降。为了获得最佳可靠性，我们建议使用 [完整的开源 AlphaFold](https://github.com/deepmind/alphafold/)，或者 [AlphaFold 蛋白质结构数据库](https://alphafold.ebi.ac.uk/)。

**与本地 AlphaFold 安装相比，此笔记本对于多聚体的平均准确性存在一定下降，对于完整的多聚体准确性，强烈建议在本地运行 [AlphaFold](https://github.com/deepmind/alphafold#running-alphafold)。** 此外，AlphaFold-Multimer 需要为复合物中的每个唯一序列搜索 MSA，因此速度相对较慢。如果由于多聚体 MSA 搜索缓慢导致笔记本超时，我们建议在本地运行 AlphaFold。

请注意，这个笔记本是一个早期访问的原型，不是一个成品。它仅用于理论建模，使用时应谨慎。

**引用此工作**

任何披露使用此笔记本的研究结果的出版物应该[引用](https://github.com/deepmind/alphafold/#citing-this-work) [AlphaFold 论文](https://doi.org/10.1038/s41586-021-03819-2)。

**许可证**

这个 Colab 使用了[AlphaFold 模型参数](https://github.com/deepmind/alphafold/#model-parameters-license)，这些参数受 Creative Commons Attribution 4.0 International ([CC BY 4.0](https://creativecommons.org/licenses/by/4.0/legalcode)) 许可证的约束。Colab 本身是根据[Apache 2.0 许可证](https://www.apache.org/licenses/LICENSE-2.0)提供的。请查看下面的完整许可声明。

**更多信息**

您可以在以下论文中找到有关 AlphaFold 如何工作的更多信息：

* [AlphaFold 方法论文](https://www.nature.com/articles/s41586-021-03819-2)
* [AlphaFold 对人类蛋白质组预测的论文](https://www.nature.com/articles/s41586-021-03828-1)
* [AlphaFold-Multimer 论文](https://www.biorxiv.org/content/10.1101/2021.10.04.463034v1)

有关如何解释 AlphaFold 预测的常见问题，请参阅[这里](https://alphafold.ebi.ac.uk/faq)。

## 下载AlphaFold数据

In [None]:
import os
import subprocess
import sys

import alphafold.common
import tqdm.notebook
from IPython.utils import io

TQDM_BAR_FORMAT = (
    "{l_bar}{bar}| {n_fmt}/{total_fmt} [elapsed: {elapsed} remaining: {remaining}]"
)

SOURCE_URL = (
    "https://storage.googleapis.com/alphafold/alphafold_params_colab_2022-01-19.tar"
)
PARAMS_DIR = "alphafold/data/params"
PARAMS_PATH = os.path.join(PARAMS_DIR, os.path.basename(SOURCE_URL))
ALPHAFOLD_COMMON_DIR = os.path.dirname(alphafold.common.__file__)

try:
    with tqdm.notebook.tqdm(total=100, bar_format=TQDM_BAR_FORMAT) as pbar:
        with io.capture_output() as captured:

            # Download and store stereo_chemical_props.txt
            !mkdir -p ~/content/alphafold/alphafold/common
            !mkdir -p /opt/conda/lib/python3.7/site-packages/alphafold/common/
            !wget -q -P ~/content/alphafold/alphafold/common https://git.scicore.unibas.ch/schwede/openstructure/-/raw/7102c63615b64735c4941278d92b554ec94415f8/modules/mol/alg/src/stereo_chemical_props.txt
            pbar.update(18)
            !cp -f ~/content/alphafold/alphafold/common/stereo_chemical_props.txt "{ALPHAFOLD_COMMON_DIR}"

            # Download alphafold_params_colab_2021-10-27.tar
            !mkdir --parents "{PARAMS_DIR}"
            !wget -O "{PARAMS_PATH}" "{SOURCE_URL}"
            pbar.update(27)

            # Un-tar alphafold_params_colab_2021-10-27.tar
            !tar --extract --verbose --file="{PARAMS_PATH}" --directory="{PARAMS_DIR}" --preserve-permissions
            # !rm "{PARAMS_PATH}"
            pbar.update(55)

except subprocess.CalledProcessError:
    print(captured)
    raise

配置GPU加速

In [None]:
# Confirm accelerator configuration
import jax

if jax.local_devices()[0].platform == "tpu":
    raise RuntimeError(
        "TPU runtime not supported. Please configure GPU acceleration on the VM."
    )
elif jax.local_devices()[0].platform == "cpu":
    print(
        "CPU-only runtime is not recommended, because prediction execution will be slow. For better performance, consider GPU acceleration on the VM."
    )
else:
    print(f"Running with {jax.local_devices()[0].device_kind} GPU")

# Make sure all necessary environment variables are set.
import os

os.environ["TF_FORCE_UNIFIED_MEMORY"] = "1"
os.environ["XLA_PYTHON_CLIENT_MEM_FRACTION"] = "2.0"

## 进行预测

请将您的蛋白质序列粘贴到下面的文本框中，然后通过“运行”>“运行所选单元格及以下所有单元格”来运行剩余单元格。您也可以通过按左侧的“播放”按钮逐个运行单元格。

请注意，根据蛋白质长度和所分配的 GPU 类型，搜索数据库和实际预测可能需要一些时间，从几分钟到几小时不等（请参阅下面的常见问题解答）。

开始之前，请输入要折叠的氨基酸序列⬇️

如果您只输入一个序列，则将使用单体模型。如果输入多个序列，则将使用多聚体模型。

In [None]:
# Input sequences (type: str)
sequence_1 = "MAAHKGAEHHHKAAEHHEQAAKHHHAAAEHHEKGEHEQAAHHADTAYAHHKHAEEHAAQAAKHDAEHHAPKPH"
sequence_2 = ""
sequence_3 = ""
sequence_4 = ""
sequence_5 = ""
sequence_6 = ""
sequence_7 = ""
sequence_8 = ""

In [None]:
from alphafold.notebooks import notebook_utils

input_sequences = (
    sequence_1,
    sequence_2,
    sequence_3,
    sequence_4,
    sequence_5,
    sequence_6,
    sequence_7,
    sequence_8,
)

# If folding a complex target and all the input sequences are
# prokaryotic then set `is_prokaryotic` to `True`. Set to `False`
# otherwise or if the origin is unknown.

is_prokaryote = False  # @param {type:"boolean"}

MIN_SINGLE_SEQUENCE_LENGTH = 16
MAX_SINGLE_SEQUENCE_LENGTH = 2500
MAX_MULTIMER_LENGTH = 2500

# Validate the input.
sequences, model_type_to_use = notebook_utils.validate_input(
    input_sequences=input_sequences,
    min_length=MIN_SINGLE_SEQUENCE_LENGTH,
    max_length=MAX_SINGLE_SEQUENCE_LENGTH,
    max_multimer_length=MAX_MULTIMER_LENGTH,
)

## 在基因数据库中搜索

一旦执行了这个单元格，你将会看到关于多重序列比对（MSA）的统计信息，这些信息将会被AlphaFold使用。特别地，你将会看到每个残基在MSA中被相似序列覆盖的程度。

In [None]:
import collections
import copy
import random
from concurrent import futures
from urllib import request

import matplotlib.pyplot as plt
import numpy as np
import py3Dmol
from alphafold.common import protein
from alphafold.data import (feature_processing, msa_pairing, pipeline,
                            pipeline_multimer)
from alphafold.data.tools import jackhmmer
from alphafold.model import config, data, model
from alphafold.relax import relax, utils
from IPython import display
from ipywidgets import GridspecLayout, Output

# Color bands for visualizing plddt
PLDDT_BANDS = [
    (0, 50, "#FF7D45"),
    (50, 70, "#FFDB13"),
    (70, 90, "#65CBF3"),
    (90, 100, "#0053D6"),
]

# --- Find the closest source ---
test_url_pattern = (
    "https://storage.googleapis.com/alphafold-colab{:s}/latest/uniref90_2021_03.fasta.1"
)
ex = futures.ThreadPoolExecutor(3)


def fetch(source):
    request.urlretrieve(test_url_pattern.format(source))
    return source


fs = [ex.submit(fetch, source) for source in ["", "-europe", "-asia"]]
source = None
for f in futures.as_completed(fs):
    source = f.result()
    ex.shutdown()
    break

JACKHMMER_BINARY_PATH = "/usr/bin/jackhmmer"
DB_ROOT_PATH = f"https://storage.googleapis.com/alphafold-colab{source}/latest/"
# The z_value is the number of sequences in a database.
MSA_DATABASES = [
    {
        "db_name": "uniref90",
        "db_path": f"{DB_ROOT_PATH}uniref90_2021_03.fasta",
        "num_streamed_chunks": 59,
        "z_value": 135_301_051,
    },
    {
        "db_name": "smallbfd",
        "db_path": f"{DB_ROOT_PATH}bfd-first_non_consensus_sequences.fasta",
        "num_streamed_chunks": 17,
        "z_value": 65_984_053,
    },
    {
        "db_name": "mgnify",
        "db_path": f"{DB_ROOT_PATH}mgy_clusters_2019_05.fasta",
        "num_streamed_chunks": 71,
        "z_value": 304_820_129,
    },
]

# Search UniProt and construct the all_seq features only for heteromers, not homomers.
if model_type_to_use == notebook_utils.ModelType.MULTIMER and len(set(sequences)) > 1:
    MSA_DATABASES.extend(
        [
            # Swiss-Prot and TrEMBL are concatenated together as UniProt.
            {
                "db_name": "uniprot",
                "db_path": f"{DB_ROOT_PATH}uniprot_2021_03.fasta",
                "num_streamed_chunks": 98,
                "z_value": 219_174_961 + 565_254,
            },
        ]
    )

TOTAL_JACKHMMER_CHUNKS = sum(cfg["num_streamed_chunks"] for cfg in MSA_DATABASES)

MAX_HITS = {
    "uniref90": 10_000,
    "smallbfd": 5_000,
    "mgnify": 501,
    "uniprot": 50_000,
}


def get_msa(fasta_path):
    """Searches for MSA for the given sequence using chunked Jackhmmer search."""

    # Run the search against chunks of genetic databases.
    raw_msa_results = collections.defaultdict(list)
    with tqdm.notebook.tqdm(
        total=TOTAL_JACKHMMER_CHUNKS, bar_format=TQDM_BAR_FORMAT
    ) as pbar:

        def jackhmmer_chunk_callback(i):
            pbar.update(n=1)

        for db_config in MSA_DATABASES:
            db_name = db_config["db_name"]
            pbar.set_description(f"Searching {db_name}")
            jackhmmer_runner = jackhmmer.Jackhmmer(
                binary_path=JACKHMMER_BINARY_PATH,
                database_path=db_config["db_path"],
                get_tblout=True,
                num_streamed_chunks=db_config["num_streamed_chunks"],
                streaming_callback=jackhmmer_chunk_callback,
                z_value=db_config["z_value"],
            )
            # Group the results by database name.
            raw_msa_results[db_name].extend(jackhmmer_runner.query(fasta_path))

    return raw_msa_results


features_for_chain = {}
raw_msa_results_for_sequence = {}
for sequence_index, sequence in enumerate(sequences, start=1):
    print(f"\nGetting MSA for sequence {sequence_index}")

    fasta_path = f"target_{sequence_index}.fasta"
    with open(fasta_path, "wt") as f:
        f.write(f">query\n{sequence}")

    # Don't do redundant work for multiple copies of the same chain in the multimer.
    if sequence not in raw_msa_results_for_sequence:
        raw_msa_results = get_msa(fasta_path=fasta_path)
        raw_msa_results_for_sequence[sequence] = raw_msa_results
    else:
        raw_msa_results = copy.deepcopy(raw_msa_results_for_sequence[sequence])

    # Extract the MSAs from the Stockholm files.
    # NB: deduplication happens later in pipeline.make_msa_features.
    single_chain_msas = []
    uniprot_msa = None
    for db_name, db_results in raw_msa_results.items():
        merged_msa = notebook_utils.merge_chunked_msa(
            results=db_results, max_hits=MAX_HITS.get(db_name)
        )
        if merged_msa.sequences and db_name != "uniprot":
            single_chain_msas.append(merged_msa)
            msa_size = len(set(merged_msa.sequences))
            print(
                f"{msa_size} unique sequences found in {db_name} for sequence {sequence_index}"
            )
        elif merged_msa.sequences and db_name == "uniprot":
            uniprot_msa = merged_msa

    notebook_utils.show_msa_info(
        single_chain_msas=single_chain_msas, sequence_index=sequence_index
    )

    # Turn the raw data into model features.
    feature_dict = {}
    feature_dict.update(
        pipeline.make_sequence_features(
            sequence=sequence, description="query", num_res=len(sequence)
        )
    )
    feature_dict.update(pipeline.make_msa_features(msas=single_chain_msas))
    # We don't use templates in AlphaFold notebook, add only empty placeholder features.
    feature_dict.update(
        notebook_utils.empty_placeholder_template_features(
            num_templates=0, num_res=len(sequence)
        )
    )

    # Construct the all_seq features only for heteromers, not homomers.
    if (
        model_type_to_use == notebook_utils.ModelType.MULTIMER
        and len(set(sequences)) > 1
    ):
        valid_feats = msa_pairing.MSA_FEATURES + (
            "msa_uniprot_accession_identifiers",
            "msa_species_identifiers",
        )
        all_seq_features = {
            f"{k}_all_seq": v
            for k, v in pipeline.make_msa_features([uniprot_msa]).items()
            if k in valid_feats
        }
        feature_dict.update(all_seq_features)

    features_for_chain[protein.PDB_CHAIN_IDS[sequence_index - 1]] = feature_dict


# Do further feature post-processing depending on the model type.
if model_type_to_use == notebook_utils.ModelType.MONOMER:
    np_example = features_for_chain[protein.PDB_CHAIN_IDS[0]]

elif model_type_to_use == notebook_utils.ModelType.MULTIMER:
    all_chain_features = {}
    for chain_id, chain_features in features_for_chain.items():
        all_chain_features[chain_id] = pipeline_multimer.convert_monomer_features(
            chain_features, chain_id
        )

    all_chain_features = pipeline_multimer.add_assembly_features(all_chain_features)

    np_example = feature_processing.pair_and_merge(
        all_chain_features=all_chain_features, is_prokaryote=is_prokaryote
    )

    # Pad MSA to avoid zero-sized extra_msa.
    np_example = pipeline_multimer.pad_msa(np_example, min_num_seq=512)

## 运行AlphaFold

执行此单元格后，将在VM上保存一个名为“prediction.zip”的压缩文件，其中包含获得的预测结果，并可在边栏中下载到您的计算机上。如果您在松弛阶段遇到问题，您可以在下方禁用它。警告：这意味着预测结果可能具有令人分心的小立体化违规。

In [None]:
run_relax = True

# --- Run the model ---
if model_type_to_use == notebook_utils.ModelType.MONOMER:
    model_names = config.MODEL_PRESETS["monomer"] + ("model_2_ptm",)
elif model_type_to_use == notebook_utils.ModelType.MULTIMER:
    model_names = config.MODEL_PRESETS["multimer"]

output_dir = "prediction"
os.makedirs(output_dir, exist_ok=True)

plddts = {}
ranking_confidences = {}
pae_outputs = {}
unrelaxed_proteins = {}

with tqdm.notebook.tqdm(total=len(model_names) + 1, bar_format=TQDM_BAR_FORMAT) as pbar:
    for model_name in model_names:
        pbar.set_description(f"Running {model_name}")

        cfg = config.model_config(model_name)
        if model_type_to_use == notebook_utils.ModelType.MONOMER:
            cfg.data.eval.num_ensemble = 1
        elif model_type_to_use == notebook_utils.ModelType.MULTIMER:
            cfg.model.num_ensemble_eval = 1
        params = data.get_model_haiku_params(model_name, "./alphafold/data")
        model_runner = model.RunModel(cfg, params)
        processed_feature_dict = model_runner.process_features(
            np_example, random_seed=0
        )
        prediction = model_runner.predict(
            processed_feature_dict, random_seed=random.randrange(sys.maxsize)
        )

        mean_plddt = prediction["plddt"].mean()

        if model_type_to_use == notebook_utils.ModelType.MONOMER:
            if "predicted_aligned_error" in prediction:
                pae_outputs[model_name] = (
                    prediction["predicted_aligned_error"],
                    prediction["max_predicted_aligned_error"],
                )
            else:
                # Monomer models are sorted by mean pLDDT. Do not put monomer pTM models here as they
                # should never get selected.
                ranking_confidences[model_name] = prediction["ranking_confidence"]
                plddts[model_name] = prediction["plddt"]
        elif model_type_to_use == notebook_utils.ModelType.MULTIMER:
            # Multimer models are sorted by pTM+ipTM.
            ranking_confidences[model_name] = prediction["ranking_confidence"]
            plddts[model_name] = prediction["plddt"]
            pae_outputs[model_name] = (
                prediction["predicted_aligned_error"],
                prediction["max_predicted_aligned_error"],
            )

        # Set the b-factors to the per-residue plddt.
        final_atom_mask = prediction["structure_module"]["final_atom_mask"]
        b_factors = prediction["plddt"][:, None] * final_atom_mask
        unrelaxed_protein = protein.from_prediction(
            processed_feature_dict,
            prediction,
            b_factors=b_factors,
            remove_leading_feature_dimension=(
                model_type_to_use == notebook_utils.ModelType.MONOMER
            ),
        )
        unrelaxed_proteins[model_name] = unrelaxed_protein

        # Delete unused outputs to save memory.
        del model_runner
        del params
        del prediction
        pbar.update(n=1)

    # --- AMBER relax the best model ---

    # Find the best model according to the mean pLDDT.
    best_model_name = max(
        ranking_confidences.keys(), key=lambda x: ranking_confidences[x]
    )

    if run_relax:
        pbar.set_description("AMBER relaxation")
        amber_relaxer = relax.AmberRelaxation(
            max_iterations=0,
            tolerance=2.39,
            stiffness=10.0,
            exclude_residues=[],
            max_outer_iterations=3,
        )
        relaxed_pdb, _, _ = amber_relaxer.process(
            prot=unrelaxed_proteins[best_model_name]
        )
    else:
        print("Warning: Running without the relaxation stage.")
        relaxed_pdb = protein.to_pdb(unrelaxed_proteins[best_model_name])
    pbar.update(n=1)  # Finished AMBER relax.

# Construct multiclass b-factors to indicate confidence bands
# 0=very low, 1=low, 2=confident, 3=very high
banded_b_factors = []
for plddt in plddts[best_model_name]:
    for idx, (min_val, max_val, _) in enumerate(PLDDT_BANDS):
        if plddt >= min_val and plddt <= max_val:
            banded_b_factors.append(idx)
            break
banded_b_factors = np.array(banded_b_factors)[:, None] * final_atom_mask
to_visualize_pdb = utils.overwrite_b_factors(relaxed_pdb, banded_b_factors)


# Write out the prediction
pred_output_path = os.path.join(output_dir, "selected_prediction.pdb")
with open(pred_output_path, "w") as f:
    f.write(relaxed_pdb)


# --- Visualise the prediction & confidence ---
show_sidechains = True


def plot_plddt_legend():
    """Plots the legend for pLDDT."""
    thresh = [
        "Very low (pLDDT < 50)",
        "Low (70 > pLDDT > 50)",
        "Confident (90 > pLDDT > 70)",
        "Very high (pLDDT > 90)",
    ]

    colors = [x[2] for x in PLDDT_BANDS]

    plt.figure(figsize=(2, 2))
    for c in colors:
        plt.bar(0, 0, color=c)
    plt.legend(thresh, frameon=False, loc="center", fontsize=20)
    plt.xticks([])
    plt.yticks([])
    ax = plt.gca()
    ax.spines["right"].set_visible(False)
    ax.spines["top"].set_visible(False)
    ax.spines["left"].set_visible(False)
    ax.spines["bottom"].set_visible(False)
    plt.title("Model Confidence", fontsize=20, pad=20)
    return plt


# Show the structure coloured by chain if the multimer model has been used.
if model_type_to_use == notebook_utils.ModelType.MULTIMER:
    multichain_view = py3Dmol.view(width=800, height=600)
    multichain_view.addModelsAsFrames(to_visualize_pdb)
    multichain_style = {"cartoon": {"colorscheme": "chain"}}
    multichain_view.setStyle({"model": -1}, multichain_style)
    multichain_view.zoomTo()
    multichain_view.show()

# Color the structure by per-residue pLDDT
color_map = {i: bands[2] for i, bands in enumerate(PLDDT_BANDS)}
view = py3Dmol.view(width=800, height=600)
view.addModelsAsFrames(to_visualize_pdb)
style = {"cartoon": {"colorscheme": {"prop": "b", "map": color_map}}}
if show_sidechains:
    style["stick"] = {}
view.setStyle({"model": -1}, style)
view.zoomTo()

grid = GridspecLayout(1, 2)
out = Output()
with out:
    view.show()
grid[0, 0] = out

out = Output()
with out:
    plot_plddt_legend().show()
grid[0, 1] = out

display.display(grid)

# Display pLDDT and predicted aligned error (if output by the model).
if pae_outputs:
    num_plots = 2
else:
    num_plots = 1

plt.figure(figsize=[8 * num_plots, 6])
plt.subplot(1, num_plots, 1)
plt.plot(plddts[best_model_name])
plt.title("Predicted LDDT")
plt.xlabel("Residue")
plt.ylabel("pLDDT")

if num_plots == 2:
    plt.subplot(1, 2, 2)
    pae, max_pae = list(pae_outputs.values())[0]
    plt.imshow(pae, vmin=0.0, vmax=max_pae, cmap="Greens_r")
    plt.colorbar(fraction=0.046, pad=0.04)

    # Display lines at chain boundaries.
    best_unrelaxed_prot = unrelaxed_proteins[best_model_name]
    total_num_res = best_unrelaxed_prot.residue_index.shape[-1]
    chain_ids = best_unrelaxed_prot.chain_index
    for chain_boundary in np.nonzero(chain_ids[:-1] - chain_ids[1:]):
        if chain_boundary.size:
            plt.plot([0, total_num_res], [chain_boundary, chain_boundary], color="red")
            plt.plot([chain_boundary, chain_boundary], [0, total_num_res], color="red")

    plt.title("Predicted Aligned Error")
    plt.xlabel("Scored residue")
    plt.ylabel("Aligned residue")

# Save the predicted aligned error (if it exists).
pae_output_path = os.path.join(output_dir, "predicted_aligned_error.json")
if pae_outputs:
    # Save predicted aligned error in the same format as the AF EMBL DB.
    pae_data = notebook_utils.get_pae_json(pae=pae, max_pae=max_pae.item())
    with open(pae_output_path, "w") as f:
        f.write(pae_data)

!zip -q -r {output_dir}.zip {output_dir}

### 解释预测

通常预测的LDDT（pLDDT）最适用于领域内的置信度，而预测的对齐误差（PAE）最适用于确定领域间或链间的置信度。

请参阅[AlphaFold方法论文](https://www.nature.com/articles/s41586-021-03819-2)，[人类蛋白质组的AlphaFold预测论文](https://www.nature.com/articles/s41586-021-03828-1)，以及[AlphaFold-Multimer论文](https://www.biorxiv.org/content/10.1101/2021.10.04.463034v1)，以及我们的[常见问题解答](https://alphafold.ebi.ac.uk/faq)以了解如何解释AlphaFold的预测结果。

## 常见问题解答和故障排除


*   如何为我的蛋白质获取预测的蛋白质结构？
    *   将笔记本连接到Jupyter内核“Python 3（ipykernel）”。
    *   将您的蛋白质的氨基酸序列（不包括任何标题）粘贴到“Making a Prediction”中的变量sequence_1中。
    *   运行笔记本中的所有单元格，可以逐个运行，也可以通过“Kernel” / “Restart Kernel and Run All Cells...”来运行。
    *   一旦所有单元格都被执行，预测的蛋白质结构将被下载。注意：这可能需要几分钟到几小时 - 请参见下文。
*   这需要多长时间？
    *   对基因数据库的搜索可能需要几分钟到几小时。
    *   运行AlphaFold并生成预测可能需要几分钟到几小时，具体取决于您的蛋白质长度以及您的VM可以访问的GPU类型。
*   我的笔记本似乎不再做任何事情，我应该怎么办？
    *   有些步骤可能需要几分钟到几小时才能完成。
    *   如果什么都没发生或者收到错误消息，请尝试通过“Kernel” / “Restart Kernel and Run All Cells...”重新启动您的笔记本运行时。
    *   如果这种方法不起作用，请尝试重置您的GCloud Console中的虚拟机（“Compute Engine” / “VM Instances”）。
*   这和AlphaFold的开源版本有什么区别？
    *   这个笔记本版本的AlphaFold只搜索BFD数据集的选定部分，目前不使用模板，因此与AlphaFold的完整版本相比，其准确性降低，完整版本描述在[AlphaFold论文](https://doi.org/10.1038/s41586-021-03819-2)和[Github存储库](https://github.com/deepmind/alphafold/)中（完整版本可通过推断脚本获得）。
*   我收到了一个“笔记本需要高RAM”的警告，我该怎么办？
    *   在“Compute Engine” / “VM Instances”控制台菜单中，您可以重新配置主机VM设置。请参阅[更改已停止实例的VM实例的机器类型](https://cloud.google.com/compute/docs/instances/changing-machine-type-of-stopped-instance)获取说明。
*   这个工具会在我的电脑上安装什么吗？
    *   不会，一切都在您的Google Cloud项目中的VM实例中进行。
*   我应该如何分享反馈和错误报告？
    *   请将任何反馈和错误报告作为[问题](https://github.com/GoogleCloudPlatform/vertex-ai-samples/issues)在Github上分享。


## 相关工作

请查看社区提供的这些Colab笔记本（请注意，这些笔记本可能与我们验证的AlphaFold系统有所不同，我们无法保证其准确性）：

*   由Sergey Ovchinnikov、Milot Mirdita和Martin Steinegger提供的[ColabFold AlphaFold2笔记本](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb)，它使用在Södinglab托管的API，基于MMseqs2服务器进行多序列对齐创建。（[Mirdita等人2019，生物信息学](https://academic.oup.com/bioinformatics/article/35/16/2856/5280135)）

# 许可和免责声明

这不是一个官方支持的谷歌产品。

此笔记本和其他提供的信息仅用于理论建模，使用时应谨慎。它以“原样”提供，不提供任何明示或暗示的任何形式的保证。信息并非用作专业医疗建议，诊断或治疗的替代，并不构成医疗或其他专业建议。

版权所有 2021 DeepMind Technologies Limited。

## AlphaFold 代码许可

根据Apache 许可证，第2版许可（“许可证”）授权；您不能在不遵守许可证的情况下使用此文件。您可以在 https://www.apache.org/licenses/LICENSE-2.0 获取许可证的副本。

除非适用法律要求或书面同意，根据许可证分发的软件以“原样”分发，不提供任何形式的保证或条件，无论是明示的还是暗示的。请查看许可证以获取管理权限和限制的具体语言。

## 模型参数许可

AlphaFold 参数根据知识共享署名4.0国际（CC BY 4.0）许可协议提供。详细信息请参阅：https://creativecommons.org/licenses/by/4.0/legalcode

## 第三方软件

在 AlphaFold 读我档案的[致谢部分](https://github.com/deepmind/alphafold/#acknowledgements)中提到的第三方软件、库或代码的使用可能受到单独的条款和条件或许可证规定的约束。您使用第三方软件、库或代码受到这些条款的约束，您应在使用之前检查您是否能够遵守任何适用的限制或条款和条件。

## 镜像数据库

以下数据库由 DeepMind 镜像，可在以下位置引用：
- UniProt：v2021\_03（未修改），由The UniProt Consortium 提供，根据[知识共享署名-非商业性使用-NoDerivatives 4.0国际许可](http://creativecommons.org/licenses/by-nd/4.0/)提供。
- UniRef90：v2021\_03（未修改），由The UniProt Consortium 提供，根据[知识共享署名-非商业性使用-NoDerivatives 4.0国际许可](http://creativecommons.org/licenses/by-nd/4.0/)提供。
- MGnify：v2019\_05（未修改），由Mitchell AL等人提供，免除所有版权限制，并根据[CC0 1.0通用（CC0 1.0）公共领域奉献](https://creativecommons.org/publicdomain/zero/1.0/)完全和免费提供，供非商业和商业使用。
- BFD：（已修改），由Steinegger M.和Söding J. 提供，由 DeepMind 修改，根据[知识共享署名-相同方式共享 4.0 国际许可协议](https://creativecommons.org/licenses/by/4.0/)提供。有关详细信息，请参阅[AlphaFold 蛋白质组学论文](https://www.nature.com/articles/s41586-021-03828-1)的方法部分。