# 阿里云 MaxCompute

>[阿里云 MaxCompute](https://www.alibabacloud.com/product/maxcompute) (前身为 ODPS) 是一个通用的、完全托管的、多租户的大规模数据仓库数据处理平台。MaxCompute 支持各种数据导入解决方案和分布式计算模型，使用户能够有效地查询海量数据集、降低生产成本并确保数据安全。

`MaxComputeLoader` 允许您执行 MaxCompute SQL 查询，并将结果加载为每行一个文档。

In [7]:
%pip install --upgrade --quiet  pyodps

Collecting pyodps
  Downloading pyodps-0.11.4.post0-cp39-cp39-macosx_10_9_universal2.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m0m
Installing collected packages: pyodps
Successfully installed pyodps-0.11.4.post0


## 基本用法
要实例化加载器，您需要一个要执行的 SQL 查询、您的 MaxCompute 端点和项目名称，以及您的访问 ID 和密钥。访问 ID 和密钥可以分别通过 `access_id` 和 `secret_access_key` 参数直接传递，或者设置为环境变量 `MAX_COMPUTE_ACCESS_ID` 和 `MAX_COMPUTE_SECRET_ACCESS_KEY`。

In [1]:
from langchain_community.document_loaders import MaxComputeLoader

In [2]:
base_query = """
SELECT *
FROM (
    SELECT 1 AS id, 'content1' AS content, 'meta_info1' AS meta_info
    UNION ALL
    SELECT 2 AS id, 'content2' AS content, 'meta_info2' AS meta_info
    UNION ALL
    SELECT 3 AS id, 'content3' AS content, 'meta_info3' AS meta_info
) mydata;
"""

In [None]:
endpoint = "<ENDPOINT>"
project = "<PROJECT>"
ACCESS_ID = "<ACCESS ID>"
SECRET_ACCESS_KEY = "<SECRET ACCESS KEY>"

In [13]:
loader = MaxComputeLoader.from_params(
    base_query,
    endpoint,
    project,
    access_id=ACCESS_ID,
    secret_access_key=SECRET_ACCESS_KEY,
)
data = loader.load()

In [17]:
print(data)

[Document(page_content='id: 1\ncontent: content1\nmeta_info: meta_info1', metadata={}), Document(page_content='id: 2\ncontent: content2\nmeta_info: meta_info2', metadata={}), Document(page_content='id: 3\ncontent: content3\nmeta_info: meta_info3', metadata={})]


In [20]:
print(data[0].page_content)

id: 1
content: content1
meta_info: meta_info1


In [21]:
print(data[0].metadata)

{}


## 指定哪些列是内容与元数据
您可以使用 `page_content_columns` 和 `metadata_columns` 参数配置应该将哪些列子集加载为 Document 的内容，哪些作为元数据。

In [22]:
loader = MaxComputeLoader.from_params(
    base_query,
    endpoint,
    project,
    page_content_columns=["content"],  # Specify Document page content
    metadata_columns=["id", "meta_info"],  # Specify Document metadata
    access_id=ACCESS_ID,
    secret_access_key=SECRET_ACCESS_KEY,
)
data = loader.load()

In [25]:
print(data[0].page_content)

content: content1


In [26]:
print(data[0].metadata)

{'id': 1, 'meta_info': 'meta_info1'}
