# Slack Data Ingestion Setup

This notebook demonstrates how to connect Slack to your preprocessing pipeline using the Unstructured Ingest CLI or Python library to batch process documents and store structured outputs locally.

## Prerequisites

### Slack Setup Requirements

1. Create a Slack App
   - Follow Step 1 of Slack's documentation for creating an app
   - App must have `channels:history` OAuth scope [(See Step 2: Requesting scopes)](https://api.slack.com/quickstart#scopes)
   - Install and authorize the app for your workspace (Step 3)
   - Obtain the app's access token
   
2. Channel Configuration
   - Add the app to target channels
   - Obtain Channel IDs from each channel's details page (About tab)

3. Define Date Range
   Supported formats:
   - `YYYY-MM-DD`
   - `YYYY-MM-DDTHH:MM:SS`
   - `YYYY-MM-DDTHH:MM:SSZ` 
   - `YYYY-MM-DD+HH:MM:SS`
   - `YYYY-MM-DD-HH:MM:SS`

### Installation

Install required dependencies:

```bash uv add  "unstructured-ingest[slack]"```

You might also need to install additional dependencies, depending on your needs. Learn more.

These environment variables:

- `SLACK_BOT_USER_OAUTH_TOKEN` - The OAuth token for the Slack app, represented by `--token` (CLI) or `token` (Python).

To specify the starting and ending date and time range for the channels to be processed:

For the CLI, use one of the following supported formats:

- `YYYY-MM-DD`
- `YYYY-MM-DDTHH:MM:SS`
- `YYYY-MM-DDTHH:MM:SSZ`
- `YYYY-MM-DD+HH:MM:SS`
- `YYYY-MM-DD-HH:MM:SS`

For Python, use the `datetime.datetime` function.

These environment variables:

- `UNSTRUCTURED_API_KEY` - Your Unstructured API key value.
- `UNSTRUCTURED_API_URL` - Your Unstructured API URL.

Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The destination connector can be any of the ones supported. This example uses the local destination connector.

Based on: https://docs.unstructured.io/api-reference/ingest/source-connectors/slack

In [1]:
from dotenv import load_dotenv
load_dotenv()

import os
from datetime import datetime

from unstructured_ingest.v2.pipeline.pipeline import Pipeline
from unstructured_ingest.v2.interfaces import ProcessorConfig

from unstructured_ingest.v2.processes.connectors.slack import (
    SlackIndexerConfig,
    SlackDownloaderConfig,
    SlackConnectionConfig,
    SlackAccessConfig
)

from unstructured_ingest.v2.processes.partitioner import PartitionerConfig
from unstructured_ingest.v2.processes.chunker import ChunkerConfig
from unstructured_ingest.v2.processes.embedder import EmbedderConfig
from unstructured_ingest.v2.processes.connectors.local import LocalUploaderConfig

# Chunking and embedding are optional.

Pipeline.from_configs(
    context=ProcessorConfig(),
    indexer_config=SlackIndexerConfig(
        channels=["C057EJE6QDB"],
        start_date=datetime(year=2024, month=10, day=22),
        end_date=datetime(year=2024, month=11, day=13)
    ),
    downloader_config=SlackDownloaderConfig(download_dir="./files"),
    source_connection_config=SlackConnectionConfig(
        access_config=SlackAccessConfig(token=os.getenv("SLACK_BOT_USER_OAUTH_TOKEN"))
    ),
    partitioner_config=PartitionerConfig(
        partition_by_api=False,
        additional_partition_args={
            "split_pdf_page": True,
            "split_pdf_allow_failed": True,
            "split_pdf_concurrency_level": 15
        }
    ),
    chunker_config=ChunkerConfig(chunking_strategy="by_title"),
    embedder_config=EmbedderConfig(embedding_provider="voyageai", embedding_api_key=os.getenv("VOYAGE_API_KEY"), embedding_model_name="voyage-multimodal-3"),
    uploader_config=LocalUploaderConfig(output_dir="./files/slack")
).run()

2024-11-14 22:21:53,386 MainProcess INFO     created index with configs: {"channels": ["C057EJE6QDB"], "start_date": "2024-10-22T00:00:00", "end_date": "2024-11-13T00:00:00"}, connection configs: {"access_config": "**********"}
2024-11-14 22:21:53,388 MainProcess INFO     Created download with configs: {"download_dir": "files"}, connection configs: {"access_config": "**********"}
2024-11-14 22:21:53,389 MainProcess INFO     created partition with configs: {"strategy": "auto", "ocr_languages": null, "encoding": null, "additional_partition_args": {"split_pdf_page": true, "split_pdf_allow_failed": true, "split_pdf_concurrency_level": 15}, "skip_infer_table_types": null, "fields_include": ["element_id", "text", "type", "metadata", "embeddings"], "flatten_metadata": false, "metadata_exclude": [], "element_exclude": [], "metadata_include": [], "partition_endpoint": "https://api.unstructuredapp.io/general/v0/general", "partition_by_api": false, "api_key": null, "hi_res_model_name": null}
2024

PipelineError: Pipeline did not run successfully