# Import Text Data Using Public URLs
The Labelbox platform enables you to annotate and manage training data to build high-quality training datasets for your machine learning models. It supports various types of data, including images, videos, text, and conversational data, from different sources such as local files, cloud storage, and public URLs. This notebook provides an example of how to use the Labelbox SDK to import a public domain text file directly and securely from its source for further data annotation and model integration.



---



## Step 1: Create an API Key

Labelbox uses API keys to authenticate API requests, so you need to create one for your data importing request. To create an API key:


1.   Go to the [Workspace settings](https://app.labelbox.com/workspace-settings/api-keys) page of your Labelbox app and select **API keys**.
2.   Select **New API key**
3.   In the **Create new API key** prompt, enter a descriptive name for your API key and then select **Create**.
4.   In the **API key created** prompt, use the **Copy** button to copy the key.
5.   Save the API key in a safe location that you can access when importing the data.




---



## Step 2: Set up the Environment

First, install the latest Labelbox Python SDK:

In [None]:
!pip install -q "labelbox[data]"

Then, import the API Client:

In [None]:
import labelbox as lb
from labelbox import Client



Next, add the API key you just created:

In [None]:
# Replace with your API key
API_KEY = "eyJhbGciOiJ...sHUR_SxEXI7ddFqleCKrfdGnRqYBk"
client = lb.Client(api_key=API_KEY)

## Step 3: Configure Data Import Settings

Labelbox supports the following parameters for setting up and specifying the necessary details of text datasets:


*   `row_data` (required): The source of the dataset. It must be a `txt` file encoded as UTF-8.
*   `global_key`: An organization-wide unique identifier for the dataset. See [Global keys](https://docs.labelbox.com/reference/data-row-global-keys).
*   `media_type`: A "TEXT" string.
*   `metadata_fields`: Non-annotation information about the dataset. See [Metadata](https://docs.labelbox.com/docs/datarow-metadata).
*   `attachments`: Supplementary content to the dataset. See [Attachments](https://docs.labelbox.com/docs/label-data#attachments).


For this example, the `row_data` parameter is the public `https` URL of the dataset. The rest of the parameters are optional, but it's recommended to include a `global_key` to ensure data integrity and a `media_type` for better data validation.

In [None]:
# Create the dataset with a name
dataset = client.create_dataset(name="Demo example - Public Text")

# Add parameters
assets = [
  {
    # Replace the value with any public txt URLs
    "row_data": "https://www.gutenberg.org/cache/epub/37106/pg37106.txt",
    # Replace the value with a non-blank string
    "global_key": "little-women-text-project-gutenberg-license",
    "media_type": "TEXT",
  }
]

## Step 4: Run the Importing Script
Run the following script to import the dataset:

In [None]:
task = dataset.create_data_rows(assets)
task.wait_till_done()

# For troubleshooting
print(task.errors)

None


## Next Steps
If you don't see any errors, you have successfully imported the dataset into your Labelbox app. You can now use it to create and customize labeling projects by defining custom label schemas and workflows tailored to your specific needs. For more information, see [Projects](https://docs.labelbox.com/docs/what-is-a-project).