# Setting up the Knowledge Graph Datasets

In [1]:
from pyspark.sql import SparkSession

from aips import get_engine
from aips.spark.dataframe import from_csv
import aips.indexer
spark = SparkSession.builder.appName("AIPS").getOrCreate()
engine = get_engine()

## Index the Jobs Dataset into the Search Engine

In [9]:
aips.indexer.build_collection(engine, "jobs")

<engines.solr.SolrCollection.SolrCollection at 0x7f901f44a110>

## Index StackExchange datasets: health, scifi, cooking, travel

In [4]:
dataset = ["jobs", "health", "cooking", "scifi", "travel", "devops"]
for dataset in dataset:
    aips.indexer.build_collection(engine, dataset)

In [5]:
aips.indexer.build_stack_exchange_collection(engine)

<engines.solr.SolrCollection.SolrCollection at 0x7f901f73c640>

## Dual index datasets into Solr for SKG

In [6]:
solr_engine = get_engine("solr")
dataset = ["jobs", "health", "cooking", "scifi", "travel", "devops"]
for dataset in dataset:
    aips.indexer.build_collection(solr_engine, dataset)
aips.indexer.build_stack_exchange_collection(solr_engine)

<engines.solr.SolrCollection.SolrCollection at 0x7f9026fa9450>

## Success!

Now that you've indexed several large text datasets, in the next notebook we will explore the rich graph of semantic relationships embedded within those documents by leveraging Semantic Knowledge Graphs for real-time traversal and ranking of arbitrary relationships within the domains of our datasets.

Up next: [Working with Semantic Knowledge Graphs](3.semantic-knowledge-graph.ipynb)