# Build your first Question Answering Application with Vector Search Engine **Weaviate**

## Introduction 

Welcome to this workshop about Vector Databases and Weaviate!

This notebook is a hands-on tutorial to be used alongside an interactive Weaviate workshop. In this workshop and with this notebook you: learn what vector search is, perform your first semantic search with a prepared demo dataset and build your own vector search application with Question Answering. We'll use a dataset of newspaper articles, but you are free to use your own dataset instead! Besides running each cell, there are some **exercises** defined, in which you need to take some action like coming up with data sources, finishing a data schema and making creative queries.

You can also [open the file as Colab notebook](https://colab.research.google.com/github/semi-technologies/weaviate-examples/blob/main/question-answering-application-with-weaviate-workshop/question-answering-application-with-weaviate-workshop.ipynb).

## Table of Contents
1. What is a Vector Database?
    1. Perform your first semantic search
2. Build your first vector search application
    1. Create a Weaviate cluster and connect to it
    2. Get data and analyze it
    3. Make a data schema
    4. Vectorize & upload data
    5. Query data
3. Bonus: automatic data classification

## 1. What is a Vector Database?

Traditional search engines perform a *keyword-based search*. Such search engines return results that contain an exact match or a close variation of a search query, like searching for papers about *“Covid-19“*. Those *keyword-based* search engines work well in some cases, but fall short in use cases where *semantics* play a role. This means that traditional search engines do not take the actual meaning of the query and documents or data into account. Documents in a database that might be of interest might not be found, because different wording is used. This means that, for example, a paper with the word *“coronavirus”* in it will not be found with the search term *“Covid-19“*.

*Vector-based search* (or semantic search, neural search) tries to solve this problem. Vector search engines like [Weaviate](https://weaviate.io/) retrieve documents by meaning, not by exact matching keywords. With machine learning models like BERT or ResNet50, a vector database converts data objects (text, images, etc) to embeddings. Embeddings are vector representations, which are stored in a high-dimensional vector space. The following image shows a simplified example of embedded concepts and images. Note that some of the name *“cat”* lies close to an image of a cat. And that *“dog”* lies closer to *“cat”* than *“banana“*. 

![Shows a 3 dimensional space with words and images placed according to their meaning](vectors-3d-multi.png "Vector database representation")

**Similarity search**

If we would search for *“kitten”* in our simple database, the machine learning model would calculate a vector for this word, and return the closest data points. Although there is no datapoint matching exactly the word *“kitten“*, the vector search engine would return objects like the image of the cat, because this is *semantically* close to the query. This is visualized in the following figure:

![Shows a 3 dimensional space with a search query and words and images placed according to their meaning](vectors-3d-multi-query.png "Vector search representation")

**Using Weaviate to do a vector-based search**

Weaviate is an open-source vector-based search engine. With Weaviate you can perform a semantic search to various data types (text, images, etc) as described above. To embed data and queries, you can use out-of-the-box models, use open models (e.g. from HuggingFace), or connect your own machine learning model. Weaviate is horizontally scalable, which enables you to scale ML models and search through big data very fast.

**Try out a semantic search demo**

To get a glimpse of how semantic search in practice works, you can try out the following demo. This demo contains a small dataset of around 3500 news articles, which are embedded with a [BERT model](https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english). You can access this dataset on https://demo.dataset.playground.semi.technology. Using the [Weaviate Console](https://console.semi.technology/), you can query this dataset. You can use [this link](https://link.semi.technology/37kMdjR) to access the GraphQL query module on this dataset directly.

With the following GraphQL query, you search the dataset for *Articles* that are around the topic of *“kitten”*. The query *“kitten”* will be vectorized using the BERT model, and *Articles* that are close to this vector will be retrieved. The title of the top articles along with the calculated *“certainty”* (ranging from 0.0-1.0) are shown. 

```graphql
{
  Get{
    Article(
      nearText: {
        concepts: ["kitten"]
      }
    ){
      title
      _additional {
        certainty
      }
    }
  }
}
```

If you run this query, you’ll see that the results will all be around the topic of *“kitten”* or related (cats or pets).

## 2. Build your first vector search application

### 2.1 Create a Weaviate cluster and connect to it

In order to build your own vector search application with Weaviate, you need a place to start and run the Weaviate instance. You can run Weaviate on you local machine or your cloud provider, or you can run a Weaviate instance using the Weaviate Cluster Service (WCS). If you want to run Weaviate yourself using Docker Compose or Kubernetes, follow the [installation steps in Weaviate's documentation](https://weaviate.io/developers/weaviate/current/getting-started/installation.html) to get a `docker-compose.yml` file and how to start it up. In this tutorial, we are using the WCS to start a Weaviate instance and interact with it using the console.

**Exercise 1: Create a WCS account on https://console.semi.technology/.**

There are two options of how you can create a cluster once you have an account. You can either create one directly in the Console, or you can use the Weaviate Python Client to create a cluster from your code. We are going for the latter option here. We need to install the Weaviate Python Client in the environment first, and then we can use the WCS method to log in. 

In [None]:
import sys
!{sys.executable} -m pip install weaviate-client==3.4.2

Now let's import the package and log in with your WCS account.

In [None]:
from getpass import getpass # hide password
import weaviate # to communicate to the Weaviate instance
from weaviate.wcs import WCS

In order to authenticate to WCS or Weaviate instance (if Weaviate instance has Authentication enable) we need to create an Authentication object. At the moment it supports two types of authentication credentials: 
* Password credentials: `weaviate.auth.AuthClientPassword(username='WCS_ACCOUNT_EMAIL', password='WCS_ACCOUNT_PASSWORD')`
* Token credentials `weaviate.auth.AuthClientCredentials(client_secret=YOUR_SECRET_TOKEN)`

For WCS we will use the Password credentials.  

In [None]:
my_credentials = weaviate.auth.AuthClientPassword(username=input("User name: "), password=getpass('Password: '))

The my_credentials object contains your credentials so be careful not make it public.

In [None]:
my_wcs = WCS(my_credentials)

Now that we connected to WCS, we can `create`, `delete_cluster`, `get_clusters`, `get_cluster_config` and check the status of a cluster with `is_ready` method. You can check additional documentation [here](https://weaviate-python-client.readthedocs.io/en/latest/weaviate.wcs.html).

*If you want to check the prototype and docstring of any methods in a notebook, run this command: `object.method?`. You can also use the `help()` function.*<br>
Ex: `WCS.is_ready?` or `my_wcs.is_ready?` or `help(WCS.is_ready)`.

When creating a Weaviate cluster, we also need to define which *vectorizer* (retriever) module we will use. A list of available modules can be found [here](https://weaviate.io/developers/weaviate/current/modules/index.html). For this tutorial we use a [`text2vec-transformers` module](https://weaviate.io/developers/weaviate/current/retriever-vectorizer-modules/text2vec-transformers.html) to vectorize data and queries. The specific Transformers models is [sentence-transformers-paraphrase-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/paraphrase-MiniLM-L6-v2) and is available on HuggingFace. 

While this model is a vectorizer (or retriever or encoder) model, which calculates vectors from data, we can extend the search pipeline with a *reader* model. Reader models will be called after a *retriever* model retrieved a list of candidate search results from the query. An example of a reader model is a Question Answering model. This model will try to find an answer to a question posed in the query in the list of candidate data points that were returned by the retriever model. In this tutorial, we will add such a Question Answering model to the pipeline, in this case the pretrained [bert-large-uncased-whole-word-masking-finetuned-squad](https://huggingface.co/bert-large-uncased-whole-word-masking-finetuned-squad) which is also available on HuggingFace. 

Alternatively, you can choose to use the `text2vec-contextionary` module or one of the other `text2vec-transformers` modules to vectorize text. Keep in mind that the free 'sandbox' Weaviate cluster that you get from WCS runs on CPU only, so Transformer models can be slower to import/vectorize data. `text2vec-contextionary` module vectorize text faster. If you want to experiment with a `text2vec-transformers` module, you are advised to run a Weaviate locally or on your cloud and enable GPU. You can change the `module` config parameter to for example (available models can be found [here](https://weaviate.io/developers/weaviate/current/retriever-vectorizer-modules/text2vec-transformers.html)):

If you want to add an additional spellcheck module to the Question Answering module and Transformer Retriever module:

```python
modules = [{
    "name": "text2vec-transformers", 
    "tag": "sentence-transformers-paraphrase-MiniLM-L6-v2"
    }, { 
    "name": "qna-transformers",
    "tag": "bert-large-uncased-whole-word-masking-finetuned-squad"
    }, { 
    "name": "text-spellcheck",
    "tag": "pyspellchecker-en"
    }]
```

or use the light-weight contextionary:

```python
modules = [{
    "name": "text2vec-contextionary",
    "tag": "en0.16.0-v1.0.0"
    }]
```

or use the contextionary in another language than English, for example Dutch:

```python
modules = [{
    "name": "text2vec-contextionary",
    "tag": "nl0.16.0-v1.0.2"
    }]
```   

In [None]:
with_auth = False # set to true if you don't want your instance to be publically available
cluster_name = None # you can change this to some name you like, if kept to None a random name will be generated
modules = [{
    "name": "text2vec-transformers", 
    "tag": "sentence-transformers-paraphrase-MiniLM-L6-v2"
    }, { 
    "name": "qna-transformers",
    "tag": "bert-large-uncased-whole-word-masking-finetuned-squad"
    }] 

weaviate_url = my_wcs.create(cluster_name, with_auth=with_auth, modules=modules, wait_for_completion=True) 

weaviate_url

**Congratulations! You have your first Weaviate cluster running!**

You can check the cluster at the given URL. You can check the meta information at `<weaviate_url>/v1/meta`, it's data schema at `<weaviate_url>/v1/schema` and the data objects at `<weaviate_url>/v1/objects`. You can also see this cluster on your homepage on [console.semi.technology](https://console.semi.technology) (when you are logged in). 

We need to connect to the cluster with the Python client in order to create a data schema, import data and query data:

In [None]:
client = weaviate.Client(weaviate_url)
client.is_ready()

### 2.2 Get data and analyze it

We set up a Weaviate instance, connected to it and have it ready for requests. Now it is time to take a step back and get some data and analyze it. 

This step, as for all the machine learning models, is the most important one. Here we have to decide what is relevant, what is important and what data structures/types to use.

In this example we are going to use news articles, which are pieces of unstructured textual data. For this we are going to need the `newspaper3k` package:

In [None]:
!{sys.executable} -m pip install newspaper3k

In the next code block we defined a function that retrieves articles from a newspaper. It saves the articles in `objects`, which consists of the article's `title`, `authors`, `title`, `word_count` and a UUID3 `id` generated from the article's URL.

In [None]:
import nltk # it is a dependency of newspaper3k
nltk.download('punkt')

In [None]:
import newspaper
import uuid
import json
from tqdm import tqdm

def get_articles_from_newspaper(
        news_url: str, 
        max_articles: int=100
    ) -> None:
    """
    Download and save newspaper articles as weaviate schemas.
    Parameters
    ----------
    newspaper_url : str
        Newspaper title.
    """
    
    objects = []
    
    # Build the actual newspaper    
    news_builder = newspaper.build(news_url, memoize_articles=False)
    
    if max_articles > news_builder.size():
        max_articles = news_builder.size()
    pbar = tqdm(total=max_articles)
    pbar.set_description(f"{news_url}")
    i = 0
    while len(objects) < max_articles and i < news_builder.size():
        article = news_builder.articles[i]
        try:
            article.download()
            article.parse()
            article.nlp()

            if (article.title != '' and \
                article.title is not None and \
                article.summary != '' and \
                article.summary is not None and\
                article.authors):

                # create an UUID for the article using its URL
                article_id = uuid.uuid3(uuid.NAMESPACE_DNS, article.url)

                # create the object
                objects.append({
                    'id': str(article_id),
                    'title': article.title,
                    'summary': article.summary,
                    'authors': article.authors,
                    'word_count': len(article.summary.split())
                })
                
                pbar.update(1)

        except:
            # something went wrong with getting the article, ignore it
            pass
        i += 1
    pbar.close()
    return objects

**Exercise 2: Let's download articles from some newspapers.** First, we initialize a data list, then you can add newspapers to it, using the function we defined above, like:
`data += get_articles_from_newspaper('http://cnn.com')`

In [None]:
data = []
data += get_articles_from_newspaper('https://www.wired.com/')

# for example: 
# data += get_articles_from_newspaper('https://www.wired.com/')
# add more newspapers if you like! 

### 2.3 Make a Weaviate data schema

We have some data downloaded. Before we can upload it to Weaviate, we need to create a [data schema](https://weaviate.io/developers/weaviate/current/data-schema/index.html). Adding a Weaviate data schema is not a requirement; if you don’t add a data schema, an automatic schema will be generated from the data that you import. But [defining a data schema yourself](https://weaviate.io/developers/weaviate/current/data-schema/schema-configuration.html) allows you to name classes, properties and relations between data entities yourself, so you have full control over the naming and structure, and also about configuration of modules and data vectorizers. 

A data schema consist of `classes`. Each `class` has a name and has `properties`. A `property` defines a data value that you can add to a `class`. 

Let's define a minimal schema for our Newspapers example. There will be three classes: `"Article"`, `"Author"` and `"Category"`. `"Article"` is already defined, it is now up to you to define the other two classes with the properties: `"Author"` has properties `"name"` and `"wroteArticles"`, `"Category"` has one property `"name"`. The schema can be visualized as follows:

![Shows a visualization of the data schema with object classes and relations](news-schema.png "Data schema visualization")

In the class schema below we create a class named `Article` with the description `An Article class to store the article summary and its authors`. The description is there to explain the user what this class is about.

Also we define 4 properties: 
* `title` - The title of the article, of type `string` (case sensitive),
* `summary` - The summary of the article, of data type `text` (case insensitive), 
* `wordCount` - The amount of words in the summary of the article, of data type `int`, 
* `hasAuthor` - The authors of the article, of data type `Author`. The `Author` is NOT a primitive data type; this is how we make cross-references between data items. This way you can link your data objects in-between them and create a relation graph. In this case, we need to define the class `Author` as well. The list of primitive data types can be found [here](https://www.semi.technology/developers/weaviate/current/data-schema/datatypes.html).
* `hasCategory` - The category of the article, of data type `Category`. The `Category` is NOT a primitive data type; this is how we make cross-references between data items. This way you can link your data objects in-between them and create a relation graph. In this case, we need to define the class `Category` as well. The list of primitive data types can be found [here](https://www.semi.technology/developers/weaviate/current/data-schema/datatypes.html).

**NOTE 1:** The properties should always be in cameCase format and starts with a lowercased word.<br>
**NOTE 2:** The property data type is always a list because it can accept more than one data type. 

**Exercise 3: `"Article"` is already defined, it is now up to you to define the other two classes with the properties: `"Author"` has properties `"name"` and `"wroteArticles"`, `"Category"` has one property `"name"`.**

In [None]:
schema = {
    "classes": [
        {
            "class": "Article", # name of the class
            "description": "An Article class to store the article summary and its authors", # a description of what this class represents
            "properties": [ # class properties
                {
                    "name": "title",
                    "dataType": ["string"],
                    "description": "The title of the article", 
                },
                {
                    "name": "summary",
                    "dataType": ["text"],
                    "description": "The summary of the article",
                },
                {
                    "name": "wordCount",
                    "dataType": ["int"],
                    "description": "The number of words in the article's summary",
                },
                {
                    "name": "hasAuthors",
                    "dataType": ["Author"],
                    "description": "The authors this article has",
                },
                {
                    "name": "hasCategory",
                    "dataType": ["Category"],
                    "description": "The category of this article",
                }
            ]
        }, {
            "class": "Author",
            "description": "An Author class to store the author information",
            "properties": [
                {
                    "name": "name",
                    "dataType": ["string"],
                    "description": "The name of the author", 
                },
                {
                    "name": "wroteArticles",
                    "dataType": ["Article"],
                    "description": "The articles of the author", 
                }
            ]
        }, {
            "class": "Category",
            "description": "Genre of an article",
            "properties": [
                {
                    "name": "name",
                    "dataType": ["string"],
                    "description": "The name of the article", 
                }
            ]
        }

 ]
}

Now that we decided on the data structure, we can tell Weaviate this schema. This can be done by accessing the `schema` attribute of the client. 

The schema can be created by using the [`.create()` method](https://weaviate-python-client.readthedocs.io/en/latest/weaviate.schema.html#weaviate.schema.Schema.create), this option creates multiple classes at once (useful if you have the whole schema). Alternatively you can use the [`.create_class()` method](https://weaviate-python-client.readthedocs.io/en/latest/weaviate.schema.html#weaviate.schema.Schema.create_class), this option creates only one class per call.

Also we can check if a schema is present or if a particular class schema is present with the `.contains()` method.

More about schema methods, click [here](https://weaviate-python-client.readthedocs.io/en/latest/weaviate.schema.html#module-weaviate.schema) for the Python docs or [here](https://weaviate.io/developers/weaviate/current/restful-api-references/schema.html) for the general schema docs of Weaviate.

In [None]:
client.schema.create(schema)

Lets get the schema from weaviate and look what was created.

In [None]:
# helper function
def prettify(json_dict): 
    print(json.dumps(json_dict, indent=2))

In [None]:
prettify(client.schema.get())

You can always delete your entire schema with `client.schema.delete_all()`.

### 2.4 Vectorize & upload data

In the previous steps we have preprocessed our data and have created a Weaviate data schema. Now we're ready to add `Articles` and `Authors`.

Importing data to weaviate can be done in 2 different ways.

1. Adding object by object iteratively. This ca be done using the `data_object` object attribute of the client. This option requires one REST request per object, thus is slower than importing data in batches.
2. Adding objects in batches, with the  `Batch` class. This is efficient if you have large amounts of data to upload. The new class also supports 3 different cases of loading data in batches: a) Manually - the user has the absolute control when and how to add and create batches; b) Auto-create batches when full; c) Auto-create batches using dynamic batching, i.e. the batch size is adjusted every time it is created to avoid any Timeout errors.

We will use the second method to upload our data. 

First, we define functions `add_article()`, `add_author()` and `add_references()`. In this function we make clear how the data that we have saved in our `data` list will be stored in Weaviate, using the schema we defined in the previous step. 

**EXERCISE4 : Finish the add_author() and add_references() function with the examples in the same code block.** 

In [None]:
from weaviate.batch import Batch # for the typing purposes
from weaviate.util import generate_uuid5


def add_article(batch: Batch, article_data: dict) -> str:
    
    article_object = {
        'title': article_data['title'],
        'wordCount': article_data['word_count'],
        'summary': article_data['summary'].replace('\n', '') # remove newline character
    }
    article_id = article_data['id']
    
    # add article to the batch
    batch.add_data_object( 
        data_object=article_object,
        class_name='Article',
        uuid=article_id
    )
    
    return article_id

def add_author(batch: Batch, author_name: str) -> str:
    
    author_object = {'name': author_name}

    # generate an UUID for the Author
    author_id = generate_uuid5(author_name)
    
    # add author to the batch
    batch.add_data_object( 
        data_object=author_object,
        class_name='Author',
        uuid=author_id
    )    
    return author_id

def add_references(batch: Batch, article_id: str, author_id: str)-> None:
    # add references to the batch
    ## Author -> Article
    batch.add_reference(
        from_object_uuid=author_id,
        from_object_class_name='Author',
        from_property_name='wroteArticles',
        to_object_uuid=article_id
    )
    
    ## Article -> Author 
    batch.add_reference(
        from_object_uuid=article_id,
        from_object_class_name='Article',
        from_property_name='hasAuthors',
        to_object_uuid=author_id
    )

Next, we use the `batch()` method to upload the data to Weaviate. We are using a dynamic auto-create batch method. The batch method automatically creates objects and references when a batch is full, which is determined dynamically (depending on the object's size). 

In [None]:
client.batch.configure(batch_size=50, dynamic=True, callback=None)

with client.batch as batch:

    for i in data:

        # add article to the batch
        article_id = add_article(batch, i)

        for author in i['authors']:

            # add author to the batch
            author_id = add_author(batch, author)

            # add cross references to the batch
            add_references(batch, article_id=article_id, author_id=author_id)

If the data upload is successful, you are now ready to query the data!

### 2.5 Query data

If you've successfully ran the previous cells, you now have the newspaper data present in your Weaviate cluster! You are ready to query the data.

**Exercise 5: Query Weaviate**. You can do this via 1) the GUI on console.semi.technology or 2) via the python client

#### 2.5.1 Query data using the GUI

You can fire GraphQL queries to the dataset via [console.semi.technology](https://console.semi.technology). Log in and connect to the correct cluster on the dashboard via the 'connect' button. Then, go to 'Query' in the left-side menu, which leads to a GraphiQL interface. In the left window you can write queries, and the result will be displayed on the right. You can open the query docs by clicking the button on the top right corner. You can also visit the docs on the website for more information about the queries: https://weaviate.io/developers/weaviate/current/graphql-references/index.html. 

Now, let's try the following queries. Feel free to adjust them and try out more, these are just some examples!

**1. Get a list of article names**
```graphql
{
  Get {
    Article {
      title
    }
  }
}
```

**2. Get the number of articles present**
```graphql
{
  Aggregate {
    Article {
      meta {
        count
      }
    }
  }
}
```

**3. Get the average, minimum and maximum word count of all articles**
```graphql
{
  Aggregate {
    Article {
      wordCount {
        minimum
        mean
        maximum
      }
    }
  }
}
```

**4. Get article titles and cross-ref authors**
```graphql
{
  Get {
    Article {
      title
      hasAuthors {
        ... on Author {
          name
        }
      }
    }
  }
}
```

**5. Get the title, id and vector of the first article**
```graphql
{
  Get {
    Article(limit: 1) {
      title
      _additional {
        id
        vector
      }
    }
  }
}
```

**6. Semantic search: get articles that are near "animal" and return the title, summary and semantic certainty**
```graphql
{
  Get {
    Article (
      nearText: {
        concepts: ["animal"]
      }
    ) {
      title
      summary
      _additional {
        certainty
      }
    }
  }
}
```

**7. Semantic search: get articles that are near "music" and return the title, summary and semantic certainty, if certainty is above 0.8**
```graphql
{
  Get {
    Article (
      nearText: {
        concepts: ["music"],
        certainty: 0.8
      }
    ) {
      title
      summary
      _additional {
        certainty
      }
    }
  }
}
```

**8. You can also combine semantic and scalar filters:**
```graphql
{
  Get {
    Article (
      where: {
        operands: [{
          path: ["hasAuthors"],
          valueInt: 2
          operator: GreaterThan
        }, {
          path: ["wordCount"],
          valueInt: 100
          operator: GreaterThan
        }], 
        operator: And
      }
      nearText: {
        concepts: ["animals"]
      }
    ) {
      title
      wordCount
      hasAuthors {
        ... on Author {
          name
        }
      }
    }
  }
}
```

**9. Use the Question Answering module:**
```graphql
{
  Get {
    Article(
      ask: {
        question: "Which earplugs should I buy?",
        rerank: true,
        properties: ["summary"]
      }
      limit: 1
    ) {
      title
      summary
      _additional {
        answer {
          result
        }
      }
    }
  }
}
```

#### 2.5.2 Query data using python

You can use all GraphQL functions also with the Weaviate Python client. For reference, [check the  docs](https://weaviate.io/developers/weaviate/current/graphql-references/index.html). Some examples:


**1. Get a list of article names**

In [None]:
result = client.query.get("Article", ["title"]).do()
prettify(result)

**2. Get the number of articles present**

In [None]:
result = client.query.aggregate("Article") \
    .with_fields('meta { count }') \
    .do()
prettify(result)

**3. Get the average, minimum and maximum word count of all articles**


In [None]:
result = client.query.aggregate("Article") \
    .with_fields("wordCount {maximum mean minimum}") \
    .do()
prettify(result)

**4. Get article titles and cross-ref authors**

In [None]:
query = "{Get {Article {title hasAuthors {... on Author {name }}}}}"

result = client.query.raw(query)

prettify(result)

**5. Get the title, id and vector of the first article**

In [None]:
result = client.query.get("Article", ["title", "_additional {id vector} "]).with_limit(1).do()
prettify(result)

**6. Semantic search: get articles that are near "animal" and return the title, summary and semantic certainty**

In [None]:
nearText = {
  "concepts": ["animal"]
}

client.query.get("Article", ["title", "summary", "_additional {certainty} "]).with_near_text(nearText).do()

**7. Semantic search: get articles that are near "music" and return the title, summary and semantic certainty, if certainty is above 0.8**

In [None]:
nearText = {
  "concepts": ["music"],
  "certainty": 0.8
}

client.query.get("Article", ["title", "_additional {certainty} "]).with_near_text(nearText).do()

**8. You can also combine semantic and scalar filters:**

In [None]:
query = """{
  Get {
    Article (
      where: {
        operands: [{
          path: ["hasAuthors"],
          valueInt: 2
          operator: GreaterThan
        }, {
          path: ["wordCount"],
          valueInt: 100
          operator: GreaterThan
        }], 
        operator: And
      }
      nearText: {
        concepts: ["animals"]
      }
    ) {
      title
      wordCount
      hasAuthors {
        ... on Author {
          name
        }
      }
    }
  }
}"""
result = client.query.raw(query)
prettify(result)

**9. Use the Question Answering module:**

In [None]:
ask = {
  "question": "Which earplugs should I buy?",
  "rerank": True,
  "properties": ["summary"]
}

client.query.get("Article", ["title", "_additional {answer {hasAnswer certainty property result startPosition endPosition} }"]).with_ask(ask).with_limit(1).do()


Congratulations! Now you know how to start a Weaviate cluster and use the Python client to upload and query data! 

We've set up a very basic vector search application. For more features, check out the documentation. For example, you can add [Question Answering](https://weaviate.io/developers/weaviate/current/reader-generator-modules/qna-transformers.html), do [multi-modal search](https://weaviate.io/developers/weaviate/current/retriever-vectorizer-modules/multi2vec-clip.html), start automatic [classifications](https://weaviate.io/developers/weaviate/current/restful-api-references/classification.html), and much more! 

You can join the [Weaviate Slack channel](https://join.slack.com/t/weaviate/shared_invite/zt-goaoifjr-o8FuVz9b1HLzhlUfyfddhw) to interact with other Weaviate users and ask questions or give feedback :) 

## BONUS: Automatic data classification

In the schema, we added the class "Category", and the property "hasCategory" in "Article" which points to this class. The articles are not yet assigned to a category, these references are not filled yet. You can check this by doing the following query: 
```graphql
{
  Get {
    Article {
      title 
      hasCategory {
        ... on Category {
          name
        }
      }
    }
  }
}
```

You will see that the name of the category will be `null` for all articles. 

We can use zero-shot classification to automatically assign articles to a category. For more information, check [the docs](https://weaviate.io/developers/weaviate/current/restful-api-references/classification.html#zero-shot-classification).

**Exercise 6: Zero shot classification to assign categories to articles** 

Follow the steps below. 


First, define some categories. You can choose yourself which categories you want to add. Keep in mind that every article will be assigned to one of the categories (no article will be left), with this simple form of classification. Categories can be for example "Business", "Sports", "Environment", etc. 

In [None]:
# define the categories, add as many as you want!
# categories = ["Business", "Arts", "Technology", "..."]

categories = ["Business", "Arts", "Animals", "Government", "Finance", "Religion", "Sport", "Lifestyle", "Crime", "General", "Environment", "Health", "Entertainment", "Education", "World", "Technology"]

In [None]:
# add the categories as data objects
for category in categories:
    client.batch.add_data_object({"name": category}, "Category", generate_uuid5(category))
client.batch.create_objects()

In [None]:
# perform the classification
client.classification.schedule()\
            .with_type("zeroshot")\
            .with_class_name("Article")\
            .with_classify_properties(["hasCategory"])\
            .with_based_on_properties(["summary"])\
            .with_wait_for_completion()\
            .do()

Once the classification is complete and successful, you can check if your articles now have a category assigned, using a GraphQL query.