# Using ChatGPT in Jupyter Notebook

As a trainer, you might have prepared a set of questions to ask your students. This will increase interactivity. Thinking upfront about what to ask is a valuable skill, that will pay off during the training (you don't need to invent questions on the fly).

You might want to use ChatGPT to generate answers to these questions. This can help you with brainstorming, adding more questions, and getting a better understanding of the topic.

This notebook shows how to do this.

**Setup**

The first cell below setups the required environment. It will install the required libraries and define a function to generate answers. Fetch the OPENAI_API_KEY from the dashboard and paste it in a new file called '.env' in the same directory as this notebook.

```bash
OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
```

In [None]:
%pip install openai
import os
import openai

openai.api_key = os.getenv("OPENAI_API_KEY")
start_sequence = " Keep it short with a bulleted list. The context for this question is Microsoft Azure technology."

def askGPT(question):
    response = openai.Completion.create(
        model="text-davinci-003",
        prompt=question + start_sequence,
        temperature=0.7,
        max_tokens=300,
        top_p=1,
        frequency_penalty=0,
        presence_penalty=0
    )
    print(response.choices[0].text)

**Here's a list of questions that I prepared for a particular topic (introduction to data engineering on Azure):**

- Share an example of structured/unstructured/semi-structured data. 
  [🔗](https://learn.microsoft.com/en-us/training/modules/choose-storage-approach-in-azure/2-classify-data)
- A Parquet file would be an example of structured/unstructured/semi-structured?
  [🔗](https://www.youtube.com/watch?v=auNAzC3AU18&t=152s)
- What's the difference between data integration/data transformation/data consolidation?
  [🔗](https://cyclr.com/blog/data-consolidation-through-integration)
- Why would I use a programming language like Python to transform data? 
  [🔗](https://www.simplilearn.com/why-python-is-essential-for-data-analysis-article#why_is_python_essential_for_data_analysis)
- Why would I consider Spark? 
  [🔗](https://spark.apache.org/)
- Why would I consider using a data lake?
  [🔗](https://learn.microsoft.com/en-us/azure/architecture/data-guide/scenarios/data-lake)
- What is the difference between a data lake and a data warehouse? 
  [🔗](https://learn.microsoft.com/en-us/azure/architecture/data-guide/scenarios/data-lake#when-to-use-a-data-lake)
- Why is it a bad idea to mix reporting/analytics and OLTP databases?
  [🔗](https://learn.microsoft.com/en-us/azure/architecture/data-guide/relational-data/online-transaction-processing#challenges)
  [🔗](https://learn.microsoft.com/en-us/azure/architecture/data-guide/relational-data/online-analytical-processing#challenges)
- What is the difference between Azure Synapse Analytics Pipelines and Azure Data Factory?
  [🔗](https://learn.microsoft.com/en-us/azure/synapse-analytics/data-integration/concepts-data-factory-differences)

Below, I have a cell ready to use for each question. You can copy/paste the question into the cell and run it to get an answer.

In [2]:
askGPT("Why is it a bad idea to mix reporting/analytics and OLTP databases?")



-Reporting and analytics databases are optimized for read-heavy workloads, while OLTP databases are optimized for write-heavy workloads.
Mixing them together can lead to performance issues and contention.
-OLTP databases are designed to handle many transactions quickly, while reporting and analytics databases are designed to store large volumes of data for longer periods of time.
-OLTP databases are usually optimized for short transactions and have more stringent security requirements, while reporting and analytics databases are optimized for longer transactions and have less stringent security requirements.
-Mixing the two databases can lead to data integrity issues, as well as increased complexity for the administrators.
