This is the repository for the LinkedIn Learning course OpenAI API: Multimodal development with GPT-4o
. The full course is available from LinkedIn Learning.
In this hands-on course you’ll use the OpenAI API to leverage the multimodal capabilities of GPT-4o and function calling to extract text from images, conform the data to JSON, and call functions to save the extracted data to a spreadsheet.
See the readme file in the main branch for updated instructions and information.
This repository holds example data and two Jupyter Notebooks:
data/
holds a collection of images of random receipts and one wild-card.expenses.csv
is the target CSV. At init, the CSV only holds column headings.gp4o-setup.py
demonstrates how to how to access gpt-4o for multimodal prompting.modular-process.py
and the module files inutils/
demonstrate a comprehensive process of ingesting and interpreting multiple receipts and sending the data to a CSV file.
The first time you run a block in a Jupyter Notebook, the environment will ask you to pick an environment. Follow the instructions and pick the first available Python environment.
NOTE: The first code block may take a while to load because the environment has to load first.
It is recommended you run these exercise files in GitHub Codespaces. This gives you a pre-configured Python environment for the Jupyter Notebooks to run. To use the exercise files, follow these steps:
- In the root folder, rename the file
env-template
to.env
. - Go to https://platform.openai.com/api-keys.
- Generate a new key and copy the key to your clipboard.
- In
.env
add the key without quotes or parentheses.