Using LLMs for the Extraction and Normalization of Product Attribute Values

This repository contains code and data for experiments on attribute value extraction and normalization using large language models.

Requirements

We evaluate hosted LLMs, such as GPT-3.5 and GPT-4. Therefore, an OpenAI access tokens needs to be placed in a .env file at the root of the repository. To obtain this OpenAI access token, users must sign up for an OpenAI account.

Installation

The codebase requires python 3.9 To install dependencies we suggest to use a conda virtual environment:

conda create -n wdc-pave python=3.9
conda activate wdc-pave
pip install -r requirements.txt
pip install pieutils/

Dataset

The extraction and extraction&normalization WDC PAVE data can be found in the data/processed_datasets folder.

Prompts

We experiment with various prompt templates involving descriptions and example values, and adding demonstrations. The following figure shows the prompt structures for the two schema descriptions (black font for extraction, black + red font for extraction + normalization).

Execution

The prompts and the code to execute the prompts are defined in the folder prompts. You can run the prompts with the following scripts:

scripts/01_run_example_values_prompts.sh
scripts/02_run_prompts_with_training_data.sh
scripts/08_run_prompts_for_data_normalization.sh

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
analysis		analysis
data		data
pieutils		pieutils
preprocessing		preprocessing
prompts		prompts
resources		resources
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

analysis

analysis

data

data

pieutils

pieutils

preprocessing

preprocessing

prompts

prompts

resources

resources

scripts

scripts

src

src

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

Using LLMs for the Extraction and Normalization of Product Attribute Values

Requirements

Installation

Dataset

Prompts

Execution

About

Releases

Packages

Languages

wbsg-uni-mannheim/wdc-pave

Folders and files

Latest commit

History

Repository files navigation

Using LLMs for the Extraction and Normalization of Product Attribute Values

Requirements

Installation

Dataset

Prompts

Execution

About

Resources

Stars

Watchers

Forks

Languages