This directory contains an example of data preprocessing using Beautiful Soup.
preprocess_openmp_faq.py
: Python script that reads the OpenMP FAQ from an HTML file, and writes the output to standard output. The output is in JSONL format, with each question-answer pair on a single line.openmp_faq.html
: HTML file containing the OpenMP FAQ.openmp_faq.jsonl
: JSONL file containing the OpenMP FAQ in a structured format.