diff --git a/guides/structured-processing/load-doc.md b/guides/structured-processing/load-doc.md index fb334d5..cebd34b 100644 --- a/guides/structured-processing/load-doc.md +++ b/guides/structured-processing/load-doc.md @@ -44,7 +44,7 @@ For this example, we'll use a sample document containing city demographic data t 1. **Access the Sample Document** Open this Google Docs document containing city demographic data: - [https://docs.google.com/document/d/1_5KfhWL9fN3VuhANIVKuJX6MVBH3vFVW3RiUNYE9jHQ/edit](https://docs.google.com/document/d/1_5KfhWL9fN3VuhANIVKuJX6MVBH3vFVW3RiUNYE9jHQ) + [https://raw.githubusercontent.com/trustgraph-ai/example-data/main/cities/most-populous-cities.pdf](https://raw.githubusercontent.com/trustgraph-ai/example-data/main/cities/most-populous-cities.pdf) 2. **Save as PDF** - In Google Docs, click **File** → **Download** → **PDF Document (.pdf)** @@ -121,6 +121,12 @@ e.g. `Object extraction`. ## Step 3: Launch Document Processing +When loading the document on the workbench, it can help to decide to store +the data in a particular collection for later. Click the dialog +top right, and set the collection to `cities`. + +Set collection option + On the Library page, select your document containing city information, click 'Submit' at the bottom of the screen. diff --git a/guides/structured-processing/load-file.md b/guides/structured-processing/load-file.md index dafba77..eb3ef74 100644 --- a/guides/structured-processing/load-file.md +++ b/guides/structured-processing/load-file.md @@ -49,9 +49,9 @@ Before starting this guide, ensure you have: ## Data files you will need: -- [UK pies](https://drive.google.com/file/d/1u0DzP5bu15sSwnHldpZTVXUNoVo5DzFQ/view?usp=sharing) -- [French pies](https://drive.google.com/file/d/1xHBYLkrbB1NmJeeXNRlQUuQCQyPThuN-/view?usp=drive_link) -- [Pies Structured Descriptor Language](https://drive.google.com/file/d/1ALuMuwRy8m_hUk2Y_ftFLHK44TwhNUv3/view?usp=drive_link) +- [UK pies](https://raw.githubusercontent.com/trustgraph-ai/example-data/main/pies/uk-pies-simplified.xml) +- [French pies](https://raw.githubusercontent.com/trustgraph-ai/example-data/main/pies/fr-pies-simplified.xml) +- [Pies Structured Descriptor Language](https://raw.githubusercontent.com/trustgraph-ai/example-data/main/pies/pies-sdl.json) ## Step 1: Define a Schema @@ -379,6 +379,10 @@ and collection load. ## Notes +- At the time of writing, the prompts work well at XML processing, + but we'll be optimising to work with smaller models and provide + better coverage of other data types. We recommend you stick with + XML data for TrustGraph 1.3. - You may find that the prompts are sensitive to different LLMs, and that you may see hallucinations or insensitivity to different data features. - XPath expressions have some incompatibilities and edge cases with diff --git a/guides/structured-processing/nlp-query.png b/guides/structured-processing/nlp-query.png new file mode 100644 index 0000000..d66641d Binary files /dev/null and b/guides/structured-processing/nlp-query.png differ diff --git a/guides/structured-processing/query.md b/guides/structured-processing/query.md index a902d0d..c11d086 100644 --- a/guides/structured-processing/query.md +++ b/guides/structured-processing/query.md @@ -32,11 +32,46 @@ Before starting this guide, ensure you have: - Python 3.10 or later with the TrustGraph CLI tools installed (`pip install trustgraph-cli`) - Sample documents or structured data files to process +## Workbench + +The Structured Query page on the workbench UI allows you to run the +queries we'll be running here. Make sure: + +- You have set the collection parameter correctly in the session state + popover, top-right. +- Be sure to set a flow which has object processing enabled e.g. the + `obj-ex` flow which you created if you are following this guide. + +NLP query + +Structured query + ## NLP query operation This operation takes a natural language query, and uses an LLM prompt to convert to a GraphQL query. This uses defined schema, so you need -to have the pies schema loaded: +to have the schemas loaded in the previous guide steps. + +This is a building block for more complete functionality, but it may +be useful for you to be able to look at converted queries to check that +your application is performing well. + +```bash +tg-invoke-nlp-query -f obj-ex -q 'Cities with more than 22.8m people' +``` + +If successful the output is something like... + +``` +Generated GraphQL Query: +---------------------------------------- +query { cities(where: {population: {gt: 22800000}}) { city country population } } +---------------------------------------- +Detected Schemas: cities +Confidence: 95.00% +``` + +Querying the pies data: ``` tg-invoke-nlp-query -f obj-ex \ @@ -59,6 +94,31 @@ Confidence: 95.00% This operation takes a GraphQL query, and executes it on the object store. +City example: + +``` +tg-invoke-objects-query -f obj-ex --collection cities -q ' +{ + cities(where: {population: {gt: 22800000}}) { city country population } +} +' +``` + +``` ++-----------+------------+------------+ +| city | country | population | ++-----------+------------+------------+ +| Shanghai | China | 30482140 | +| São Paulo | Brazil | 22990007 | +| Delhi | India | 34665569 | +| Tokyo | Japan | 37036204 | +| Dhaka | Bangladesh | 24652864 | +| Cairo | Egypt | 23074225 | ++-----------+------------+------------+ +``` + +Pies example: + ``` tg-invoke-objects-query -f obj-ex \ --collection uk-pies \ @@ -86,6 +146,28 @@ You can use `--format` to request CSV or JSON output. This is an API which uses the above two operations in sequence. +Cities example: + +``` +tg-invoke-structure-query -f obj-ex --collection cities \ + -q 'Cities with more than 22.8m people' +``` + +``` ++-----------+------------+------------+ +| city | country | population | ++-----------+------------+------------+ +| Shanghai | China | 30482140 | +| São Paulo | Brazil | 22990007 | +| Delhi | India | 34665569 | +| Tokyo | Japan | 37036204 | +| Dhaka | Bangladesh | 24652864 | +| Cairo | Egypt | 23074225 | ++-----------+------------+------------+ +``` + +Pies example: + ``` tg-invoke-structured-query -f obj-ex \ --collection uk-pies \ @@ -107,6 +189,9 @@ You can use `--format` to request CSV or JSON output. ## With collections +Using the same schema with different collections allows you to group +data: + ``` tg-invoke-structured-query -f obj-ex \ --collection fr-pies \ diff --git a/guides/structured-processing/set-collection.png b/guides/structured-processing/set-collection.png new file mode 100644 index 0000000..87df3e9 Binary files /dev/null and b/guides/structured-processing/set-collection.png differ diff --git a/guides/structured-processing/structured-query.png b/guides/structured-processing/structured-query.png new file mode 100644 index 0000000..9a71617 Binary files /dev/null and b/guides/structured-processing/structured-query.png differ