Skip to content
Dual-specification query synthesis with natural language and table sketch queries
Python TSQL JavaScript HTML SQLPL Dockerfile Other
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data
duoquest
enum
results
squid
task_db
web
.dockerignore
.gitignore
.gitmodules
Dockerfile
LICENSE
README.md
cli.py
config.ini.example
docker_cfg.ini
eval.py
experiments.py
main.py
nltk_download.py
profile_spider.py
requirements.txt
squid.py

README.md

duoquest

Dual-specification query synthesis with natural language and table sketch queries.

Quickstart

Dependencies:

TODO: Simplify this process with Docker compose - cannot do this until issue with GPUS is resolved.

  1. Set the variable correctly for the most recent version.
export DQ_VERSION=0.1
  1. Start the Docker network (dq-net).
docker network create dq-net
  1. Run the data container (dq-data).
docker run --rm -dit --name dq-data -v dq-vol:/home/data chrisjbaik/duoquest-data:$DQ_VERSION
  1. Run the Enumerator (dq-enum) using one of the following instructions.
docker run --rm --gpus all -dit --name dq-enum --network dq-net -v dq-vol:/workspace/data chrisjbaik/duoquest-enum:$DQ_VERSION

# Add `--toy` for fast-starting debugging mode with decreased performance
docker run --rm --gpus all -dit --name dq-enum --network dq-net -v dq-vol:/workspace/data chrisjbaik/duoquest-enum:$DQ_VERSION --toy
  1. Run the task database container (dq-task-db).
docker run --rm -dit --name dq-task-db --network dq-net chrisjbaik/duoquest-task-db:$DQ_VERSION
  1. Run the autocomplete container (dq-autocomplete).
docker run --rm -dit --name dq-autocomplete --network dq-net redis
  1. Run the web interface container (dq-web). Note the -p option, where the first port number indicates which port the web interface will run on the host machine. Also note the WORKERS_PER_CORE option, which determines how many workers will run for the web server (we set it to 0.1 because we have a 32-core server).
docker run --rm -dit -p 5000:80 -e WORKERS_PER_CORE="0.1" --name dq-web --network dq-net -v dq-vol:/home/data chrisjbaik/duoquest-web:$DQ_VERSION
  1. Run the main container (dq-main). The --timeout flag indicates how many seconds each task will run before giving up.
docker run --rm -dit --name dq-main --network dq-net -v dq-vol:/home/data chrisjbaik/duoquest-main:$DQ_VERSION --timeout=60

Simulation Experiments

Run

Follow the instructions in steps 1-7 under Quickstart above. Instead of starting the dq-main container with the default entrypoint, we run simulation experiments using the following command:

docker run --rm -dit --name dq-main --network dq-net -v dq-vol:/home/data --entrypoint="python" chrisjbaik/duoquest-main experiments.py spider dev default

The last 3 arguments indicate the dataset, subset of dataset, and type of evaluation (default, partial, minimal, nlq_only, tsq_only, chain), respectively.

If dq-main is already running, shut down and remove that container using docker stop and docker rm if needed to ensure this container can run.

If you want to view the experiment progress in real-time, use the following:

docker logs -f dq-main

The results will automatically be saved in a results folder within the shared volume dq-vol.

Result Summary/Analysis

The following container can be executed to generate a viewable result summary/analysis after running a simulation experiment:

docker run --rm -it --name dq-eval --network dq-net -v dq-vol:/home/data --entrypoint="python" chrisjbaik/duoquest-main eval.py spider dev default

Note that all arguments (including any additional arguments unmentioned above, like --timeout) must exactly match the arguments executed when running the dq-main container for running the experiment!

Task Database Schema

The task database has the following schema:

Tasks

Column Name Type Description
tid text task id
db text database name
nlq text natural language query
nlq_with_literals text raw NLQ including tag markup
tsq_proto blob table sketch query protobuf
literals_proto blob literals protobuf
status text waiting, running, done, or error
time integer timestamp for task submission time
error_msg text error message, if any

Databases

Column Name Type Description
name text database name
path text database path in file system
schema_proto blob schema protobuf

Results

Column Name Type Description
rid integer primary key/unique id for result
tid text foreign key to task id
query text candidate SQL query

Build Process for Docker Images (Development Only)

  1. Save the version number as a variable.
export DQ_VERSION=<version_number_here>
  1. Download Spider dataset and mas_smallest.sqlite into data/ folder.

  2. Build data container.

docker build -t chrisjbaik/duoquest-data:$DQ_VERSION data/
  1. Load/build Enumerator image.
cd enum/syntaxSQL
git submodule init        # only if submodule not initialized yet
git submodule update      # only if submodule not initialized yet
docker build -t chrisjbaik/duoquest-enum:$DQ_VERSION .
cd ../../
  1. Build task database image.
docker build -t chrisjbaik/duoquest-task-db:$DQ_VERSION -f task_db/Dockerfile .
  1. Build web interface image.
docker build -t chrisjbaik/duoquest-web:$DQ_VERSION -f web/Dockerfile .
  1. Build main image.
docker build -t chrisjbaik/duoquest-main:$DQ_VERSION .
You can’t perform that action at this time.