Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Condenser hanging -- debugging options? #22

Open
SimonGoring opened this issue Jun 9, 2021 · 1 comment
Open

Condenser hanging -- debugging options? #22

SimonGoring opened this issue Jun 9, 2021 · 1 comment

Comments

@SimonGoring
Copy link

I've created a public gist with my config file and a link to a dump of the database I'm applying condenser against. The issue I'm running into is that condenser appears to hang (over 24hrs with no new text to screen on verbose mode), but I'm not sure how to debug the issue, or know whether or not anything is actually happening.

I'm running condenser as part of a broader workflow through a bash script:

#!/bin/bash
#
# A bash script that uses `condenser` to export a database subset to a database
# to a `localhost` database, and then dump the file and compress it into a tar
# file.
#
# Simon Goring - May 12, 2021
#

# First we check to see if the condenser files actually exist.
if [[ ! -f db_connect.py ]]
then
    echo "Condenser does not exist in the current directory."
    pip install toposort
    pip install psycopg2-binary
    pip install mysql-connector-python
    git clone --depth=1 git@github.com:TonicAI/condenser.git .
    rm -rf !$/.git
fi

# Clone the repo
#
# Remove the .git directory
#rm -rf !$/.git

export PGPASSWORD='DATABASE PASSWORD'
psql -h localhost -U postgres -c "CREATE DATABASE export;"
echo "SELECT 'DROP SCHEMA '||nspname||' CASCADE; CREATE SCHEMA '||nspname||';' FROM pg_catalog.pg_namespace WHERE NOT nspname ~ '.*_.*'" | \
    psql -h localhost -d export -U postgres -t | \
    psql -h localhost -d export -U postgres
python3 direct_subset.py -v
echo "SELECT 'DROP SCHEMA '||nspname||' CASCADE;' FROM pg_catalog.pg_namespace WHERE nspname =ANY('{"ap","da","doi","ecg","emb","gen","ti","ts","tmp"}')" | \
    psql -h localhost -d export -U postgres -t | \
    psql -h localhost -d export -U postgres
now=`date +"%Y-%m-%d"`
mkdir -p dumps
mkdir -p archives
pg_dump -Fc -O -h -o localhost -U postgres -v -d export > ./dumps/$1_dump_${now}.sql
tar -cvf ./archives/$1_dump_${now}.tar -C ./dumps $1_dump_${now}.sql
# -----------------------------------
# |  Clean up files and databases   |
# -----------------------------------
psql -h localhost -U postgres -c "DROP DATABASE export;"
rm ./dumps/$1_dump_${now}.sql
rmdir ./dumps

That's more an FYI about how we're trying to use it though. The key element is that we're just calling condenser with python3 direct_subset.py -v and the config file is linked above in the gist.

The goal of this issue is to note that there seems to be a point at which condenser is hanging, and to figure out a way to debug it so I can fix it.

@theaeolianmachine
Copy link
Contributor

Hi @SimonGoring, do you have any details on the last log statements? I think adding additional logging would certainly be useful for debugging and determining where it's actually hanging.

Notably condenser doesn't do anything fancy to my knowledge with say threads or other forms of deadlocks, so my guess is it might be a query timeout or a connection timeout to the database. It probably wouldn't be too hard to actually hook into where queries are issued to print the last issued query in a debug mode; would be happy to take a look at a PR for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants