Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbases.xml updated suffix #98

Open
indexofire opened this issue Aug 29, 2020 · 7 comments
Open

dbases.xml updated suffix #98

indexofire opened this issue Aug 29, 2020 · 7 comments
Assignees
Labels

Comments

@indexofire
Copy link

indexofire commented Aug 29, 2020

The dbases.xml updated suffix path of profile and sequence. it's csv and fasta now instead of txt and tfa. the mlst-download-pub-mlst script should be update.

@indexofire indexofire changed the title dbases.xml updated filetype dbases.xml updated suffix Aug 29, 2020
@indexofire
Copy link
Author

indexofire commented Sep 1, 2020

right now I use this revised mlst-download_pub_mlst nasty script to grab pubmlst data.

#!/bin/bash

set -e

OUTDIR=pubmlst
mkdir -p "$OUTDIR"
wget --no-clobber -P "$OUTDIR" http://pubmlst.org/data/dbases.xml

for URL in $(grep '<url>' $OUTDIR/dbases.xml); do
#  echo $URL
  URL=${URL//<url>}
  URL=${URL//<\/url>}
#  echo ${URL: -4}
  if [ ${URL:(-4)} = "_csv" ]; then
    #PROFILE=$(basename $URL .txt)
    PROFILE=$(echo $URL | awk -F'_' '{print $2}')
    NUM=$(echo $URL | awk -F'/' '{if($7!=1)print "_"$7}')
    echo "# $PROFILE "
    PROFILEDIR="$OUTDIR/$PROFILE$NUM"
    echo "mkdir -p '$PROFILEDIR'"
    echo "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' -O '$PROFILE$NUM.txt')"
  elif [ ${URL:(-6)} = "_fasta" ]; then
    ALLELE=$(echo $URL | awk -F'/' '{print $7}')
    echo "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' -O '$ALLELE.tfa')"
  fi
done

# delete fungi schemes
echo rm -frv "$OUTDIR"/{afumigatus,blastocystis,calbicans,cglabrata,ckrusei}
echo rm -frv "$OUTDIR"/{ctropicalis,csinensis,kseptempunctata,sparasitica,tvaginalis}

@tdcollingsworth
Copy link

Thank you so much, @indexofire!

Any chance we'll see these updates reflected in the repo @tseemann?

Grateful for all your guys hard work and dedication, I know we've all got a lot on our plates right now.

@tdcollingsworth
Copy link

To make sure 'mlst-make_blast_db' functions correctly, one suggestion I would make here is to alter:

echo "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' -O '$ALLELE')"

to

echo "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' -O '$ALLELE.tfa')"

Cheers!

@andersgs andersgs self-assigned this Sep 4, 2020
@andersgs andersgs added the bug label Sep 4, 2020
@indexofire
Copy link
Author

To make sure 'mlst-make_blast_db' functions correctly, one suggestion I would make here is to alter:

echo "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' -O '$ALLELE')"

to

echo "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' -O '$ALLELE.tfa')"

Cheers!

The nasty script still does not create the same scheme name in pubmlst folder as the original one beacuse of the change of dbases.xml . Hope that's OK for users.

@safrye
Copy link

safrye commented Oct 14, 2020

Hi,
For me the script didn't work. The subfolders were not created and the files not downloaded. I had to change the echo command into an eval command. Any explanations for an old DOS-user?

Here the lines I changed:

eval "mkdir -p '$PROFILEDIR'"
eval "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' --output-document='$PROFILE$NUM.txt')"
elif [ ${URL:(-6)} = "_fasta" ]; then
ALLELE=$(echo $URL | awk -F'/' '{print $7}')
eval "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' --output-document='$ALLELE.tfa')"

@javiertognarelli
Copy link

Hi,
For me the script didn't work. The subfolders were not created and the files not downloaded. I had to change the echo command into an eval command. Any explanations for an old DOS-user?

Here the lines I changed:

eval "mkdir -p '$PROFILEDIR'"
eval "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' --output-document='$PROFILE$NUM.txt')"
elif [ ${URL:(-6)} = "_fasta" ]; then
ALLELE=$(echo $URL | awk -F'/' '{print $7}')
eval "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' --output-document='$ALLELE.tfa')"

This change works for me!!! Also thank you all of you for this fix!!!!!

@javiertognarelli
Copy link

javiertognarelli commented Dec 16, 2020

Hi everyone. It looks like pubMLST has changed something again, so now a new fix is needed, because there is two different URLs to download schemes and sequences, so should be like this:

#!/bin/bash

set -e

OUTDIR=pubmlst
mkdir -p "$OUTDIR"
wget --no-clobber -P "$OUTDIR" http://pubmlst.org/data/dbases.xml

for URL in $(grep '<url>' $OUTDIR/dbases.xml); do
#  echo $URL
  URL=${URL//<url>}
  URL=${URL//<\/url>}
#  echo ${URL: -4}
  if [ ${URL:(-4)} = "_csv" ]; then
    #PROFILE=$(basename $URL .txt)
    PROFILE=$(echo $URL | awk -F'_' '{print $2}')
    if [ $(echo $URL | awk -F'/' '{print $3}')  = "rest.pubmlst.org" ]; then
        NUM=$(echo $URL | awk -F'/' '{if($7!=1) print "_"$7}')
    else
        NUM=$(echo $URL | awk -F'/' '{if($8!=1) print "_"$8}')
    fi
    echo "# $PROFILE "
    PROFILEDIR="$OUTDIR/$PROFILE$NUM"
    eval "mkdir -p '$PROFILEDIR'"
    eval "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' -O '$PROFILE$NUM.txt')"
  elif [ ${URL:(-6)} = "_fasta" ]; then
    if [ $(echo $URL | awk -F'/' '{print $3}')  = "rest.pubmlst.org" ]; then
        ALLELE=$(echo $URL | awk -F'/' '{print $7}')
    else
        ALLELE=$(echo $URL | awk -F'/' '{print $8}')
    fi
    eval "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' -O '$ALLELE.tfa')"
  fi
done

# delete fungi schemes
echo rm -frv "$OUTDIR"/{afumigatus,blastocystis,calbicans,cglabrata,ckrusei}
echo rm -frv "$OUTDIR"/{ctropicalis,csinensis,kseptempunctata,sparasitica,tvaginalis}

I hope this could save time and suffering

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants