Skip to content

Speed-up database downloads #1805

Open
Open
@aibaars

Description

@aibaars

Is your feature request related to a problem? Please describe.

I feel like it takes too long to download a CodeQL database from GitHub into VSCode.

Describe the solution you'd like

Use multi-threaded downloads to speed things up.

Describe alternatives you've considered
N/A

Additional context

For example the QL database from github/codeql is only 160MB, but it takes 2 minutes to download. If I concurrently download 10 chunks of the file the download takes less than 10 seconds. I wrote a small bash script to demonstrate.
A single 160MB chunk:

time sh script.sh github/codeql ql 1
gh api -H Accept: application/zip -H Range: bytes=0-165712932 /repos/github/codeql/code-scanning/codeql/databases/ql

real	2m9.894s
user	0m0.439s
sys	0m1.426s

and a download with 10 chunks of 16MB:

time sh script.sh github/codeql ql 10
gh api -H Accept: application/zip -H Range: bytes=0-16571293 /repos/github/codeql/code-scanning/codeql/databases/ql
gh api -H Accept: application/zip -H Range: bytes=16571294-33142587 /repos/github/codeql/code-scanning/codeql/databases/ql
gh api -H Accept: application/zip -H Range: bytes=33142588-49713881 /repos/github/codeql/code-scanning/codeql/databases/ql
gh api -H Accept: application/zip -H Range: bytes=49713882-66285175 /repos/github/codeql/code-scanning/codeql/databases/ql
gh api -H Accept: application/zip -H Range: bytes=66285176-82856469 /repos/github/codeql/code-scanning/codeql/databases/ql
gh api -H Accept: application/zip -H Range: bytes=82856470-99427763 /repos/github/codeql/code-scanning/codeql/databases/ql
gh api -H Accept: application/zip -H Range: bytes=99427764-115999057 /repos/github/codeql/code-scanning/codeql/databases/ql
gh api -H Accept: application/zip -H Range: bytes=115999058-132570351 /repos/github/codeql/code-scanning/codeql/databases/ql
gh api -H Accept: application/zip -H Range: bytes=132570352-149141645 /repos/github/codeql/code-scanning/codeql/databases/ql
gh api -H Accept: application/zip -H Range: bytes=149141646-165712932 /repos/github/codeql/code-scanning/codeql/databases/ql

real	0m9.752s
user	0m1.069s
sys	0m2.009s

The script

#! /bin/bash

nwo="$1"
lang="$2"
count="$3"

URL="/repos/${nwo}/code-scanning/codeql/databases/${lang}"
SIZE=$(gh api  -H "Accept: application/zip" -H "Range: bytes=0-1" -i "${URL}"  | tr -d '\r' |  grep "Content-Range: bytes 0-1/" | cut -d / -f 2)
CHUNK_SIZE=$(expr "${SIZE}" / "${count}")

start=0
parts=""
for i in $(seq $(expr "${count}" - 1))
do
  end=$(expr "${start}" + "${CHUNK_SIZE}")
  echo gh api  -H "Accept: application/zip" -H "Range: bytes=${start}-${end}" "${URL}"
  gh api  -H "Accept: application/zip" -H "Range: bytes=${start}-${end}" "${URL}" > "part-$i" &
  start=$(expr "${end}" + 1)
  parts="${parts}part-${i} "
done

if [ "${start}" -lt "${SIZE}" ] ; then
 echo gh api  -H "Accept: application/zip" -H "Range: bytes=${start}-${SIZE}" "${URL}"
 gh api  -H "Accept: application/zip" -H "Range: bytes=${start}-${SIZE}" "${URL}" > "part-${count}"
 parts="${parts}part-${count}"
fi
wait

cat $parts > database.zip
rm -f $parts

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions