Skip to content

Datasets

Nevio Vesic edited this page May 7, 2024 · 21 revisions

{Un}pack datasets are basically clickhouse database dumps that are available for you to use freely.

We tried numerous approaches to deal with massive amount of the data, to make it available publicly, while preserving sanity of the amount of the data that you would need to download. Idea was, even if you have smaller bandwidth and network connection to be able with some time, download and start using tool-chain locally.

We didn't test on smaller machines to do the exports or imports. Our current machine processing the datasets (ingestion and export) utilizes 256Gb of RAM and AMD Threadripper's processor (48C). In the future we are going to deal with this and ensure that proper reports are created and applied to the wiki. Bellow, under examples you can see the time it took for us to dump, compress and decompress the data.

Export and Import Datasets

Export

We'll keep it short here and just in short bullet points explain higher overview of how datasets are being generated.

  1. Syncer service ingests the data into the database.
  2. Executing export command which will take all of the available tables from the database and create {table}.clickhouse dump.
  3. Executing compress command that is going to compress the data using 7z compression algorithm and create {table}.clickhouse.7z files.
  4. Executing make compress-ethereum which is going to compress raw ethereum source code.
  5. Running the scripts/upload.sh by executing make upload that is going to upload clickhouse and ethereum datasets to https://r2.unpack.dev/.

Navigate to Import to see how you can import datasets. Check out Compress to see how you can compress datasets.

Command

unpack datasets clickhouse export

Example

Bellow you can see how you can utilizing one-liner export whole datasets into .clickhouse format.

Exporting tables for database: default . Please be patient. This WILL take a while...

Exported table: contracts to /home/nevio/dev/unpack/inspector/datasets/contracts.clickhouse (size: 4.71 GB)
Exported table: metadata to /home/nevio/dev/unpack/inspector/datasets/metadata.clickhouse (size: 0.03 GB)
Exported table: tokens to /home/nevio/dev/unpack/inspector/datasets/tokens.clickhouse (size: 0.01 GB)
Exported table: ast to /home/nevio/dev/unpack/inspector/datasets/ast.clickhouse (size: 0.01 GB)
Exported table: cfg to /home/nevio/dev/unpack/inspector/datasets/cfg.clickhouse (size: 0.05 GB)
Exported table: constructors to /home/nevio/dev/unpack/inspector/datasets/constructors.clickhouse (size: 1.39 GB)
Exported table: standards to /home/nevio/dev/unpack/inspector/datasets/standards.clickhouse (size: 1.42 GB)
Exported table: variables to /home/nevio/dev/unpack/inspector/datasets/variables.clickhouse (size: 1.51 GB)
Exported table: functions to /home/nevio/dev/unpack/inspector/datasets/functions.clickhouse (size: 40.98 GB)
Exported table: events to /home/nevio/dev/unpack/inspector/datasets/events.clickhouse (size: 2.06 GB)

Successfully exported tables for database: default - Completed in 1m57.11145471s

Import

Navigate to Export to see how you can export datasets.

Command

unpack datasets clickhouse import

Example

Importing tables into the database. Please be patient. This WILL take a while...

Imported data into table: contracts
Imported data into table: metadata
Imported data into table: tokens
Imported data into table: ast
Imported data into table: cfg
Imported data into table: constructors
Imported data into table: standards
Imported data into table: variables
Imported data into table: functions
Imported data into table: events

Successfully imported tables into the database. Completed in 3m32.002066679s

Compress and Decompress datasets

Compress

Command

unpack datasets clickhouse compress

Example

Compressing exported files. Please be patient. This WILL take a while...

Compressed file: /home/nevio/dev/unpack/inspector/datasets/contracts.clickhouse.7z (size: 0.19 GB)
Compressed file: /home/nevio/dev/unpack/inspector/datasets/metadata.clickhouse.7z (size: 0.01 GB)
Compressed file: /home/nevio/dev/unpack/inspector/datasets/tokens.clickhouse.7z (size: 0.00 GB)
Compressed file: /home/nevio/dev/unpack/inspector/datasets/ast.clickhouse.7z (size: 0.00 GB)
Compressed file: /home/nevio/dev/unpack/inspector/datasets/cfg.clickhouse.7z (size: 0.00 GB)
Compressed file: /home/nevio/dev/unpack/inspector/datasets/constructors.clickhouse.7z (size: 0.05 GB)
Compressed file: /home/nevio/dev/unpack/inspector/datasets/standards.clickhouse.7z (size: 0.01 GB)
Compressed file: /home/nevio/dev/unpack/inspector/datasets/variables.clickhouse.7z (size: 0.09 GB)
Compressed file: /home/nevio/dev/unpack/inspector/datasets/functions.clickhouse.7z (size: 2.23 GB)
Compressed file: /home/nevio/dev/unpack/inspector/datasets/events.clickhouse.7z (size: 0.08 GB)

Successfully compressed exported files. Completed in 18m34.316607237s

Decompress

Command

unpack datasets clickhouse decompress

Example

Decompressing exported database files. Please be patient. This WILL take a while...

Decompressed file: /home/nevio/dev/unpack/inspector/datasets/contracts.clickhouse
Decompressed file: /home/nevio/dev/unpack/inspector/datasets/metadata.clickhouse
Decompressed file: /home/nevio/dev/unpack/inspector/datasets/tokens.clickhouse
Decompressed file: /home/nevio/dev/unpack/inspector/datasets/ast.clickhouse
Decompressed file: /home/nevio/dev/unpack/inspector/datasets/cfg.clickhouse
Decompressed file: /home/nevio/dev/unpack/inspector/datasets/constructors.clickhouse
Decompressed file: /home/nevio/dev/unpack/inspector/datasets/standards.clickhouse
Decompressed file: /home/nevio/dev/unpack/inspector/datasets/variables.clickhouse
Decompressed file: /home/nevio/dev/unpack/inspector/datasets/functions.clickhouse
Decompressed file: /home/nevio/dev/unpack/inspector/datasets/events.clickhouse

Successfully decompressed exported files. Completed in 3m35.361370553s

Upload and Download

Upload

Command

unpack datasets clickhouse upload

Example

Uploading exported archive datasets to Cloudflare R2. Please be patient. This WILL take a while...

Uploading (contracts.clickhouse.7z): 190.985 MiB, 100%, ETA 0s
Uploading (metadata.clickhouse.7z): 11.128 MiB, 100%, ETA 0s
Uploading (tokens.clickhouse.7z): 3.243 MiB, 100%, ETA 0s
Uploading (ast.clickhouse.7z): 3.015 MiB, 100%, ETA 0s
Uploading (cfg.clickhouse.7z): 5.093 MiB, 100%, ETA 0s
Uploading (constructors.clickhouse.7z): 51.992 MiB, 100%, ETA 0s
Uploading (standards.clickhouse.7z): 10.909 MiB, 100%, ETA 0s
Uploading (variables.clickhouse.7z): 93.205 MiB, 100%, ETA 0s
Uploading (functions.clickhouse.7z): 2.225 GiB, 100%, ETA 0s
Uploading (events.clickhouse.7z): 76.833 MiB, 100%, ETA 0s

Successfully uploaded exported archive datasets to Cloudflare R2. Completed in 4m24.593401168s

Download

Command

unpack datasets download

Example

Downloading exported archive datasets from Cloudflare R2. Please be patient. This WILL take a while...

Destination path: /home/nevio/dev/unpack/inspector/datasets-test
Database Tables: [contracts metadata tokens ast cfg constructors standards variables functions events]
Blockchains: [ethereum]

[11/12] Downloading ethereum.7z... 100% [========================================] (20 MB/s)         

Successfully downloaded exported archive datasets from Cloudflare R2. Completed in 3m3.04008376s