This is an Intake plugin supporting a single YAML hierarchical catalog to organize datasets and avoid a data swamp.
Example of organizing the datasets by business domain entities:
metadata:
hierarchical_catalog: true
entity:
customer:
customer_attributes:
args:
urlpath: s3://foo
driver: parquet
user:
user_profile:
args:
urlpath: s3://foo
driver: parquet
Can be accessed as:
df = catalog.entity.customer.customer_attributes.read()