ArchDataPy

ArchDataPy is a lightweight Python package for accessing archaeological datasets from R package archives in Python. It can download registered CRAN source packages, extract their .rda data files, and load those files with pyreadr. It also includes a small dataset registry for direct access to selected datasets as pandas DataFrames.

Features

Download registered R package archives without needing R installed.
List available package sources, including archdata and folio.
Load .rda files into Python using pyreadr.
Load selected datasets directly from the dataset registry as pandas DataFrames.
Use custom sources by passing a CRAN archive URL or local .tar.gz package archive.

Installation

You can install ArchDataPy from PyPI:

pip install archdatapy

For local development, clone the repository and install it in editable mode:

git clone https://github.com/wccarleton/archdatapy.git
cd archdatapy
pip install -e .

Dependencies

This package requires the following Python libraries:

requests
pyreadr
pandas

These dependencies are automatically installed when you install the package.

Usage

1. List registered package sources

The package ships with a registry in package_registry.json. Registered package keys currently include archdata and folio.

from archdatapy import list_available_packages

print(list_available_packages())

2. Download a package and build a manifest

The get_archdata function accepts either:

a registry key for a known CRAN package, or
a direct package archive URL or local archive path.

It returns a manifest mapping dataset names to .rda file paths, along with package metadata.

from archdatapy import get_archdata

# Download the default registered package, archdata
manifest = get_archdata()
print(manifest.package_name)
print(manifest.source_url)
print(manifest.keys())

To download another registered package, pass its registry key:

from archdatapy import get_archdata

manifest = get_archdata(data_url="folio")
print(manifest.package_name)
print(manifest.keys())

3. Load a specific `.rda` file from the manifest

Use load_archdata with a path from the returned manifest.

from archdatapy import load_archdata

dataset_name = 'Acheulean'  # Example key from the manifest
data = load_archdata(manifest[dataset_name])
print(data)

pyreadr.read_r() returns a dictionary-like object because a single .rda file can contain one or more R objects.

4. Load a selected dataset directly

The package also ships with a smaller dataset registry in datasets.json. These entries point directly to individual dataset files and can be loaded with get_dataset.

from archdatapy import get_dataset, list_available_datasets

print(list_available_datasets())
mask_site = get_dataset("MaskSite")
print(mask_site.head())

5. Use your own package source

If you want to use a different CRAN package archive, pass the archive URL or local .tar.gz path directly:

manifest = get_archdata(data_url='https://cran.r-project.org/src/contrib/yourpackage_1.0.0.tar.gz')

Documentation

Full documentation is available on the GitHub Pages site: https://wccarleton.github.io/archdatapy

Contributing

Contributions are welcome. Please feel free to submit issues or pull requests to improve the package.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Roadmap

Future enhancements planned for ArchDataPy:

High Priority (Completed ✅)

Registry-based package sourcing system
Modern packaging with pyproject.toml (PEP 517/518/621)
Type hints for better IDE support
Automated CI/CD with GitHub Actions
.gitignore and MANIFEST.in for clean distribution

Medium Priority

Expand package registry with curated archaeology datasets
Add structured logging instead of print statements
Improve error messages with helpful recovery suggestions
Add CONTRIBUTING.md guide for registry contributions
Include metadata (DOI, citations) in registry entries

Lower Priority

Optional caching layer for load_archdata()
Docstring examples and doctests
Dependency version compatibility checking
GitHub issue/PR templates
Support for additional data formats beyond .rda

Contributing to the Registry

To add new package sources to the package registry:

Fork the repository
Edit archdatapy/package_registry.json to add your source
Submit a pull request with a description of the package and datasets

Registry entries should follow this structure:

{
  "package_name": {
    "url": "https://cran.r-project.org/src/contrib/package_1.0.0.tar.gz",
    "description": "Description of the package and datasets",
    "homepage": "https://CRAN.R-project.org/package=package",
    "license": "Package license"
  }
}

Acknowledgments

The default registry includes datasets from the R archdata package, a collection of archaeological datasets maintained on CRAN. It provides the datasets used in Quantitative Methods in Archaeology Using R by David L. Carlson.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
archdatapy		archdatapy
docs		docs
notebooks		notebooks
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
environment.yml		environment.yml
index.rst		index.rst
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ArchDataPy

Features

Installation

Dependencies

Usage

1. List registered package sources

2. Download a package and build a manifest

3. Load a specific `.rda` file from the manifest

4. Load a selected dataset directly

5. Use your own package source

Documentation

Contributing

License

Roadmap

High Priority (Completed ✅)

Medium Priority

Lower Priority

Contributing to the Registry

Acknowledgments

About

Uh oh!

Releases 3

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ArchDataPy

Features

Installation

Dependencies

Usage

1. List registered package sources

2. Download a package and build a manifest

3. Load a specific .rda file from the manifest

4. Load a selected dataset directly

5. Use your own package source

Documentation

Contributing

License

Roadmap

High Priority (Completed ✅)

Medium Priority

Lower Priority

Contributing to the Registry

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

3. Load a specific `.rda` file from the manifest

Packages