Skip to content

Add to_parquet and to_csv methods for Dataset and Station classes#556

Merged
vergauwenthomas merged 10 commits intodevfrom
copilot/fix-553
Sep 2, 2025
Merged

Add to_parquet and to_csv methods for Dataset and Station classes#556
vergauwenthomas merged 10 commits intodevfrom
copilot/fix-553

Conversation

Copy link
Contributor

Copilot AI commented Sep 2, 2025

This PR implements new data export methods for the MetObs-toolkit, allowing users to save processed observations with QC flags to parquet and CSV formats.

Added Methods

Dataset Class

  • to_parquet(target_file, **kwargs) - Export multi-station observations to parquet format
  • to_csv(target_file, **kwargs) - Export multi-station observations to CSV format

Station Class

  • to_parquet(target_file, **kwargs) - Export single station observations to parquet format
  • to_csv(target_file, **kwargs) - Export single station observations to CSV format

Implementation Details

All methods follow the established codebase patterns:

  • Use the @log_entry decorator for consistent logging
  • Export the DataFrame from the .df property (includes processed data with QC labels)
  • Accept **kwargs to pass additional options to the underlying pandas methods
  • Include comprehensive docstrings with parameters, returns, and "See Also" sections

Usage Examples

# Export entire dataset with all stations
dataset.to_parquet('observations.parquet')
dataset.to_csv('observations.csv')

# Export individual station data
station = dataset.get_station('vlinder05')
station.to_parquet('vlinder05_data.parquet')
station.to_csv('vlinder05_data.csv')

# Use pandas options for customization
dataset.to_csv('data.csv', sep=';', float_format='%.2f')
station.to_parquet('data.parquet', compression='gzip')

Testing and Documentation

  • Added comprehensive tests in test_importing.py following the existing save_dataset_to_pkl test pattern
  • Updated API documentation for both Dataset and Station classes
  • Added new "Exporting data" section to the introduction notebook with practical examples
  • Applied black code formatting for style consistency

The implementation enables users to easily export their processed meteorological observations for further analysis or sharing while preserving all quality control information.

Fixes #553.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 2 commits September 2, 2025 07:39
Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com>
Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com>
@vergauwenthomas vergauwenthomas added the RUN TESTS The main testing workflow is runned if this label is added to a PR label Sep 2, 2025
Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com>
@vergauwenthomas vergauwenthomas changed the base branch from master to dev September 2, 2025 07:47
Copilot AI changed the title [WIP] Save final dataset to parquet and csv Add to_parquet and to_csv methods for Dataset and Station classes Sep 2, 2025
@vergauwenthomas vergauwenthomas marked this pull request as ready for review September 2, 2025 08:42
Copilot AI review requested due to automatic review settings September 2, 2025 08:42
@vergauwenthomas vergauwenthomas merged commit 5c372f2 into dev Sep 2, 2025
32 checks passed
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds data export functionality to the MetObs-toolkit by implementing to_parquet and to_csv methods for both Dataset and Station classes. These methods enable users to export processed meteorological observations with quality control flags to standard data formats.

Key changes:

  • Added export methods that leverage pandas DataFrame serialization capabilities
  • Implemented comprehensive test coverage with round-trip validation
  • Added pyarrow dependency for parquet support

Reviewed Changes

Copilot reviewed 6 out of 8 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/test_importing.py Added four comprehensive test methods validating round-trip export/import for both formats and classes
src/metobs_toolkit/station.py Implemented to_parquet and to_csv methods for Station class with proper logging and documentation
src/metobs_toolkit/dataset.py Implemented to_parquet and to_csv methods for Dataset class with proper logging and documentation
pyproject.toml Added pyarrow dependency for parquet format support
docs/reference/station.rst Updated API documentation to include new export methods
docs/reference/dataset.rst Updated API documentation to include new export methods

vergauwenthomas added a commit that referenced this pull request Sep 5, 2025
* Modeltimeseries unit conv (#545)

* functionality for unit conversion in creation of modeltimeseries

* import at top

* update docstring of Modeltimeseries

* rename obstype to modelobstype for modeltimeseries

* black edits

* rename obstype attr

* typo

* fixing bugs

* Update the exaple to illustrate how units are converted

* Update tests and fixed bugs

* code review fixes

* black edits

* Update src/metobs_toolkit/modeltimeseries.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* minor version bump

* Add to_parquet and to_csv methods for Dataset and Station classes (#556)

* Initial plan

* Implement to_parquet and to_csv methods for Dataset and Station classes

Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com>

* Update documentation for to_parquet and to_csv methods

Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com>

* Apply black formatting to tests and validate implementation

Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com>

* fix the parquet implementation

* update the introduction example

* new subsection in api docs

* update docstrings

* fix formatting the csv test

* black edits

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com>
Co-authored-by: Thomas Vergauwen <thomas.vergauwen@meteo.be>

* Update README.md to include conda install and badge (#555)

* Update README.md to include conda install and badge

* flags before package name convention

* update the docs to include conda install description

---------

Co-authored-by: Thomas Vergauwen <thomas.vergauwen@meteo.be>

* Implement CF-compliant netCDF serialization for xarray Datasets with nested attributes (#558)

* Initial plan

* Implement core netCDF serialization functionality with CF compliance

Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com>

* Complete netCDF serialization implementation with documentation and examples

Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com>

* version in seperate file to be accesible by other methods

* fix version in CI

* fix serializable cr datasets

* update docstrings

* add label conversion in the example

* fix bugs and xr tests

* fix version test

* fix the to_netcdf methods

* add to netcdf tests in the xr testing module

* code review style fixes

* black edits

* add to netcdf mode in the xarray topic

* to_netcdf in the introduction example

* sync version with pyproject

* bugfix

* fix merge issue with version

* add netcdf4

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com>
Co-authored-by: Thomas Vergauwen <thomas.vergauwen@meteo.be>

* Parquet reader (#557)

* Adding tests and Parquet reader

* Adding parquet test data

* trigger tests

* fixed mismatch in tz + tests + docstring

* not intended for commit

* replace csvfilereader by the find_suitable_reader function

* update notebook

* bugfix

* black edits

* fix failing test

* fix tests

---------

Co-authored-by: nea-ppatel <patel_pratiman@nea.gov.sg>
Co-authored-by: Thomas Vergauwen <thomas.vergauwen@meteo.be>

* Qc on dataset error handling (#560)

* drop faulty files

* raise warnings if target station does not hold target obstype when qc on dataset level

* write test

* rename func

* black edits

* use tmp module in tests

* Fix NaTType error in frequency estimation for empty variable lists (#562)

* Initial plan

* Fix NaTType error in frequency estimation for empty timestamps

Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com>

* fix issue by selecting a minimum of 1 non-na values

* Delete tests/test_timestampmatcher.py

* test with nans

* black edits

* use warning instead of logging

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com>
Co-authored-by: Thomas Vergauwen <thomas.vergauwen@meteo.be>

* Standardize warning formatting by converting operational warnings to logging (#565)

* Initial plan

* Replace warnings.warn with logger.warning for operational warnings

Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com>

* Update test to check for logging instead of warnings

Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com>

* fix warnings bug

* Check stations for obstype when GF is called on Dataset (#566)

* add filter_to_stations_with_target_obstype to GF methods on Dataset

* black edits

* black edits

* Implement human-readable __repr__ methods for all main classes (#568)

* Initial plan

* Implement human-readable __repr__ methods for all main classes

Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com>

* improve the __repr__ returns

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com>
Co-authored-by: Thomas Vergauwen <thomas.vergauwen@meteo.be>

* reduce xarray restriction to >=2022.3.0

* review

* black edits

* use tempdir for parquet files in test

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Pratiman <31694629+pratiman-91@users.noreply.github.com>
Co-authored-by: nea-ppatel <patel_pratiman@nea.gov.sg>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

RUN TESTS The main testing workflow is runned if this label is added to a PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Save final dataset to parquet and csv

3 participants