Add to_parquet and to_csv methods for Dataset and Station classes#556
Merged
vergauwenthomas merged 10 commits intodevfrom Sep 2, 2025
Merged
Add to_parquet and to_csv methods for Dataset and Station classes#556vergauwenthomas merged 10 commits intodevfrom
vergauwenthomas merged 10 commits intodevfrom
Conversation
Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com>
Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com>
Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Save final dataset to parquet and csv
Add to_parquet and to_csv methods for Dataset and Station classes
Sep 2, 2025
vergauwenthomas
approved these changes
Sep 2, 2025
Contributor
There was a problem hiding this comment.
Pull Request Overview
This PR adds data export functionality to the MetObs-toolkit by implementing to_parquet and to_csv methods for both Dataset and Station classes. These methods enable users to export processed meteorological observations with quality control flags to standard data formats.
Key changes:
- Added export methods that leverage pandas DataFrame serialization capabilities
- Implemented comprehensive test coverage with round-trip validation
- Added pyarrow dependency for parquet support
Reviewed Changes
Copilot reviewed 6 out of 8 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
tests/test_importing.py |
Added four comprehensive test methods validating round-trip export/import for both formats and classes |
src/metobs_toolkit/station.py |
Implemented to_parquet and to_csv methods for Station class with proper logging and documentation |
src/metobs_toolkit/dataset.py |
Implemented to_parquet and to_csv methods for Dataset class with proper logging and documentation |
pyproject.toml |
Added pyarrow dependency for parquet format support |
docs/reference/station.rst |
Updated API documentation to include new export methods |
docs/reference/dataset.rst |
Updated API documentation to include new export methods |
vergauwenthomas
added a commit
that referenced
this pull request
Sep 5, 2025
* Modeltimeseries unit conv (#545) * functionality for unit conversion in creation of modeltimeseries * import at top * update docstring of Modeltimeseries * rename obstype to modelobstype for modeltimeseries * black edits * rename obstype attr * typo * fixing bugs * Update the exaple to illustrate how units are converted * Update tests and fixed bugs * code review fixes * black edits * Update src/metobs_toolkit/modeltimeseries.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * minor version bump * Add to_parquet and to_csv methods for Dataset and Station classes (#556) * Initial plan * Implement to_parquet and to_csv methods for Dataset and Station classes Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com> * Update documentation for to_parquet and to_csv methods Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com> * Apply black formatting to tests and validate implementation Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com> * fix the parquet implementation * update the introduction example * new subsection in api docs * update docstrings * fix formatting the csv test * black edits --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com> Co-authored-by: Thomas Vergauwen <thomas.vergauwen@meteo.be> * Update README.md to include conda install and badge (#555) * Update README.md to include conda install and badge * flags before package name convention * update the docs to include conda install description --------- Co-authored-by: Thomas Vergauwen <thomas.vergauwen@meteo.be> * Implement CF-compliant netCDF serialization for xarray Datasets with nested attributes (#558) * Initial plan * Implement core netCDF serialization functionality with CF compliance Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com> * Complete netCDF serialization implementation with documentation and examples Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com> * version in seperate file to be accesible by other methods * fix version in CI * fix serializable cr datasets * update docstrings * add label conversion in the example * fix bugs and xr tests * fix version test * fix the to_netcdf methods * add to netcdf tests in the xr testing module * code review style fixes * black edits * add to netcdf mode in the xarray topic * to_netcdf in the introduction example * sync version with pyproject * bugfix * fix merge issue with version * add netcdf4 --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com> Co-authored-by: Thomas Vergauwen <thomas.vergauwen@meteo.be> * Parquet reader (#557) * Adding tests and Parquet reader * Adding parquet test data * trigger tests * fixed mismatch in tz + tests + docstring * not intended for commit * replace csvfilereader by the find_suitable_reader function * update notebook * bugfix * black edits * fix failing test * fix tests --------- Co-authored-by: nea-ppatel <patel_pratiman@nea.gov.sg> Co-authored-by: Thomas Vergauwen <thomas.vergauwen@meteo.be> * Qc on dataset error handling (#560) * drop faulty files * raise warnings if target station does not hold target obstype when qc on dataset level * write test * rename func * black edits * use tmp module in tests * Fix NaTType error in frequency estimation for empty variable lists (#562) * Initial plan * Fix NaTType error in frequency estimation for empty timestamps Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com> * fix issue by selecting a minimum of 1 non-na values * Delete tests/test_timestampmatcher.py * test with nans * black edits * use warning instead of logging --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com> Co-authored-by: Thomas Vergauwen <thomas.vergauwen@meteo.be> * Standardize warning formatting by converting operational warnings to logging (#565) * Initial plan * Replace warnings.warn with logger.warning for operational warnings Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com> * Update test to check for logging instead of warnings Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com> * fix warnings bug * Check stations for obstype when GF is called on Dataset (#566) * add filter_to_stations_with_target_obstype to GF methods on Dataset * black edits * black edits * Implement human-readable __repr__ methods for all main classes (#568) * Initial plan * Implement human-readable __repr__ methods for all main classes Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com> * improve the __repr__ returns --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: vergauwenthomas <82087298+vergauwenthomas@users.noreply.github.com> Co-authored-by: Thomas Vergauwen <thomas.vergauwen@meteo.be> * reduce xarray restriction to >=2022.3.0 * review * black edits * use tempdir for parquet files in test --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: Pratiman <31694629+pratiman-91@users.noreply.github.com> Co-authored-by: nea-ppatel <patel_pratiman@nea.gov.sg>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR implements new data export methods for the MetObs-toolkit, allowing users to save processed observations with QC flags to parquet and CSV formats.
Added Methods
Dataset Class
to_parquet(target_file, **kwargs)- Export multi-station observations to parquet formatto_csv(target_file, **kwargs)- Export multi-station observations to CSV formatStation Class
to_parquet(target_file, **kwargs)- Export single station observations to parquet formatto_csv(target_file, **kwargs)- Export single station observations to CSV formatImplementation Details
All methods follow the established codebase patterns:
@log_entrydecorator for consistent logging.dfproperty (includes processed data with QC labels)**kwargsto pass additional options to the underlying pandas methodsUsage Examples
Testing and Documentation
test_importing.pyfollowing the existingsave_dataset_to_pkltest patternThe implementation enables users to easily export their processed meteorological observations for further analysis or sharing while preserving all quality control information.
Fixes #553.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.