Skip to content

Feat CSV export#112

Merged
abnegate merged 11 commits into
mainfrom
feat-csv-export
Sep 24, 2025
Merged

Feat CSV export#112
abnegate merged 11 commits into
mainfrom
feat-csv-export

Conversation

@abnegate
Copy link
Copy Markdown
Member

@abnegate abnegate commented Sep 15, 2025

Summary by CodeRabbit

  • New Features

    • Added a CSV export destination for row data with configurable columns, optional headers, buffered writing, proper escaping, and final transfer to target storage.
    • Support for database:collection identifiers when exporting databases and tables.
  • Refactor

    • Standardized calls to global PHP functions for consistent resolution; behavior unchanged.
  • Style

    • Standardized constructor parameter formatting.
  • Tests

    • Added comprehensive CSV export tests covering basics, special characters, arrays, nulls, allowed attributes, and import compatibility.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Sep 15, 2025

Walkthrough

Adds a new CSV migration destination that buffers Row resources into a local CSV file (configurable format, optional headers, column filtering, type serialization), transfers the file to the target device on shutdown and attempts cleanup; adjusts sources to call global PHP built-ins and accept database:collection root IDs; adds unit tests and minor formatting tweaks.

Changes

Cohort / File(s) Summary of Changes
New CSV destination
src/Migration/Destinations/CSV.php
Adds CSV destination class implementing buffered local CSV exports for Row resources: constructor with formatting options and filename sanitization, local temporary storage with directory creation, buffered writes (flush ~1MB) using fputcsv (configurable delimiter/enclosure/escape), optional headers, allowed-columns filtering, value serialization for null/booleans/arrays/objects, resource status updates, report(), and shutdown() to transfer the file and cleanup.
CSV destination tests
tests/Migration/Unit/General/CSVTest.php
Adds TestCSV test wrapper (testableImport, getLocalRoot, no-op shutdown) and multiple unit tests covering basic export, escaping of special characters, array/nested handling, null/primitives conversion, allowed-attribute filtering, and import/round-trip compatibility; includes test cleanup helper.
Source: Appwrite — ID parsing & globals
src/Migration/Sources/Appwrite.php
Parse colon-delimited rootResourceId (e.g., database:collection) to extract database/table IDs when present; switch internal calls to global PHP built-ins (\str_contains, \explode, \count) and use \count(...) for end-element/index logic; minor doc/annotation tweaks.
Source: CSV — globals & minor tweaks
src/Migration/Sources/CSV.php
Replace unqualified built-ins with global variants (\explode, \array_push, \count), adjust last-element access to use \count, small spacing and doc wording change.
Row constructor formatting
src/Migration/Resources/Database/Row.php
Normalize whitespace/formatting in the Row constructor signature only; no behavioral or signature change.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Exporter
  participant Source as Source (Appwrite/CSV)
  participant Dest as Destination/CSV
  participant Local as Local Temp Storage
  participant Device as Target Device

  User->>Exporter: start migration
  Exporter->>Source: fetch rows (batches)
  loop per batch
    Source-->>Exporter: rows[]
    Exporter->>Dest: import(rows, callback)
    alt first write
      Dest->>Local: create directory & write headers (if enabled)
    end
    Dest->>Local: append CSV rows (buffered, flush when ~1MB)
    Dest-->>Exporter: callback(resources marked SUCCESS)
  end
  User->>Exporter: finalize
  Exporter->>Dest: shutdown()
  Dest->>Local: flush remaining buffer -> CSV file
  Dest->>Device: transfer CSV file to target device (validate existence/access)
  Dest->>Local: attempt cleanup temp dir (report errors if any)
  Dest-->>Exporter: shutdown complete
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • ArnabChatterjee20k

Poem

I nibble bytes and stack them neat,
Commas, quotes — a tidy treat.
I buffer, flush, and guard each row,
Pack them up and off they go.
A rabbit export — swift and sweet. 🐇📄

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 9.09% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "Feat CSV export" is concise and directly describes the primary change in the changeset—adding CSV export support (new CSV destination, tests, and related source updates). It is specific, short, and clearly related to the modifications shown in the diff summary.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat-csv-export

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/Migration/Sources/CSV.php (2)

455-467: Bug: Undefined variable used in Exception “previous” when transfer fails without throwing.

If transfer returns false without throwing, $e is undefined and will cause a new error. Initialize and pass a captured exception instead.

-        try {
+        $exception = null;
+        try {
             $success = $device->transfer(
                 $filePath,
                 $filePath,
                 new Device\Local('/'),
             );
-        } catch (\Exception $e) {
+        } catch (\Exception $e) {
             $success = false;
+            $exception = $e;
         }
 
         if (!$success) {
-            throw new \Exception('Failed to transfer CSV file from device to local storage.', previous: $e ?? null);
+            throw new \Exception('Failed to transfer CSV file from device to local storage.', previous: $exception);
         }

390-401: Open the downloaded CSV using its local absolute path (don't fopen the original/relative path).

downloadToLocal() transfers the file into new Device\Local('/') and sets $this->downloaded, but it does not update $this->filePath; withCsvStream() still calls fopen($this->filePath), which can open the wrong (relative/remote) path for non-local devices.

  • Fix: persist or return the local absolute path from downloadToLocal() and use that path in withCsvStream() (or at minimum compute a resolvedPath when the source device isn't local, e.g. prepend '/' or use the Local device's getPath/getRoot).
  • Location: src/Migration/Sources/CSV.php — downloadToLocal() and withCsvStream().
🧹 Nitpick comments (6)
src/Migration/Sources/CSV.php (2)

264-270: Minor: Keep global-builtins usage consistent.

Elsewhere you’ve adopted leading backslashes for built-ins. For consistency (and tiny perf win), do the same here.

-                                    Column::TYPE_INTEGER => is_numeric($item) ? (int)$item : null,
-                                    Column::TYPE_FLOAT => is_numeric($item) ? (float)$item : null,
-                                    Column::TYPE_BOOLEAN => filter_var($item, FILTER_VALIDATE_BOOLEAN, FILTER_NULL_ON_FAILURE),
+                                    Column::TYPE_INTEGER => \is_numeric($item) ? (int)$item : null,
+                                    Column::TYPE_FLOAT => \is_numeric($item) ? (float)$item : null,
+                                    Column::TYPE_BOOLEAN => \filter_var($item, FILTER_VALIDATE_BOOLEAN, FILTER_NULL_ON_FAILURE),

199-213: Header validation is good; consider warning on extra columns per row.

Rows with columns not present in the header are silently accepted (they’ll be dropped by array_combine). Optionally, log a warning for mismatched keys to aid CSV debugging.

src/Migration/Destinations/CSV.php (2)

70-149: Buffered writer looks solid; add small guard for empty import set.

If $resources contains no Row instances, nothing is written, and shutdown() throws “No data to export…”. If that’s intentional, ignore. Otherwise, consider downgrading to a warning.


288-308: Consider fputcsv for escaping instead of a bespoke CSV writer.

fputcsv() handles tricky edge cases (embedded quotes/newlines, locale nuances). Not blocking, but lowers maintenance.

I can provide a minimal patch converting toCSV() to use fputcsv() with an in-memory stream.

tests/Migration/Unit/General/CSVTest.php (2)

105-111: Pint: add a space after “function” in closures (single_space_around_construct).

Closures like function($resources) { violate Pint’s rule. Add a space and optional type to quiet static analysis.

-        $csvDestination->testableImport([$row1, $row2], function($resources) {
+        $csvDestination->testableImport([$row1, $row2], function (array $resources): void {
             // Callback - verify resources are marked as successful
             foreach ($resources as $resource) {
                 $this->assertEquals('success', $resource->getStatus());
             }
         });

Repeat for the other callbacks in this file.

Also applies to: 169-170, 214-215, 260-261, 306-307, 360-361


185-190: Nit: comment typo.

The parenthetical seems truncated. Consider rephrasing the comment describing the CSV parsing assertion.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c42935a and 4e3919d.

📒 Files selected for processing (6)
  • src/Migration/Destinations/Appwrite.php (0 hunks)
  • src/Migration/Destinations/CSV.php (1 hunks)
  • src/Migration/Resources/Database/Row.php (1 hunks)
  • src/Migration/Sources/Appwrite.php (5 hunks)
  • src/Migration/Sources/CSV.php (4 hunks)
  • tests/Migration/Unit/General/CSVTest.php (2 hunks)
💤 Files with no reviewable changes (1)
  • src/Migration/Destinations/Appwrite.php
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-30T12:06:02.331Z
Learnt from: abnegate
PR: utopia-php/migration#0
File: :0-0
Timestamp: 2025-07-30T12:06:02.331Z
Learning: In the utopia-php/migration codebase, the Utopia Database package does not have a Memory adapter. When testing classes that require a Database instance (like CSV), use PHPUnit's createMock() method to create proper mocks instead of trying to instantiate real database adapters.

Applied to files:

  • tests/Migration/Unit/General/CSVTest.php
🧬 Code graph analysis (5)
src/Migration/Sources/Appwrite.php (4)
src/Migration/Sources/Appwrite/Reader/API.php (2)
  • queryEqual (221-224)
  • queryLimit (239-242)
src/Migration/Sources/Appwrite/Reader/Database.php (2)
  • queryEqual (369-372)
  • queryLimit (411-414)
src/Migration/Source.php (1)
  • callback (43-46)
src/Migration/Resource.php (1)
  • Resource (5-236)
src/Migration/Resources/Database/Row.php (1)
src/Migration/Resources/Database/Table.php (1)
  • Table (8-110)
src/Migration/Sources/CSV.php (3)
src/Migration/Sources/Appwrite/Reader/Database.php (2)
  • Database (21-426)
  • queryLimit (411-414)
src/Migration/Resources/Database/Table.php (1)
  • Table (8-110)
src/Migration/Sources/Appwrite/Reader.php (1)
  • queryLimit (108-108)
src/Migration/Destinations/CSV.php (4)
src/Migration/Destination.php (1)
  • Destination (5-54)
src/Migration/Resources/Database/Row.php (4)
  • Row (8-99)
  • __construct (16-25)
  • getName (77-80)
  • getData (95-98)
src/Migration/Transfer.php (1)
  • Transfer (5-338)
src/Migration/Cache.php (1)
  • update (126-145)
tests/Migration/Unit/General/CSVTest.php (4)
src/Migration/Destinations/CSV.php (3)
  • CSV (16-309)
  • import (70-149)
  • shutdown (154-184)
src/Migration/Resources/Database/Row.php (1)
  • Row (8-99)
src/Migration/Resources/Database/Table.php (1)
  • Table (8-110)
src/Migration/Destinations/Local.php (1)
  • Local (16-157)
🪛 GitHub Actions: Linter
src/Migration/Resources/Database/Row.php

[error] 1-1: Pint lint failed: braces_position violation (PSR-12) in src/Migration/Resources/Database/Row.php. Run 'vendor/bin/pint' to fix.

tests/Migration/Unit/General/CSVTest.php

[error] 1-1: Pint lint failed: single_space_around_construct rule violation in tests/Migration/Unit/General/CSVTest.php. Run 'vendor/bin/pint' to fix.

🪛 PHPMD (2.15.0)
src/Migration/Destinations/CSV.php

60-60: Avoid unused parameters such as '$resources'. (undefined)

(UnusedFormalParameter)

tests/Migration/Unit/General/CSVTest.php

169-169: Avoid unused parameters such as '$resources'. (undefined)

(UnusedFormalParameter)


214-214: Avoid unused parameters such as '$resources'. (undefined)

(UnusedFormalParameter)


260-260: Avoid unused parameters such as '$resources'. (undefined)

(UnusedFormalParameter)


306-306: Avoid unused parameters such as '$resources'. (undefined)

(UnusedFormalParameter)


360-360: Avoid unused parameters such as '$resources'. (undefined)

(UnusedFormalParameter)

🪛 GitHub Actions: Tests
tests/Migration/Unit/General/CSVTest.php

[error] 116-116: CSV export basic test failed. Expected '/tmp/csv_export_68c796f5ef82b/test_db:test_table_id.csv' to exist.


[error] 176-176: CSV export with special characters test failed. Expected a non-false value; actual was false.


[error] 221-221: CSV export with arrays test failed. Expected a non-false value; actual was false.


[error] 267-267: CSV export with null values test failed. Expected a non-false value; actual was false.


[error] 313-313: CSV export with allowed attributes test failed. Expected a non-false value; actual was false.


[error] 365-365: CSV export/import compatibility test failed. CSV file '/tmp/csv_export_68c796f5f105f/test_db:test_table_id.csv' does not exist.

🔇 Additional comments (2)
src/Migration/Sources/Appwrite.php (2)

712-724: LGTM: Supports “database:table” rootResourceId for databases.

Extracting the database part before applying $id equality is correct and keeps pagination logic intact.

Please add an integration test (or manual check) for:

  • rootResourceType=database with db:table id
  • rootResourceType=table with table id
    to ensure only one page is fetched (due to queryLimit(1)) and loops break as intended.

777-789: LGTM: Table filter for “database:table” and direct table-id paths.

Querying by $id with limit(1) short-circuits the loop correctly when targeting a single table.

Comment thread src/Migration/Destinations/CSV.php
Comment thread src/Migration/Resources/Database/Row.php Outdated
Comment thread tests/Migration/Unit/General/CSVTest.php
Comment thread tests/Migration/Unit/General/CSVTest.php
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
tests/Migration/Unit/General/CSVTest.php (2)

28-33: Expose the actual output path to decouple tests from filename policy

Add a helper to return the exact output path computed by the destination, so tests don’t assume a specific sanitization (e.g., ':' → '_').

 class TestCSV extends DestinationCSV
 {
     public function testableImport(array $resources, callable $callback): void
     {
         $this->import($resources, $callback);
     }
     
     public function getLocalRoot(): string
     {
         return $this->local->getRoot();
     }
 
     // Override shutdown to avoid transfer for testing
     public function shutdown(): void
     {
         // Do nothing for testing - don't transfer files
     }
+
+    public function getOutputPath(): string
+    {
+        return $this->local->getRoot() . '/' . $this->outputFile . '.csv';
+    }
 }

116-117: Use the helper to build expected CSV paths (remove hard‑coded filename assumptions)

Replace manual joins with getOutputPath() so tests remain stable if sanitization rules change.

-        $expectedFile = $csvDestination->getLocalRoot() . '/test_db_test_table_id.csv';
+        $expectedFile = $csvDestination->getOutputPath();
@@
-        $csvFile = $csvDestination->getLocalRoot() . '/test_db_test_table_id.csv';
+        $csvFile = $csvDestination->getOutputPath();
@@
-        $csvFile = $csvDestination->getLocalRoot() . '/test_db_test_table_id.csv';
+        $csvFile = $csvDestination->getOutputPath();
@@
-        $csvFile = $csvDestination->getLocalRoot() . '/test_db_test_table_id.csv';
+        $csvFile = $csvDestination->getOutputPath();
@@
-        $csvFile = $csvDestination->getLocalRoot() . '/test_db_test_table_id.csv';
+        $csvFile = $csvDestination->getOutputPath();
@@
-        $csvFile = $csvDestination->getLocalRoot() . '/test_db_test_table_id.csv';
+        $csvFile = $csvDestination->getOutputPath();

Also applies to: 173-173, 218-218, 264-264, 310-310, 365-366

🧹 Nitpick comments (3)
src/Migration/Destinations/CSV.php (2)

86-88: Open file in binary append mode for cross‑platform safety

Use 'ab' to avoid any platform‑specific text mode transformations.

-                    $handle = \fopen($log, 'a');
+                    $handle = \fopen($log, 'ab');

185-190: Temp dir cleanup is best‑effort; improve reliability

deletePath('') may be a no‑op depending on the storage adapter. At minimum, delete the file explicitly and then attempt to remove the temp directory.

         } finally {
             // Clean up the temporary directory
-            if (!$this->local->deletePath('') || $this->local->exists($this->local->getRoot())) {
-                Console::error('Error cleaning up: ' . $this->local->getRoot());
-            }
+            // Best effort: remove the generated file, then the directory.
+            @\unlink($sourceFilePath);
+            if (\is_dir($this->local->getRoot())) {
+                @\rmdir($this->local->getRoot());
+            }
+            if ($this->local->exists($this->local->getRoot())) {
+                Console::error('Error cleaning up: ' . $this->local->getRoot());
+            }
         }

If the directory can have nested contents in future, consider a small recursive remover here.

tests/Migration/Unit/General/CSVTest.php (1)

170-170: Silence PHPMD “unused parameter” by dropping the unused callback arg

The callbacks don’t use $resources; omit the parameter.

-        $csvDestination->testableImport([$row], function ($resources) {});
+        $csvDestination->testableImport([$row], function (): void {});

Apply to the marked occurrences.

Also applies to: 215-215, 261-261, 307-307, 361-361

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4e3919d and 08c680d.

📒 Files selected for processing (3)
  • src/Migration/Destinations/CSV.php (1 hunks)
  • src/Migration/Resources/Database/Row.php (1 hunks)
  • tests/Migration/Unit/General/CSVTest.php (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/Migration/Resources/Database/Row.php
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-30T12:06:02.331Z
Learnt from: abnegate
PR: utopia-php/migration#0
File: :0-0
Timestamp: 2025-07-30T12:06:02.331Z
Learning: In the utopia-php/migration codebase, the Utopia Database package does not have a Memory adapter. When testing classes that require a Database instance (like CSV), use PHPUnit's createMock() method to create proper mocks instead of trying to instantiate real database adapters.

Applied to files:

  • tests/Migration/Unit/General/CSVTest.php
🧬 Code graph analysis (2)
tests/Migration/Unit/General/CSVTest.php (5)
src/Migration/Destinations/CSV.php (3)
  • CSV (16-291)
  • import (74-156)
  • shutdown (161-191)
src/Migration/Resources/Database/Row.php (1)
  • Row (8-98)
src/Migration/Resources/Database/Table.php (1)
  • Table (8-110)
src/Migration/Destinations/Local.php (1)
  • Local (16-157)
src/Migration/Resource.php (2)
  • setPermissions (199-204)
  • getStatus (163-166)
src/Migration/Destinations/CSV.php (6)
src/Migration/Destination.php (1)
  • Destination (5-54)
src/Migration/Resource.php (4)
  • Resource (5-236)
  • setStatus (168-174)
  • getId (127-130)
  • getPermissions (191-194)
src/Migration/Resources/Database/Row.php (4)
  • Row (8-98)
  • __construct (16-24)
  • getName (76-79)
  • getData (94-97)
src/Migration/Transfer.php (1)
  • Transfer (5-338)
src/Migration/Source.php (1)
  • callback (43-46)
src/Migration/Cache.php (1)
  • update (126-145)
🪛 PHPMD (2.15.0)
tests/Migration/Unit/General/CSVTest.php

170-170: Avoid unused parameters such as '$resources'. (undefined)

(UnusedFormalParameter)


215-215: Avoid unused parameters such as '$resources'. (undefined)

(UnusedFormalParameter)


261-261: Avoid unused parameters such as '$resources'. (undefined)

(UnusedFormalParameter)


307-307: Avoid unused parameters such as '$resources'. (undefined)

(UnusedFormalParameter)


361-361: Avoid unused parameters such as '$resources'. (undefined)

(UnusedFormalParameter)

src/Migration/Destinations/CSV.php

64-64: Avoid unused parameters such as '$resources'. (undefined)

(UnusedFormalParameter)

Comment on lines +16 to +24
class CSV extends Destination
{
protected Device $deviceForMigrations;
protected string $resourceId;
protected string $outputFile;
protected Local $local;

protected array $allowedColumns = [];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Prevent duplicate headers across multiple import() calls

import() can be invoked multiple times (batched). Using a function‑local $csvHeader will re‑emit headers on subsequent calls. Persist this state on the instance.

 class CSV extends Destination
 {
     protected Device $deviceForMigrations;
     protected string $resourceId;
     protected string $outputFile;
     protected Local $local;

     protected array $allowedColumns = [];
+    protected bool $headerWritten = false;
@@
-                // Write headers if this is the first row of the file
-                if (!isset($csvHeader) && $this->includeHeaders) {
+                // Write headers once per file
+                if (!$this->headerWritten && $this->includeHeaders) {
                     $headers = \array_keys($csvData);
                     $buffer['lines'][] = ['type' => 'csv', 'data' => $headers];
                     $buffer['size'] += \strlen(\implode($this->delimiter, $headers)) + 2; // Approximate size
-                    $csvHeader = true;
+                    $this->headerWritten = true;
                 }

Also applies to: 123-129

🤖 Prompt for AI Agents
In src/Migration/Destinations/CSV.php around lines 16 to 24 (and also apply same
change to lines 123-129), the CSV header state is currently a function-local
variable and causes duplicate header rows when import() is called multiple
times; add a boolean or array property on the CSV instance (e.g., protected
?array $writtenHeader = null or protected bool $headerWritten = false) to
persist header emission across calls, update import() to check that instance
property before writing headers and set it after writing, and remove the
function-local $csvHeader so headers are only emitted once per CSV instance.

Comment on lines +269 to +278
protected function convertArrayToCSV(array $value): string
{
if (empty($value)) {
return '';
}
if (isset($value['$id'])) {
return $value['$id'];
}
return \json_encode($value);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

json_encode() may return false → declared string return type will throw; harden JSON encoding

When arrays/objects contain invalid UTF‑8, json_encode returns false. convertArrayToCSV()/convertObjectToCSV() declare string return types → runtime TypeError.

     protected function convertArrayToCSV(array $value): string
     {
         if (empty($value)) {
             return '';
         }
         if (isset($value['$id'])) {
             return $value['$id'];
         }
-        return \json_encode($value);
+        return $this->safeJsonEncode($value);
     }
@@
     protected function convertObjectToCSV($value): string
     {
         if ($value instanceof Row) {
             return $value->getId();
         }
-        return \json_encode($value);
+        return $this->safeJsonEncode($value);
     }
+
+    /**
+     * Robust JSON encoding for CSV fields.
+     */
+    private function safeJsonEncode(mixed $value): string
+    {
+        $json = \json_encode(
+            $value,
+            JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES | JSON_INVALID_UTF8_SUBSTITUTE
+        );
+        return $json === false ? '' : $json;
+    }

Please confirm the minimum supported PHP version includes JSON_INVALID_UTF8_SUBSTITUTE (PHP ≥7.2). If not, I can fall back to a different strategy.

Also applies to: 283-289


🏁 Script executed:

#!/bin/bash
set -euo pipefail
echo "PWD: $(pwd)"
echo
# Find composer.json files
composer_files=$(fd -t f -H composer.json || true)
if [ -z "$composer_files" ]; then
  echo "NO_COMPOSER_JSON"
  exit 0
fi
echo "FOUND_COMPOSER_JSON"
echo "$composer_files"
first=$(echo "$composer_files" | head -n1)
echo "---- BEGIN $first ----"
sed -n '1,240p' "$first" || true
echo "---- END $first ----"
echo
# Try jq extraction
if command -v jq >/dev/null 2>&1; then
  echo "require.php: $(jq -r '.require.php // empty' "$first")"
  echo "config.platform.php: $(jq -r '.config.platform.php // empty' "$first")"
  echo "require-dev php: $(jq -r '.["require-dev"].php // empty' "$first")"
else
  echo "jq not available; grepping for php requirement lines"
  rg -n --hidden --no-ignore '"php"\s*:\s*".+"' "$first" || true
  rg -n --hidden --no-ignore '"platform"\s*:\s*{\s*' "$first" || true
fi

# Search CI for php-version or setup-php
echo
echo "CI files mentioning php version (searching .github and common CI files):"
rg -n --hidden --no-ignore -S 'php-version|setup-php|matrix:.*php|php:' .github || true
rg -n --hidden --no-ignore -S 'php:' Dockerfile* || true
rg -n --hidden --no-ignore -S 'composer config platform.php' || true

# Search repo for explicit minimum php mentions
echo
echo "Search README or docs for 'PHP' version mentions:"
rg -n --hidden --no-ignore -S 'PHP [0-9]\.[0-9]' README* || true

Length of output: 1855


Harden JSON encoding — avoid json_encode returning false → TypeError

Confirmed: composer.json requires PHP >=8.1 (platform "php": "8.1"), so JSON_INVALID_UTF8_SUBSTITUTE is available. Replace json_encode(...) with a robust helper and add the helper below. Applies to src/Migration/Destinations/CSV.php — convertArrayToCSV (≈269–278) and convertObjectToCSV (≈283–289).

     protected function convertArrayToCSV(array $value): string
     {
         if (empty($value)) {
             return '';
         }
         if (isset($value['$id'])) {
             return $value['$id'];
         }
-        return \json_encode($value);
+        return $this->safeJsonEncode($value);
     }
@@
     protected function convertObjectToCSV($value): string
     {
         if ($value instanceof Row) {
             return $value->getId();
         }
-        return \json_encode($value);
+        return $this->safeJsonEncode($value);
     }
+
+    /**
+     * Robust JSON encoding for CSV fields.
+     */
+    private function safeJsonEncode(mixed $value): string
+    {
+        $json = \json_encode(
+            $value,
+            JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES | JSON_INVALID_UTF8_SUBSTITUTE
+        );
+        return $json === false ? '' : $json;
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
protected function convertArrayToCSV(array $value): string
{
if (empty($value)) {
return '';
}
if (isset($value['$id'])) {
return $value['$id'];
}
return \json_encode($value);
}
protected function convertArrayToCSV(array $value): string
{
if (empty($value)) {
return '';
}
if (isset($value['$id'])) {
return $value['$id'];
}
return $this->safeJsonEncode($value);
}
protected function convertObjectToCSV($value): string
{
if ($value instanceof Row) {
return $value->getId();
}
return $this->safeJsonEncode($value);
}
/**
* Robust JSON encoding for CSV fields.
*/
private function safeJsonEncode(mixed $value): string
{
$json = \json_encode(
$value,
JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES | JSON_INVALID_UTF8_SUBSTITUTE
);
return $json === false ? '' : $json;
}
🤖 Prompt for AI Agents
In src/Migration/Destinations/CSV.php around lines 269–278 (and similarly for
convertObjectToCSV around 283–289), replace direct json_encode(...) calls with a
safeJsonEncode helper to avoid json_encode returning false and causing a
TypeError; implement safeJsonEncode(mixed $value): string below the class
methods that calls json_encode($value,
JSON_UNESCAPED_SLASHES|JSON_UNESCAPED_UNICODE|JSON_INVALID_UTF8_SUBSTITUTE)
inside a try/catch (or checks json_last_error()), and if encoding fails return
an empty string (or a safe fallback) instead of allowing false to propagate;
update convertArrayToCSV and convertObjectToCSV to call safeJsonEncode($value)
where json_encode was used.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
src/Migration/Destinations/CSV.php (3)

207-214: Filename sanitization removes ':' — align with tests/parity

Allow ':' on POSIX to preserve database:table traceability (Windows still disallows). If you prefer keeping strict portability, update tests to derive the expected filename from the destination instead.

-        // Replace problematic characters with underscores
-        $sanitized = \preg_replace('/[:\\/<>"|*?]/', '_', $filename);
+        // Replace problematic characters with underscores; keep ':' on POSIX
+        $sanitized = \preg_replace('/[\\/<>"|*?]/', '_', $filename);

Please confirm CI does not run on Windows runners; otherwise switch to an OS‑aware pattern.


16-24: Prevent duplicate headers across multiple import() calls

Persist header state on the instance so headers are written once per file.

 class CSV extends Destination
 {
     protected Device $deviceForMigrations;
     protected string $resourceId;
     protected string $outputFile;
     protected Local $local;
 
     protected array $allowedColumns = [];
+    protected bool $headerWritten = false;
@@
-                // Write headers if this is the first row of the file
-                if (!isset($csvHeader) && $this->includeHeaders) {
+                // Write headers once per file
+                if (!$this->headerWritten && $this->includeHeaders) {
                     $headers = \array_keys($csvData);
-                    $buffer['lines'][] = $headers;
+                    $buffer['lines'][] = $headers;
                     $buffer['size'] += \strlen(\implode($this->delimiter, $headers)) + 2; // Approximate size
-                    $csvHeader = true;
+                    $this->headerWritten = true;
                 }

Also applies to: 121-127


267-276: Harden JSON encoding to avoid json_encode(false) → TypeError

json_encode can return false (invalid UTF‑8). Use a robust helper; PHP ≥8.1 supports JSON_INVALID_UTF8_SUBSTITUTE.

     protected function convertArrayToCSV(array $value): string
     {
         if (empty($value)) {
             return '';
         }
         if (isset($value['$id'])) {
             return $value['$id'];
         }
-        return \json_encode($value);
+        return $this->safeJsonEncode($value);
     }
@@
     protected function convertObjectToCSV($value): string
     {
         if ($value instanceof Row) {
             return $value->getId();
         }
-        return \json_encode($value);
+        return $this->safeJsonEncode($value);
     }
+
+    /**
+     * Robust JSON encoding for CSV fields.
+     */
+    private function safeJsonEncode(mixed $value): string
+    {
+        $json = \json_encode(
+            $value,
+            JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES | JSON_INVALID_UTF8_SUBSTITUTE
+        );
+        return $json === false ? '' : $json;
+    }

Also applies to: 281-287, 289-289

🧹 Nitpick comments (4)
src/Migration/Destinations/CSV.php (4)

1-2: Enable strict types

Add strict types for better type-safety across this new class.

 <?php
+
+declare(strict_types=1);

105-108: Avoid unset($handle) — leads to undefined variable notice in finally

Set to null after fclose; finally block safely checks is_resource($handle).

-                if (isset($handle)) {
-                    \fclose($handle);
-                    unset($handle);
-                }
+                if (isset($handle)) {
+                    \fclose($handle);
+                    $handle = null;
+                }

64-67: PHPMD: Unused parameter in report()

Either use the argument or suppress the warning until implemented.

-    public function report(array $resources = []): array
+    /** @SuppressWarnings(PHPMD.UnusedFormalParameter) */
+    public function report(array $resources = []): array
     {
         return [];
     }

69-74: Docblock lists \JsonException but method doesn’t throw it

Remove the misleading @throws annotation or switch to JSON_THROW_ON_ERROR everywhere (not recommended here).

     /**
      * @param array<Row> $resources
-     * @throws \JsonException
      * @throws \Exception
      */
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 08c680d and 8931241.

📒 Files selected for processing (1)
  • src/Migration/Destinations/CSV.php (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/Migration/Destinations/CSV.php (5)
src/Migration/Destination.php (1)
  • Destination (5-54)
src/Migration/Resource.php (4)
  • Resource (5-236)
  • setStatus (168-174)
  • getId (127-130)
  • getPermissions (191-194)
src/Migration/Resources/Database/Row.php (4)
  • Row (8-98)
  • __construct (16-24)
  • getName (76-79)
  • getData (94-97)
src/Migration/Transfer.php (1)
  • Transfer (5-338)
src/Migration/Cache.php (1)
  • update (126-145)
🪛 PHPMD (2.15.0)
src/Migration/Destinations/CSV.php

64-64: Avoid unused parameters such as '$resources'. (undefined)

(UnusedFormalParameter)

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/Migration/Destinations/CSV.php (2)

165-168: Avoid double slashes when directory is empty

Build the destination path conditionally.

-        $destPath = $this->deviceForFiles->getPath($this->directory . '/' . $filename);
+        $destPath = $this->deviceForFiles->getPath(
+            ($this->directory !== '' ? $this->directory . '/' : '') . $filename
+        );

68-71: Silence PHPMD “UnusedFormalParameter” in report()

Param isn’t used; make it explicit.

     public function report(array $resources = []): array
     {
+        unset($resources);
         return [];
     }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8931241 and 7a9c8be.

📒 Files selected for processing (1)
  • src/Migration/Destinations/CSV.php (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/Migration/Destinations/CSV.php (7)
src/Migration/Destination.php (1)
  • Destination (5-54)
src/Migration/Resource.php (6)
  • Resource (5-236)
  • setStatus (168-174)
  • getId (127-130)
  • getPermissions (191-194)
  • getCreatedAt (209-212)
  • getUpdatedAt (217-220)
src/Migration/Resources/Database/Row.php (4)
  • Row (8-98)
  • __construct (16-24)
  • getName (76-79)
  • getData (94-97)
src/Migration/Transfer.php (1)
  • Transfer (5-338)
src/Migration/Destinations/Local.php (1)
  • Local (16-157)
src/Migration/Source.php (1)
  • callback (43-46)
src/Migration/Cache.php (1)
  • update (126-145)
🪛 PHPMD (2.15.0)
src/Migration/Destinations/CSV.php

68-68: Avoid unused parameters such as '$resources'. (undefined)

(UnusedFormalParameter)

🪛 GitHub Actions: Tests
src/Migration/Destinations/CSV.php

[error] 32-32: ArgumentCountError: Too few arguments to function Utopia\Migration\Destinations\CSV::__construct(), 2 passed in /app/tests/Migration/Unit/General/CSVTest.php on line 84 and at least 4 expected


[error] 32-32: ArgumentCountError: Too few arguments to function Utopia\Migration\Destinations\CSV::__construct(), 2 passed in /app/tests/Migration/Unit/General/CSVTest.php on line 157 and at least 4 expected


[error] 32-32: ArgumentCountError: Too few arguments to function Utopia\Migration\Destinations\CSV::__construct(), 2 passed in /app/tests/Migration/Unit/General/CSVTest.php on line 203 and at least 4 expected


[error] 32-32: ArgumentCountError: Too few arguments to function Utopia\Migration\Destinations\CSV::__construct(), 2 passed in /app/tests/Migration/Unit/General/CSVTest.php on line 248 and at least 4 expected


[error] 32-32: ArgumentCountError: Too few arguments to function Utopia\Migration\Destinations\CSV::__construct(), 2 passed in /app/tests/Migration/Unit/General/CSVTest.php on line 295 and at least 4 expected


[error] 32-32: TypeError: Utopia\Migration\Destinations\CSV::__construct(): Argument #3 ($directory) must be of type string, array given, called in /app/tests/Migration/Unit/General/CSVTest.php on line 295

🔇 Additional comments (5)
src/Migration/Destinations/CSV.php (5)

211-218: Allow ':' in filenames on POSIX or update tests (keep parity with resourceId)

Current sanitization replaces ':', breaking tests expecting names like db:collection.csv.

-        // Replace problematic characters with underscores
-        $sanitized = \preg_replace('/[:\\/<>"|*?]/', '_', $filename);
+        // Replace problematic characters with underscores (keep ':' for POSIX traceability)
+        $sanitized = \preg_replace('/[\\/<>"|*?]/', '_', $filename);

24-25: Persist header emission across multiple import() calls

Headers are re-emitted if import() is called in batches. Persist state on the instance.

Apply:

 protected array $allowedColumns = [];
+protected bool $headerWritten = false;

125-131: Write headers once per file (batch-safe)

Use the instance flag instead of a function-local variable.

-                // Write headers if this is the first row of the file
-                if (!isset($csvHeader) && $this->includeHeaders) {
+                // Write headers once per file
+                if (!$this->headerWritten && $this->includeHeaders) {
                     $headers = \array_keys($csvData);
-                    $buffer['lines'][] = $headers;
+                    $buffer['lines'][] = $headers;
                     $buffer['size'] += \strlen(\implode($this->delimiter, $headers)) + 2; // Approximate size
-                    $csvHeader = true;
+                    $this->headerWritten = true;
                 }

273-294: Harden JSON encoding to avoid TypeError on invalid UTF‑8

json_encode can return false; declared string return types would then throw. Use a robust helper.

     protected function convertArrayToCSV(array $value): string
     {
         if (empty($value)) {
             return '';
         }
         if (isset($value['$id'])) {
             return $value['$id'];
         }
-        return \json_encode($value);
+        return $this->safeJsonEncode($value);
     }
@@
     protected function convertObjectToCSV($value): string
     {
         if ($value instanceof Row) {
             return $value->getId();
         }
-        return \json_encode($value);
+        return $this->safeJsonEncode($value);
     }

Add this helper inside the class:

/**
 * Robust JSON encoding for CSV fields.
 */
private function safeJsonEncode(mixed $value): string
{
    $json = \json_encode(
        $value,
        JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES | JSON_INVALID_UTF8_SUBSTITUTE
    );
    return $json === false ? '' : $json;
}

32-54: Constructor signature breaks tests — make CSV ctor BC-friendly (accept string|array, default filename to resourceId)

Adjust constructor to accept string|array for the 3rd parameter, default filename to resourceId, and handle empty directory to preserve BC and avoid ArgumentCountError/TypeError. Repo search for new CSV( returned only the class declaration at src/Migration/Destinations/CSV.php — no instantiations found; run tests and check external callers.

Apply:

-    public function __construct(
-        Device $deviceForFiles,
-        string $resourceId,
-        string $directory,
-        string $filename,
-        array $allowedColumns = [],
-        private readonly string $delimiter = ',',
-        private readonly string $enclosure = '"',
-        private readonly string $escape = '\\',
-        private readonly bool $includeHeaders = true,
-    ) {
-        $this->deviceForFiles = $deviceForFiles;
-        $this->resourceId = $resourceId;
-        $this->directory = $directory;
-        $this->outputFile = $this->sanitizeFilename($filename);
+    public function __construct(
+        Device $deviceForFiles,
+        string $resourceId,
+        string|array $directoryOrAllowedColumns = '',
+        ?string $filename = null,
+        array $allowedColumns = [],
+        private readonly string $delimiter = ',',
+        private readonly string $enclosure = '"',
+        private readonly string $escape = '\\',
+        private readonly bool $includeHeaders = true,
+    ) {
+        $this->deviceForFiles = $deviceForFiles;
+        $this->resourceId = $resourceId;
+        if (\is_array($directoryOrAllowedColumns)) {
+            // Signature: (Device, resourceId, allowedColumns[, filename])
+            $this->directory = '';
+            $effectiveFilename = $filename ?? $resourceId;
+            $allowedColumns = $directoryOrAllowedColumns;
+        } else {
+            // Signature: (Device, resourceId, directory[, filename, allowedColumns])
+            $this->directory = $directoryOrAllowedColumns;
+            $effectiveFilename = $filename ?? $resourceId;
+        }
+        $this->outputFile = $this->sanitizeFilename($effectiveFilename);
         $this->local = new Local(\sys_get_temp_dir() . '/csv_export_' . uniqid());
         $this->local->setTransferChunkSize(Transfer::STORAGE_MAX_CHUNK_SIZE);
         $this->createDirectory($this->local->getRoot());
 
         foreach ($allowedColumns as $attribute) {
             $this->allowedColumns[$attribute] = true;
         }
     }

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
tests/Migration/Unit/General/CSVTest.php (3)

104-111: Prefer constant over magic string for status.

Use the library’s STATUS_SUCCESS to avoid brittle string coupling.

-            foreach ($resources as $resource) {
-                $this->assertEquals('success', $resource->getStatus());
-            }
+            foreach ($resources as $resource) {
+                $this->assertEquals(\Utopia\Migration\Resource::STATUS_SUCCESS, $resource->getStatus());
+            }

172-172: Silence PHPMD UnusedFormalParameter in test callbacks.

Explicitly discard the parameter.

-        $csvDestination->testableImport([$row], function ($resources) {});
+        $csvDestination->testableImport([$row], function ($resources) { unset($resources); });

(Apply at lines 172, 218, 265, 312, and 368.)

Also applies to: 218-218, 265-265, 312-312, 368-368


79-82: Remove redundant mkdir (Local device creates paths).

Keep the test lean; the Local device constructor ensures directory creation.

         $tempDir = sys_get_temp_dir() . '/csv_test_' . uniqid();
-        mkdir($tempDir, 0755, true);
         $exportDevice = new Local($tempDir);
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7a9c8be and c46083a.

📒 Files selected for processing (1)
  • tests/Migration/Unit/General/CSVTest.php (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-30T12:06:02.331Z
Learnt from: abnegate
PR: utopia-php/migration#0
File: :0-0
Timestamp: 2025-07-30T12:06:02.331Z
Learning: In the utopia-php/migration codebase, the Utopia Database package does not have a Memory adapter. When testing classes that require a Database instance (like CSV), use PHPUnit's createMock() method to create proper mocks instead of trying to instantiate real database adapters.

Applied to files:

  • tests/Migration/Unit/General/CSVTest.php
🧬 Code graph analysis (1)
tests/Migration/Unit/General/CSVTest.php (5)
src/Migration/Destinations/CSV.php (3)
  • CSV (16-295)
  • import (78-158)
  • shutdown (163-193)
src/Migration/Resources/Database/Row.php (1)
  • Row (8-98)
src/Migration/Resources/Database/Table.php (1)
  • Table (8-110)
src/Migration/Destinations/Local.php (1)
  • Local (16-157)
src/Migration/Resource.php (2)
  • setPermissions (199-204)
  • getStatus (163-166)
🪛 PHPMD (2.15.0)
tests/Migration/Unit/General/CSVTest.php

172-172: Avoid unused parameters such as '$resources'. (undefined)

(UnusedFormalParameter)


218-218: Avoid unused parameters such as '$resources'. (undefined)

(UnusedFormalParameter)


265-265: Avoid unused parameters such as '$resources'. (undefined)

(UnusedFormalParameter)


312-312: Avoid unused parameters such as '$resources'. (undefined)

(UnusedFormalParameter)


368-368: Avoid unused parameters such as '$resources'. (undefined)

(UnusedFormalParameter)

🔇 Additional comments (2)
tests/Migration/Unit/General/CSVTest.php (2)

16-33: Expose the canonical output path via a helper on the test wrapper.

Add a getter so tests don’t depend on filename sanitization rules.

 class TestCSV extends DestinationCSV
 {
     public function testableImport(array $resources, callable $callback): void
     {
         $this->import($resources, $callback);
     }
     
     public function getLocalRoot(): string
     {
         return $this->local->getRoot();
     }
+
+    public function getOutputPath(): string
+    {
+        return $this->local->getRoot() . '/' . $this->outputFile . '.csv';
+    }
     
     // Override shutdown to avoid transfer for testing
     public function shutdown(): void
     {
         // Do nothing for testing - don't transfer files
     }
 }

114-116: Use the helper for CSV path to decouple tests from filename policy.

Switch hard-coded path building to the new getter.

-        $expectedFile = $csvDestination->getLocalRoot() . '/test_db_test_table_id.csv';
+        $expectedFile = $csvDestination->getOutputPath();
-        $csvFile = $csvDestination->getLocalRoot() . '/test_db_test_table_id.csv';
+        $csvFile = $csvDestination->getOutputPath();

(Apply the second hunk at lines 175, 221, 268, 315, and 372.)

Also applies to: 175-175, 221-221, 268-268, 315-315, 372-372

Comment on lines +122 to +124
$header = fgetcsv($handle, 0, ',', '"', '"');
$row1Data = fgetcsv($handle, 0, ',', '"', '"');
$row2Data = fgetcsv($handle, 0, ',', '"', '"');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix CSV parsing: wrong escape character passed to fgetcsv.

Writer uses backslash as escape; tests pass the quote character, which can mis-parse quoted fields.

-        $header = fgetcsv($handle, 0, ',', '"', '"');
-        $row1Data = fgetcsv($handle, 0, ',', '"', '"');
-        $row2Data = fgetcsv($handle, 0, ',', '"', '"');
+        $header = fgetcsv($handle, 0, ',', '"', '\\');
+        $row1Data = fgetcsv($handle, 0, ',', '"', '\\');
+        $row2Data = fgetcsv($handle, 0, ',', '"', '\\');
-        $header = fgetcsv($handle, 0, ',', '"', '"');
-        $rowData = fgetcsv($handle, 0, ',', '"', '"');
+        $header = fgetcsv($handle, 0, ',', '"', '\\');
+        $rowData = fgetcsv($handle, 0, ',', '"', '\\');

(Apply to lines 181-182, 227-228, 274-275, 321-322, and 378-379 similarly.)

Also applies to: 181-182, 227-228, 274-275, 321-322, 378-379

🤖 Prompt for AI Agents
In tests/Migration/Unit/General/CSVTest.php around lines 122-124 (and similarly
update 181-182, 227-228, 274-275, 321-322, 378-379), the fgetcsv calls pass the
quote character as the escape parameter causing mis-parsing of quoted fields;
change the escape parameter to the backslash character (i.e. use '\' as the
escape argument) so fgetcsv uses backslash escaping consistent with the writer
and re-run tests.

@abnegate abnegate merged commit 42ff497 into main Sep 24, 2025
4 checks passed
@abnegate abnegate deleted the feat-csv-export branch September 24, 2025 10:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant