Skip to content

Conversation

@nyamsprod
Copy link
Member

@nyamsprod nyamsprod commented Feb 1, 2017

Introduction

This PR introduces the League\Csv v9 public API. This version goals is to make the package more SOLID.

Current API design issues:

The Reader class

When extracting data from a CSV document you are required to issue multiple times the same queries because the query/filtering methods are coupled directly to the Reader class. In doing so we create

  • a verbose code for no reason
  • we slow down the script since we are required each time to apply the same filtering again and again.

The Writer class

  • Writer::insertOne does nothing on insertion error. Currently the errors are silently ignored.
  • Writer::insertOne accepts array and string. If a string is passed it will be converted into a array using str_getcsv. This behaviour is problematic at best. As it may lead to incorrect insertion.
  • The Writer class does many things it should not be able to do regarding reading a CSV.

Improve Stream filtering

in the v8 line we introduced support for PHP's stream. In v9, we can fully take advantage of that to:

  • improve stream support on the Writer class
  • simplify stream filtering API

Proposal

This PR:

  • Introduces two new classes by decoupling the Reader class into :

    • a Statement class to ease selecting records from the CSV data. This class is immutable.
    • a RecordSet class to manipulate the resulting records.
  • Improves the Reader class which can now properly handle the CSV header and strip automatically the BOM sequence. Because of the decoupling, extracting methods and filtering methods are also improved and simplified.

  • Improves the Writer class by:

    • improving error handling on records insertion
    • restricting insertOne accepted value
    • improving stream filter API (which is now fully supported / on par with the Reader class)

the AbstractCsv public API

public static AbstractCsv::createFromFileObject(SplFileObject $file): self;
public static AbstractCsv::createFromStream($stream): self;
public static AbstractCsv::createFromString(string $str): self;
public static AbstractCsv::createFromPath(string $path, string $open_mode = 'r+'): self;
public AbstractCsv::__toString(): string;
public AbstractCsv::output(string $filename = null): int;
public AbstractCsv::getDelimiter(): string;
public AbstractCsv::getEnclosure(): string;
public AbstractCsv::getEscape(): string;
public AbstractCsv::getOutputBOM(): string;
public AbstractCsv::getInputBOM(): string;
public AbstractCsv::hasStreamFilter(string $filter_name): bool;
public AbstractCsv::isStream(): bool;
public AbstractCsv::setDelimiter(string $delimiter): self;
public AbstractCsv::setEnclosure(string $enclosure): self;
public AbstractCsv::setEscape(string $escape): self;
public AbstractCsv::setOutputBOM(string $str): self;
public AbstractCsv::addStreamFilter(string $filter_name): self;

the Writer public API extends the AbstractCsv public API

public Writer::getNewline(): string
public Writer::getFlushThreshold(): int|null
public Writer::setNewline(string $newline): self
public Writer::setFlushThreshold($threshold): self
public Writer::addFormatter(callable $formatter): self
public Writer::addValidator(callable $validator, string $name): self
public Writer::insertAll(iterable $records_set): int
public Writer::insertOne(array $record): int

the Reader public API extends the AbstractCsv public API

Implements the IteratorAggregate interface

public Reader::getHeader(): array
public Reader::getHeaderOffset(): int|null
public Reader::fetchDelimitersOccurence(array $delimiters, int $nb_rows = 1): array
public Reader::setHeaderOffset(int|null $offset): self
public Reader::select(Statement $stmt = null): RecordSet

The Statement public API

public Statement::columns(array $columns): self
public Statement::where(callable $callable): self
public Statement::orderBy(callable $callable): self
public Statement::offset(int $offset): self
public Statement::limit(int $limit): self
public Statement::process(Reader $csv): RecordSet

The RecordSet public API

Implements the following interface Countable, IteratorAggregate, JsonSerialize

public RecordSet::getColumnNames(): array
public RecordSet::getColumnName(int $offset): string
public RecordSet::toHTML(string $class_attr = 'table-csv-data', string $offset_attr= 'data-record-offset'): string
public RecordSet::toXML(string $root_name = 'csv', string $row_name = 'row', string $cell_name = 'cell', string $column_attr = 'name', string $offset_attr = 'offset'): DOMDocument
public RecordSet::fetchAll(): array
public RecordSet::fetchOne(int $offset = 0): array
public RecordSet::fetchColumn(string|int $column_index = 0): Generator
public RecordSet::fetchPairs(string|int $offset_index = 0, string|int $value_index = 1): Generator
public RecordSet::setConversionInputEncoding(string $input_encoding): self
public RecordSet::preserveOffset(bool $status): self

before:

<?php

use League\Csv\Reader;

$csv = Reader::createFromPath('/path/to/your/iso8859-1.csv');
$csv->setInputEncoding('iso-8859-1');

$filter = function (array $row) {
    return filter_var($row[2], FILTER_VALIDATE_EMAIL);
};

$csv->addFilter($filter)
    ->setOffset(10)
    ->setLimit(3);
foreach ($csv->fetchAssoc() as $row) {
    $row; //no conversion - row keys contains the header value from the first row
}

$csv->addFilter($filter)
    ->setOffset(10)
    ->setLimit(3);
$res = $csv->fetchAll();  //no conversion - row keys are numeric

$csv->addFilter($filter)
    ->setOffset(10)
    ->setLimit(3);
$col2 = $csv->fetchColumn(2);   //no conversion

$csv->addFilter($filter)
    ->setOffset(10)
    ->setLimit(3);

$xml = json_encode($records, JSON_PRETTY_PRINT);  //converted into UTF-8

after:

<?php

use League\Csv\Statement;
use League\Csv\Reader;

$csv = Reader::createFromPath('/path/to/your/iso8859-1.csv');
$csv->setHeaderOffset(0); //specify the header to be used

$filter = function (array $row) {
    return filter_var($row['email'], FILTER_VALIDATE_EMAIL);
};

$stmt = (new Statement())
    ->where($filter)
    ->offset(10)
    ->limit(3)
;

$records = $csv->select($stmt);
$res = $records->fetchAll();  //no conversion - row keys contains the header value from the first row
$col2 = $records->fetchColumn('email');   //no conversion 
$col2bis = $records->fetchColumn(2);      //no conversion $col2bis contains the same value as $col2

$records->setConversionInputEncoding('iso-8859-1');
$json = json_encode($records, JSON_PRETTY_PRINT);  //converted into UTF-8 - row keys contains the header value from the first row

Backward Incompatible Changes

  • The Write class no longer implements the following interfaces JsonSerializable, IteratorAggregate
  • Removed methods from the AbstractCsv.
    • newWriter, newReader.
    • All query filtering methods (attached to the Statement class).
    • fetchDelimitersOccurrence (attached to the Reader class).
    • setNewline, getNewline (attached to the Writer class).
    • setInputEncoding (renamed setConversionInputEncoding and attached to the RecordSet class).
  • All extracting methods from the Reader class (attached to the RecordSet class).
  • The optional callable parameter from the extract methods is removed
  • Already specified stream filters are no longer manipulable.
  • Writer::insert* methods now throws exception on error and returns the number of bytes added.

Targeted release version

version 9.0.0

Open issues

Since this is a major release some BC breaks are introduced. Some of them can be kept to preserve some of the current behavior:

  • All extracting methods have been removed from the Reader class with the introduction of the new Reader::select method. Question: Should we use aliasing to keep them and mark them as deprecated for v9 and completely remove them for the next version or not ?

- The Statement and the RecordSet class are introduced
to improve Reader usage and behaviour

- The Stream filtering is now supported for every object
except when created from SplFileObject.

- Stream Filtering is simplified and improve for Writer class
- AbstractCsv::setFlushThreshold is added to improve Writing speed
- StreamIterator::fflush is added to improve Writing speed
- StreamIterator::__clone to forbid the object cloning
- StreamTrait improve
- Statement is made immutable
- Removed newReader and newWriter methods
- newLine and flushtreshold are only available on the Writer class
- fetchDelimitersOccurence is only available on the Reader class
@nyamsprod nyamsprod changed the title Decouple Statement and RecordSet from Reader Implementing 9.0 Public API Feb 3, 2017
@nyamsprod nyamsprod force-pushed the feature/decouple-reader branch from 3391dc5 to bbfadce Compare February 6, 2017 12:22
- Replacing InvalidRowException with a more generic exception InsertinException
- insertOne and insertAll returns int
You can no longer remove are clear the stream filters
It is done automatically when the class is destruct
- Remove StreamTrait
- Bugfix header from AbstractCsv::output to be HTTP/2 compliant
@nyamsprod nyamsprod force-pushed the feature/decouple-reader branch from 386e466 to 20b5e8d Compare February 8, 2017 15:45
- Adding RecordFormatterInterface
- Adding RecordValidatorInterface
- Adding CallableValidatorAdapter to reduce BC breakage
- Adding CallableFormatterAdapter to reduce BC breakage
- Removing the Config subnamespace
@nyamsprod nyamsprod force-pushed the feature/decouple-reader branch from 20b5e8d to 9987799 Compare February 9, 2017 09:10
@nyamsprod nyamsprod force-pushed the feature/decouple-reader branch from 3249cfd to 44e2097 Compare February 12, 2017 21:55
- Remove RecordFormatterInterface
- Remove RecordValidatorInterface

restore callable usage instead
Adding a fourth parameter to indicate the header attribute name
added to each record field when the header has been given to the CSV Document
Removed the header method and replaced it with the better columns method
Adding a method preserveOffset to format output so that they
contain or not the CSV document record offset if needed. By default
the CSV document offset is not given.
@nyamsprod nyamsprod force-pushed the feature/decouple-reader branch 2 times, most recently from 20d0eda to e3a5819 Compare February 20, 2017 21:35
@nyamsprod nyamsprod force-pushed the feature/decouple-reader branch from e3a5819 to e78eb9f Compare February 20, 2017 21:41
@nyamsprod nyamsprod merged commit 87bc1aa into master Feb 27, 2017
@nyamsprod nyamsprod deleted the feature/decouple-reader branch February 27, 2017 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants