-
-
Notifications
You must be signed in to change notification settings - Fork 342
Decoupling Query from Reader #178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks! Using setInputEncoding that way looks much more comfortable, and the immutable query object is neat. One potential issue I see with the cell-oriented transcoding method (which isn't introduced by this PR but wasn't used by everything before) is that it does splitting by ascii bytes first, and not all encodings are ascii compatible. If you have a UTF-16 encoded document CSV with 㬬 (U+3B2C) in it, the csv splitter see two bytes So the stream filters should be used whenever possible. Other API ideas:
So you could go from: $query = (new Query())
->addFilter($filter)
->setOffset(10)
->setLimit(3);
$records = $csv->getRecords($query);To: $records = $csv->query()
->addFilter($filter)
->setOffset(10)
->setLimit(3)
->get(); |
The latest commit to the PR just does that 👍 . It uses stream whenever possible or else fallback to cell by cell conversion.
As you mentioned it would be partial compatibility and too much magic IMHO
The goal is to have the Query being totally decoupled from the Csv so that if I have to treat 100 files at once I would to something like this use League\Csv\Query;
use League\Csv\Reader;
use League\Csv\Records;
$query = (new Query())
->setOffset(3)
->setLimit(12)
->addFilter($callable)
;
foreach ($csv_list as $csv) {
$data = $csv->getRecords($query)->fetchAssoc();
....
}
Again this goes against the decoupling. And looking at your chaining one could easily loose what is returned when ? What the new API provide is an alternative like this: use League\Csv\Query;
use League\Csv\Reader;
$query = (new Query())
->setOffset(3)
->setLimit(12)
->addFilter($callable)
;
$csv = new Reader::CreateFromPath('/path/to/file.csv')
->setDelimiter(";")
->setInputEncoding("ISO-8859-1")
;
$records = new Records($csv, $query);And yes |
|
@nyamsprod this looks like a nice improvement! One thing that feels a little out of place, is the |
|
@frankdejonge the other option would maybe be 'Filter', I don't know... naming thing is hard :) |
|
@nyamsprod it seems more like a specification of some sort... right? |
|
And yeah, naming things is really hard. |
|
@frankdejonge @dequis @cordoval how about renaming the classes as follow
Which would gives the following result: <?php
use League\Csv\Reader;
use League\Csv\Statement;
$csv = Reader::createFromPath('/path/to/your/iso8859-1.csv');
$csv->setInputEncoding('iso-8859-1');
$csv->setHeader(0); //specify the record to be use as header - can be an array too
$stmt = (new Statement())
->addFilter($filter)
->setOffset(10)
->setLimit(3);
$records = $csv->select($stmt);
$records = $stmt->process($csv); //or to enable the use of the same statement on different CSV objects
foreach ($records as $row) {
$row; //converted into UTF-8
}
$res = $records->fetchAll(); //converted into UTF-8 with the header fields combined
$col2 = $records->fetchColumn(2); //converted into UTF-8
$xml = $records->toXML(); //converted into UTF-8 |
By decoupling the extract features from the reader we can now: - Apply the same query to multiple CSV files without redefining them each time - Apply auto-conversion on any CSV without stream filter usage (see issue #177) - Auto remove the BOM sequence if present - Set the CSV header more easily - Selecting records can now be done on a League\Csv\Writer object too This is a major BC break so it is schedule for the next major version - Dropping support for PHP 5.5 - Updating PHPUnit to PHPUnit 5.5 - Require iconv extension
|
@dequis @frankdejonge I've reconsidered your idea of using |
- Adding getter in the Statement object
- RecordSet constructor expects a Reader and a Statement objects
- To have on consistent select/recordset behavior the stream filter
mode can no longer be changed:
- STREAM_FILTER_READ for the Reader class
- STREAM_FILTER_WRITE for the Writer class
- The stream filter API is simplified
- Conversion and filtering of the Reader::getIterator is no longer done in
the Statement class but in the RecordSet class.
fdf226a to
ef7eef7
Compare
|
This PR is closed in favor of PR #210 |
Introduction
Using this library to extract data from multiple CSV at once is complicated with the fact that the query features are coupled to the Reader class.
Proposal
Describe the new feature
This proposal introduces:
Queryclass to extract some records from the CSV data. This is an immutable value object.Recordsclass to manipulate the resulting records.before:
after:
Writerobject directly.setInputEncodingwill now have an effect when extracting records from the CSVsetInputEncodingwill use stream filter if possible or will fallback to usingmb_*to convert the found records on the fly.To ease creating a
Recordsclass, a new method on theAbstractCsv::getRecords(Query $query = null)is added to instantiate aRecordsclass from a Csv object.Backward Incompatible Changes
Readerclass are now attached to theQueryclass.Readerclass are now attached to theRecordsclass.AbstractCsvclass are now attached to theRecordsclass.each;fetch;fetchPairsWithoutDuplicates;callableparameter from the extract methods is removedTargeted release version
version 9.X
Open issues
composer.jsontheiconvextension, to normalize how data conversion is done with/without stream filter ?AbstractCsv::getRecordsis a proposed named, but it could be renamedAbstractCsv::query?__callto partially keep the current stable API. We could still access the extract method and would help migration from 8.x version to 9.x version ?