Skip to content

Commit

Permalink
adding delimiter_detect function
Browse files Browse the repository at this point in the history
Remove Reader::fetchDelimiterOccurence and replace it with
delimiter_detect function to make the library more SOLID.
  • Loading branch information
nyamsprod committed Jun 22, 2017
1 parent 48cbe36 commit ba0e83c
Show file tree
Hide file tree
Showing 11 changed files with 208 additions and 162 deletions.
6 changes: 4 additions & 2 deletions CHANGELOG.md
Expand Up @@ -10,6 +10,7 @@ All Notable changes to `Csv` will be documented in this file
- `League\Csv\Reader::getRecords` to access all CSV records
- `League\Csv\Statement` provides a constraint builder to select CSV records.
- `League\Csv\ResultSet` represents the result set of the selected CSV records.
- `League\Csv\delimiter_detect` function to detect CSV delimiter character
- Improved CSV document header selection.
- `League\Csv\Reader::getHeader`
- `League\Csv\Reader::getHeaderOffset`
Expand All @@ -24,11 +25,12 @@ All Notable changes to `Csv` will be documented in this file
- `League\Csv\Exception\InsertionException`
- `League\Csv\Exception\InvalidArgumentException`
- `League\Csv\Exception\LogicException`
- `League\Csv\Exception\OutOfRangeException`
- `League\Csv\Exception\RuntimeException`
- Improved CSV document output
- `League\Csv\AbstractCsv::chunk` method to output the CSV document in chunk
- `League\Csv\bom_match` function to detect BOM sequence in a given string
- `League\Csv\BOM` interface to decoupled BOM sequence from CSV documents
- `League\Csv\ByteSequence` interface to decoupled BOM sequence from CSV documents
- Improved CSV records column count consistency on insertion
- `League\Csv\ColumnConsistency`
- Improved CSV document flush mechanism on insertion
Expand Down Expand Up @@ -71,7 +73,6 @@ All Notable changes to `Csv` will be documented in this file
- `League\Csv\AbstractCsv::newWriter`
- `League\Csv\Reader::getNewline`
- `League\Csv\Reader::setNewline`
- `League\Csv\Writer::fetchDelimitersOccurrence`
- The Exception mechanism is improved thus the following class is removed:
- `League\Csv\Exception\InvalidRowException`;
- The CSV records filtering methods are removed in favor of the `League\Csv\Statement` class:
Expand Down Expand Up @@ -100,6 +101,7 @@ All Notable changes to `Csv` will be documented in this file
- `League\Csv\Plugin\ForbiddenNullValuesValidator`
- `League\Csv\Plugin\ColumnConsistencyValidator` *replace by `League\Csv\ColumnConsistency`*
- `League\Csv\Writer` no longers implements the `IteratorAggregate` interface
- `League\Csv\AbstractCsv::fetchDelimitersOccurrence` is removed *replace by `League\Csv\delimiter_detect` function*

## 8.2.1 - 2017-02-22

Expand Down
41 changes: 41 additions & 0 deletions docs/9.0/connections/controls.md
Expand Up @@ -102,3 +102,44 @@ echo $csv->getDelimiter(); //display '|'

<p class="message-warning">The escape character is only inherited starting with <code>PHP 7.0.10</code>.</p>

## Detecting the delimiter character

~~~php
<?php

function League\Csv\delimiter_detect(Reader $csv, array $delimiters, $limit = 1): array
~~~

The `delimiter_detect` function helps detect the possible delimiter character used by the CSV document. This function returns the found occurences of submitted delimiters in a given CSV object.

The function takes three (3) arguments:

* a [Reader](/9.0/reader/) object
* an array containing the delimiters to check;
* an integer which represents the number of CSV records to scan (default to `1`);

and returns an associated array whose keys are the submitted delimiters characters and whose values represents the field numbers found depending on the delimiter value.

~~~php
<?php

use function League\Csv\delimiter_detect;
use League\Csv\Reader;

$reader = Reader::createFromPath('/path/to/file.csv');
$reader->setEnclosure('"');
$reader->setEscape('\\');

$result = delimiter_detect($reader, [' ', '|'], 10);
// $result can be the following
// [
// '|' => 20,
// ' ' => 0,
// ]
// This seems to be a consistent CSV with:
// - 20 fields were counted with the "|" delimiter in the 10 first records;
// - in contrast no field was detected for the " " delimiter;
~~~
<p class="message-info">To detect the delimiters stats on the full CSV document you need to set <code>$limit</code> to <code>-1</code>.</p>
<p class="message-notice"><strong>Notice:</strong> This function only returns hints. Only the CSV providers will validate the real CSV delimiter character.</p>
<p class="message-warning"><strong>Warning:</strong> This function only test the delimiters you gave it.</p>
40 changes: 1 addition & 39 deletions docs/9.0/reader/index.md
Expand Up @@ -13,7 +13,6 @@ class Reader extends AbstractCsv implements Countable, IteratorAggregate
public function count(): int
public function fetchAll(): array
public function fetchColumn(string|int $columnIndex = 0): Generator
public function fetchDelimitersOccurrence(array $delimiters, int $nb_records = 1): array
public function fetchOne(int $offset = 0): array
public function fetchPairs(string|int $offsetIndex = 0, string|int $valueIndex = 1): Generator
public function getHeader(): array
Expand Down Expand Up @@ -329,41 +328,4 @@ $stmt = (new Statement())

$records = $stmt->process($reader);
//$records is a League\Csv\ResultSet object
~~~

## Detecting the delimiter character

This method allow you to find the occurences of some delimiters in a given CSV object.

~~~php
<?php

public Reader::fetchDelimitersOccurrence(array $delimiters, int $nb_records = 1): array
~~~

The method takes two arguments:

* an array containing the delimiters to check;
* an integer which represents the number of CSV records to scan (default to `1`);

~~~php
<?php

use League\Csv\Reader;

$reader = Reader::createFromPath('/path/to/file.csv');
$reader->setEnclosure('"');
$reader->setEscape('\\');

$delimiters_list = $reader->fetchDelimitersOccurrence([' ', '|'], 10);
// $delimiters_list can be the following
// [
// '|' => 20,
// ' ' => 0,
// ]
// This seems to be a consistent CSV with:
// - the delimiter "|" appearing 20 times in the 10 first records
// - the delimiter " " never appearing
~~~

<p class="message-warning"><strong>Warning:</strong> This method only test the delimiters you gave it.</p>
~~~
25 changes: 25 additions & 0 deletions docs/upgrading/9.0.md
Expand Up @@ -327,7 +327,32 @@ $stmt = (new Statement())
$records = $stmt->process($reader, ['firstname', 'lastname', 'email']);
~~~

### Reader::fetchDelimitersOccurrence is removed

The `Reader::fetchDelimitersOccurrence` is removed instead you are required to use the `League\Csv\delimiter_detect` function with a `Reader` object.

Before:

~~~php
<?php

use League\Csv\Reader;

$reader = Reader::createFromPath('/path/to/file.csv');
$stats = $reader->fetchDelimitersOccurrence([',', ';', "\t"], 10);
~~~

After:

~~~php
<?php

use League\Csv\Reader;
use function League\Csv\delimiter_detect;

$reader = Reader::createFromPath('/path/to/file.csv');
$stats = delimiter_detect($reader, [',', ';', "\t"], 10);
~~~

## Miscellanous

Expand Down
58 changes: 2 additions & 56 deletions src/Reader.php
Expand Up @@ -20,7 +20,6 @@
use Iterator;
use IteratorAggregate;
use League\Csv\Exception\RuntimeException;
use LimitIterator;
use SplFileObject;

/**
Expand Down Expand Up @@ -69,59 +68,6 @@ class Reader extends AbstractCsv implements Countable, IteratorAggregate
*/
protected $nb_records = -1;

/**
* Detect Delimiters occurences in the CSV
*
* Returns a associative array where each key represents
* a valid delimiter and each value the number of occurences
*
* @param string[] $delimiters the delimiters to consider
* @param int $nb_records Detection is made using $nb_records of the CSV
*
* @return array
*/
public function fetchDelimitersOccurrence(array $delimiters, int $nb_records = 1): array
{
$filter = function ($value): bool {
return 1 == strlen($value);
};

$nb_records = $this->filterMinRange($nb_records, 1, __METHOD__.'() expects the number of records to consider to be a valid positive integer, %s given');
$delimiters = array_unique(array_filter($delimiters, $filter));
$reducer = function (array $res, string $delimiter) use ($nb_records): array {
$res[$delimiter] = $this->getCellCount($delimiter, $nb_records);

return $res;
};

$res = array_reduce($delimiters, $reducer, []);
arsort($res, SORT_NUMERIC);

return $res;
}

/**
* Returns the cell count for a specified delimiter
* and a specified number of records
*
* @param string $delimiter CSV delimiter
* @param int $nb_records CSV records to consider
*
* @return int
*/
protected function getCellCount(string $delimiter, int $nb_records): int
{
$filter = function ($record): bool {
return is_array($record) && count($record) > 1;
};

$this->document->setFlags(SplFileObject::READ_CSV | SplFileObject::READ_AHEAD | SplFileObject::SKIP_EMPTY);
$this->document->setCsvControl($delimiter, $this->enclosure, $this->escape);
$iterator = new CallbackFilterIterator(new LimitIterator($this->document, 0, $nb_records), $filter);

return count(iterator_to_array($iterator, false), COUNT_RECURSIVE);
}

/**
* Returns the record padding value
*
Expand Down Expand Up @@ -260,7 +206,7 @@ public function getIterator(): Iterator
* filled with null values while extra record fields are strip from
* the returned object.
*
* @param array $header
* @param string[] $header an optional header to use instead of the CSV document header
*
* @return Iterator
*/
Expand Down Expand Up @@ -310,7 +256,7 @@ protected function computeHeader(array $header)
* Add the CSV header if present and valid
*
* @param Iterator $iterator
* @param array $header
* @param string[] $header
*
* @return Iterator
*/
Expand Down
4 changes: 2 additions & 2 deletions src/Statement.php
Expand Up @@ -132,8 +132,8 @@ public function limit(int $limit): self
/**
* Returns the inner CSV Document Iterator object
*
* @param Reader $csv
* @param array $header
* @param Reader $csv
* @param string[] $header an optional header to use instead of the CSV document header
*
* @return ResultSet
*/
Expand Down

0 comments on commit ba0e83c

Please sign in to comment.