Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BOM to outputed file #65

Closed
RomeroMsk opened this issue Jan 15, 2015 · 34 comments
Closed

Add BOM to outputed file #65

RomeroMsk opened this issue Jan 15, 2015 · 34 comments

Comments

@RomeroMsk
Copy link

Hello.
Please add a config option to write BOM (\xEF\xBB\xBF) in the begining of outputed file.
It solves the problem of opening UTF-8 encoded csv-file in MS Excel. Refer to http://stackoverflow.com/questions/4348802/how-can-i-output-a-utf-8-csv-in-php-that-excel-will-read-properly

@nyamsprod
Copy link
Member

Basically what you suggesting is adding the following methods :

League\Csv\AbstractCsv::setOutputBOM($use_bom_on_output); //$use_bom_on_output is a boolean
League\Csv\AbstractCsv::hasOutputBOM(); // returns true or false

Because we don't want any BC break by default:

League\Csv\AbstractCsv::hasOutputBOM(); returns false

To be clear this will only have an effect on the League\Csv\AbstractCsv::output method.
If you have a better names they are welcomed

@RomeroMsk
Copy link
Author

Ok, you can make it as you wish :)

@nyamsprod
Copy link
Member

Since it will work on output.. it will be available on both classes Reader and Writer

@RomeroMsk
Copy link
Author

These names are ok for me.

nyamsprod added a commit that referenced this issue Jan 15, 2015
@nyamsprod
Copy link
Member

The dev-master branch now supports BOM on output you can test it and if everything is good for you it will be release hopefully next week as version 6.3

nyamsprod added a commit that referenced this issue Jan 15, 2015
@RomeroMsk
Copy link
Author

Checked, it works. Thanks a lot!

@Heart1010
Copy link
Contributor

@RomeroMsk Do

$bom = chr(239).chr(187).chr(191);

work for you (which os? It should work on MS Excel Win but not an Mac?)? I've read this as "best solution" for Excel http://stackoverflow.com/questions/155097/microsoft-excel-mangles-diacritics-in-csv-files/1648671#1648671 which uses

/**
   * Export an array as downladable Excel CSV
   * @param array   $header
   * @param array   $data
   * @param string  $filename
   */
  function toCSV($header, $data, $filename) {
    $sep  = "\t";
    $eol  = "\n";
    $csv  =  count($header) ? '"'. implode('"'.$sep.'"', $header).'"'.$eol : '';
    foreach($data as $line) {
      $csv .= '"'. implode('"'.$sep.'"', $line).'"'.$eol;
    }
    $encoded_csv = mb_convert_encoding($csv, 'UTF-16LE', 'UTF-8');
    header('Content-Description: File Transfer');
    header('Content-Type: application/vnd.ms-excel');
    header('Content-Disposition: attachment; filename="'.$filename.'.csv"');
    header('Content-Transfer-Encoding: binary');
    header('Expires: 0');
    header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
    header('Pragma: public');
    header('Content-Length: '. strlen($encoded_csv));
    echo chr(255) . chr(254) . $encoded_csv;
    exit;
  }
PHP can input and output Unicode, but a little different from what Microsoft means: 
when Microsoft says "Unicode", it unexplicitly means little-endian UTF-16 with 
BOM(FF FE = chr(255).chr(254)), whereas PHP's "UTF-16" means big-endian with 
BOM. For this reason, PHP does not seem to be able to output Unicode CSV file for 
Microsoft Excel. Solving this problem is quite simple: just put BOM infront of 
UTF-16LE string.

Example:

$unicode_str_for_Excel = chr(255).chr(254).mb_convert_encoding( $utf8_str, 'UTF-16LE', 'UTF-8');

Another problem is that excel opens the csv in one line...

UTF-16LE solution for CSV for Excel by Eugene Murai works well:
$unicode_str_for_Excel = chr(255).chr(254).mb_convert_encoding( $utf8_str, 'UTF-16LE', 'UTF-8');

However, then Excel on Mac OS X doesn't identify columns properly and its puts each 
whole row in its own cell. In order to fix that, use TAB "t" character as CSV delimiter 
rather than comma or colon.

Which encoding opens CSV files correctly with Excel on both Mac and Windows? see http://stackoverflow.com/questions/6588068/which-encoding-opens-csv-files-correctly-with-excel-on-both-mac-and-windows

=> So using UTF-16LE with BOM and tab delimiter should "the best way" to store the csv file (for MS Excel).

@RomeroMsk
Copy link
Author

It works for me on Windows and doesn't work on Mac. But we decide to leave this problem as is.
I think, you can try to convert encoding of content to UTF-16LE, add BOM and change delimeter - now Writer allows all of this, I belive.

@nyamsprod
Copy link
Member

After more researchs on the BOM issue I realize that to really resolve this issue you need to know the BOM sequence specific to the CSV encoding charset. I think we should change the signature of the setBOMOnOutput method. This method should accept a BOM sequence or a Class constant which refers to the available BOM sequence.
The developer who wants to handle BOM would require to set the proper encoding AND the proper BOM sequence attached to it. This is more verbose but less error prone (Of note I'm using the stream capability as League/Csv default encoding charset is UTF8.)

Here's a example code:

use League\Csv\Reader;

$csv = new Reader::createFromPath('/path/to/my/file.csv');
$csv->setBOM(Reader::BOM_UTF16LE);
$csv->appendStreamFilter('convert.utf16encode');
$csv->output('file.csv');

Thoughts ?
You can look here for more information

@RomeroMsk
Copy link
Author

Looks good for me. Did you try to open UTF-16 encoded file with right BOM sequence in MS Excel on Mac and Win?

@nyamsprod
Copy link
Member

No I did not, first I don't have a Mac ... and I don't use MS Excel at all 🎱 That being said, implementing this way should do it. The only tradeoffs are:

  • You need to know which client is going to open you CSV prior to downloading
  • There's the extra work of providing the stream to convert on the fly your UTF-8 encoded CSV into UTF-16 (if your document is do be opened in MS Excel Mac if I follow correctly)

@RomeroMsk
Copy link
Author

I can ask my collegue to check the file on Mac if you generate it with UTF-16LE + BOM.

nyamsprod added a commit that referenced this issue Jan 16, 2015
@nyamsprod
Copy link
Member

The changes are done. Now the new methods are :

use League\Csv\Reader;

$csv = new Reader::createFromPath('/path/to/my/file.csv');
$bom_sequence = $csv->getBOMOnOutput(); // returns '';
$csv->setBOMOnOutput(Reader::BOM_UTF16LE);
$bom_sequence = $csv->getBOMOnOutput(); // returns "\xFF\xFE"
$csv->appendStreamFilter('convert.utf16encode');
$csv->output('file.csv');

don't forget to implement the stream filter converter otherwise the output will be in UTF-8 which defeat the BOM sequence addition. you can use the FilterTranscode class from the example for instance to do so.

@RomeroMsk
Copy link
Author

Sorry, out of my working place now. Can check changes only after weekend.

@nyamsprod
Copy link
Member

no problem I can wait 👍

@RomeroMsk
Copy link
Author

Hello. I can't make Writer to use filter. Can you help me?
I've got FilterTranscode example class and changed only namespace. Then I tried to use it like this:

    stream_filter_register(FilterTranscode::FILTER_NAME . '*', '\common\components\FilterTranscode');
    $writer = Writer::createFromFileObject(new \SplTempFileObject);
    $writer->appendStreamFilter('convert.transcode.UTF-8:UTF-16LE');
    $writer->setBOMOnOutput(Writer::BOM_UTF16_LE);
    $writer->insertOne($content);
    $writer->output($fileName);

and got The stream filter API can not be used LogicException. What am I doing wrong?

@nyamsprod
Copy link
Member

Streams do not work with createFromFileObject because SplFileObject poorly supports PHP stream
you can make it work using:

$writer = Writer::createFromPath('/my/path/to/my/file.csv');
//or
$writer = Writer::createFromPath('php://output'); 

@RomeroMsk
Copy link
Author

Now I'm getting Cannot rewind file php://filter/write=convert.transcode.UTF-8:UTF-16LE/resource=php://output

@nyamsprod
Copy link
Member

Here's what I did to test on my computer the file is located in the examples directory.

error_reporting(-1);
ini_set('display_errors', '1');

use League\Csv\Reader;
use League\Csv\Writer;
use lib\FilterTranscode;

require '../vendor/autoload.php';

stream_filter_register(FilterTranscode::FILTER_NAME."*", "\lib\FilterTranscode");

$csv = Reader::createFromPath(__DIR__.'/data/prenoms.csv');
$csv->setBOMOnOutput(Reader::BOM_UTF16_LE);
$csv->appendStreamFilter(FilterTranscode::FILTER_NAME."UTF-8:UTF-16LE");
$csv->output('test.csv');

Hope it helps

@RomeroMsk
Copy link
Author

Ok, I've got a result with $writer = Writer::createFromPath('/my/path/to/my/file.csv').
MS Excel on Windows didn't recognize a separators in CSV (just loaded entire lines), but with right encoded symbols.
Waiting for collegue to test it on Mac...

@nyamsprod
Copy link
Member

The delimiter can be adjust using setDelimiter method

@RomeroMsk
Copy link
Author

Yes, I know. But after manipulations with encoding MS Excel is not recognizing delimiters correctly.
With UTF-8 everything is ok, but with UTF-16LE - not.
The same situation is on Mac.

@nyamsprod
Copy link
Member

it's because the tabulation is still in UTF-8 👎 you should use the UTF-16 tabulation character instead

@RomeroMsk
Copy link
Author

But I'm using ; as delimiter, beacuse MS Excel is awaiting it in CSV files.

@Heart1010
Copy link
Contributor

...see my comment above. MS Excel "wants" tab as delimiter.

@RomeroMsk
Copy link
Author

Ok, with setDelimiter("\t") it works.

@nyamsprod
Copy link
Member

Great!!

I've added a method to detect the BOM in the Input CSV so we will end up having:

  • getBOMOnInput (detect and return the BOM sequence used on the input CSV if any)
  • setBOMOnOutput (set the BOM sequence to prepend on output methods (ie: output and __toString)
  • getBOMOnOutput (get the BOM sequence that will be prepended on output methods)

Removing the Input BOM is already possible with the existing extract methods.

When this is stable and bug free I think we will have a complete new and nice feature.

Thoughts ?

@RomeroMsk
Copy link
Author

For me sounds good, thanks for your help!

@Heart1010
Copy link
Contributor

Yes 👍

...and make a section for that infos -also regarding MS Excel- in your documentation page would be helpful imho :-)!

@nyamsprod
Copy link
Member

I think @RomeroMsk could write that section, since like I said I'm no MS Excel expert :) . Let me add everything in the source and we will update the documentation when all is done and stable

@RomeroMsk
Copy link
Author

I'm not an expert too :) Also my English is not very good.

@nyamsprod
Copy link
Member

http://csv.thephpleague.com/bom/ <- the documentation you can improve it directly via pull request if needed.

@nyamsprod
Copy link
Member

the stable release version 6.3 was released with the stable feature. Of note the methods names have been changed.

@RomeroMsk
Copy link
Author

Checked 6.3, everything is fine, thanks! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants