-
-
Notifications
You must be signed in to change notification settings - Fork 335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BOM to outputed file #65
Comments
Basically what you suggesting is adding the following methods : League\Csv\AbstractCsv::setOutputBOM($use_bom_on_output); //$use_bom_on_output is a boolean
League\Csv\AbstractCsv::hasOutputBOM(); // returns true or false Because we don't want any BC break by default: League\Csv\AbstractCsv::hasOutputBOM(); returns false To be clear this will only have an effect on the |
Ok, you can make it as you wish :) |
Since it will work on output.. it will be available on both classes |
These names are ok for me. |
The |
Checked, it works. Thanks a lot! |
@RomeroMsk Do $bom = chr(239).chr(187).chr(191); work for you (which os? It should work on MS Excel Win but not an Mac?)? I've read this as "best solution" for Excel http://stackoverflow.com/questions/155097/microsoft-excel-mangles-diacritics-in-csv-files/1648671#1648671 which uses /**
* Export an array as downladable Excel CSV
* @param array $header
* @param array $data
* @param string $filename
*/
function toCSV($header, $data, $filename) {
$sep = "\t";
$eol = "\n";
$csv = count($header) ? '"'. implode('"'.$sep.'"', $header).'"'.$eol : '';
foreach($data as $line) {
$csv .= '"'. implode('"'.$sep.'"', $line).'"'.$eol;
}
$encoded_csv = mb_convert_encoding($csv, 'UTF-16LE', 'UTF-8');
header('Content-Description: File Transfer');
header('Content-Type: application/vnd.ms-excel');
header('Content-Disposition: attachment; filename="'.$filename.'.csv"');
header('Content-Transfer-Encoding: binary');
header('Expires: 0');
header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
header('Pragma: public');
header('Content-Length: '. strlen($encoded_csv));
echo chr(255) . chr(254) . $encoded_csv;
exit;
}
Another problem is that excel opens the csv in one line...
Which encoding opens CSV files correctly with Excel on both Mac and Windows? see http://stackoverflow.com/questions/6588068/which-encoding-opens-csv-files-correctly-with-excel-on-both-mac-and-windows => So using UTF-16LE with BOM and tab delimiter should "the best way" to store the csv file (for MS Excel). |
It works for me on Windows and doesn't work on Mac. But we decide to leave this problem as is. |
After more researchs on the BOM issue I realize that to really resolve this issue you need to know the BOM sequence specific to the CSV encoding charset. I think we should change the signature of the setBOMOnOutput method. This method should accept a BOM sequence or a Class constant which refers to the available BOM sequence. Here's a example code: use League\Csv\Reader;
$csv = new Reader::createFromPath('/path/to/my/file.csv');
$csv->setBOM(Reader::BOM_UTF16LE);
$csv->appendStreamFilter('convert.utf16encode');
$csv->output('file.csv'); Thoughts ? |
Looks good for me. Did you try to open UTF-16 encoded file with right BOM sequence in MS Excel on Mac and Win? |
No I did not, first I don't have a Mac ... and I don't use MS Excel at all 🎱 That being said, implementing this way should do it. The only tradeoffs are:
|
I can ask my collegue to check the file on Mac if you generate it with UTF-16LE + BOM. |
The changes are done. Now the new methods are : use League\Csv\Reader;
$csv = new Reader::createFromPath('/path/to/my/file.csv');
$bom_sequence = $csv->getBOMOnOutput(); // returns '';
$csv->setBOMOnOutput(Reader::BOM_UTF16LE);
$bom_sequence = $csv->getBOMOnOutput(); // returns "\xFF\xFE"
$csv->appendStreamFilter('convert.utf16encode');
$csv->output('file.csv'); don't forget to implement the stream filter converter otherwise the output will be in UTF-8 which defeat the BOM sequence addition. you can use the FilterTranscode class from the example for instance to do so. |
Sorry, out of my working place now. Can check changes only after weekend. |
no problem I can wait 👍 |
Hello. I can't make Writer to use filter. Can you help me? stream_filter_register(FilterTranscode::FILTER_NAME . '*', '\common\components\FilterTranscode');
$writer = Writer::createFromFileObject(new \SplTempFileObject);
$writer->appendStreamFilter('convert.transcode.UTF-8:UTF-16LE');
$writer->setBOMOnOutput(Writer::BOM_UTF16_LE);
$writer->insertOne($content);
$writer->output($fileName); and got |
Streams do not work with createFromFileObject because SplFileObject poorly supports PHP stream $writer = Writer::createFromPath('/my/path/to/my/file.csv');
//or
$writer = Writer::createFromPath('php://output'); |
Now I'm getting |
Here's what I did to test on my computer the file is located in the examples directory. error_reporting(-1);
ini_set('display_errors', '1');
use League\Csv\Reader;
use League\Csv\Writer;
use lib\FilterTranscode;
require '../vendor/autoload.php';
stream_filter_register(FilterTranscode::FILTER_NAME."*", "\lib\FilterTranscode");
$csv = Reader::createFromPath(__DIR__.'/data/prenoms.csv');
$csv->setBOMOnOutput(Reader::BOM_UTF16_LE);
$csv->appendStreamFilter(FilterTranscode::FILTER_NAME."UTF-8:UTF-16LE");
$csv->output('test.csv'); Hope it helps |
Ok, I've got a result with |
The delimiter can be adjust using setDelimiter method |
Yes, I know. But after manipulations with encoding MS Excel is not recognizing delimiters correctly. |
it's because the tabulation is still in UTF-8 👎 you should use the UTF-16 tabulation character instead |
But I'm using |
...see my comment above. MS Excel "wants" tab as delimiter. |
Ok, with |
Great!! I've added a method to detect the BOM in the Input CSV so we will end up having:
Removing the Input BOM is already possible with the existing extract methods. When this is stable and bug free I think we will have a complete new and nice feature. Thoughts ? |
For me sounds good, thanks for your help! |
Yes 👍 ...and make a section for that infos -also regarding MS Excel- in your documentation page would be helpful imho :-)! |
I think @RomeroMsk could write that section, since like I said I'm no MS Excel expert :) . Let me add everything in the source and we will update the documentation when all is done and stable |
I'm not an expert too :) Also my English is not very good. |
http://csv.thephpleague.com/bom/ <- the documentation you can improve it directly via pull request if needed. |
the stable release version 6.3 was released with the stable feature. Of note the methods names have been changed. |
Checked 6.3, everything is fine, thanks! 👍 |
Hello.
Please add a config option to write BOM (
\xEF\xBB\xBF
) in the begining of outputed file.It solves the problem of opening UTF-8 encoded csv-file in MS Excel. Refer to http://stackoverflow.com/questions/4348802/how-can-i-output-a-utf-8-csv-in-php-that-excel-will-read-properly
The text was updated successfully, but these errors were encountered: