-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem converting cyrillic .doc OR .odt file to .txt #73
Comments
Is it possible to try again with the latest version from master branch. I have updated the manual page to reflect import and export filter options. You can now use -i FilterOptions=,,76 and -e FilterOptions=,,76 to enforce a certain encoding during import and export phase. Look at the following link for more information: http://wiki.services.openoffice.org/wiki/Documentation/DevGuide/Spreadsheets/Filter_Options Assuming that your ODT or DOC is in Unicode, you can use:
To convert to a UTF-8 (76) text-file. If you however would prefer another encoding for text files (because your editor or your system expects it), then use one of the below:
More options are in the above referenced URL. Too many options, I know ;-) |
@Yuseinov Any feedback on this ? |
Sorry for the late response. Problem is fixed. Thanks. |
Great to hear ! Using UTF-8 for export should be the default for me whenever possible. Import encoding is of course very specific to the original document. I would expect LibreOffice import filters to auto-detect when that's possible (depends on the source format). Let us know if there's anything we can improve, especially for (to me) foreign encodings. |
I have a new problem. I don' t know this is issue for
If i start I removed |
Please open a new issue for this. It helps getting attention from people with a similar issue (and myself too ;-)) |
I'm trying to convert cyrillic.rtf to unicode.rtf with command:
remote:
local:
|
OS: Linux Ubuntu 10.04
Unconv version: 0.3-6
When converting file with cyrillic text to txt format output is with broken encoding.
Input text:
Ouput text:
When converting from .odt OR .doc to .pdf everything is ok, but when converting to .txt then instead of text have question marks.
The text was updated successfully, but these errors were encountered: