Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for diacritics? Is UTF-8 being used consistently? #23

Closed
johnhuck opened this issue Mar 18, 2015 · 34 comments
Closed

Support for diacritics? Is UTF-8 being used consistently? #23

johnhuck opened this issue Mar 18, 2015 · 34 comments
Assignees

Comments

@johnhuck
Copy link

I was entering cataloguing information for an item in Dataverse, and it seems that diacritics are not being captured properly. I wonder if it is related to this issue at all:

IQSS/dataverse#834

For an example, look at the title for study doi:10.7939/DVN/10269, if you can access it through an admin interface (It hasn't been released yet). I don't know if this is a known issue for us already. Thank you!

@piyapongch
Copy link
Member

I will try it. Can you give me the item URL? I have looked at the link. It is in version 4.0 beta. We are using 3.6.2.

@johnhuck
Copy link
Author

Some further observations on this issue: I believe that dataverse may be interpreting a UTF-8 two bit character as two ASCII characters. (I have seen reference to this issue in 3.6.x elsewhere on github threads.) An interesting wrinkle to this issue is that when you are entering cataloguing for the study, I think certain actions (such as adding another instance of a repeating field) cause DV to automatically save, and it re-interprets the two ASCII characters it previously put in place of the single UTF-8 character over again, adding an additional character. So that over time é becomes © and then �Ã�©, then �Â�Ã�©, then Ã�Â�Ã�©, etc.

@piyapongch
Copy link
Member

I am still working on DOI problem. If you have found discussion on the
Datavese User Group, please forward to me. I think it should be supported
out of the box. I have looked in the html and I found that it encoded with
UTF-8. It might be something that they did or did not on the server. I will
investigate the problem after I am done with the DOI problems.

Thanks,

Piyapong.

On Tue, Mar 24, 2015 at 4:15 PM, johnhuck notifications@github.com wrote:

Some further observations on this issue: I believe that dataverse may be
interpreting a UTF-8 two bit character as two ASCII characters. (I have
seen reference to this issue in 3.6.x elsewhere on github threads.) An
interesting wrinkle to this issue is that when you are entering cataloguing
for the study, I think certain actions (such as adding another instance of
a repeating field) cause DV to automatically save, and it re-interprets the
two ASCII characters it previously put in place of the single UTF-8
character over again, adding an additional character. So that over time é
becomes © and then ‚©, then ƒÂ‚©, then ©, etc.


Reply to this email directly or view it on GitHub
#23 (comment).

@johnhuck
Copy link
Author

Thanks, Piyapong. I'll see if I can find some specific discussion threads for you (the one reference I saw was a passing comment someone made).

@pcharoen
Copy link

One of my source is Dataverse User Group,
https://groups.google.com/forum/#!forum/dataverse-community. Just in case,
you might not know it.

Piyapong.

On Wed, Mar 25, 2015 at 11:05 AM, johnhuck notifications@github.com wrote:

Thanks, Piyapong. I'll see if I can find some specific discussion threads
for you (the one reference I saw was a passing comment someone made).


Reply to this email directly or view it on GitHub
#23 (comment).

@piyapongch
Copy link
Member

I have tried to add the character encode in the startup script, but it did not seem to be working. I will tried to find another solutions.

@piyapongch
Copy link
Member

I have fixed the problem and tested. It seems to be working. Here is a sample study, https://hibernian.library.ualberta.ca/dvn/faces/study/StudyPage.xhtml?globalId=doi:10.5072/FK2/10076&versionNumber=2. I have entered some French, Chinese, Japanese and Thai. The dataverse saved and returned back with the same characters.

Please tried again and let me know.

@johnhuck
Copy link
Author

johnhuck commented Apr 1, 2015

Hi Piyapong, unfortunately, the problem persists in the dataverse study I showed you the other day. I just edited the record via my laptop (mac) and also via my desktop (pc), and got the same result both times. I also observed the extra character being added on "save". I didn't look at your example, but I notice it's on a different server. Could that be why?

@piyapongch
Copy link
Member

I have fixed it on the development server,
https://hibernian.library.ualberta.ca. Please try again.

If everything is working as expected, I will deploy the application on
production server.

Thanks,

Piyapong.

On Wed, Apr 1, 2015 at 3:16 PM, johnhuck notifications@github.com wrote:

Hi Piyapong, unfortunately, the problem persists in the dataverse study I
showed you the other day. I just edited the record via my laptop (mac) and
also via my desktop (pc), and got the same result both times. I also
observed the extra character being added on "save". I didn't look at your
example, but I notice it's on a different server. Could that be why?


Reply to this email directly or view it on GitHub
#23 (comment).

@piyapongch
Copy link
Member

You might not be able to try on your laptop with Wifi connection. You
should be able to do it from your workstation. If you can not, please let
me know. I will need to submit a ticket to sysadmin to open the firewall to
the development server for you.

Piyapong.

On Wed, Apr 1, 2015 at 3:25 PM, Piyapong Charoenwattana <
piyapong.charoenwattana@ualberta.ca> wrote:

I have fixed it on the development server,
https://hibernian.library.ualberta.ca. Please try again.

If everything is working as expected, I will deploy the application on
production server.

Thanks,

Piyapong.

On Wed, Apr 1, 2015 at 3:16 PM, johnhuck notifications@github.com wrote:

Hi Piyapong, unfortunately, the problem persists in the dataverse study I
showed you the other day. I just edited the record via my laptop (mac) and
also via my desktop (pc), and got the same result both times. I also
observed the extra character being added on "save". I didn't look at your
example, but I notice it's on a different server. Could that be why?


Reply to this email directly or view it on GitHub
#23 (comment).

@johnhuck
Copy link
Author

johnhuck commented Apr 1, 2015

I was able to log on to the hibernia dev server. I created a test dataverse and a test study with simple diacritics and it looks fine, so, I can confirm the behaviour that you have observed. Looks like it's fixed. Thanks! Was it simple to solve? (In case someone else runs into the same problem)

@piyapongch
Copy link
Member

I will package it and deploy on production server. I will let you know when
it deployed.

Piyapong.

On Wed, Apr 1, 2015 at 3:40 PM, johnhuck notifications@github.com wrote:

I was able to log on to the hibernia dev server. I created a test
dataverse and a test study with simple diacritics and it looks fine, so, I
can confirm the behaviour that you have observed. Looks like it's fixed.
Thanks!


Reply to this email directly or view it on GitHub
#23 (comment).

@chumphre
Copy link

chumphre commented Apr 1, 2015

Hi, Piyapong ... I'm told by the development server that this is a
restricted page so I can't see the improvements to character sets that
you've accomplished. Thanks, Chuck

Charles (Chuck) Humphrey
Research Data Management Services Coordinator
University of Alberta Libraries
Phone: 780-492-9216
ORCID: http://orcid.org/0000-0003-4623-020X
http://orcid.org/0000-0003-4623-020X?lang=en

On Wed, Apr 1, 2015 at 2:54 PM, Piyapong Charoenwattana <
notifications@github.com> wrote:

I have fixed the problem and tested. It seems to be working. Here is a
sample study,
https://hibernian.library.ualberta.ca/dvn/faces/study/StudyPage.xhtml?globalId=doi:10.5072/FK2/10076&versionNumber=2.
I have entered some French, Chinese, Japanese and Thai. The dataverse saved
and returned back with the same characters.

Please tried again and let me know.


Reply to this email directly or view it on GitHub
#23 (comment).

@piyapongch
Copy link
Member

Chuck:

I would like to see the problem. It might be firewall or browser certificate. Can you show me when you have a chance? If it is firewall problem, I will request the sysadmin to open it for you.

Thanks,

@johnhuck
Copy link
Author

johnhuck commented Apr 2, 2015

Piyapong, I also got a restricted message when I clicked your link (using my staff station), https://hibernian.library.ualberta.ca/dvn/faces/study/StudyPage.xhtml?globalId=doi:10.5072/FK2/10076&versionNumber=2, but I was able to navigate somehow to a login screen, where I was able to login (ignoring a certificate warning message from my browser) and once I was in the dev. instance of dataverse, I was able to visually identify your test study (using the DOI) and navigate to it. But the link wouldn't take me directly to it. I didn't mention all of this because it seemed tangential to the original task of checking the diacritics.

@piyapongch
Copy link
Member

I think I was testing when I logged in.

Chuck:

You might want to point your browser to http://hibernian.library.ualberta.ca
and log in using your ccid. Then, you can try to create a study, enter
special characters and save it. Then, open the study again to see that the
special characters displays properly. You also should be able to click on
the link to see my sample.

Thanks,

Piyapong.

On Thu, Apr 2, 2015 at 11:46 AM, johnhuck notifications@github.com wrote:

Piyapong, I also got a restricted message when I clicked your link (using
my staff station),
https://hibernian.library.ualberta.ca/dvn/faces/study/StudyPage.xhtml?globalId=doi:10.5072/FK2/10076&versionNumber=2,
but I was able to navigate somehow to a login screen, where I was able to
login (ignoring a certificate warning message from my browser) and once I
was in the dev. instance of dataverse, I was able to visually identify your
test study (using the DOI) and navigate to it. But the link wouldn't take
me directly to it. I didn't mention all of this because it seemed
tangential to the original task of checking the diacritics.


Reply to this email directly or view it on GitHub
#23 (comment).

@piyapongch
Copy link
Member

Chuck:

I have released the study. Can you try this URL,
https://hibernian.library.ualberta.ca/dvn/faces/study/StudyPage.xhtml?studyId=8699
?

Thanks,

Piyapong.

On Thu, Apr 2, 2015 at 11:53 AM, Piyapong Charoenwattana <
piyapong.charoenwattana@ualberta.ca> wrote:

I think I was testing when I logged in.

Chuck:

You might want to point your browser to
http://hibernian.library.ualberta.ca and log in using your ccid. Then,
you can try to create a study, enter special characters and save it. Then,
open the study again to see that the special characters displays properly.
You also should be able to click on the link to see my sample.

Thanks,

Piyapong.

On Thu, Apr 2, 2015 at 11:46 AM, johnhuck notifications@github.com
wrote:

Piyapong, I also got a restricted message when I clicked your link (using
my staff station),
https://hibernian.library.ualberta.ca/dvn/faces/study/StudyPage.xhtml?globalId=doi:10.5072/FK2/10076&versionNumber=2,
but I was able to navigate somehow to a login screen, where I was able to
login (ignoring a certificate warning message from my browser) and once I
was in the dev. instance of dataverse, I was able to visually identify your
test study (using the DOI) and navigate to it. But the link wouldn't take
me directly to it. I didn't mention all of this because it seemed
tangential to the original task of checking the diacritics.


Reply to this email directly or view it on GitHub
#23 (comment).

@johnhuck
Copy link
Author

johnhuck commented Apr 2, 2015

I still get an unsafe certificate warning, but if I click through, the link works now.

@piyapongch
Copy link
Member

The development server does not have a trust certificate installed. You
will always get the warning. Just need to click through it.

In the sample, I know that Thai is working properly. It think, it should be
the same as the others.

Piyapong.

On Thu, Apr 2, 2015 at 12:16 PM, johnhuck notifications@github.com wrote:

I still get an unsafe certificate warning, but if I click through, the
link works now.


Reply to this email directly or view it on GitHub
#23 (comment).

@chumphre
Copy link

chumphre commented Apr 2, 2015

Excellent! Thanks, Chuck

Charles (Chuck) Humphrey
Research Data Management Services Coordinator
University of Alberta Libraries
Phone: 780-492-9216
ORCID: http://orcid.org/0000-0003-4623-020X
http://orcid.org/0000-0003-4623-020X?lang=en

On Thu, Apr 2, 2015 at 12:03 PM, Piyapong Charoenwattana <
notifications@github.com> wrote:

Chuck:

I have release the study. Can you try this URL,

https://hibernian.library.ualberta.ca/dvn/faces/study/StudyPage.xhtml?studyId=8699
?

Thanks,

Piyapong.

On Thu, Apr 2, 2015 at 11:53 AM, Piyapong Charoenwattana <
piyapong.charoenwattana@ualberta.ca> wrote:

I think I was testing when I logged in.

Chuck:

You might want to point your browser to
http://hibernian.library.ualberta.ca and log in using your ccid. Then,
you can try to create a study, enter special characters and save it.
Then,
open the study again to see that the special characters displays
properly.
You also should be able to click on the link to see my sample.

Thanks,

Piyapong.

On Thu, Apr 2, 2015 at 11:46 AM, johnhuck notifications@github.com
wrote:

Piyapong, I also got a restricted message when I clicked your link
(using
my staff station),

https://hibernian.library.ualberta.ca/dvn/faces/study/StudyPage.xhtml?globalId=doi:10.5072/FK2/10076&versionNumber=2
,
but I was able to navigate somehow to a login screen, where I was able
to
login (ignoring a certificate warning message from my browser) and once
I
was in the dev. instance of dataverse, I was able to visually identify
your
test study (using the DOI) and navigate to it. But the link wouldn't
take
me directly to it. I didn't mention all of this because it seemed
tangential to the original task of checking the diacritics.


Reply to this email directly or view it on GitHub
#23 (comment).


Reply to this email directly or view it on GitHub
#23 (comment).

@piyapongch
Copy link
Member

I have told Henry to deploy the application on production server. I will let you know when it lives.

@johnhuck
Copy link
Author

The original encoding problem persists on the production instance.

@piyapongch
Copy link
Member

I have checked the version of development server and production server. The
production server stills old version. I have asked Henry to deploy the
latest version from development server that working properly. I will let
you know when the application deployed. Then, you can try again.

Thanks,

Piyapong.

On Wed, Apr 15, 2015 at 10:14 AM, johnhuck notifications@github.com wrote:

The original encoding problem persists on the production instance.


Reply to this email directly or view it on GitHub
#23 (comment).

@piyapongch
Copy link
Member

I think Henry is away today. I will check with him again tomorrow.

Thanks,

Piyapong.

On Wed, Apr 15, 2015 at 11:42 AM, Piyapong Charoenwattana <
piyapong.charoenwattana@ualberta.ca> wrote:

I have checked the version of development server and production server.
The production server stills old version. I have asked Henry to deploy the
latest version from development server that working properly. I will let
you know when the application deployed. Then, you can try again.

Thanks,

Piyapong.

On Wed, Apr 15, 2015 at 10:14 AM, johnhuck notifications@github.com
wrote:

The original encoding problem persists on the production instance.


Reply to this email directly or view it on GitHub
#23 (comment).

@henryzhang87
Copy link

what would you like me to do?

regards
Henry

On Wed, Apr 15, 2015 at 11:44 AM, Piyapong Charoenwattana <
notifications@github.com> wrote:

I think Henry is away today. I will check with him again tomorrow.

Thanks,

Piyapong.

On Wed, Apr 15, 2015 at 11:42 AM, Piyapong Charoenwattana <
piyapong.charoenwattana@ualberta.ca> wrote:

I have checked the version of development server and production server.
The production server stills old version. I have asked Henry to deploy
the
latest version from development server that working properly. I will let
you know when the application deployed. Then, you can try again.

Thanks,

Piyapong.

On Wed, Apr 15, 2015 at 10:14 AM, johnhuck notifications@github.com
wrote:

The original encoding problem persists on the production instance.


Reply to this email directly or view it on GitHub
#23 (comment).


Reply to this email directly or view it on GitHub
#23 (comment).

@pcharoen
Copy link

Henry:

Can you deploy the dataverse war file from Hibernian to production server?
You do not need to restart the server, just drop the war file in autodeploy
directory. I think the application will be unavailable briefly.

Thanks,

Piyapong.

On Wed, Apr 15, 2015 at 12:01 PM, henryzhang87 notifications@github.com
wrote:

what would you like me to do?

regards
Henry

On Wed, Apr 15, 2015 at 11:44 AM, Piyapong Charoenwattana <
notifications@github.com> wrote:

I think Henry is away today. I will check with him again tomorrow.

Thanks,

Piyapong.

On Wed, Apr 15, 2015 at 11:42 AM, Piyapong Charoenwattana <
piyapong.charoenwattana@ualberta.ca> wrote:

I have checked the version of development server and production server.
The production server stills old version. I have asked Henry to deploy
the
latest version from development server that working properly. I will
let
you know when the application deployed. Then, you can try again.

Thanks,

Piyapong.

On Wed, Apr 15, 2015 at 10:14 AM, johnhuck notifications@github.com
wrote:

The original encoding problem persists on the production instance.


Reply to this email directly or view it on GitHub
#23 (comment).


Reply to this email directly or view it on GitHub
#23 (comment).


Reply to this email directly or view it on GitHub
#23 (comment).

@chumphre
Copy link

Hi, Piyapong ... Sorry, I'm on the side of the firewall to check this. I'll be back on the office Monday.

Thanks, Chuck

Sent from my iPhone

On Apr 2, 2015, at 12:03 PM, Piyapong Charoenwattana notifications@github.com wrote:

Chuck:

I have release the study. Can you try this URL,
https://hibernian.library.ualberta.ca/dvn/faces/study/StudyPage.xhtml?studyId=8699
?

Thanks,

Piyapong.

On Thu, Apr 2, 2015 at 11:53 AM, Piyapong Charoenwattana <
piyapong.charoenwattana@ualberta.ca> wrote:

I think I was testing when I logged in.

Chuck:

You might want to point your browser to
http://hibernian.library.ualberta.ca and log in using your ccid. Then,
you can try to create a study, enter special characters and save it. Then,
open the study again to see that the special characters displays properly.
You also should be able to click on the link to see my sample.

Thanks,

Piyapong.

On Thu, Apr 2, 2015 at 11:46 AM, johnhuck notifications@github.com
wrote:

Piyapong, I also got a restricted message when I clicked your link (using
my staff station),
https://hibernian.library.ualberta.ca/dvn/faces/study/StudyPage.xhtml?globalId=doi:10.5072/FK2/10076&versionNumber=2,
but I was able to navigate somehow to a login screen, where I was able to
login (ignoring a certificate warning message from my browser) and once I
was in the dev. instance of dataverse, I was able to visually identify your
test study (using the DOI) and navigate to it. But the link wouldn't take
me directly to it. I didn't mention all of this because it seemed
tangential to the original task of checking the diacritics.


Reply to this email directly or view it on GitHub
#23 (comment).


Reply to this email directly or view it on GitHub.

@chumphre
Copy link

I see that this is an older message that was stacked on the latest exchanges about special character sets. It popped into sight and I responded. But now that I think about, I already checked this earlier in the month.

Sent from my iPhone

On Apr 2, 2015, at 12:03 PM, Piyapong Charoenwattana notifications@github.com wrote:

Chuck:

I have release the study. Can you try this URL,
https://hibernian.library.ualberta.ca/dvn/faces/study/StudyPage.xhtml?studyId=8699
?

Thanks,

Piyapong.

On Thu, Apr 2, 2015 at 11:53 AM, Piyapong Charoenwattana <
piyapong.charoenwattana@ualberta.ca> wrote:

I think I was testing when I logged in.

Chuck:

You might want to point your browser to
http://hibernian.library.ualberta.ca and log in using your ccid. Then,
you can try to create a study, enter special characters and save it. Then,
open the study again to see that the special characters displays properly.
You also should be able to click on the link to see my sample.

Thanks,

Piyapong.

On Thu, Apr 2, 2015 at 11:46 AM, johnhuck notifications@github.com
wrote:

Piyapong, I also got a restricted message when I clicked your link (using
my staff station),
https://hibernian.library.ualberta.ca/dvn/faces/study/StudyPage.xhtml?globalId=doi:10.5072/FK2/10076&versionNumber=2,
but I was able to navigate somehow to a login screen, where I was able to
login (ignoring a certificate warning message from my browser) and once I
was in the dev. instance of dataverse, I was able to visually identify your
test study (using the DOI) and navigate to it. But the link wouldn't take
me directly to it. I didn't mention all of this because it seemed
tangential to the original task of checking the diacritics.


Reply to this email directly or view it on GitHub
#23 (comment).


Reply to this email directly or view it on GitHub.

@henryzhang87
Copy link

Deployed. Please verify.

Regards

Henry

On Wed, Apr 15, 2015 at 12:06 PM, pcharoen notifications@github.com wrote:

Henry:

Can you deploy the dataverse war file from Hibernian to production server?
You do not need to restart the server, just drop the war file in autodeploy
directory. I think the application will be unavailable briefly.

Thanks,

Piyapong.

On Wed, Apr 15, 2015 at 12:01 PM, henryzhang87 notifications@github.com
wrote:

what would you like me to do?

regards
Henry

On Wed, Apr 15, 2015 at 11:44 AM, Piyapong Charoenwattana <
notifications@github.com> wrote:

I think Henry is away today. I will check with him again tomorrow.

Thanks,

Piyapong.

On Wed, Apr 15, 2015 at 11:42 AM, Piyapong Charoenwattana <
piyapong.charoenwattana@ualberta.ca> wrote:

I have checked the version of development server and production
server.
The production server stills old version. I have asked Henry to
deploy
the
latest version from development server that working properly. I will
let
you know when the application deployed. Then, you can try again.

Thanks,

Piyapong.

On Wed, Apr 15, 2015 at 10:14 AM, johnhuck <notifications@github.com

wrote:

The original encoding problem persists on the production instance.


Reply to this email directly or view it on GitHub
<#23 (comment)
.


Reply to this email directly or view it on GitHub
#23 (comment).


Reply to this email directly or view it on GitHub
#23 (comment).


Reply to this email directly or view it on GitHub
#23 (comment).

@piyapongch
Copy link
Member

Thank you Henry. It is now latest version.

John:

Can you try again? Thanks,

Piyapong.

On Wed, Apr 15, 2015 at 12:18 PM, henryzhang87 notifications@github.com
wrote:

Deployed. Please verify.

Regards

Henry

On Wed, Apr 15, 2015 at 12:06 PM, pcharoen notifications@github.com
wrote:

Henry:

Can you deploy the dataverse war file from Hibernian to production
server?
You do not need to restart the server, just drop the war file in
autodeploy
directory. I think the application will be unavailable briefly.

Thanks,

Piyapong.

On Wed, Apr 15, 2015 at 12:01 PM, henryzhang87 <notifications@github.com

wrote:

what would you like me to do?

regards
Henry

On Wed, Apr 15, 2015 at 11:44 AM, Piyapong Charoenwattana <
notifications@github.com> wrote:

I think Henry is away today. I will check with him again tomorrow.

Thanks,

Piyapong.

On Wed, Apr 15, 2015 at 11:42 AM, Piyapong Charoenwattana <
piyapong.charoenwattana@ualberta.ca> wrote:

I have checked the version of development server and production
server.
The production server stills old version. I have asked Henry to
deploy
the
latest version from development server that working properly. I
will
let
you know when the application deployed. Then, you can try again.

Thanks,

Piyapong.

On Wed, Apr 15, 2015 at 10:14 AM, johnhuck <
notifications@github.com

wrote:

The original encoding problem persists on the production instance.


Reply to this email directly or view it on GitHub
<
#23 (comment)
.


Reply to this email directly or view it on GitHub
<#23 (comment)
.


Reply to this email directly or view it on GitHub
#23 (comment).


Reply to this email directly or view it on GitHub
#23 (comment).


Reply to this email directly or view it on GitHub
#23 (comment).

@johnhuck
Copy link
Author

Success! Diacritics are saving properly, so it looks like it is fixed. Thanks, Piyapong and Henry.

@piyapongch
Copy link
Member

Henry:

You can close the ticket. Thanks,

Piyapong.

On Wed, Apr 15, 2015 at 2:17 PM, johnhuck notifications@github.com wrote:

Success! Diacritics are saving properly, so it looks like it is fixed.
Thanks, Piyapong and Henry.


Reply to this email directly or view it on GitHub
#23 (comment).

@henryzhang87
Copy link

fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants