Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default-on SCAYT spell checker sends all text to third party without warning #434

Closed
kshade opened this issue Nov 9, 2018 · 25 comments
Closed

Comments

@kshade
Copy link

kshade commented Nov 9, 2018

The activated-by-default "cloud" spell checker service sends all data you enter to a company in the Ukraine, hosted on AWS. There is no warning about this on the plug-in page, the settings page or in the editor itself, even if the page edited has a non-public ACL.

Since this is a DokuWiki plugin, which lists "Corporate Knowledge Base" and "CMS - Intranet" as use cases on their home page, that should not be the case. Also, there's that whole GDPR thing.

Suggestions:

  • Disable the spell checker by default
  • Display a warning to both administrators and users when it's active, before the first wiki page is loaded (and, with that, sent)
  • Update the plug-in description page at https://www.dokuwiki.org/plugin:ckgedit
  • Remove the spell checker altogether
@turnermm
Copy link
Owner

turnermm commented Nov 9, 2018

This is interesting and disturbing if true. Could you give me some references so that I can raise the issue with the creators of the CKEditor and give my own users a source to go to for information? I did a quick google search but didn't turn up anything. Thanks.

@kshade
Copy link
Author

kshade commented Nov 9, 2018

You mean steps to reproduce? Install DokuWiki and the CKEdit plugin, open any page in the CKEditor, type some words and watch your browser's developer console. You will see a request for every word, with these URL parameters:

customerid=[some long string]
format=json
cmd=check_spelling
slang=[language code]
user_wordlist=[list of words that you added to the dictionary]
text=[the word you entered, if you first open a page the entire text will be sent, after that it's cached in your browser's local storage]

I've also created a thread on /r/sysadmin: https://www.reddit.com/r/sysadmin/comments/9vkuzb/dokuwiki_with_ckgedit_plugin_and_defaultenabled/

Don't blame you for not catching it, tbh, it's kinda ridiculous.

@turnermm
Copy link
Owner

turnermm commented Nov 10, 2018

I'm not convinced by your argument. I've taken a look at the outgoing data. First of all there is no validity to your idea that passwords are being sent to their server. The spell-checker doesn't come into play until after you have logged in and have opened a page in the CKEditor. And the customerid field bears no relation to a Dokuwiki password. It is in fact the id hard-coded the version of CKEditor currently on your server. Secondly, it is not sending back your entire page. When you open up a document, it sends back limited chunks and isolated single words, often not in sequence. And when you are typing, it sends single words, which is what you would expect of a spell-checker which is designed to check while you type. There is also the fact that CKEDitor with scayt spell-checker has 100's of millions of users, including many big tech companies, and it's been around for 10 years, so you would think that somewhere along the way this big gaping hole would have been identified and corrected. I'm not discounting the possibility of a security flaw, but I would like to see some further evidence.

@kshade
Copy link
Author

kshade commented Nov 10, 2018

I've taken a look at the outgoing data. First of all there is no validity to your idea that passwords are being sent to their server. The spell-checker doesn't come into play until after you have logged in and have opened a page in the CKEditor. And the customerid field bears no relation to a Dokuwiki password.

I think there's a misunderstanding: I did not mean that it sends your DokuWiki password, but passwords (and other sensitive information) that might get stored in a non-public wiki instance. I just mentioned passwords in the Reddit thread (but not here) because I initially thought that the checker might not send words with numbers in them, but it does.

This is not a (typical) security issue, but it is unexpected behaviour from a piece of software meant for corporate intranets.

Secondly, it is not sending back your entire page. When you open up a document, it sends back limited chunks and isolated single words, often not in sequence.

Depends on what's cached in local browser storage.

And when you are typing, it sends single words, which is what you would expect of a spell-checker which is designed to check while you type.

I don't expect spell checkers to send any data anywhere, which is the actual problem. The editor, as it is configured by default, will send page data to a third party unexpectedly. Everyone can decide for themselves if they are okay with that, of course, but to make that decision they need to be informed.

@turnermm
Copy link
Owner

You began by saying that Scayt sends back "All" data to their serevers, and now that I've examined what is in fact being sent, non-sequential words and small bits of text, you have altered that statement to:

Depends on what's cached in local browser storage.

You might want to show some evidence that this is even a meaningful statement.
I have no hesitancy in warning about Scayt if there is in fact something to be warned about. So far, I don't see the basis for that. But to satisfy any doubts I am going to post an issue to CKEditor.

@kshade
Copy link
Author

kshade commented Nov 11, 2018

You began by saying that Scayt sends back "All" data to their serevers

All data loaded into the editor. I thought that was clear, sorry if it wasn't.

you have altered that statement to:

Depends on what's cached in local browser storage.

The spell checker stores some of the results in the browser's local storage, so not all page data gets submitted every time you open the editor, only the first time. If it is opened for the first time, you can see a bunch of XHR requests that, in sequential order (at least during my test) included the entire page's contents in blocks of about 100 characters, the page's address (hinting at which company the information belongs to) and its title.

This means that someone on the other side can easily reconstruct the page contents, including personal data, confidential data and so on. The user is not informed of this beforehand. Do you see where I'm coming from with this?

I have no hesitancy in warning about Scayt if there is in fact something to be warned about. So far, I don't see the basis for that.

User consent is becoming more and more important. Scayt sends your data to a third-party service unexpectedly, quietly and by default. I think that warrants at least a warning, especially since DokuWiki isn't just for open-to-the-public wikis.

@turnermm
Copy link
Owner

turnermm commented Nov 11, 2018

I have opened a ticket with CKEDitor and will wait to see what their reponse is. Ckgedit and its predecessors have been in use for almost 10 years, so I am quite willing to wait for an answer before I start making public accusations about Scayt that may have no solid foundation.

@kshade
Copy link
Author

kshade commented Nov 11, 2018

Accusations of what, exactly? The spell checker service runs as intended and is fine for a public-facing site, that's not the problem, the problem is that it's being used without informing the user on something that's partly intended for corporate intranets, sending all data in the wiki to a company that uses AWS to host and is located in the Ukraine.

Disclosing that information to the end user is not an accusation.

@turnermm
Copy link
Owner

A warning is a tacit accusation. Anyway, I have contacted CKEditor directly and also posted an issue to stackoverflow:
https://stackoverflow.com/questions/53254237/scayt-security-in-ckeditor

@jswiderski
Copy link

Hi,

I work at CKSource (the creator of CKEditor) and will give you some basic insights of what SCAYT is and why there is nothing to be afraid of.

First of all this is a product developed by https://webspellchecker.net/ which is incorporated into CKEditor as a third-party plugin.

Second, as you can read in plugins description https://ckeditor.com/cke4/addon/scayt

SCAYT is "installation-less", using the web services of WebSpellChecker.net.
what clearly indicates it does some online checking. The "cloud" is also mentioned on products page https://www.webspellchecker.net/scayt.html so in my opinion no one can say that SCAYT does something quietly when this is in fact its basic (and well known) functionality of which SCAYT speaks loudly (I will get to the loud speaking in a minute).

Third, SCAYT is disabled by default in CKEditor. There is a possibility to turn it on automatically at editor start with configuration setting https://ckeditor.com/docs/ckeditor4/latest/api/CKEDITOR_config.html#cfg-scayt_autoStartup, but this is strictly a decision of developer how implemented CKEditor into DokuWiki. Now I don't know if DokuWiki users have an option to additionally configure CKEditor but if they don't this is a potential feature request to introduce. I believe however there should be a ckeditor/config.js file available where you can simply change config.scayt_autoStartup = true; to config.scayt_autoStartup = false;.

Forth, with SCAYT you have the option to use Cloud Services however if your business model doesn't allow that or you simply feel to paranoid, you can also use SCAYT on your server and WebSpellChecker describes this widely in its documentation. Please see below links:
https://docs.webspellchecker.net/
https://docs.webspellchecker.net/display/WebSpellCheckerCloud
https://docs.webspellchecker.net/display/WebSpellCheckerServer520

Now if you compare the pricing of both, I can understand why services were used instead of server:
https://www.webspellchecker.net/webspellchecker-licensed-version.html
https://www.webspellchecker.net/webspellchecker-hosted-services.html
@kshade if your company has such budget to spent and if DokuWiki allows it (and it should, provided that you can configure your editor instance freely), I'm sure you will be able to switch from Cloud Services to your private server so that you have no doubts about your data privacy.

@kshade I have also contacted WebSpellChecker team so that they could also provide some insights.

@jshaptala
Copy link

Hi @kshade, @turnermm, @jswiderski,

My name is Julia, I'm in charge of business and development processes at WebSpellChecker.

Let me jump in here and provide you with more information about our products and specifically the SpellCheckAsYouType (SCAYT) and WebSpellChecker Dialog (WSC) plugins for CKEditor. We have a long-term cooperation with CKSource (the Creator of CKEditor), in which we built two plugins SCAYT and WSC for spelling and grammar checking.

By default, both plugins available for free and provided with banner ads saying that this is a product developed by WebSpellChecker and provide options for the upgrade. These plugins go with built-in activation key which allows you to start using the service right away. It means that we are not able to collect any information who exactly installed our products and moreover who are the end users, except the general information about the amount of usage generated under the free services. The only way how we can interact with users of the free version – show banner.

Moreover, in respect to GDPR, we implemented a number of technical and organizational safeguards, updated our Privacy Policy and Terms of Services where explained how we process the data that is sent to our servers. I'd like to draw your attention that we do not store/collect any data that is sent for spelling or grammar checking.

Our Cloud Services are hosted and maintained on AWS (specifically in US East (N. Virginia)). This information is outlined on our subprocessors list.

You know, I just could not miss your multiple mentions of our location in Ukraine. Yes, we are a small development company in Ukraine. We have a wonderful professional team. We are working hard (often overtime) in order to keep improving our products and services and provide support for our customers. However, I wonder if our company was based in EU or USA, will you have a different attitude to the service?

As to the technical side, of SCAYT and WSC in CKEditor.
@turnermm and @jswiderski thank you a lot for such comprehensive and detailed explanations. @jswiderski is right, SCAYT is not enabled by default, it can be changed intentionally by a developer responsible for the integration using scayt_autoStartup to start it automatically on the page load.

Our products are designed and provided to companies...developers who are looking for 3rd party proofreading tools that can be integrated with their systems. We provide a wide range of options for how our products can be managed and customized to fit the needs of a customer. Using GDPR terms we are playing the role of the processor, and not a controller.

I totally agree with the fact that if any 3rd products or services (including ours) that are used within any integration and involve the data processing, must be verified and described in the subprocessors list. So, whenever SCAYT or other our products are integrated, a person who is responsible for the integration must inform their end users their text will be processed on removed servers.

As I mentioned earlier, we do not store the text that is sent for proofreading to our servers. We process it and send it back with the additional information about the mistakes and their possible corrections. Moreover, you can easily see what the information is sent to us:

– customerid=[some long string] – it is a default activation key for all users who use the free version of SCAYT/WSC in CKEditor. There is no magic.
– format=json – no need to explain.
– cmd=check_spelling – type of command. The paid version also can have a grammar check and other types of checking.
– slang=[language code] – language used for spell or grammar checking.
user_wordlist=[list of words that you added to the dictionary] – user dictionary (for the free version is stored in the browser local storage only).
– text=[the word you entered, if you first open a page the entire text will be sent, after that it's cached in your browser's local storage] – only the text that is entered or will be entered in the instance of CKEditor where SCAYT is enabled. Moreover, the words are sent in small portions (max: 10 words), without identifying the information of end user who is using the proofreading. The only identifier for us when we receive a request if the activation key.

I strongly encourage you to take a look at our Privacy Policy, Cloud Service Architecture Diagram and Data Flow Diagrams to learn more how we process customer data.

Moreover, we are always open to answer our customers or end user questions (at support@webspellchecker.net). You just need to ask us directly without posting or referring to unreliable or confirmed information.

@kshade
Copy link
Author

kshade commented Nov 13, 2018

@jswiderski
@wsc-julia-shaptala

Hello. Thank you for replying. I'll be quoting you both in my reply, hope that's okay. Emboldened sections not for yelling, but to show what I find important.

Third, SCAYT is disabled by default in CKEditor.

This may be true in general, but this plug-in enables it by default.

Second, as you can read in plugins description https://ckeditor.com/cke4/addon/scayt

I can now, that's true, that's why I'm filing this bug against the plug-in for DokuWiki, where there is no mention of the spell checker at all (https://www.dokuwiki.org/plugin:ckgedit). In fact, I found out about your company after I saw your domain in my browser console, which probably isn't optimal for building trust.

By default, both plugins available for free and provided with banner ads saying that this is a product developed by WebSpellChecker and provide options for the upgrade.

Haven't seen any ads on our Intranet page. Don't think we're blocking them.

It means that we are not able to collect any information who exactly installed our products

I think you could, in theory, by looking at the HTTP headers. Pretty sure the site's URL is in there, including our corporate domain.

Moreover, in respect to GDPR, we implemented a number of technical and organizational safeguards, updated our Privacy Policy and Terms of Services where explained how we process the data that is sent to our servers. I'd like to draw your attention that we do not store/collect any data that is sent for spelling or grammar checking.

And I appreciate that, I don't think your company is malicious in any way, but, since you brought up the GDPR, you also understand that I, as an administrator, need to keep track of where our data is sent, correct? I'm sure nothing will come of this incident, but it is a security incident, just like an employee uploading confidential data to one of those online document converters.

I'm not even going to go into trusting you or not, because, to make that decision, people have to know that there is a decision to make in the first place.

That's why I filed this bug: It's unexpected behaviour to use a third-party service for spell checking on a product that's partly meant for internal use. There should be a notification about what's happening (data gets send to you, a third party) and explicit user consent (opt-in) before it happens, especially since there is almost certainly confidential data (internal documentation) involved.

You know, I just could not miss your multiple mentions of our location in Ukraine. Yes, we are a small development company in Ukraine. We have a wonderful professional team. We are working hard (often overtime) in order to keep improving our products and services and provide support for our customers. However, I wonder if our company was based in EU or USA, will you have a different attitude to the service?

I'm sure your team is great, but that doesn't mean I want to, or can, trust you with non-public data. It's part of my job to minimize attack surfaces, and your service, by its nature, seems like a nice, low-key target for unsavoury actors. About the Ukraine: I was actually glad that you weren't US-based but from an EU-adjacent country, but then I saw that you use US Amazon instances. I did mention the Ukraine in my Reddit post because I know that that sub is frequented by many Americans who might have a bigger problem with your location (and neighbours).

@turnermm
Copy link
Owner

I totally agree with the fact that if any 3rd products or services (including ours) that are used within any integration and involve the data processing, must be verified and described in the subprocessors list. So, whenever SCAYT or other our products are integrated, a person who is responsible for the integration must inform their end users their text will be processed on removed servers.

I will take this advice. The only issue is how to implement a change without having my inbox inundated with complaints. Thanks.

@jswiderski
Copy link

I think you could, in theory, by looking at the HTTP headers. Pretty sure the site's URL is in there, including our corporate domain.

As I have explained before, you can always switch to server solution and no external cloud communication will occur. It's all up to you.

Btw. This is pretty rude thing to say especially that you how no proof of that. Anyone can say "this is a potential security threat" but not anyone can prove it so I recommend you have some proof before making such strong arguments next time otherwise this discussion is pointless. There are 3 people try to show you everything is fine but you still know better. If that is the case, explain how, show us some real evidence but don't witch hunt.

@kshade
Copy link
Author

kshade commented Nov 14, 2018

@turnermm

I will take this advice.

Thank you, that's basically all I wanted.

@jswiderski

It's a bit disheartening that all you reply to is a bit of speculation, not my actual points, which I've clearly marked. To reiterate, I don't believe that there is a technical problem with your product, nor do I think that you don't adhere to your data protection guidelines, the issue is with the lack of user consent/information in this DokuWiki plug-in.

As for why I speculated about the HTTP headers, please see the attached image. I'm not 100% sure if that's what you receive on the other end, and I do believe you when you say that you don't store this information or link it with customer IDs. You can, apparently, stop clients from sending the referer header by setting a Referrer-Policy header, don't know what to do about the origin one: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referrer-Policy

hdrs

@turnermm
Copy link
Owner

turnermm commented Nov 15, 2018

I have made some changes which leave it to the user to activate Scayt or not. This option was available before but is now the default. The administrator receives a message on installation advising of this and referencing the plugin page, where this matter is further explained and which in turn references this github issue. The admin can, as before, remove Scayt altogether.

I am now closing this issue. Thank you for your input.

@jshaptala
Copy link

Before this one is forgotten, I'd like to add few comments:

By default, both plugins available for free and provided with banner ads saying that this is a product developed by WebSpellChecker and provide options for the upgrade.

Haven't seen any ads on our Intranet page. Don't think we're blocking them.

That's interesting. @turnermm I guess, we need to discuss the licensing question. As if you are using the free version, the banners must be present. Otherwise, this can be considered as a violation of Terms of Service.

It means that we are not able to collect any information who exactly installed our products

I think you could, in theory, by looking at the HTTP headers. Pretty sure the site's URL is in there, including our corporate domain.

Yes, you are right. The information that is collected within your use of our services is described in our Privacy Policy.

Moreover, in respect to GDPR, we implemented a number of technical and organizational safeguards, updated our Privacy Policy and Terms of Services where explained how we process the data that is sent to our servers. I'd like to draw your attention that we do not store/collect any data that is sent for spelling or grammar checking.

And I appreciate that, I don't think your company is malicious in any way, but, since you brought up the GDPR, you also understand that I, as an administrator, need to keep track of where our data is sent, correct? I'm sure nothing will come of this incident, but it is a security incident, just like an employee uploading confidential data to one of those online document converters.

I totally agree with your comments here and understand why you brought up this question.

You know, I just could not miss your multiple mentions of our location in Ukraine. Yes, we are a small development company in Ukraine. We have a wonderful professional team. We are working hard (often overtime) in order to keep improving our products and services and provide support for our customers. However, I wonder if our company was based in EU or USA, will you have a different attitude to the service?

I'm sure your team is great, but that doesn't mean I want to, or can, trust you with non-public data. It's part of my job to minimize attack surfaces, and your service, by its nature, seems like a nice, low-key target for unsavoury actors. About the Ukraine: I was actually glad that you weren't US-based but from an EU-adjacent country, but then I saw that you use US Amazon instances. I did mention the Ukraine in my Reddit post because I know that that sub is frequented by many Americans who might have a bigger problem with your location (and neighbours).

We do not hide any information where our Cloud services are, and where our company is registered, and team located. It is outlined everywhere on our website. At the moment having and maintaining AWS in US the the most cost effective option considering the fact the major part of usage is the free services. Using CDN and geographically distributing traffic is the next step for us, but not for the free services for sure.

@turnermm
Copy link
Owner

Before this one is forgotten, I'd like to add few comments:

By default, both plugins available for free and provided with banner
ads saying that this is a product developed by WebSpellChecker and
provide options for the upgrade.

Haven't seen any ads on our Intranet page. Don't think we're
blocking them.

That's interesting. @turnermm https://github.com/turnermm I guess, we need to discuss the licensing question. As if you are using the free version, the banners must be present. Otherwise, this can be considered as a violation of Terms of Service.

That was not my statement but @kshade's. I don't know where @kshade got this idea. I suspect he never uses the CKEditor plugin for DokuWiki. He appears to be the tech person at his firm. His first mention of his objection was on the Reddit sysadmin list. So I think his primary experience with it was as a sysadmin and not as a user.

I started out defending you, now suddenly I find myself on the defensive. If you are suspicious of my use of your spellchecker, all you have to do is to install a copy of Dokuwiki and then go to the extension manager and install a copy of ckgedit.

@kshade
Copy link
Author

kshade commented Nov 20, 2018

I'm one of the technical staff, but I do use the editor myself. None of the users, including myself, have seen ads. I do have uBlock Origin installed, but it's disabled on intranet sites. Checked if there were any errors on the console, couldn't see any.

We are running DokuWiki 2018-04-22a with your plugin and use the default template (sidebar enabled). The site is only accessible via HTTPS. Other plugins that might be relevant are: Changes, Dw2Pdf, imgpaste, Indexmenu, Info, Move, Pagelist, Revert Manager, safefnrecode, styling, tag, wrap.

Can one of you tell me where I should have seen the ads?

@turnermm
Copy link
Owner

turnermm commented Nov 20, 2018

When a word is marked as mis-spelled and then you right-click on it, a context menu pops up with possible correct spellings. It's then that you see the ads. At least that's been my experience:

scayt_ads_2

scayt_ads

I don't see anything in those plugins that would suppress the ads.

@kshade
Copy link
Author

kshade commented Nov 20, 2018

@wsc-julia-shaptala I do see that when I turn SCAYT back on, I guess it just never registered as a (banner) ad with anyone because it's so non-intrusive. Plus, I think most of us don't use the menu, but just correct the mistake manually.

Guess that clears that one up as well.

@jshaptala
Copy link

Dear @turnermm, I didn't intent to offend you when I brought up the question with the licensing. I really appreciate all your comments and trying to explain how our service work. If I did offend you, I'm sorry. In any case, I'm glad that this issue is resolved, it was a good lesson for us.

Dear @kshade, I'd like to thank you as well for bringing up this question to all of us. It showed us how our services can be treated, and pointed to the issues that we need to resolve and make sure that our customers and users won't be misled.

@Grief
Copy link

Grief commented Feb 6, 2022

@turnermm Is there any chance that browser's spell checker could be used instead? Looks like there is an option for it in ckeditor's config, but I wasn't able to manage it to work though... https://ckeditor.com/docs/ckeditor4/latest/api/CKEDITOR_config.html#cfg-disableNativeSpellChecker

@Grief
Copy link

Grief commented Feb 6, 2022

image
For those, who are interested, that's easy to enable. Just add config.disableNativeSpellChecker = false; before the ending curly brace in dokuwiki/lib/plugins/ckgedit/ckeditor/config.js.unc, then do uglifyjs config.js.unc > config.js. User ctrl+right click to reach the browser's native context menu with the spell checking settings and correction suggestions. Would be great if there be an option for that in administration/config though

@turnermm
Copy link
Owner

turnermm commented Feb 6, 2022

Good Idea. Now if you select disabled for the scayt spell checker, the native spellchecker is automatically enabled. And thanks for sourcing out the configuration option to enable the native spell check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants