Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more web storage options #3

Open
wants to merge 15 commits into
base: trunk
Choose a base branch
from
Open

Conversation

layoutd
Copy link
Collaborator

@layoutd layoutd commented Jan 25, 2024

This PR enables the user to specify which type of storage to use via the web_storage configuration parameter:

  • web_storage: 'cookies' (default) - multiple cookies as now
    • Proven implementation.
    • Some users don't like having so many cookies (extra requests, etc).
    • Some cookies trigger a false positive for Web Application Firewalls (like ModSecurity) using the Comodo WAF ruleset.
  • web_storage: 'singleCookie' - one JSON-encoded sbjs_current cookie that contains all data stored in the cookies
    • Workaround for Web Application Firewall 403 errors which generally have sbjs_current whitelisted.
    • Includes a method for handling session length (able to count user sessions)
    • Reduces the number of cookies passed back and forth to the server.
    • Limited to 4kb in data, which could be a problem if there are many 4-byte values (emoji, Sumerian, Hieroglyphics, etc.)
  • web_storage: 'sessionStorage' - keep all data in the browser's session storage
    • No cookies
    • All data expires once the session ends (no counting user sessions)
    • Data isn't synced across multiple browser windows (opening a link in a new tab loses the data)
  • web_storage: 'localStorage' - keep all the data in the browser's local storage
    • No cookies
    • Data will persist indefinitely
    • Includes a method for handling session length (able to count user sessions)
    • Data will persist indefinitely
Notes
  • This doesn't currently handle moving from one storage type to another. If a user visits the site with one storage method configured, then the configuration changes during their session or while navigating, the original data will be lost (at best) or corrupted.
  • Please reread the sub-bullet points above as they illustrate pros and cons of each method!
  • The localStorage uses the maximum of the session_length (minutes) and lifetime (months converted to minutes) to create an expiry date for the values. Technically they'll live "forever" in the browser's local storage, but whenever Sourcebuster tries to access them, if the expiry timestamp has passed, the values will be discarded. This workaround was added to prevent 0-lifetime configuration (all cookies expire with the session) from creating 0-lifetime local storage, which effectively is erased and set every time a page is loaded.

Testing

  1. Clone the repo into a web-accessible directory.
  2. Navigate to /public/index.html in a browser.
  3. At the bottom of the index.html file, find the Sourcebuster JavaScript configuration. Initially, it should have web_storage set as cookies.
  4. You can add a session_length parameter with value of 1 minute if you would like to test transitions from session to session (default is 30 minutes).
  5. Perform some testing with the cookies value for web_storage.
    • Ensure that Referral (coming from woo.com), Organic (coming from google.com), TypeIn/Direct, UTM (with ?utm_source=[VALUE]…) all capture source values correctly.
    • Change source method and see that the "first" source values and "current" source values are correct (UTM parameters will always override "first").
    • Check for incrementing pages viewed in the session.
    • Wait for session timeout then refresh, and see pages viewed return to 1 and user sessions increase.
    • Confirm multiple cookies in the Application tab of the dev panel.
  6. Close the browser (or erase all sbjs_ cookies) and change the web_storage config value to singleCookie (camel cased).
  7. Run the above tests again and verify that the behavior is consistent with the previous tests.
  8. Also, check in the Application tab in the dev panel to confirm that only one cookie is being created.
  9. Repeat the tests with both localStorage and sessionStorage options.
    • Be sure to close the browser (and/or erase previous config values in storage or cookies) as the current implementation doesn't erase any old data when changing the web_storage value.
    • Confirm that no cookies are created when those these methods are enabled.

@layoutd layoutd self-assigned this Jan 25, 2024
@layoutd layoutd added this to the 1.2 milestone Jan 25, 2024
@layoutd layoutd requested a review from a team January 30, 2024 14:48
Copy link
Member

@ecgan ecgan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some simple testing with the web_storage: 'cookies', seems to work as expected.

I have reviewed 15 out of 18 files. I left some comments below. I have also edited the PR description to fix a code mistake.

In the following line:

// Switch delimiter and renew cookies

We have a code comment that mentions "cookies", I think it may not be accurate anymore since the code is about web_storage, so we may need to change the code comment? 🤔

I'll continue review the three remaining helpers files and do some testings later, if nobody else beat me to it.

Comment on lines +208 to +220
#### web_storage
``` javascript
web_storage: 'cookies'
```
See [PR 3](https://github.com/woocommerce/sourcebuster-js/pull/3) for more details.

Basically, configure where the Sourcebuster data will be stored and manipulated:
- `cookies` (default) — in multiple cookies
- `singleCookie` — in a single, JSON-encoded cookie
- `localStorage` — in browser `local storage`
- `sessionStorage` — in browser `session storage`

Any other value (or no value) will default to `cookies`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paragraph spacing, to be consistent with other spacing in the document:

Suggested change
#### web_storage
``` javascript
web_storage: 'cookies'
```
See [PR 3](https://github.com/woocommerce/sourcebuster-js/pull/3) for more details.
Basically, configure where the Sourcebuster data will be stored and manipulated:
- `cookies` (default) — in multiple cookies
- `singleCookie` — in a single, JSON-encoded cookie
- `localStorage` — in browser `local storage`
- `sessionStorage` — in browser `session storage`
Any other value (or no value) will default to `cookies`.
#### web_storage
```javascript
web_storage: 'cookies'
```javascript
See https://github.com/woocommerce/sourcebuster-js/pull/3 for more details.
Basically, configure where the Sourcebuster data will be stored and manipulated:
- `cookies` (default) — in multiple cookies
- `singleCookie` — in a single, JSON-encoded cookie
- `localStorage` — in browser `local storage`
- `sessionStorage` — in browser `session storage`
Any other value (or no value) will default to `cookies`.

@@ -12,7 +12,8 @@ var data = {
first_extra: 'sbjs_first_add',
session: 'sbjs_session',
udata: 'sbjs_udata',
promocode: 'sbjs_promo'
promocode: 'sbjs_promo',
single: 'sbjs_current',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have an explanation of the meaning of this single here? In a documentation, or maybe in this PR description? 🤔

promo: 'code'
promo: 'code',

single_expire: 'sxp'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have an explanation of the meaning of this single_expire here? In a documentation, or maybe in the PR description? 🤔

validateType: function( storage_type ) {
// Default to valid_values[0] if storage_type is not in valid_values
var valid_values = ['cookies', 'singleCookie', 'localStorage', 'sessionStorage'];
return valid_values.indexOf( storage_type ) > -1 ? storage_type : valid_values[0];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use array includes here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/includes. Also, I think it is better to be explicit and use 'cookies' instead of valid_values[0].

Suggested change
return valid_values.indexOf( storage_type ) > -1 ? storage_type : valid_values[0];
return valid_values.includes( storage_type ) ? storage_type : 'cookies';

Comment on lines +26 to +28
case 'cookies':
default:
storage_module = cookies;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifying break for default statement. Some mention it is a good practice. See Break on default case in switch.

Suggested change
case 'cookies':
default:
storage_module = cookies;
case 'cookies':
default:
storage_module = cookies;
break;

Copy link
Member

@ecgan ecgan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed codes and tested some scenarios. cookies, singleCookie and localStorage seem to work as expected. 👍

📝 💡 When I was testing sessionStorage, I forgot that it was tied to the browser tab session, and the session_length value is actually meaningless. I think we should mention it somewhere, or even better, make it work like localStorage, so that it will expire when it exceeds session_length or when the browser tab is close.

🚧 In my testing, I have session_length: 1. When I set web_storage: 'singleCookie', I can't seem to get session or "visits" to go greater than 1; it is always 1, and the "Pages seen" is always incrementing. I think this is a bug?

I am leaving 12 comments below. Important ones are marked with ❓ . I think there are a few code mistakes that need to be fixed.

Requesting changes due to the above 🚧 and ❓ .

Comment on lines +8 to +11
set: function(name, value, minutes, domain, excl_subdomains) {
// Don't break the build
domain = '';
excl_subdomains = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since domain and excl_subdomains are the last two parameters, we can just remove them and things should still work.

Suggested change
set: function(name, value, minutes, domain, excl_subdomains) {
// Don't break the build
domain = '';
excl_subdomains = false;
set: function(name, value, minutes) {

Comment on lines +41 to +45
destroy: function(name, domain, excl_subdomains) {
// Don't break the build
domain = '';
excl_subdomains = false;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since domain and excl_subdomains are the last two parameters, we can just remove them and things should still work.

Suggested change
destroy: function(name, domain, excl_subdomains) {
// Don't break the build
domain = '';
excl_subdomains = false;
destroy: function(name) {

// as well as the time when it's supposed to expire
var item = {
value: this.encodeData( value ),
expiry: (new Date()).getTime() + ( Math.max( minutes, local_storage_session_length ) * 60 * 1000 ),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ Is the Math.max expected? 🤔 I'm thinking it might give unexpected / undesired behavior.

Consider this scenario:

  1. We set the session length by calling setSessionLength( 15 ), i.e. 15 minutes "default" session length.
  2. We call set('abc', '123', 5), i.e. we intentionally want this item to have 5 minutes expiry.
  3. Because of the Math.max, 15 minutes expiry will be set, instead of the 5 minutes expiry that we intentionally want.

I think it may be more sensible to make session length as default parameter, when the minutes parameter is not supplied by the caller.

I tried to make session length as default parameter like this:

set: function(name, value, minutes = local_storage_session_length ) {

But there is a build error that says:

src/js/helpers/local_storage.js: line 8, col 38, 'default parameters' is only available in ES6 (use 'esversion: 6').

Alternatively, we can try this:

Suggested change
expiry: (new Date()).getTime() + ( Math.max( minutes, local_storage_session_length ) * 60 * 1000 ),
expiry: (new Date()).getTime() + ( ( minutes || local_storage_session_length ) * 60 * 1000 ),

If minutes is falsy (including 0), it will use local_storage_session_length.

This means that with the scenario I mentioned above, when we call set('abc', '123', 5), the expiry will be 5 minutes instead of 15 minutes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If minutes is falsy (including 0), it will use local_storage_session_length.

We could also do an explicit minutes === undefined check, as sometimes 0 has a convention to denote no limit.

// `item` is an object which contains the original value
// as well as the time when it's supposed to expire
var item = {
value: this.encodeData( value ),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❔ Because we are using a JavaScript object and we are setting value into item.value, I'm thinking we don't really need this.encodeData (and the corresponding this.decodeData when reading the value). I think we can just do this:

Suggested change
value: this.encodeData( value ),
value: value,

(Note: We can't use object short notation because "'object short notation' is available in ES6".)

In cookies.js file, encodeData and decodeData is needed to prevent character collision in the larger document.cookie string.

domain = '';
excl_subdomains = false;

localStorage.removeItem(name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ This should be sessionStorage?

Suggested change
localStorage.removeItem(name);
sessionStorage.removeItem(name);

Comment on lines +17 to +21
destroy: function(name, domain, excl_subdomains) {
// Don't break the build
domain = '';
excl_subdomains = false;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the unneeded parameters:

Suggested change
destroy: function(name, domain, excl_subdomains) {
// Don't break the build
domain = '';
excl_subdomains = false;
destroy: function(name) {

if( name === data.containers.session ) {
var session = single_cookie[default_cookies.unsbjs(name)];
if( session ) {
var sts = session.match(/sts=(\d+)/);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ I think this should be sxp, as per data.aliases.single_expire?

Suggested change
var sts = session.match(/sts=(\d+)/);
var sxp = session.match(/sxp=(\d+)/);


// Set an expiration for the values
if( name === data.containers.session ) {
value += data.delimiter + data.aliases.single_expire + '=' + (Math.floor(Date.now() / 1000) + minutes * 60);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In local storage and in cookies, we are using time in milliseconds, but for single_cookie here, we are using minutes? I'm thinking it may be better to make them consistent? 🤔

}
}
}
return single_cookie[default_cookies.unsbjs(name)] || default_cookies.get(name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we call default_cookies.get(name)? I'm just thinking / concerned that this might cause issues because we are getting data from two different data structures. 🤔

default_cookies.set(data.containers.single, JSON.stringify(single_cookie), lifetime, domain, isolate);
},

deleteOld: function( domain, isolate ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have an explanation or documentation on this deleteOld function? What's the purpose etc? I don't see it being used in the code, and it is not being mentioned in the PR description.

Comment on lines +24 to +25
// if the item doesn't exist, return null
if (!item_raw) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the code would do "if the item is falsy, return null", and localStorage.getItem returns null for non-existing items.

Is that the case that we consider falsy values (", 0, false, ...) "non-existing"?

If so, I'd rather rephrase the code comment to explain from the "why?" perspective, not the "what?" perspective. ("What?" is answered in the code.)

Suggested change
// if the item doesn't exist, return null
if (!item_raw) {
// We consider items set to a falsy value as nonexistent and return `null` consistently.
if (!item_raw) {

But, I don't know if that's the actual intention.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or if we want to just forward the behavior, and return null for non existing, I'd rather change the code:

Suggested change
// if the item doesn't exist, return null
if (!item_raw) {
// if the item doesn't exist, return null
if (!item_raw === null) {

Currently, we do set the item to be stringified JSON object, so it will always be nonfalsy if existing, but I'd rather make the code resilient for future changes. Plus, somebody may use this method to ask for an object set by a different piece of software.

// as well as the time when it's supposed to expire
var item = {
value: this.encodeData( value ),
expiry: (new Date()).getTime() + ( Math.max( minutes, local_storage_session_length ) * 60 * 1000 ),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If minutes is falsy (including 0), it will use local_storage_session_length.

We could also do an explicit minutes === undefined check, as sometimes 0 has a convention to denote no limit.

Comment on lines +24 to +25
// if the item doesn't exist, return null
if (!item_raw) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or if we want to just forward the behavior, and return null for non existing, I'd rather change the code:

Suggested change
// if the item doesn't exist, return null
if (!item_raw) {
// if the item doesn't exist, return null
if (!item_raw === null) {

Currently, we do set the item to be stringified JSON object, so it will always be nonfalsy if existing, but I'd rather make the code resilient for future changes. Plus, somebody may use this method to ask for an object set by a different piece of software.

Comment on lines +20 to +22
// Select web storage method
storage_init.set(p.web_storage, p.session_length);
web_storage = storage_init.get();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


I don't get the rationale behind this architecture, and it looks bug-prone and confusing to me.

  1. In the PR description, you mentioned, that changing webstorage in runtime is currently not supported, and may result in corrupted data.
    And there is nothing in this code that would prevent such corruption, or discourage it anyhow.
    sbjs.init() may be called multiple times (and in WC Core we do so), that technically can happen with different prefs every time.
  2. We set it to immediately get it at the line bellow and get it again inside the migrations.go another line bellow.

Do we even need that storage_init module? Can sbjs.init just have web_storage variable and pass it to migrations.go.

I could imagine it would be useful, if it would not be ..._init but a storage singleton, that would prevent from changing it in runtime.

cookies = require('./cookies');

module.exports = {
validateType: function( storage_type ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is not used anywhere outside, so I'd rather make it private.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants