-
-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Advanced Srcset and Style Sheet Containing Media Queries Preservation #359
Advanced Srcset and Style Sheet Containing Media Queries Preservation #359
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #359 +/- ##
========================================
Coverage 87.78% 87.78%
========================================
Files 59 59
Lines 7107 7107
Branches 1256 1256
========================================
Hits 6239 6239
Misses 582 582
Partials 286 286 Continue to review full report at Codecov.
|
71f4cb9
to
36493a7
Compare
pywb/static/wombat.js
Outdated
} | ||
// depending on if we have promises, defer things until | ||
// next time the Promise.resolve or setTimeout Qs are cleared | ||
if (typeof $wbwindow.Promise !== 'undefined') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to assume we have Promises if we have workers? If so, maybe should just do it one or the other, as there's already enough conditionals here. :) I think we use Promises already in wombat..
pywb/static/wombat.js
Outdated
var WBPreserWorker; | ||
var wbSheetMediaQChecker; | ||
var wbUsePresWorker = $wbwindow.Worker != null && | ||
(wbinfo.is_live || wbinfo.top_url.indexOf('/record/') !== -1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we shouldn't need the /record/
check, is_live
should be sufficient..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right you are.
is_live
is sufficient, changing
pywb/static/wombat.js
Outdated
for (var i = 0; i < values.length; i++) { | ||
values[i] = rewrite_url(values[i].trim()); | ||
} | ||
|
||
if (WBPreserWorker) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be checking wbUsePresWorker
also?
pywb/static/wombat.js
Outdated
@@ -3630,6 +3804,9 @@ var _WBWombat = function($wbwindow, wbinfo) { | |||
} | |||
|
|||
send_top_message(message); | |||
if (WBPreserWorker) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check wbUsePresWorker
for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
notify_top()
actually runs twice (for readyState interactive
and for readyState complete
, to allow the banner to update as soon as possible). Probably should send to worker only when readyState is complete
?
pywb/static/wombat.js
Outdated
encodeURIComponent(prefix) + '&mod=' + | ||
encodeURIComponent(mod); | ||
this.worker = new Worker(workerURL); | ||
if (typeof $wbwindow.URL === 'undefined') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should just be the default, to avoid the multiple init paths? (I wonder which browsers support workers and not window.URL..) Just trying to remove variation :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would seem that safari is the only browser that might give us trouble is safari (searchParams
available in v10.1)
This is all really awesome, just added few minor comments. And one other, we also want this to work in proxy mode. Might be more tricky since we're not intercepting element addition in proxy mode? (hopefully can avoid mutationobservers and just check periodically?) Maybe that should be a separate task/PR, what do you think? |
6ecf31e
to
496fa7d
Compare
…preservation of srcset values to fix Rhizome-Conifer/conifer#64. wombat.js: - Finalized PreserveWorker that preserves srcset values and Media Query values - Defered extraction and preservation of the values to be preserved so that the UI thread is not clobered - Hooked into places where wombat rewrites the values we are interested in wombatPreservationWorker.js: - Updated handling of srcset extraction now that we are sending wombat srcset rewrites - Added check to see if we have seen a URL to be fetched - Added light polyfill of Promise and fetch if they are not defined in wombatPreservationWorker.js, for safari wombat.spec.js - Updated to include values necessary to work with PWorker changes.
496fa7d
to
d983a8a
Compare
Probably best to add proxy support in another task because we would have to check periodically and thus be sending repeat values to the worker depending on the amount of time the browser stays on the same page. A more involved deduplication strategy is in order to account for this fact |
Description
This PR introduces to Pywb via changes to
wombat.js
and the addition of a new static resourcewombatPreservationWorker.js
the ability to preservethe set of values contained in the
srcset
attribute and images contained inCSS media query rules
.The preservation scheme is split into two parts:
wombat
: InitiatingwombatPreservationWorker
and sending itsrcset
andmedia query
values for preservationwombatPreservationWorker
: The web worker responsible for fetching the resourceswombat
In order to facilitate this new preservation scheme the changes to wombat are as follows.
Three new internal global variables
WBPreserWorker
: a reference to the preservation worker abstraction used by wombat itselfwbSheetMediaQChecker
: a reference to theload
event callback for checking the stylesheet introduced bylink[rel='stylesheet']
wbUsePresWorker
: flag indicating if we should use the preservation worker.The first two global variables (
WBPreserWorker
andwbSheetMediaQChecker
) remain undefined unless the third (wbUsePresWorker
) is true.wbUsePresWorker
is set to true whenwindow.Worker
is non-null and wombat is operating in live mode or thewbinfo.top_url
containsrecord
.New init function initPreserveWorker and PWorker
The new init function,
initPreserveWorker
, contains wombats interface to the newwombatPreservationWorker
, that was designed to be cross-frame safe.The init function only introduces the
PWorker
interface when thewbUsePresWorker
flag is true.initPreserveWorker
operates similar to the existingweb worker
override except that it must come first in order to use non-overridden reference towindow.Worker
which is used by
PWorker
(preservation worker).PWorker
is an ES5 class that provides functionality for cross-frame preservation ofsrcset
andmedia query
CSS rule preservation.Creation of the
PWorker
takes two argumentsprefix
(wb_abs_prefix
) andmod
(wbinfo.mod
).If wombat is operating in the
__WB_replay_top
browser context, the backing worker is created otherwise it is not re-created.The URL to the backing worker includes query parameters
prefix
andmod
which are used bywombatPreservationWorker
to create itself if we can usewindow.URL
otherwise we must immediately send an
init
message containing those values to the worker.The
PWorker
class exposes six functions:deferredSheetExtraction
: Method for checking astyle
orlink[rel='stylesheet']
elementssheet
property for media query rules.Done in a manner that does not block the UI thread by using
Promise.resolve
(when available) orsetTimeout(cb, 1)
.Used by the
checkStyle
(value ofwbSheetMediaQChecker
),rewrite_elem
andoverride_html_assign
functions.Usage in
rewrite_elem
is tied to elementsstyle
(text contents changed andsheet != null
) andlink[rel='stylesheet']
(vialink.addEventListener('load',wbSheetMediaQChecker)
).terminate
: terminate the backing web worker, only terminates worker when in the__WB_replay_top
browser context.postMessage
: send a message to the backing web worker. If wombat is in the__WB_replay_top
browser context sends message directlyto worker otherwise forwards msg to
__WB_replay_top
which will then send the msg directly to the backing worker.preserveSrcset
: method used when sendingsrcset
values fromrewrite_srcset
preserveMedia
: method used when sendingmedia query
values fromdeferredSheetExtraction
extractFromLocalDoc
: method to checkdocument.stylesheets
anddocument.querySelectorAll('*[srcset]')
whennotify_top
is called frominit_top_frame_notify
wombatPreservationWorker
The new file
wombatPreservationWorker.js
contains the worker code used for preservingsrcset
andmedia query
values.Because it is expected of wombat to operate in a wide range of browsers this file contains minimal polyfills of
Promise
andfetch
if these symbols are not defined.This worker expects to receive two messages from wombat:
init
: when the WHATWG URL parser is not available to instantiate the Preserver class.values
: contains values to be preserved.The
Preserver
class (ES5) is responsible for the extraction and preservation of the desired values.The
Preserver
tracks seen URLs, up to 2500 before resetting, in order to lessen the load against the server(s) requests are made too.If the
Preserver
has seen a URL no fetch is made otherwise it is added to the seen cache.The
Preserver
class exposes 9 functions:fixupURL
: Ensures the URL to be fetched are absolutesafeFetch
: Initiates a URL fetch or queues the URL to be fetched if we are waiting for a batch of fetches to completeurlExtractor
: Extracts the URL from sheet text (media query) in a similar manner asstyle_replacer
fetchDone
: When a batch of fetches is done ensures the URL queue is drained and if thePreserver
has seen 2500 URLs clears the seen count for GC purposes.fetchAll
: Ensures all fetches complete,fetchDone
is called and indicates we are to queue URLs until fetches complete.If
Preserver
is queuing URLs or no fetches are to be made this is a no op.drainQ
: Drains the queue by callingsafeFetch
for each queued URL and then callsfetchAll
extractMedia
: For each style sheet that contained a media query, extract URLs from it's text and fetch initiate a fetch for it if there was a URLextractSrcset
: For eachsrcset
value, if the value is fromPworker.preserveSrcset
only fetch URL otherwise fromPworker.extractFromLocalDoc
need to extract split fullsrcset
attribute value and then fetch each URL containedpreserveMediaSrcset
: Used when thevalues
message is received and callsextractMedia
,extractSrcset
andfetchAll
.Motivation and Context
Ref Rhizome-Conifer/conifer#64
Types of changes