Commit Graph

5 Commits

Author SHA1 Message Date
Dan Stillman
c27ff75e80 Use Reader Mode for HTML snapshots if possible
WebPageDump hasn't been updated in years, it can result in
multi-megabyte snapshots on pages with lots of JS and ads, and there are
often rendering problems viewing snapshots of JS-heavy pages. ScrapBook
X, a fork of ScrapBook (which WBP was based on), is actively maintained
and works much better, but like ScrapBook it's not modular and it can
produce massive snapshots (e.g., 12MB for a single page with lots of
ads).

Rather than produce ugly, broken-looking pages, we can run eligible
pages through Firefox's Reader Mode (a modified Readability under the
hood) and save those. While this won't always be perfect, most of the
time it will save what people actually care about - the text of the page
-- and it avoids filling up people's storage directories and storage
accounts with junk. We should probably reduce the number of translators
that save snapshots in general, but this lessens their impact.

The current implementation will need to be updated for Standalone,
either by including the Reader Mode files in Standalone or switching to
other JS libraries. This strips the standard controls before saving, but
it might be nice to provide some format options when viewing (or just
run the page through Reader Mode again). We can also customize the
default styling.
2016-05-26 00:35:33 -04:00
Dan Stillman
47b934f67e Fix direct saving of PDFs via connector 2016-05-25 17:34:26 -04:00
Dan Stillman
7b5b2dc89e Close browser window in server connector tests 2016-05-23 01:19:44 -04:00
Dan Stillman
0be2796500 Fix webpage/snapshot saving from connector 2016-05-20 15:51:54 -04:00
Dan Stillman
eb400587e8 Fix various bugs saving from connector and add test 2016-05-13 15:00:54 -04:00