Gwtar: a static efficient single-file HTML format · Gwern.net Skip to main content compression , Internet archiving Gwtar is a new polyglot HTML archival format which provides a single, self-contained, HTML file which still can be efficiently lazy-loaded by a web browser. This is done by a header’s JavaScript making HTTP range requests. It is used on Gwern.net to serve large HTML archives. 2026-01-20–2026-01-27 finished certainty : certain importance : 4 bibliography Background HTML Trilemma Trisecting Download Stopping Mechanisms Concatenated Archive Design Creation Implementation Header Details Fallback Compression Limitations Local Viewing Range Request Support Cloudflare Is Broken Accessing Binary Assets Optional Trailing Data FEC Signing Metadata IP Further Work Archiving HTML files faces a trilemma: it is easy to create an archival format which is any two of static (self-contained ie. all assets included, no special software or server support), a single file (when stored on disk), and efficient (lazy-loads assets only as necessary to display to a user), but no known format allows all 3 simultaneously. We introduce a new format, Gwtar ( logo ; pronounced “guitar”, .gwtar.html extension), which achieves all 3 properties simultaneously. A Gwtar is a classic fully-inlined HTML file, which is then processed into a self-extracting concatenated file of an HTML + JavaScript header followed by a tarball of the original HTML and assets. The HTML header’s JS stops web browsers from loading the rest of the file, loads just the original HTML, and then hooks requests and turns them into range requests into the tarball part of the file. Thus, a regular web browser loads what seems to be a normal HTML file, and all assets download only when they need to. In this way, a static HTML page can inline anything—such as gigabyte-size media files—but those will not be downloaded until necessary, even while the server sees just a single large HTML file it serves as normal. And because it is self-contained in this way, it is forwards-compatible: no future user or host of a Gwtar file needs to treat it specially, as all functionality required is old standardized web browser/server functionality. Gwtar allows us to easily and reliably archive even the largest HTML pages, while still being user-friendly to read. Example pages: “The Secret of Psalm 46” (vs original SingleFile archive — warning : 286MB download). Background Linkrot is one of the biggest challenges for long-term websites. Gwern.net makes heavy use of web page archiving to solve this; and due to quality problems and long-term reliability concerns , simply linking to the Internet Archive is not enough, so I try to create & host my own web page archives of everything I link. There are 3 major properties we would like of an HTML archive format, beyond the basics of actually capturing a page in the first place: it should not depend in any way on the original web page, because then it is not an archive and wil
Source: Hacker News | Original Link