Web Bundles, more formally known as Bundled HTTP Exchanges, are part of the Web Packaging proposal.

This is an interesting change to the web platform based around the idea of packaging websites into a self-contained binary file. Much has been said already about the specifications and potential use cases, this post focuses on my early experience hacking on the feature.

Bundling Hugo Sites

Hugo is one of the most popular static site generators. I use it to compile various markdown posts into the HTML you are reading right now.

Web bundling feels like a quite natural fit for the toolchain to produce static sites. Rather than ending up with a directory structure, the hugo binary might optionally produce a single site.wbn file. Web servers may be taught to serve content from within this file and the file itself may be exchanged over the wire or saved for offline use.

Using the glue of the Unix file system and bash, it’s quite easy to get a bundling toolchain up and running. Starting with a Hugo site and gen-bundle, just run the below command to get a web bundle containing every resource outputted to the public directory by Hugo.

$ hugo -s example && gen-bundle -dir example/public -baseURL https://example.com/ -o example.wbn -primaryURL https://example.com/

It’s worth noting that both hugo and the reference web bundle implementation in gen-bundle are written in Go, making it a natural fit for extending Hugo with an “output to web bundle” option. For now, Unix glue works quite well.

Size Comparison

  • output.wbn –> 382 KB
  • output.zip –> 307 KB

Example (using Hugo’s starter template): output.wbn. With a naive Nginx setup, Chrome will download the web bundle as a file, which you can then drag into the browser window for offline viewing. Perhaps if I play around with the headers I can get Chrome to render this directly.

Scraping Sites into Web Bundles

I also see merit in using web bundles for archival and offline viewing of content on the internet. Given a sitemap or other crawler strategies, it is remarkably easy to record the contents of a site into a web bundle and then distribute it.

Of course, web scraping is not a new concept and search engines are built on the indexing of this content. What stands out to me here is the browser level support for rendering this content.

Parting Thoughts

Props to the Chrome engineers working on this exciting new change to the web platform.

Questions

  1. How do we reconcile caching of downloaded subresources? For 200MB bundled website, how can a publisher push updates of a subset of subresources without forcing an entire refetch of the bundle? Are cache digests a viable option here?