Modern HTML to PDF Conversion

After many years of working in traditional design programs like Illustrator, I finally made the jump to designing and writing my resume using web technologies. As interesting as it may be to me, a zip file containing HTML, CSS and images is not a deliverable format. PDF is king, and thus we have to look at converting our webpages to PDF documents.

Requirements

My requirements for the conversion tool are as follows:

  1. A command line interface. I need to be able to automate the conversion using something like a Makefile.
  2. Supports modern CSS standards without vendor-specific prefixes or workarounds. I want my CSS to be lean and clean to keep iteration fast and easy. Features like CSS variables and the calc() function are great ways to enforce style consistency and are a must for me. I also want to get as close to the WYSIWYG model as I can; The resulting PDF should look the same as the source webpage in my browser.
  3. Uses 72 PPI. This requirement is a bit obscure, but it comes from the fact that I was recreating an existing design made in Illustrator at 72 PPI. I wanted to keep values consistent, so it was important that the Illustrator pt unit and the CSS pt unit matched. This will likely not be a high priority if you are creating something from scratch, however, a conversion tool which allows you to configure the scaling may come in handy if you realize you want some slight adjustment after the fact.
  4. Free and open-source.
  5. Hyperlinks are maintained.
  6. Fonts are embedded.

Below are options I considered, what I learned about them, and their pros and cons. If you don’t want to wade through the options, I ended up going with headless Chromium as it suited my needs the best. This is also in no way a comprehensive list, paid options such as Prince and PDFCrowd’s HTML to PDF API were not considered.

wkhtmltopdf

+ Command line interface
+ Free and open-source
~ Zoom (and therefore PPI) supported with patched QT
~ Print options (page size, margins, colors, etc.) set via CLI flags
- Poor support for modern CSS

One of the more popular tools, wkhtmltopdf wraps QT and QT’s webkit functionality to render webpages. The full history of the tool can be read here, but TL:DR, at the time of writing the most up-to-date stable release is v0.12.6, which is based on a patched version of QT 4. This release uses a very old webkit version, and thus any somewhat-modern features (such as the flexbox and grid layouts, calc(), etc.) are unsupported. While there are a myriad of workarounds that can be found in the 1000+ GitHub issues for the project, workarounds were something I didn’t want to deal with (requirement #2).

There are two other options in the wkhtmltopdf realm: A distro-provided version, and a v0.13 release.

With respect to the former, most package managers do not include the patched QT 4 along with wkhtmltopdf, but instead just include some (unpatched) version of QT as a dependency. In Arch’s case, QT 5 components are used for the wkhtmltopdf package; if you want the patched QT 4 to be included, you need to use the wkhtmltopdf-static package. The unpatched QT 5 brings support for more modern CSS features, but does not allow for certain options, like the --disable-smart-shrinking flag that must be used for the --zoom <float> flag to take effect. This already didn’t work for me because of my 72 PPI need and the default 96 PPI, but there were also still some unsupported features. For me, it was the list-style-type property that didn’t seem to want to work.

The latter, the v0.13 release of wkhtmltopdf, uses a patched QT 5. Hypothetically, this should be the best of both worlds (improved CSS support with all options available), but after 5 years of development, this release is still in alpha. As I already knew QT 5 couldn’t support everything I wanted anyways (at time of writing), I didn’t look too deep into this, though it could be a viable option in the future.

WeasyPrint

+ Command line interface and Python library
+ Free and open-source
+ Print options set via CSS @page rules
- Spotty support for modern CSS
- Limited options from command line, zoom only set via Python library

WeasyPrint is another interesting project that recently got a renewed development effort now that a new group has taken over. It is leaps and bounds ahead of traditional wkhtmltopdf with respect to supporting modern CSS, though it unfortunately still has quite a ways to go. There are a number of notable omissions like calc() support and some outstanding bugs like how CSS variables cannot be used in multiple value properties that kept me from making this my go-to, but hopefully the frequent repo activity will eventually sort this out. A list of all supported features can be seen here.

Headless Chromium

+ Excellent modern CSS support
+ Command line interface, further control via DevTools protocol
+ Free and open-source
+ Print options set via CSS @page rules or DevTools protocol
- Limited options from command line, zoom only set via DevTools protocol

Headless Chromium is definitely an option I should have looked into sooner. Instead of trying to find a tool that matches the CSS support of current browsers, we can just use the browsers themselves. Puppeteer, a headless Chrome/Chromium automation tool, is often suggested as a wkhtmltopdf alternative, but I was hesitant to spring for a full Node.js solution if something simpler was available. Thankfully, after poking around the docs, I learned that a handful of utility command line flags were also implemented with the release of headless Chrome, including a --print-to-pdf flag. I tried out the same syntax on Chromium (not Chrome), and sure enough, it works.

It’s worth noting that we do lose out on the options available through the DevTools protocol (and thus the options available through any automation tool that interfaces with the DevTools protocol such as Puppeteer) when using the command line, but most of the parameters changed by these options can be configured in CSS anyways. For example, a letter-sized output with zero margin can be achieved with the following CSS rule:

@page {
    size: Letter;
    margin: 0;
}

The page size property works just fine despite the docs saying that the preferCSSPageSize option defaults to false. An option we can’t tweak from the command line is the scale, but thankfully for me, the default scale is equivalent to 72 PPI.

This is the solution I ended up going with in the end. One hint I would keep in mind is to use the --virtual-time-budget flag to allow your page to load any external fonts or images before the conversion takes place; A value of 10000 seems to work fine (--virtual-time-budget=10000). As you may have noticed, I didn’t look much into the hyperlink preservation and font embedding of the other options (mostly because I eliminated them early anyways), but I can confirm that all works well here.


© 2024. All rights reserved.