By Markus Begiebing

The brief for one of our recent projects requested the dynamic generation of a PDF containing the entire product catalog for one of our customers. This catalog should then be re-generated in regular intervals to reflect changing product details, pricing, etc.

With PDF having been a proprietary format for most of it's life, there is no open standard that allows for a straight forward code-based creation of PDFs.

In the past, we used to employ tried and tested - yet quite basic - tools for such tasks, such as FPDF or it's bigger brother TCPDF. However, more recently the designs for these PDFs became more detailed and complex. This meant that the old-fashioned implementation would be very time consuming and layout changes could take up considerably more time.

Dynamically generated PDF

We opted therefore to take a new approach to the dynamic generation of PDF documents with a more modern group of parsers that are able to do most of the heavy lifting for you. Now, instead of having to position dozens of elements manually in a PDF, we were able to create a more less time-consuming HTML page and parse the output through a third-party library to end up with a very usable PDF.

We looked at various of these libraries, such as mPDF and DomPDF, but ultimately chose to use WKHtmlToPDF, which turned out very impressive results. WKHtmlToPDF actually uses WebKit code (which is also used in Apple's Safari browsers) to render its output.

We were now able to rapidly produce PDFs with complex layouts and a large amount of details, containing e.g. internal and external links, custom fonts and other HTML-related goodness.

The next hurdle we encountered was memory. Since we were tasked to produce entire catalogs of products (~ 100 pages) and not just a handful of pages, we noticed that this quickly consumed a lot of memory.

To avoid this situation, we opted to split the entire process into two parts.
The first part was the generation and storage of each page of the PDF separately. In addition this allowed us to decide for each page whether it really needed to be regenerated, or whether an identical PDF version did already exist.

Then, a second process would pick up all the pages and assemble the final catalog, one page at a time, while simply adding page numbers on each page. This approach ensured that memory consumption was very low and resources were used efficiently by only generating dynamic PDFs when their respective details had changed.

We conclude that the new tools are a major step forward by facilitating the process for the developer. Until an open standard for the creation of PDF documents appears on the horizon, these tools enable us to create PDFs, the likes of which would would have caused us to pull our hairs out just a few years ago.

Dynamically generated PDF
Make Comment