Home » Business Topics » Digital Disruption

Are PDF Documents a Thing of the Past?

There has been many articles predicting the death of the PDF format, invented in 1993. Some of these articles are 10 years old: you can find them by googling “death of PDF”. With the advent of fluid or liquid layout design on almost every website, widespread Internet browsing on small devices (cell phones), notebooks combining text, illustrations, and source code, one would thing that PDF is doomed. For instance, these days, any modern webpage will automatically resize and re-arrange its elements to nicely fit with your device. While it is easy to zoom in or out, PDF documents are not easy to read or navigate on a cell phone. However, it gets much better if you switch from Portrait to Landscape mode: to do this, rotate your cell phone by 90 degrees. Yet, the fundamental problem is that PDF is actually a static, or “fixed layout” format.

To make things worse, many PDF documents written 10 years ago, look ugly and lack any navigation structure. They still show up en masse, when searching for technical documents on Google. It gives PDF a bad reputation. Then Adobe recently stopped supporting Flash. Also, it is almost impossible to publish a high quality PDF tech document on Kindle or other similar devices: the rendering is terrible, as these devices are designed for linear reading of mostly standard text. Whether or not your document is native PDF or PDF translated to AZW or Mobi, the result is bad.

Who Still Produce or Read PDF’s These Days?

Ironically, tech people are probably the largest consumers and producers of PDF documents. Scientific books, especially self-published, are usually in PDF format before being printed out. And the very reason is because what most people consider to be PDF’s weak point: its fixed-format layout. For the same reason, it is also used for legal purposes.

When you have many large tables or figures, complex mathematical formulas, or a non-linear document, PDF is still the best format. By “non-linear”, I mean a document that you read in “random” order. Authors writing in LaTex have only one option for output: PDF. While PostScript was once the favorite format, now my standard LaTex editor (from MikTex) would only output PDF. Of course you can then translate from PDF to any other format, but it will come with a quality loss.

So, I believe that PDF is here to stay. At least, the feature that was supposed to lead to its death, is actually the one that is here to stay. I think the future of PDF is not comparable to fax machines, but more to Word, Gif images, or fake news. Something that will survive the test of time.

Modern PDF

A very popular book entitled “Probabilistic Machine Learning”, published in March 2022 (see here), is entirely designed in PDF. The format has nothing to do with old-style PDF. It is designed to be read just like browsing the Web. My new book also published in 2022 (see here), very similar in format, goes even one step further. It incorporates state-of-the-art navigation capabilities found nowhere else. These features are easy to produce in PDF. I summarize them in the example below. See PDF extract in Figure 1 for illustration. These documents are easy to read and navigate with Chrome, Safari, Edge and other Internet browsers. You don’t need Adobe anymore. The two books in question are originally written in LaTeX.

Are PDF Documents a Thing of the Past?
Figure 1: Modern PDF with numerous navigation features

In Figure 1, the text in red is internal clickable links. It is either a reference to a bibliography entry, glossary term, section, formula, exercise, table, figure, or theorem. Text in orange are indexed keywords. Bibliography, index and glossary sections have backlinks to where the references appear in the document. The table of content consists of clickable links pointing to the target sections.

Text in blue are external clickable links. For instance an online reference, or one of the items on my GitHub repository, such as source code, video or Excel spreadsheet. Finally, it is easy to offer two versions of the PDF document: one with 10 point font size (my favorite), and one with 12 points (larger, easier to read especially for people with vision problems). In LaTex, this is accomplished with one minor change to the source file.

Conclusion

PDF, or at least its supposed weakness — fixed layout format — is here to stay. There are no good alternatives to render complex technical documents. It may become a format not used anymore by the public at large, but certainly among tech people. Especially those generating or using mathematical content. While the PDF documents found on Google Search are typically old and ugly, the recent ones — like recent PhD theses or new books — are artfully designed, colorful, and easy to navigate. They represent the future of high-quality, professional tech publishing.

HTML may catch up one day: it is possible to produce the same rendering in HTML, but to this day, it is tricky. If you know of math-dense webpages featuring the same quality and capabilities (index, backlinks and so on), please let me know! In the end, HTML has more potential: you can more easily add videos and sound to your document. But I have yet to see a serious math website of the same layout caliber offered by large math-heavy PDF eBooks. Organized, high-level HTML pages incorporating many small PDF’s documents, could be the way to go.

About the Author

Vincent Granville is a machine learning scientist, author and publisher. He was the co-founder of Data Science Central (acquired by TechTarget) and most recently, founder of MLtechniques.com.