Documentation Debt

It was a good question, and a lot of people jumped in to answer it. I’ve given a talk on avoiding product design debt — That’s what #7fights is. But I haven’t given one explicitly about avoiding debt in documentation.

To understand the problem of avoiding debt, you have to understand the economy you’re working in. Many of the companies I’ve worked with over the years identify documentation as something they need to do to reduce support costs and manage the problem of training internal people. It’s not exactly glamorous. Some companies do value and invest in docs, but it’s a minority. That means that documentation is constantly underfunded, and depending on how much discipline people have about retiring old software products, it means that writers are supporting an increasingly large corpus of text. I’ve worked on projects that were literally thousands of pages long.

The debt comes down to three parts:

  • Format
  • Legacy
  • Unknown unknowns

Format

Format is whatever your tool stack is, and how the documentation is packaged and presented. If, many years ago, someone decided you were a FrameMaker shop, well, you’re probably still a FrameMaker shop, albeit one that can now output “FrameML” or DITAFrame. But the soul of the documentation, the way it is stored is in a proprietary file format that only the priestly class of technical writers can access with relatively expensive tools.

The only way to move something out of a format like that is to strip it to plain text and then re-code all the meta-text like headings and pictures. We call it a conversion project, but it’s just as much fun as changing any other codebase on the fly while trying to still deliver things. Theoretically, there are conversion tools and companies that make that less painful, but I have never seen one work properly, and I’ve been through at least 7 conversion projects. Wailing, gnashing of teeth, and a permanent repetitive stress injury were the usual style of results.

Some parts of the industry, notably Read the Docs, Corilla, Gnome/Mallard, and other open-source-y people have been trying to treat this problem by using extremely flat files, in Sphinx or Markdown. These are essentially text files with very minimal markup that can then be parsed by a variety of different engines. It’s a good theory, but it moves the locus of control from the software knowing how to wrap text and manage hyperlinks to the writer needing to know how to do so using a constrained palette of commands. Thus, the markdown technical writer needs to know their product, and the basics of technical communication, and ALSO a syntax, instead of an interface. Not really ONE syntax, but several, with nuances and subtleties between them. The writer must be a coder.

I hang out with a lot of writer/coders, but I also keep in touch with the rest of the industry, and not everyone wants to do or be that person. They want to put more of their career development into writing and understanding their product, less into the means of production.

The debt here is the very nature of everything that is around your text. The storage, the format, the numbering, the images (oh, IMAGES, the worst kind of debt), they all contribute to what you will have to pay down.

Legacy

Aside from the problem of legacy formats, you have the problem of legacy information. In 5 years, almost all of what I am writing now will be wrong or outdated. That’s just the nature of things. But if we continue to support the software, then we need to continue to support the documentation that makes the software work. This means that in an older company, you may be maintaining documentation for products that are fifteen years old. And it is maintenance. You have to update them to have the new format, new branding, whatever you change for the new documents has to be projected back to the old documents. If you move from FrameMaker to MadCap Flare, you have a conversion problem again. And it’s not like you can just ignore it and republish old PDFs – there WILL be a reason you have to get into that document to fix something.

I really recommend making your company and product name a variable that you can easily and universally replace. Just trust me on that. It will happen.

So the older your company is, the more documentation debt you have, just like with a codebase that is several layers of abstraction balanced on an OS/390 system kept in the basement.

Unknown Unknowns

The worst kind of debt is the kind that you can’t accurately predict, but when I thought about just leaving my list two items long, I realized that this third type is the kind that has gotten me in the most trouble over my career. I can predict how long it will take to convert x-many pages from one format to another, and what percentage of my time we will have to spend updating dead documents with new logos. What I can’t predict are the documentation debts that come out of left field. The merger that brings you a new set of docs and tooling and writers to integrate. The drop-everything crisis response to a serious security breach. The training that no one realized needed to be written. Those things are almost impossible to reject, and yet they can scramble a tidy documentation roadmap all to hell.

Mitigating Documentation debt

Here are my suggestions for reducing your documentation debt as much as possible. None of them will completely rescue you from the problem, but they are all things I’ve either implemented or tried to talk people into for debt reduction.

Portability

Whatever tool you pick, make sure that it exports your intellectual property in a usable format — you don’t want PDFs, you want text, or XML, or in a pinch, HTML. Have you ever had that frustrating moment when someone sends you a Photoshop file and not a PNG and you can’t open it because you don’t have freakin’ expensive Photoshop?

Yeah, don’t end up feeling like that about your documentation. You must always be able to extract something that could be reduced to text by means of a simple script. Your software can store it in a weird relational database for all you care, as long as there is an exit scenario.

Referencing

Rigidly use universal resources for common items. Your logo? There should only be one file location for that, and the software should retrieve it every time it builds the document. That way you don’t have to change it 4 places per document (my life this week), but rather change the file in one place, keep the name the same, and all your branding is refreshed, poof!

The same goes for templates, fonts, product names, and anything else that is going to be true across multiple documents or a large document.

If you’ve chosen software that is not sophisticated enough to handle universal variables and referenced images, do what you can to imitate them. Code it yourself or get some help, but never have local images of your logo scattered throughout your files. That’s just like burning money.

Lifespans

Retire documents. The same way we say, “Hey, IE 7 still technically runs and we know some of you are locked in by third-party software, but you’re out of support”, we need to give our products and our documents a lifespan. “Hey, this document is 8 years old. We are no longer updating it, but you can download the PDF and good luck to you, traveller from the past!” Probably you shouldn’t be that snarky, though. No one using super old software is happy about it, they just have constraints that we don’t in shiny new-MacbookPro-land.*

Images

Images are the most heinous form of debt. Schematics and conceptual drawings are pretty much fine, but screenshots are a personal nightmare. They’re wrong almost instantly, expensive to translate, prone to rebranding, frequently unreadable, and everyone likes to put them in because they find it reassuring to have pictures.

My suggestion is to use as few screenshots as humanly possible, and to give them names that correspond with the page name, NOT the step they appear in. If you’re lucky you’ll be able to re-use them, but numbering with any kind of sequencing is an act of madness, because there will always be insertions and deletions.

But mostly, don’t use screenshots.

If you have to, I’m excited about this idea for automating capture with Python:

Automation

Automate as much as you can. Automate production, using the same build tools as the coders. Automate screen captures if you can. If there’s anything that can be scripted or regularized, do it. This applies to the writing itself. If you can create a template that helps people give you more usable input, it’s worth a couple days of your effort. For example, you could create templates for bug reports, for user-visible changes for release notes, for change requests, for steps.

Never accept a set of typed commands as instructions if you can distribute a script instead. Fingers are fallible.

If you’re looking for inspiration on how to automate documentation and get to the part where you’re only writing the interesting juicy bits, look to API documentation, especially OpenAPI and Readme.io. They are automating away all the tedious parts of the API reference doc creation and giving us more time to write about WHY someone would want to use this call over that one. THat is the ideal of automation – not doing less work, but doing less useless work.


Very few “professional-grade” writing toolsuites work on Macs. Your writer is not any happier about this than you are, but it’s the way it works. That said, if anyone wants to torture me by getting me a Macbook Air that won’t run my software, feel free.


I hope these tips for reducing your documentation debt help you make some good arguments in your organization. I’ll keep thinking on the topic and maybe someday it will be a talk.