I wrote a tweet thread and I want to capture it in a more permanent form and expand a bit.
I am a technology person, writing about the technology industry. I won’t pretend to have insight into the people who actually, truly have to show up to work in person and serve the general public. I will say that I am pretty sure everything that is bad for us is much much worse for them.
In the technology space, there is a real ethical discussion to be had about whether we should be having in-person events (such as conferences, customer advisory board sessions, and so on), how we should change in-person events to be more inclusive, and why some people are untrustworthy about whether they should attend a given event. This post is not that discussion. This is a discussion about the practical implications of in-person events, now that I’ve been to a handful in the post-vaccination era of 2022.
We have collectively decided that there is a degree of sick that is antisocial, and we won’t tolerate it in public. It is a different degree of sick than we tolerated before. Before the pandemic, I worked through a lot of colds, Now if I had those same symptoms, I would call the conference organizer and explain that I was not leaving my hotel room, not even to give a keynote talk.
Today, when someone is sick, they’re often sicker, or sick for longer than we are used to. I think all of us can think of a time when someone took a day or two off because they had the flu, and then they maybe worked from home the rest of the week. But even vaccinated people, even tough-it-out-people, are getting knocked out for a week or more, and they are out. Only smart enough to watch episodes of “How It’s Made” and take naps. And, as ethical teammates, we should encourage them to actually take that time to sleep and recover, because it means they have a better chance of recovering well.
So we have a highly-contagious illness with very rapid onset and variable severity. We don’t have a good idea about the long-term effects, but it’s not looking like “mild” infection is without risk. Even conferences retaining masking are still petri dishes of yikes, because we all have to eat and very few conferences are doing at-the-door, every-morning testing. And even with that, we know the rapid tests can miss a lot of early cases. So we don’t know when we’ll be exposed, or when any of us will get sick.
What that means
We have to change how we think about staffing and add in a lot of expensive redundancy. I thought about this originally in the context of in-person events, but it’s honestly true for every part of work and life. We have to change how we do planning and team coordination—not because we’re remote, but because we’re understaffed.
Examples from real life:
- Planned two technical resources for a medium-sized conference, one person ended up sick the day before. Resource ended up working the booth solo and was exhausted.
- Pulled a booth staffer out of a rest period to cover the absence of the scheduled staffer.
- Point of contact for an external project went out on extended sick time and didn’t realize how long it would be. Several deadlines were missed because the rest of the team didn’t know about them.
- 80+ person meeting was canceled because the key stakeholder was out.
- On-call team rotations not working as expected because of absences.
None of those are catastrophic on their own. We do all pitch in to help when someone is sick and goes out unexpectedly for reasons. The problem is that we do not have enough slack capacity to “pitch in” for weeks on end, for multiple absences. We are all exhausted and running close to the limits of capacity and burnout. Every time we have to perform heroics, it burns a different stamina bar than our regular work, which we had budgeted for.
And even worse, to start improving the problem right now, we have to lean in really hard on documentation and hiring, both of which feel like overhead at a time when we are already stretched thin. There is an inevitable lag between when you realize you’re understaffed and when you can get someone in place to help with the problem. Documenting our process and decisions takes time that we feel like we may not have.
What do we do?
I think there are a few things that we can do to structure our organizations to be more resilient. They all take work, money, and time, which are hard to get. But fighting to do this work up front is going to save a lot in burnout replacement and lost opportunities.
Staff more: You won’t be able to do this perfectly; sometimes you’ll end up having paid for too many people. But each of those people will now have more unallocated time to work on the next parts of the plan. Or it will turn out to be the right number of people after all.
Define a succession plan: If someone goes out, who do their duties flow down to? Who is their deputy? In military environments, this is well-defined because there is an expectation people would have to take on new duties at a moment’s notice. It’s less clear in most of the rest of industry, especially the startup world. Sometimes there’s only one person at a company who does a task. Sometimes a leader reaches across team boundaries so it seems impossible to have a person who knows everything they do. But it is possible to have a deputy, or to have official delegates for parts of a job. We just have to make the effort.
Define a Service Level Objective and a Service Level Agreement: These are terms that come out of the Site Reliability world, but they are useful for all sorts of organizations. What do you plan to achieve? That is your objective. What is the amount that you can promise to achieve? That is your agreement, a contract with the parts of the organization that depend on you. The gap in between is your error budget, where you can have outages, or agree to drop nice-to-haves. Once you can see what is essential and what is optimal, you can decide on how much optimal you’re willing to pay for, in money, people, and resources.
Create and enforce a project template: You knew I’d get to documentation eventually. Create a template that captures the essentials of a project. This can fit neatly into most project-tracking software, so it doesn’t need to be a huge shift. Frequently it will include things like:
- Project owner and deputy
- Other stakeholders
- Key dates
- Project scope and description
- Resources needed
This doesn’t seem very groundbreaking to people who do project management. But I think we’re all doing project management, it’s just that some of us (me included) are kinda running it out of two pinned Slack posts, an open tab somewhere, and a handwritten list. And none of those resources does anyone any good if I am out for days or weeks.
The key is that the template results have to live in a public place, and everyone needs to be empowered to ask for them. So if your manager says, “Hey, can you do this sponsored talk we have already paid for?,“ you should be able to go see what the scope is, who else cares about getting this done, when it’s due, and all the other things that will make it easier to execute correctly.
Prioritize business continuity: It’s very easy for us to have a quick chat with someone in DMs if we just have a question or two. The problem is that a DM is not accessible to your deputies. If it’s about work getting done, please keep it in the named channels so that other people can see what’s happening. This is advice relevant to the distributed team using chat clients, but it’s a way to look at a bigger issue.
If we think of the organization we work for as made up of our personal relationships, it mostly works. But personal relationships, while hugely valuable, are not transitive. Just because someone has the same role, it doesn’t mean that they know all the same things as the other people in that role. If you think about what the business needs to know, you can see how communication needs to happen in public spaces, so that the organization as a whole doesn’t have crucial messages that it can’t access.
Give grace: The world is chaotic, and that chaos does not seem likely to diminish. We can overwork ourselves and keep waiting for the world we used to have to come back, or we can change the way we work for the world we have now, in all its chaos and opportunity. But the thing we absolutely have control over is how kind we are to the people around us, how much we can work to accept and honor each other.
Minneapolis DevOps Meetup