2020-04-05
in journalFor the last 12 months, I begin to realize maintaining a status page is a hard problem.
To be fair, it shouldn't be. However, things get complicated once you have an internal status page as well as an external one. To make it even more tricky, the two pages are facing a different audience with different product lines.
The common problems I have seen include: mistakenly update the external-facing page with internal facing messages; not update the external page when it should have; sometimes not even realize there is an external-facing statue page.
At the root, you can see all of them are because of communication breakdown. However, this generic statement won't solve any of the problems. Having communication gaps, I believe, is very common in any large organization. It doesn't necessarily mean we have to eliminate them to solve for the status page issue.
Automation is the way to go. Think of when you need to update a status page. It must be an incident, which is a stressful situation. People are rushing to find the root cause, brainstorming possible solutions, and do whatever to stop the breeding. Updating the status page, while it's important, sometimes could be forgotten when everyone's hands are full already.
Use our company as an example. We have a process that during an incident a scribe is assigned with the responsibility of updating the status pages (and a bunch of other things). The problem is this person is often occupied to catch up with the situation. Plus, he might not know the difference between an internal status page and an external one.
Ideally, you want a tool that you can easily declare incidents, and scope the impact on your systems. Then updating the status pages should be automatic from here. The same goes for resolving incidents, manual intervention needs to be as minimal as possible.