My dad occasionally says “More plumbers, less firefighters”. He’s probably biased because he has a plumbing business (Trident Plumbers in Nairobi).
Wait! aren’t firefighters cool though? So many Hollywood movies with thrilling firefighting scenes. Why would he say that?
Growing up, I had the opportunity to accompany my Dad to construction sites. Witnessing fire sprinklers being tested was quite a sight. They’d open the valves, check water pressure, and light a small fire under the sprinklers. After a minute or so, boom! Water sprinkles down in a fountain. I got a huge kick out of it.
Fire sprinklers are boring. Hollywood can’t make a movie out of it. They’ve probably saved a lot of lives though.
In my career journey I’ve noticed many orgs over-value firefighters (folks that patch bugs and fight incidents), and under-value plumbers (folks who build good infra and architecture such that those bugs and incidents never arise in the first place).
Human nature is that we don’t appreciate things that didn’t happen. Newspapers don’t have headlines “No one died from smallpox today”.
A related tweetstorm by @shreyas about the concept of “Preventable Paradox” and a superman story.
What makes good plumbing?
Controlled flow - Things continuously flow from one place to the other. They are monitored and the rate can be controlled. e.g Water flows at a certain pressure through pipes, gate valves can control the flow, pressure gauges monitor the pressure..
Feedback loop - A loop continuously runs that gives feedback on whether to increase or decrease the flow. In the fire sprinkler scenario, when it gets too hot the bulb bursts and lets the water out, which extinguishes the fire.
After reading the book Systems Thinking, this principle has solidified for me. Every stable system has the elements of good plumbing. Coding Horror as a good article on pit of success. Good plumbing makes it so the users fall into the pit of success, instead of the pit of despair.
Refrigeration
Controlled flow - the fan and compressor limit how much cool air is released.
Feedback loop - Thermostat monitors temperature and gives start/stop signals to compressor.
Cruise Control / L2 self driving
Controlled flow - amount of fuel fed to engine is controlled by throttle, which controls speed.
Feedback loop - Radars monitor distance and give more/less signals to throttle.
Kubernetes
It’s built on the fundamental principle of control loops. Almost every resource has a spec, status and a control loop that monitors it.
Controlled flow - The notion of PodDisruptionBudget controls how many pods can be unavailable at a time.
Feedback loop - The status is continuously monitored, pods are restarted if they go down.
Continuous Integration / Continuous Deploy (CI/CD)
Controlled flow - Only one deploy to a service at a time. Build -> test -> stage -> deploy is a continuous flow.
Feedback loop - If tests and health checks pass, deploys move to next stage, otherwise they abort. Tests that run per PR are the checks that hold the line. An autoscaling nodepool scales up and down to handle the load.
At Mixpanel, I’ve had the opportunity to sit in multiple incident postmortems (many caused by me). I really loved their blameless postmortem culture. “What happened? what was the root cause? what can we do to ensure that it never happens again?”. Engineering leaders saw value in investing in good infra. It was okay to go a bit slow on features but have better quality checks. Good plumbing is valued there.
Over time, we added more automation to the deploy system, more checks, more tests, more observability, more parallelism, better caching, deploy previews per PR, faster runs.
Mixpanel deploys >1000/month. Having a solid CI/CD system can do wonderful things for productivity. I’m probably biased because I built parts of the infra. Mixpanel has legit really good infra to work with monorepos. I may do another post on elements of good CI/CD and monorepos with polyservice setup.
Alex, Andy, Ted, Austin, Rohith, Eddie, Chinmay, Josh - If you’re reading this, thanks for helping me become a better digital plumber.
At Stripe, I was surprised how much friction there was to get things done. I assumed that this was only my perception, but after reading every response in their annual survey, developer productivity was a commonly cited pain. In Stripe’s case, they have a shit ton of money and can hire 1000s of engineers to compensate for efficiency losses.
I’m currently at Recurrency. I’m grateful for the folks there to trust me with the frontend ownership and setup good plumbing. We can deploy in <1 min, CI runs in <2 min (including build, unit tests, preview links, e2e, and visual regression tests). May do a more concrete post on the frontend setup there.
If you like plumbing problems, recurrency.ai is hiring. ERPs are the core of world’s supply chain. Current ERPs are slow, clunky and ugly, it doesn’t have to be that way.
In summary:
If plumbers do a good job, firefighters don’t need to be called. Good plumbing is all about setting up stable autonomous systems.