daily

When Chaos Meets Code: The Art of Processes in a Startup TPM's Life

In a world where AI meets the unpredictable chaos of startups, navigating processes like incident management and SLO hygiene becomes an art form. Join me as I reflect on the balance between governance and speed, and how we can dodge the bureaucratic pitfalls while embracing agility.

TPMxAI

26 Apr 2025 — 4 min read

When Chaos Meets Code: The Art of Processes in a Startup TPM's Life

Startup Chaos: Thriving Amidst Pressure

Picture this: a bright Tuesday morning in the startup trenches, where I, a Technical Program Manager, am juggling a dozen tasks while sipping what can only be described as ‘liquid optimism’—a double shot of espresso. The team is buzzing, AI tools are humming, and the latest release is just around the corner. But wait! What’s that? A critical incident has just been flagged. Cue the dramatic music.

This chaotic moment is typical in the fast-paced world of startups. In such an environment, effective processes are not just a luxury; they are a lifeline. So how do we manage the whirlwind of incidents, releases, and reviews while maintaining our sanity and ensuring our AI solutions don’t become sentient, take over the world, and blame us for it?

Let’s dive into some TPM-relevant processes that can save the day—or at least your hairline.

Incident Management: The Blameless Postmortem

Incident management is akin to being a fire chief in a circus. Things are always going awry, but it’s my job to ensure we learn from each act. The blameless postmortem is a critical tool in this chaos. It’s like a group therapy session for teams where we gather around, share our feelings, and analyze the incident without the finger-pointing and shame.

Imagine a scenario where our AI model fails spectacularly during peak hours—users can’t log in, and support tickets are piling up like laundry on a Sunday night. In our postmortem, we don’t ask, “Who messed this up?” Instead, we focus on questions like, “What can we improve?” This shifts the perspective from blame to growth, fostering a culture of psychological safety.

SLO/SLA Hygiene: The Health Check

Next up is SLO (Service Level Objectives) and SLA (Service Level Agreements). Think of these as the gym membership for our services—great in theory, but often neglected. Maintaining SLO/SLA hygiene is about keeping our promises to users and ourselves. If we say our response time is under two seconds, we better mean it.

In the world of AI, it’s crucial to understand what success looks like. If our AI model can’t meet its SLOs, we need to ask if it’s time for a reevaluation. Are we expecting too much? Are our users’ needs evolving faster than our models? Keeping our SLOs and SLAs healthy requires regular “check-ups” to ensure they reflect reality and not just our aspirations.

Release Trains and Quality Gates: The Express Route

Release trains are like the popular kid in school; everyone wants to be on board, but if there’s a delay, chaos ensues. Establishing clear quality gates is essential to ensure we’re not releasing half-baked features just to meet a deadline. It’s about striking the right balance between speed and quality—a dance that requires finesse.

In the age of AI, where iterative improvements can happen almost overnight, it’s tempting to push changes out the door as quickly as possible. However, we must resist the urge to become a cargo cult—worshiping the release process without understanding its purpose. A lightweight, data-informed approach allows us to make smarter decisions about what to release and when. By integrating continuous feedback, we can ensure that our release trains are not just fast but also reliable.

Design/PRD Review Rituals: The Art of Collaboration

Design reviews and PRD (Product Requirements Document) reviews can feel like attending a dinner party where everyone debates the merits of pineapple on pizza. But here’s the kicker—they’re essential for ensuring alignment across the team. These rituals should encourage open dialogue, not just a box-ticking exercise.

As a TPM, I’ve learned that the best reviews are those that foster collaboration. When we approach design with the mindset of “How can we improve this together?” rather than “This is my idea, and it’s perfect,” we create space for innovation. AI can assist here, too—using data to inform our discussions can help us put aside biases and focus on what truly matters.

Governance vs. Speed: The Balancing Act

Finally, we arrive at the age-old debate of governance versus speed. It’s like trying to figure out if you should prioritize your coffee or your breakfast on a busy morning. In the startup world, the pressure to move quickly is immense, but without a framework, we risk descending into utter chaos.

Here’s the trick: find a governance model that is lightweight and adaptive. Strive for processes that provide structure without becoming a bureaucratic burden. Embrace agility—let data drive your decisions and allow the team to be adaptive. This way, we can respond quickly to changes without getting bogged down in endless meetings or approval loops.

Closing Thoughts: Embracing the Chaos

As I reflect on my journey as a TPM in the realm of AI and startups, I realize that processes, when approached thoughtfully, can be the backbone of our chaotic existence.

Thriving Through Chaos And Collaboration

By embracing blameless postmortems, maintaining SLO/SLA hygiene, establishing quality gates, fostering collaborative design rituals, and balancing governance with speed, we can create an environment that thrives amidst unpredictability.

So here’s to the chaos, the coffee, and the wonderfully messy world of technical program management with AI. May we always find humor in our struggles, wisdom in our processes, and, most importantly, a little bit of grace in our journey.