December 12, 2025 · Aldo Raicich

Building Dexter AI for Focus on Five: A Product Manager's Perspective

Listen to this article

0:00 0:00

Key Takeaways

AI product management requires shifting from deterministic specs to probabilistic thinking — outputs vary and "done" looks different
AI metrics blend quality, engagement, safety, and economics in ways traditional features don't require
The skills that still matter are user empathy, clear communication, and constant prioritization — what changes is how you apply them
Product managers don't need to become ML engineers, but fluency in prompts, evaluation, and API economics is core product work

Product managers tend to approach AI features the same way we approach everything else: detailed specs, clear acceptance criteria, predictable outputs. It's a playbook that works for traditional features. AI behaves differently, and the usual approach creates friction because AI features don't work like the rest of the product.

After years of building traditional web and mobile experiences for Fortune 500 companies and startups, I recently shipped Dexter AI, the conversational assistant inside the Focus on Five app. I've experienced firsthand how the shift from traditional Customer Experience (CX) product management to AI product management requires rethinking how you work with stakeholders, collaborate with engineering teams, and define expected outcomes.

The Feature in Question

Focus on Five is a productivity app. Instead of endless to-do lists, users choose up to five meaningful priorities each week. The app guides them through a weekly rhythm of planning, focusing, and reflecting.

Dexter AI is the app's conversational assistant—a chatbot that helps users break down a big goal into an actionable weekly priority or warns the user that the goal is a simple task and not meaningful enough to be a priority. Rather than filling out forms or selecting from pre-built templates, users have a natural conversation with Dexter, who asks clarifying questions, offers suggestions, and helps translate vague aspirations into concrete commitments.

Under the hood: Dexter is powered by a Large Language Model (LLM)—specifically OpenAI's GPT-4.1-mini via their Response API. After extensive research, this model provides a well-balanced response quality and performance vs token costs. We manage the system prompt, pass user personalization data to shape each interaction, and receive hybrid text/JSON responses. The JSON enables quick replies—selectable options that let users respond with a tap instead of typing—creating a more fluid conversational experience.

Building the core app was traditional CX product work. Building Dexter AI for the Focus on Five app was something else entirely.

How Traditional Features Work

In conventional CX product management, the workflow is well-established. You identify a user problem, design a solution, write specifications, and hand off to engineering. The team builds exactly what was specified. You test against the acceptance criteria. If the button does what the spec says it should do, the feature is done.

This model works because traditional features are deterministic. A button either navigates to the correct screen or it doesn't. A form either validates input correctly or it doesn't.

Deterministic: In AI and software, deterministic means the system produces the exact same output every time, given the same input.

Dependencies are predictable. You need design assets, API endpoints, and perhaps some backend logic. Timelines can be reasonably estimated. Teams use velocity, story points, and past experience to forecast delivery—imperfect, but grounded in comparable work. Stakeholder expectations are manageable because you can show them mockups and prototypes that represent the final experience with high fidelity.

This is the world most product managers are trained in—until AI features become part of the roadmap.

The Shifts

Building Dexter AI required a fundamental change in how I approached product work. Not because the technology was unfamiliar, but because the nature of the output was different. Here's what actually changed.

From Specifications to Behavior Ranges

Traditional features have specifications. AI features have desired behaviors and acceptable ranges. When I write a spec for a standard feature, I'm describing exactly what should happen. When I define requirements for an AI feature, I'm describing what should generally happen, what should never happen, and what the acceptable variance looks like in between.

For Dexter, this meant shifting from "the assistant responds with X" to "the assistant responds in a way that is encouraging, asks clarifying questions, stays under 100 words, and never provides medical advice." The output isn't deterministic—it's probabilistic. Two users asking similar questions might get different responses, and that's not a bug. It's the nature of the system.

Probabilistic: A probabilistic system generates responses based on patterns and likelihood rather than fixed rules.

This changes how you write user stories, how you define acceptance criteria, and how you think about quality assurance. You're not testing for exact matches. You're testing for behavior within bounds.

Example JIRA story for Dexter: "As a user, when I describe a big goal, Dexter helps me break it into a weekly priority." Acceptance criteria: Dexter asks 1-2 clarifying questions; response stays under 100 words; suggests an actionable priority, not a vague aspiration; maintains encouraging tone; does not provide medical, legal, or financial advice.

From Linear Sprints to Iterative Evaluation

Agile methodology still applies to AI development, but the nature of the work shifts. Traditional sprints might include stories like "build the settings screen" or "implement push notification logic." AI sprints include more spikes and research stories—exploratory work to test whether an approach even works before committing to building it.

With Dexter, we ran multiple prompt engineering cycles that looked nothing like traditional development. We'd adjust system instructions, test against a variety of user inputs, evaluate the responses, and iterate. This tuning work doesn't fit neatly into "development" or "QA" categories. It's ongoing product work that continues even after launch.

This changes how you demonstrate progress to stakeholders. For traditional features, what stakeholders see is what users will get. AI features break this model—a demo might show Dexter handling a conversation beautifully, but that same conversation phrased slightly differently might produce a less impressive result. Stakeholders are seeing a sample from a distribution, not a deterministic preview.

Instead of polished demos, I found it more effective to share evaluation results—how Dexter performed across a range of test cases, where it excelled, and where it struggled. The definition of done changes accordingly: "feature performs acceptably across our test case library." You're shipping behavior, not just code.

From Predictable Dependencies to Layered Uncertainty

Traditional features have dependencies you can map clearly: design needs to deliver assets, backend needs to expose an API, and legal needs to approve copy. AI features add layers of uncertainty that compound on each other.

For a conversational AI like Dexter, the dependencies include the quality of your prompt engineering, the performance of the underlying language model, and the guardrails you've implemented to prevent problematic outputs. Each layer introduces variability. The language model might be updated by the provider. Guardrails might catch legitimate responses or miss problematic ones.

Knowing When You Need RAG—And When You Don't

A common assumption is that building a capable AI chatbot requires retrieval-augmented generation (RAG) and vector databases. For some products, that's true. For Dexter, it wasn't.

RAG (Retrieval-Augmented Generation): A technique where an AI model pulls relevant information from an external data source — like a database or document library — before generating its response, so the answer is grounded in specific, up-to-date knowledge rather than just what the model learned during training.

RAG shines when your AI needs to reference large, proprietary, or frequently changing knowledge bases—product catalogs, customer documentation, internal wikis, or personalized user history at scale. If your chatbot needs to answer "what's our return policy for items purchased in the last 30 days?", you're likely in RAG territory.

Dexter's job is different. It's not retrieving information from a knowledge base—it's facilitating a structured conversation. It asks questions, synthesizes user input, and applies consistent logic to help users articulate their priorities. The intelligence is in the conversation design and prompt engineering, not in information retrieval. GPT-4.1-mini, combined with well-crafted system prompts and user context passed at runtime, was sufficient for these rule-based interactions.

For PMs evaluating this decision: start with what your AI actually needs to know and where that knowledge lives. If the answers exist within the model's training data or can be provided through prompt context, you may not need the added complexity and cost of RAG infrastructure. If you need the AI to access dynamic, proprietary, or voluminous data, invest in retrieval architecture early.

Fine-tuning — training a base model on your own data to specialize its behavior — is expensive, time-consuming, and was unnecessary for Dexter. It requires curating training datasets, running training cycles, and maintaining a custom model over time. Well-crafted system prompts and user context passed at runtime gave us the conversational control we needed without that investment.

From Fixed Costs to Variable Economics

Here's a shift that catches many product teams off guard: AI features introduce variable costs that scale with usage. Traditional features have development costs and relatively fixed infrastructure expenses. AI features that rely on external providers incur costs per API call, per token, per interaction.

For Dexter, every conversation has a cost. Each time a user sends a message, the model runs inference — the process of generating a response — and that inference is billed by token usage. This fundamentally changes how you think about monetization, pricing tiers, and feature gating. Do you absorb API costs and build them into your margins? Do you limit AI interactions for free users? Do you pass costs through to customers on a usage basis?

These aren't just finance decisions—they require tight alignment between CX and AI strategy. If you limit interactions to control costs, you affect the product experience. If you absorb costs without limits, you need confidence in your unit economics. Product managers need to be active participants in these conversations.

Forecasting is also harder. Traditional infrastructure scales somewhat predictably. AI API costs depend on user adoption, conversation length, and interaction frequency—all of which are difficult to model before launch and can shift quickly after.

Measuring What Matters

Traditional features have straightforward success metrics: load time, error rates, tap/click success rates. These are binary—the screen loads or it doesn't, the button works or it fails. AI features require a different measurement framework—one that captures whether a probabilistic system achieved the desired outcome and how the experience felt along the way.

Industry pattern: Leading AI chatbots — ChatGPT, Claude, Gemini — all use a simple thumbs up/thumbs down mechanism on individual responses. It's the most common way to capture user satisfaction for AI interactions because it's low-friction and immediate. The feedback is inherently subjective — it reflects how the user felt about the response, not whether the response was technically correct — but that subjectivity is precisely what makes it valuable. Satisfaction is a human judgment, and simple binary feedback captures it at scale.

For Dexter, the most important metric is priority completion rate: did the user successfully complete their weekly priorities? This is the north star. Closely related is conversation completion rate—tracking how many users finish the flow (add a priority) versus abandoning mid-conversation. Drop-off patterns reveal where Dexter loses people, which informs prompt tuning and conversation design.

User satisfaction matters, but measuring it requires intentional design. Post-chat ratings or thumbs up/down UI elements capture sentiment that raw completion data misses.

Guardrail trigger rate is an often-overlooked metric. How frequently do safety boundaries activate? Monitoring this ensures the AI behaves responsibly and helps identify edge cases where the system prompt needs refinement.

On the performance side, response latency affects user experience directly—slow responses break conversational flow. Fallback and failure rates track when Dexter can't understand or respond appropriately, signaling areas for improvement.

The key insight: AI metrics blend quality, engagement, safety, and economics in ways traditional features don't require.

What This Means for Product Managers

Product managers moving into AI work don't need to become machine learning engineers, but they do need to develop fluency in how these systems behave. Understanding concepts like prompt engineering, evaluation metrics, retrieval architecture decisions, and API economics isn't optional—it's core product work.

My background in AI product design and responsible AI through MIT's certification program provided a foundation for thinking about these problems systematically. Shipping Dexter reinforced these principles in practice.

The skills that still matter are the ones that always mattered: deep user empathy, clear communication, cross-functional leadership, and constant prioritization. What changes is how you apply those skills when the output you're managing is probabilistic rather than deterministic.

Where This Is Heading

AI features are becoming standard components of modern products. The distinction between "AI product managers" and "regular product managers" will likely fade as these skills become expected baseline competencies.

For now, though, product managers who understand both worlds—who can ship a polished traditional experience and navigate the uncertainty of AI development—have a meaningful advantage. The goal isn't to become an AI expert. It's to become a product manager who knows what questions to ask, what tradeoffs to navigate, and what "done" really means when the feature you're shipping thinks for itself.

Get Focus on Five App

Focus on Five app

Scan Me

Aldo Raicich, MBA Principal Product Consultant Product Strategy · Planning · Development · Growth

Aldo is a product leader with 10+ years of experience helping Fortune 500 companies, small businesses, and startups build and grow web, mobile, eCommerce, and AI-integrated digital products. He is the founder of Copotential, a San Francisco-based product consultancy.

Ready to move your product forward? Start here →

Back to Articles