The Risk of Automating AI Too Early in Healthcare

AI integration is often framed as a binary decision: either you have “done AI” or you have not. In practice, the harder and more interesting work is deciding when to formalize it, and what you need to learn before you do.

This post outlines an approach we took to use AI early, safely, and productively… without committing to assumptions we had not yet earned. It reflects a broader philosophy I bring to product work: optimize for learning first, then systematize once the value is clear.

Why we didn’t automate clinical workflows first

AI came up in nearly every product conversation this year. New models, new capabilities, and growing pressure to integrate it everywhere. The question was never whether we would use AI. It was how quickly we should formalize it.

What we kept coming back to was: we already had high-quality data and a small, well-defined ecosystem. The bigger risk was not underusing AI. It was over-engineering it too early.

Embedding AI directly into clinician and coach workflows would have forced us to define prompts, UX patterns, and system behavior before we understood how people would actually use the tool. That felt premature.

Instead, we started with something much simpler.

Unifying medical records, RPM data, and coaching notes

For each patient, we created a single, approved data file that brought together the information our care team already relies on day to day. This included medical records, device readings, biometrics, coaching transcripts, coach notes, and longitudinal goals.

That file could be imported into a HIPAA-compliant LLM. Coaches and providers could drop it in and ask questions in natural language while preparing for or during conversations, without relying on rigid workflows or pre-defined prompts baked into the product.

This decision did a few important things at once:

It avoided hard-coding AI behavior into the UI
It respected existing clinical and coaching workflows
It gave users freedom to explore, rather than forcing a “right” way to ask questions

Coaches used it to answer the kinds of questions that actually help them do their jobs better:

When was the last time we talked about nutrition?
What have been their biggest motivational barriers over the past three months?
What personal details matter for continuity, like family or work context?

Providers used it to generate concise summaries that combined RPM trends with qualitative coaching insights when communicating with a patient’s physician.

That flexibility was intentional.

Using clinician prompts as product research

By starting with a single data view per patient, we gave the care team freedom to use AI in ways that supported how they actually think and work. At the same time, we could observe how it was being used.

Which questions came up repeatedly?
Where were people trying to synthesize information rather than retrieve it?
What context was missing when answers felt weak?

Those prompts became a form of research. Instead of guessing what to build or prematurely defining workflows, we were learning directly from real behavior. Over time, this gave us a much stronger foundation for deciding what should eventually be productized inside the care management platform.

Just as importantly, this approach kept us resilient as models continued to change. We designed around access to high-quality data, not around a specific model, vendor, or prompt structure. The value was not automation for its own sake. It was helping people show up with better context.

Why we avoided deep AI integration into our care management platform

It would have been intuitive to jump straight to a deep integration: wiring an LLM directly into the product, defining prompts up front, and embedding assumptions into the workflow.

Doing that early would have meant committing to decisions before we understood how the tool would actually be used.

By keeping the setup lightweight, we avoided locking ourselves in too early. Coaches did not have to wait on engineering cycles to experiment. Product could learn from real usage and move toward deeper integration deliberately, not by default.

This also reduced long-term risk. In a space where models evolve quickly, tightly coupled integrations tend to age poorly. A prompt that works today can quietly degrade tomorrow. Treating AI as a flexible layer on top of existing data kept us adaptable while still operating within clear compliance boundaries.

This was not about avoiding ambition. It was about sequencing.

We put one foot in and one foot out: enough structure to be useful and safe, without over-committing before we had evidence.

What changed for clinicians and patients

Patients benefit from this approach even if they never see the AI directly. Coaches show up more prepared. Conversations feel more continuous. Important details are less likely to be missed.

Over time, this creates a responsible path to deeper AI-powered experiences, grounded in real usage instead of assumptions.

I am not anti-AI. I am anti committing too early.

In an AI-heavy world, the fastest way to lose flexibility is to overbuild before you understand where the value actually is. Choosing leverage over complexity gave us room to learn now, while preserving the option to go deeper later.

Practical Implementation Notes for Product Teams

What mattered most was the data, not the model.

What we mean by “the data view”

The patient file was designed to be compact and human-readable. If a coach could skim it and quickly find medications, recent encounters, labs, device readings, notes, and goals, a model usually could too.

No special instructions required.

Principles we followed

Make data the foundation. When outputs looked off, we resisted the urge to write a smarter prompt. We improved the data instead. Better context consistently produced better answers across many different questions.

Prefer plain text over heavy structure. Most of our patient data lives in FHIR, exported as JSON. For patients with very large medical records, that structure was token-heavy and often required explaining the schema inside the prompt (which we were trying to avoid). We transformed and flattened the data into plain text or light markdown, so the model used fewer tokens and produced more accurate, reliable responses.

Design for many prompts, not one perfect prompt. We did not aim for a single “right” question. Coaches asked different questions at different moments. As the data improved, the same prompts produced better results, which was the signal we cared about.

Keep experimentation visible. Prompt usage was not hidden. We reviewed patterns, talked openly about what worked, and adjusted the data accordingly. Learning was shared and not siloed.