AI has been moving fast, but inside real health coaching and clinical workflows, we keep asking ourselves: What is it actually good for?
For the past couple years, we’ve been putting large language models through real-world use cases across our coaching and clinical teams. We’ve used them to summarize calls, draft reports, structure data, and support preparation for clinical visits. And what we’ve seen is the same pattern every time: AI can be brilliant for certain tasks and completely unreliable for others.
There’s no one-size-fits-all solution. The impact depends on the workflow, the data structure, and the level of human judgment the task requires.
And as we evaluated AI across different parts of the workflow, SOAP emerged as the clearest lens for understanding what should be automated and what should stay human.
At Kannact and Starlight, our health coaches take notes using the classic clinical note structure: SOAP, which we’ve adapted to align with behavior change principles like motivational interviewing and the transtheoretical model of behavior change.
SOAP stands for:
- Subjective: What the patient says, their feelings, barriers, and wins.
- Objective: Measurable data like blood pressure, glucose, weight, and activity.
- Assessment: The coach’s interpretation: how the person is progressing, where they’re struggling, and what stage of change they’re in.
- Plan: What happens next: new goals, next steps, and follow-up actions.
Coaches complete this note after every call and send quarterly progress reports with key updates to the patient’s broader care team.
That process might sound simple, but it’s surprisingly nuanced. Some steps are manual on purpose because they create moments of reflection and help coaches and clinicians think, process, and support people more effectively. Those pieces need to stay human. And understanding that distinction has helped us see exactly where AI adds value and where it does not.
Where LLMs Shine
1. Summarizing the Subjective Section
The subjective section is often the most time-consuming part of documentation. It captures what was said directly in the conversation… the wins, the barriers, and the patient’s own words.
LLMs handle this incredibly well. They are good at distilling a single call transcript into a clear, concise summary. They don’t need long-term memory, and it’s fine if the wording isn’t identical every time. Coaches simply review, edit lightly, and approve.
This saves time, reduces fatigue, and lets the coach focus on interpreting what the conversation meant rather than transcribing it.
2. Drafting Quarterly Progress Reports
Every quarter, we send a Health Summary Report to the patient and their primary care team. These reports synthesize structured data (device readings, trends, HEDIS measures, coaching history) with highlights from recent conversations.
LLMs perform well here because the inputs are structured and predictable. The model weaves trend analysis, data insights, and transcript highlights into a single narrative. The reports still receive human review, but the baseline clarity and consistency are often stronger than a manually written version.
The time savings are meaningful, and it standardizes the way progress is communicated across hundreds of participants.
Where LLMs Fall Short
1. Summarizing Complete Medical Records
One of the first things we tried was summarizing complete medical records. These charts often span thousands of pages. LLMs struggled. The results were inconsistent, incomplete, or confidently inaccurate.
We realized the limitation wasn’t the model. It was the data. LLMs are not built to process massive, fragmented medical charts in raw form.
The solution is to restructure the data first: normalize fields, remove noise, and build a cleaner, consolidated patient view. With that foundation, AI becomes far more reliable. Without it, the model simply can’t support the precision required for an E&M chart review.
2. Assessing Patients (the “A” in SOAP)
This one is non-negotiable. The assessment section has to stay human. This is where the coach interprets tone, hesitation, motivation, and stage of change. It is reflective work, and it builds the coach’s long-term understanding of the patient.
AI can surface data or remind coaches of patterns, but it cannot reliably interpret emotional nuance or behavioral context. And it should not decide what stage of change a person is in. That judgment comes from experience, listening, and empathy.
3. Building the Plan
At first, we considered having AI draft the Plan section. But goal-setting is part of the coaching conversation itself. It happens live, collaboratively, and intentionally.
Coaches use a guided structure to help patients turn vague intentions into SMART goals. That moment of translation is central to engagement and accountability. If AI wrote it after the call, it would turn a collaborative moment into a passive one. So we keep this section fully human.
Caveats and Lessons Learned
Transcription accuracy is the weakest link
Transcripts are the foundation of every AI layer that sits above them. If the transcript is wrong, the summary will be wrong. Speech-to-text still misses key words, medical terms, or emotional cues. Deepgram’s Nova 3 Medical Model has been the closest so far, but accuracy still needs to improve.
Until transcription becomes consistently reliable, every downstream use case requires caution.
SOAP isn’t just documentation. It’s reflection.
In workshops with coaches, we kept returning to examples of notes written like: “She wasn’t really talkative, seemed stressed about work.”
That line doesn’t reflect what was said. It reflects what was observed. It captures energy, tone, and context the coach wants to remember later.
LLMs rarely capture these cues without an unrealistic amount of custom training. And even if they could, it would be brittle, expensive, and hard to maintain as coaching styles and models evolve.
This is why some parts of the workflow must stay human.
Don’t lose non-health context.
Small personal details, like family updates, job stress, or travel plans, often shape the next visit. AI tends to ignore or deprioritize these unless specifically trained to retain them. Without intentional prompts or structure, these meaningful details disappear from the narrative.
Use AI where judgment is not required, and streamline the rest
For sections that do require human thinking, there are still ways to reduce the burden. Dictation is a major one. The ability to speak an assessment out loud and get an accurate, clean note makes the reflective work faster without replacing it.
The goal is to keep the thinking human and make the manual parts easier.
Closing Thoughts
AI is a tool. It works beautifully when it supports the parts of the workflow that are mechanical, repetitive, or data heavy. And it breaks quickly when the task requires interpretation, judgment, or emotional intelligence.
The sweet spot is using AI to reduce documentation load so coaches and clinicians can focus on reflection, connection, and decision-making. AI should support reasoning, not replace it.
Used in the right places, it makes documentation faster, clearer, and more consistent.
And when it comes to assessment, interpretation, and planning… that’s where humans shine.