February 12, 2024

The Clinical AI Trust Deficit: Why Adoption Lags Development

By Claire Johansson, General Partner

Every conversation we have with health system CIOs and clinical informatics leaders in 2024 includes some version of the same concern, regardless of the technology category: the models may be good, but can we rely on them when something goes wrong? Trust, auditability, and clear accountability chains are the actual product in clinical AI — not accuracy metrics, not benchmark performance, not research publication counts.

This is not a new observation. The clinical AI field has discussed the gap between academic performance and clinical deployment for years. What is changing is the stakes. As AI applications move from decision support in low-acuity settings — scheduling, documentation, administrative triage — into diagnostic and therapeutic workflow support, the accountability question becomes more acute. An AI that helps a physician document a visit is one thing. An AI that surfaces a diagnostic flag in a patient chart is another. The second application demands a more rigorous answer to the question of what happens when the system is wrong.

The companies addressing this most effectively are doing so through three mechanisms. First, they are building explicit human-in-the-loop workflows where the AI output is a recommendation that requires physician confirmation, not an action that happens automatically. This preserves clinical accountability in a form that physicians, health systems, and their liability carriers can accept. Second, they are building audit trails that allow a health system to reconstruct exactly what information the AI had access to when it made a recommendation and exactly what recommendation it made. Third, they are building correction feedback loops — systems where physician overrides of AI recommendations are captured and used to improve model performance, but also to demonstrate to regulators that the company takes clinical error seriously.

The trust deficit is real, but it is not permanent. It is a function of the maturity of the deployment evidence base. As more health systems accumulate multi-year deployment experience with specific clinical AI products — experience that includes both accurate outputs and error events and how those were handled — the trust calculus shifts. The companies that are building the safety infrastructure now, even at the cost of slower initial deployment, are building the evidence base that will make them the trusted vendors in five years.