When AI Learns to Say “I’m Not Sure” About Lung Cancer Slides

Drug delivery is the subway system of oncology: the medicine has to get to the right stop, at the right time, without accidentally express-training itself into a neighborhood where it has no business being. Diagnosis has a similar transit problem. Before treatment can board the platform, doctors need to know exactly what kind of cancer they are dealing with. Wrong stop, wrong train, very bad commute.

That is why a new study in Nature Biomedical Engineering feels less like “AI reads slides” and more like “the pathology robot finally got a caution light.” Zhang and colleagues introduced TRUECAM, a framework built to make AI more trustworthy when classifying non-small cell lung cancer, or NSCLC, from digitized pathology slides [1].

When AI Learns to Say “I’m Not Sure” About Lung Cancer Slides

NSCLC is the big category of lung cancer, making up roughly 85% of lung cancer cases. Two major subtypes, lung adenocarcinoma and lung squamous cell carcinoma, can look like close cousins under the microscope but often send treatment decisions down different corridors. Pathologists already do this work with remarkable skill. The question is whether AI can help without becoming that overconfident spaceship computer that says, “Everything is fine,” while sparks fly behind the control panel.

The Slide Is Not One Picture. It Is a Planet.

A whole-slide image is not a neat little JPEG. It is a gigapixel tissue galaxy, chopped into thousands of tiles. Some tiles show tumor. Some show normal tissue. Some show smudgy, ambiguous chaos, because biology apparently hired a fog machine.

Traditional AI models often try to make a single call from this sprawling landscape. TRUECAM wraps around these models and asks three sensible questions before letting the machine speak too loudly.

First: is this slide even the kind of thing the model was trained to understand? If not, TRUECAM flags it as out-of-scope.

Second: are some regions too ambiguous to help? If yes, it filters them out, like deleting blurry security-camera footage where everyone looks like a suspect in a trench coat.

Third: how confident should we be in the answer? That is where conformal prediction enters, a statistical method that can return a set of plausible labels or abstain when the model should stop cosplaying as an oracle.

“Computer, Please Admit When You’re Guessing”

This is the part that feels oddly futuristic and deeply practical. TRUECAM does not just chase higher accuracy. It gives the AI a formal way to say, “I need a human.”

In the study, the team tested TRUECAM across more than 20,000 whole-slide images, including NSCLC datasets from TCGA, CPTAC, and a private Hong Kong cohort. They tried it with an older task-specific model, Inception-v3, and newer pathology foundation models including UNI, CONCH, Prov-GigaPath, and TITAN.

The results were striking. With a target coverage of 95%, TRUECAM reduced the Inception-v3 NSCLC subtyping error rate by 72%. With 99% target coverage, the reduction reached 93.8% [1]. When the system was uncertain, it could abstain and send the case to a pathologist instead of winging it. Honestly, if more software had this level of self-awareness, printers would be less hated.

TRUECAM also spotted out-of-domain inputs, such as non-cancerous lung tissue being fed into a cancer-subtyping model. That matters because real hospitals are messy. Scanners differ. Stains differ. Patient populations differ. A model trained in one data universe may wobble in another. In sci-fi terms, the ship’s sensors need to know when they have left mapped space.

Why This Matters Beyond One Lung Cancer Task

AI in lung cancer has been racing ahead across screening, prognosis, treatment response, and pathology. Reviews keep pointing to the same translation bottlenecks: models can perform beautifully in controlled datasets, then stumble when faced with new hospitals, new devices, or underrepresented patient groups [2]. The future is not just bigger models. It is models that know their perimeter.

That is also why pathology foundation models are so exciting. CONCH showed how pairing histology images with biomedical language can make pathology AI more flexible [3]. Prov-GigaPath pushed whole-slide modeling at enormous scale [4]. CHIEF showed broad cancer diagnosis and prognosis potential across many datasets [5]. These systems are the gleaming starships. TRUECAM is the flight safety protocol that asks whether the airlock is actually closed.

If reproduced and tested prospectively in real clinical workflows, this kind of uncertainty-aware AI could help pathologists focus on the hardest cases, reduce silent errors, and make digital pathology less of a black box with a lab coat. It will not replace the human expert. It may become the unusually honest assistant that says, “I’ve got this one,” or “Please call the grown-up.”

That is a small sentence. In cancer diagnosis, it can be a very big deal.

References

Zhang X, Wang T, Yan C, et al. Implementing trust in non-small cell lung cancer diagnosis with a conformalized uncertainty-aware AI framework. Nature Biomedical Engineering. 2026. https://doi.org/10.1038/s41551-026-01694-8. PMCID: PMC11975025.
Zhu E, Muneer A, Zhang J, et al. Progress and challenges of artificial intelligence in lung cancer clinical translation. npj Precision Oncology. 2025;9:210. https://doi.org/10.1038/s41698-025-00986-7.
Lu MY, Chen B, Williamson DFK, et al. A visual-language foundation model for computational pathology. Nature Medicine. 2024;30:863-874. https://doi.org/10.1038/s41591-024-02856-4. PMCID: PMC11384335.
Xu H, Usuyama N, Bagga J, et al. A whole-slide foundation model for digital pathology from real-world data. Nature. 2024;630:181-188. https://doi.org/10.1038/s41586-024-07441-w.
Wang X, Zhao J, Marostica E, et al. A pathology foundation model for cancer diagnosis and prognosis prediction. Nature. 2024;634:970-978. https://doi.org/10.1038/s41586-024-07894-z.

Disclaimer: The image accompanying this article is for illustrative purposes only and does not depict actual experimental results, data, or biological mechanisms.