When You Don't Have the Recipe Card: How AI Learned to Spot Cancer Mutations Flying Solo
Making a soufflé without a recipe is a bold move. Making one without ever having tasted a soufflé? That's basically what cancer genomics has been trying to do every time it analyzes a tumor sample without its matching "normal" DNA reference. You're staring at thousands of genetic changes, trying to figure out which ones are the cancer's fault and which ones were there all along - like trying to identify which ingredients the previous chef snuck into a soup when you never saw the original pantry.
A team led by Kiran Krishnamachari and Anders Jacobsen Skanderup just handed the field a really, really good set of taste buds. Their new tool, VarNet-T, uses deep learning to identify cancer-specific mutations from tumor samples alone - no matched normal required - and it's outperforming existing methods by 20-33% (Krishnamachari et al., 2026).
Let that marinate for a second.
The "Where's My Other Sample?" Problem
Here's the setup. Standard somatic variant calling works like a spot-the-difference puzzle: you sequence the tumor, sequence a healthy tissue sample from the same patient, and whatever shows up only in the tumor is probably a cancer mutation. Simple. Elegant. Also frequently impossible.
Matched normal samples are often unavailable. Maybe the patient's archived tissue has degraded. Maybe the biopsy was collected years ago when nobody thought to bank a normal sample. Maybe you're working with a massive biobank of tumor samples that never had normals collected in the first place. The Broad Institute's own GATK documentation essentially says tumor-only calling is "not recommended and should be avoided if possible" (GATK Mutect2 Documentation). Encouraging stuff.
Without that reference sample, you're stuck trying to separate somatic mutations from germline variants (the ones you inherited from your parents, who are blameless in this particular situation) and sequencing artifacts (the ones that are nobody's fault except entropy). Existing approaches filter against databases of known germline variants, but rare germline variants slip through like uninvited guests at a potluck - you can't screen for what you've never catalogued.
Enter the Neural Network With an Attitude
VarNet-T builds on the team's earlier VarNet framework, which was originally trained on 4.6 million high-confidence somatic variants from 356 tumor whole genomes (Krishnamachari et al., 2022). The "T" stands for tumor-only, and it's earned the letter.
The system uses a weakly supervised deep learning approach - meaning it learns from large datasets of labeled variants without requiring perfect annotations for every single training example. Think of it as learning to cook by watching thousands of cooking shows rather than following one precise recipe. The model takes aligned tumor sequencing reads, converts them into image-like representations, and classifies variants as somatic or not.
Benchmarked against public datasets, VarNet-T showed a 20-33% performance improvement over existing tumor-only methods. But the real showstopper is what it does for tumor mutation burden.
TMB: The Number That Could Save Your Life (If We Count It Right)
Tumor mutation burden - TMB - is essentially a count of how many mutations a tumor carries. It matters because in 2020, the FDA approved pembrolizumab (Keytruda) for any solid tumor with TMB of 10 or more mutations per megabase, making it one of the first tumor-agnostic, biomarker-driven cancer approvals (Marabelle et al., 2020). The logic: heavily mutated tumors produce more weird proteins on their surface, which makes them more visible to the immune system - and more responsive to immunotherapy that takes the brakes off immune cells.
The catch? TMB is only as good as your mutation calls. And tumor-only sequencing has been shown to inflate TMB estimates, particularly in underrepresented populations whose germline variants are poorly catalogued in reference databases (Parikh et al., 2021). You end up counting inherited variants as somatic mutations. Patients get classified as TMB-high when they're not. Some get immunotherapy they won't benefit from. Others miss out entirely.
VarNet-T tackled this head-on, testing TMB estimation accuracy across 1,000 tumor samples spanning 10 solid cancer types. The result: more than three times higher accuracy in classifying TMB-high status compared to existing methods.
Three times.
Why This Actually Matters (Beyond the Benchmarks)
This isn't just a prettier number on a leaderboard. Accurate TMB estimation from tumor-only samples means thousands of archival tumor samples in biobanks worldwide could become useful for research. It means clinical labs that can't obtain matched normals - which is more common than genomics Twitter would have you believe - can still provide reliable mutation profiling. It means the gap between well-resourced academic medical centers and community oncology practices gets a little smaller.
The challenges aren't gone. TMB itself remains an imperfect biomarker - high TMB doesn't guarantee immunotherapy response, and some TMB-low patients still benefit (Chan et al., 2019). But when your measuring stick is three times more accurate, the measurements start meaning something.
VarNet-T is the soufflé that rose without a recipe. And cancer genomics just got a better kitchen.
References:
-
Krishnamachari K, Nguyen HAB, Kadioglu S, Tze JO, Skanderup AJ. Improved tumor-only variant calling and mutation burden estimation with VarNet-T. Nature Communications. 2026. DOI: 10.1038/s41467-026-71705-4. PMID: 41957035
-
Krishnamachari K, Lu T, Ng AHQ, et al. Accurate somatic variant detection using weakly supervised deep learning. Nature Communications. 2022;13:4248. DOI: 10.1038/s41467-022-31765-8. PMID: 35869060
-
Marabelle A, Fakih M, Lopez J, et al. Association of tumour mutational burden with outcomes in patients with advanced solid tumours treated with pembrolizumab. Annals of Oncology. 2020;31(6):745-753. DOI: 10.1016/j.annonc.2020.02.014
-
Parikh K, Huber R, Engstrom LD, et al. Fast, accurate, and racially unbiased pan-cancer tumor-only variant calling with tabular machine learning. npj Precision Oncology. 2022;6:69. DOI: 10.1038/s41698-022-00340-1
-
Chan TA, Yarchoan M, Jaffee E, et al. Development of tumor mutation burden as an immunotherapy biomarker: utility for the oncology clinic. Annals of Oncology. 2019;30(1):44-56. DOI: 10.1093/annonc/mdy495. PMID: 30395155
Disclaimer: The image accompanying this article is for illustrative purposes only and does not depict actual experimental results, data, or biological mechanisms.