Philip Graemer

PhD student · Deep learning for biomedical microscopy
University of Strathclyde & University of Glasgow · Supervised by Dr Giuseppe Di Caprio

About

I study how modern vision architectures learn, transfer, and fail when moved from natural images into biomedical microscopy, a domain where the standard assumptions of computer vision break down in informative ways.

My current work centres on a controlled benchmark comparing CNNs and Vision Transformers for single-cell classification on the LIVECell dataset, where I found that previously reported CNN superiority is an artefact of unmatched pretraining, and that standard optimisation recipes like layer-wise learning rate decay can unexpectedly hurt under domain shift.

I'm now building on these findings through domain-adaptive self-supervised learning (PhaseDINO), knowledge distillation, and cross-modality robustness testing.

More broadly, I'm interested in the questions that sit beneath the benchmarks: what makes one representation transfer better than another? Why do some training recipes fail outside their native domain? How do inductive biases interact with pretraining regime?

My path to these questions came through analytical philosophy, psychology, and neuroscience before arriving at machine learning, I've always been drawn to how systems form knowledge and why they sometimes get it wrong.

Selected results

From a controlled comparison of eleven architectures on LIVECell 8-class single-cell classification, with matched pretraining conditions across all models:

Architecture Macro F1 Accuracy Note
EfficientNet-B5 92.25% 93.53% Best single model
EVA-02 (ViT-B/14) 90.51% 92.26% Best teacher for distillation
EVA-02 → EN-B0 91.01% 92.67% Cross-architecture distillation
EN-B5 → EN-B0 91.01% 92.18% Same-family distillation
3×EVA-02 council → EN-B0 92.05% 93.42% ViT council distillation. Best distillation
2xEVA-02 + EN-B5 council → EN-B0 91.41% 92.86% Mixed council distillation
ViT-B/16 89.93% 91.94%
ViT-S/8 90.61 92.20% Matches EVA-02 at ¼ parameters, better than the larger ViT-B/16 due to smaller patches/finer tokenisation

Layer-wise learning rate decay (LLRD), widely used for fine-tuning pretrained transformers, consistently hurts performance under ImageNet to LIVECell domain shift, a finding with implications for how standard NLP-derived training recipes are adopted in biomedical vision.

Publications

Pretraining Inverts the CNN–ViT Ranking for Single-Cell Classification: A Controlled Benchmark on LIVECell
Philip Graemer, Giuseppe Di Caprio
Manuscript in preparation · targeting arXiv 2026

Code

livecell-classification-benchmark

Controlled benchmark comparing CNN and Vision Transformer architectures for single-cell classification on LIVECell. Eleven architectures, pretrained and scratch conditions, multi-seed reproducibility.

PyTorch timm Mixed precision A100
livecell-distillation

Cross-architecture knowledge distillation. Temperature-scaled soft labels, combined hard/soft loss, council distillation. Didactic codebase with extensive inline documentation.

PyTorch timm Knowledge distillation
segmentation-crop-checker

Validation tool for single-cell crops extracted from segmentation masks. Flags multi-cell crops, boundary truncation, size outliers, and mask coverage issues.

Python Quality control

Contact