Xiaoman Zhang (张小嫚)

Hi, I am a PhD student at Shanghai Jiao Tong University, advised by Prof. Weidi Xie and Prof. Ya Zhang. I received my bachelor degree from University of Science and Technology of China (USTC) in June 2019.

My recent research interest focuses on Artificial Intelligence for Healthcare (AI4Health), with the ultimate goal of developing a generalist medical foundation model.

Email  /  Google Scholar  /  Github  /  CV  /  Twitter

profile photo

Research

Representative papers are highlighted.

RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis
Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Jiayu Lei, Ya Zhang, Yanfeng Wang, Weidi Xie
Technical Report, 2024.
In this paper, we introduce RadGenome-Chest CT, a comprehensive, large-scale, region-guided 3D chest CT interpretation dataset based on CT-RATE. It includes: Organ-level segmentation for 197 categories; 665K multi-granularity grounded reports; 1.3M grounded VQA pairs.
Knowledge-enhanced Visual-Language Pretraining for Computational Pathology
Xiao Zhou, Xiaoman Zhang, Chaoyi Wu, Ya Zhang, Yanfeng Wang, Weidi Xie
Technical Report, 2024.
In this paper, we consider the problem of visual representation learning for computational pathology, by exploiting large-scale image-text pairs gathered from public resources, along with the domain specific knowledge in pathology. We curate a pathology knowledge tree that consists of 50,470 informative attributes for 4,718 diseases requiring pathology diagnosis from 32 human tissues, and develop a knowledge-enhanced visual-language pretraining approach.
Towards Building Multilingual Language Model for Medicine
Pengcheng Qiu*, Chaoyi Wu*, Xiaoman Zhang, Weixiong Lin, Haicheng Wang, Ya Zhang, Yanfeng Wang, Weidi Xie
Technical Report, 2024.
In this paper, we construct a new multilingual medical corpus, that contains approximately 25.5B tokens encompassing 6 main languages, termed as MMedC. We propose a new multilingual medical multi-choice question-answering benchmark with rationale, termed as MMedBench. Our final model, termed as MMedLM 2, with only 7B parameters, achieves superior performance compared to all other open-source models, even rivaling GPT-4 on MMedBench.
One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts
Ziheng Zhao, Yao Zhang, Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie
Technical Report, 2024.
In this paper, we build a large-scale segmentation dataset by collecting over 11K 3D medical image scans from 31 segmentation datasets (SAT-DS), and a model that can Segment Anything in medical scenarios, driven by Text prompts, termed as SAT. The datasets and model will continue update in the future, stay tuned!
Large-scale Long-tailed Disease Diagnosis on Radiology Images
Qiaoyu Zheng*, Weike Zhao*, Chaoyi Wu*, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie
Technical Report, 2023.
In this paper, we collect a large-scale multi-modal, multi-scan, long-tailed muti-lable diagnosis (classification) dataset. We further propose a vision-encoder together with a fusion module, enabling arbitary scan input per case. On evaluation, our methods achive better experiment results on our benchmark and can also serve as an pre-train mdoel for external datasets.
Can GPT-4V(ision) Serve Medical Applications ? Case Studies on GPT-4V for Multimodal Medical Diagnosis
Chaoyi Wu*, Jiayu Lei*, Qiaoyu Zheng*, Weike Zhao*, Weixiongt Lin*, Xiaoman Zhang*, Xiao Zhou*, Ziheng Zhao*, Ya Zhang, Yanfeng Wang , Weidi Xie
Technical Report, 2023.
We present recent efforts on assessing GPT-4V for multimodal medical diagnosis, by case studies, covering 17 human body systems, across 8 clinical imaging modalities, e.g., radiology, pathology.
UniBrain: Universal Brain MRI Diagnosis with Hierarchical Knowledge-enhanced Pre-training
Jiayu Lei, Lisong Dai, Haoyun Jiang, Chaoyi Wu, Xiaoman Zhang, Yao Zhang, Jiangchao Yao, Weidi Xie, Yanyong Zhang, Yuehua Li, Ya Zhang, Yanfeng Wang
Technical Report, 2023.
In this paper, we propose a hierarchical knowledge-enhanced pre-training framework for the universal brain MRI diagnosis, termed as UniBrain. Specifically, UniBrain leverages a large-scale dataset of 24,770 imaging-report pairs from routine diagnostics.
Towards Generalist Foundation Model for Radiology
Chaoyi Wu*, Xiaoman Zhang*, Ya Zhang, Yanfeng Wang , Weidi Xie
Technical Report, 2023.
In this paper, we have constructed a complete set of medical foundation model-building processes, including data collection, problem formulation, model design, training, and evaluation. We construct the largest medical multi-modal database in this paper and in model capabilities, compared to existing work, our model is able to process multiple 3D or 2D image inputs interleaved with texts, which fits the practical usage more. We surpass the latest open-source multi-modal foundation model significantly.
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering
Xiaoman Zhang*, Chaoyi Wu*, Ziheng Zhao, Weixiong Lin, Yanfeng Wang , Ya Zhang, Weidi Xie
Technical Report, 2023.
In this paper, we introduce PMC-VQA, a large-scale medical visual question-answering dataset, which contains 227k VQA pairs of 149k images that cover various modalities or diseases.
PMC-LLaMA: Towards Building Open-source Language Models for Medicine
Chaoyi Wu*, Weixiong Lin*, Xiaoman Zhang, Yanfeng Wang , Ya Zhang, Weidi Xie
Technical Report, 2023.
In this report, we systematically investigate the process of adapting a general-purpose foundation language model towards medical domain, this involves data-centric knowledge injection through the integration of 4.8M biomedical academic papers and 30K medical textbooks, as well as comprehensive fine-tuning for alignment with domain-specific instructions.
PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents
Weixiong Lin*, Ziheng Zhao*, Xiaoman Zhang, Chaoyi Wu, Yanfeng Wang , Ya Zhang, Weidi Xie
MICCAI, 2023
We collect a biomedical dataset, PMC-OA with 1.6M image-caption pairs collected from PubMedCentral's OpenAccess subset.
Knowledge-enhanced Visual-Language Pre-training on Chest Radiology Images
Xiaoman Zhang, Chaoyi Wu, Yanfeng Wang , Ya Zhang, Weidi Xie
Nature Communications , 2023. (Impact Factor: ~18)
In this paper, we propose a knowledge-enhanced vision-language pre-training approach for auto-diagnosis on chest X-ray images. The algorithm, named Knowledge-enhanced Auto Diagnosis (KAD), first trains a knowledge encoder based on an existing medical knowledge graph, and then leverages the pre-trained knowledge encoder to guide the visual representation learning with paired chest X-rays and radiology reports.
K-Diag: Knowledge-enhanced Disease Diagnosis in Radiographic Imaging
Chaoyi Wu*, Xiaoman Zhang*, Yanfeng Wang , Ya Zhang, Weidi Xie
MICCAI Big Task Small Data Workshop, 2023   (Oral)
In this paper, we consider the problem of disease diagnosis. Unlike the conventional learning paradigm that treats labels independently, we propose a knowledge-enhanced framework, that enables training visual representation with the guidance of medical domain knowledge.
MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training in Radiology
Chaoyi Wu, Xiaoman Zhang, Yanfeng Wang , Ya Zhang, Weidi Xie
ICCV, 2023
We propose to leverage medical specific knowledge enhancing language-image pre-training method, significantly advancing the ability of pre-trained models to handle unseen diseases on zero-shot classification and grounding tasks.
Self-supervised Tumor Segmentation with Sim2Real Adaptation
Xiaoman Zhang, Weidi Xie, Chaoqin Huang, Ya Zhang, Xin Chen, Qi Tian, Yanfeng Wang
IEEE Journal of Biomedical and Health Informatics, 2023
we propose a two-stage Sim2Real training regime for unsupervised tumor segmentation, where we first pre-train a model with simulated tumors, and then adopt a self-training strategy for downstream data adaptation.
SAR: Scale-Aware Restoration Learning for 3D Tumor Segmentation
Xiaoman Zhang, Shixiang Feng, Yuhang Zhou, Yanfeng Wang , Ya Zhang
MICCAI, 2021


Based on a template by Jon Barron.