We propose PMC-OA, a biomedical dataset with 1.6M image-caption pairs collected from PubMedCentral’s OpenAccess subset, covers diverse modalities or diseases, with majority of the image-caption samples aligned at finer-grained level, i.e., subfigure and subcaption.
Project page  /  Github  /  Paper
We propose PMC-VQA, a large-scale medical visual question-answering dataset, which contains 227k VQA pairs of 149k images that cover various modalities or diseases. Additionally, we propose a test set, which is significantly more challenging than all existing ones, even the best model struggles to solve.
Project page  /  Github  /  Paper
MedKLIP is proposed to enhance self-supervised visual-language pre-training (VLP) with medical-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice. First, we propose a report filter to extract useful medical entities with more useful supervision signals, simplifying complex raw reports with minimal information loss. Second, we translate entities into detailed medical descriptions and embed them with a text encoder enabling the network to understand complex medical expert-level knowledge. Finally, a transformer-based structure is proposed to do local region alignment.
Project page  /  Github  /  Paper
K-Diag is proposed for disease diagnosis. Unlike the conventional learning paradigm that treats labels independently, we propose a knowledge-enhanced framework, that enables training visual representation with the guidance of medical domain knowledge. First, to explicitly incorporate experts' knowledge, we propose to learn a neural representation for the medical knowledge graph via contrastive learning, implicitly establishing relations between different medical concepts. Second, while training the visual encoder, we keep the parameters of the knowledge encoder frozen and propose to learn a set of prompt vectors for efficient adaptation. Third, we adopt a Transformer-based disease-query module for cross-model fusion, which naturally enables explainable diagnosis results via cross attention.
Project page  /  Github  /  Paper
KAD is a knowledge-enhanced vision-language pre-training approach for auto-diagnosis on chest X-ray images. The algorithm, named as Knowledge-enhanced Auto Diagnosis (KAD), first trains a text encoder on an existing medical knowledge graph to embed knowledge about the concept definitions and concept relationships, and then leverages the pre-trained text encoder to enhance the image-text contrastive learning from paired chest X-rays and radiology reports.
Project page  /  Github  /  Paper
We finetuning LLaMA on 4.8 million biomedical papers from Pubmed, after several epochs, it has already enhanced capabilities in the medical domain. The proposed model, PMC-LLaMA, achieves high performance on biomedical QA benchmarks.