作者：Bram de Wilde Anindo Saha Richard P. G. ten Broek Henkjan Huisman
Diffusion-based models for text-to-image generation have gained immensepopularity due to recent advancements in efficiency, accessibility, andquality. Although it is becoming increasingly feasible to perform inferencewith these systems using consumer-grade GPUs, training them from scratch stillrequires access to large datasets and significant computational resources. Inthe case of medical image generation, the availability of large, publiclyaccessible datasets that include text reports is limited due to legal andethical concerns. While training a diffusion model on a private dataset mayaddress this issue, it is not always feasible for institutions lacking thenecessary computational resources. This work demonstrates that pre-trainedStable Diffusion models, originally trained on natural images, can be adaptedto various medical imaging modalities by training text embeddings with textualinversion. In this study, we conducted experiments using medical datasetscomprising only 100 samples from three medical modalities. Embeddings weretrained in a matter of hours, while still retaining diagnostic relevance inimage generation. Experiments were designed to achieve several objectives.Firstly, we fine-tuned the training and inference processes of textualinversion, revealing that larger embeddings and more examples are required.Secondly, we validated our approach by demonstrating a 2\% increase in thediagnostic accuracy (AUC) for detecting prostate cancer on MRI, which is achallenging multi-modal imaging modality, from 0.78 to 0.80. Thirdly, weperformed simulations by interpolating between healthy and diseased states,combining multiple pathologies, and inpainting to show embedding flexibilityand control of disease appearance. Finally, the embeddings trained in thisstudy are small (less than 1 MB), which facilitates easy sharing of medicaldata with reduced privacy concerns.