LUMOS Dataset: Lumbar Multimodal Osteoporosis Screening with X-ray and CT images

1Zhejiang Key Laboratory of Accessible Perception and Intelligent Systems, College of Computer Science, Zhejiang University
2Dept. of Orthopedic Seurgery, The Second Affiliated Hospital, Zhejiang University School of Medicine

ACM MM 2025 Dataset Track Under Review

Overview

1

Lumbar Multimodal Osteoporosis Screening dataset (LUMOS) is the first multimodal dataset specifically designed for lumbar osteoporosis screening. LUMOS integrates clinical data from 803 patients, including 1,620 anteroposterior/lateral lumbar X-rays with BMD values and T-scores, comprehensive demographic information, and 280 lumbar CT scans.

The advent of LUMOS is expected to propel forward research on automated osteoporosis classification, BMD prediction, and other related tasks. Its standardized and multimodal nature fills critical gaps in lumbar osteoporosis data, providing high-quality data support for the development and validation of medical AI algorithms in the early detection of osteoporosis.

Our findings

1

Existing public datasets for osteoporosis screening remain a significant bottleneck. (1) Lack anatomical imaging; (2) Lack paired conventional X-ray or CT images; (3) Lack standardized BMD reference values. These disparities highlight the urgent need for more comprehensive, standardized, and multimodal public datasets to advance osteoporosis research and improve diagnostic practices.

Dataset Description

1

(1) Data collection. The LUMOS dataset includes cases with a time interval of no more than 6 months between modalities, images with a radiological quality assessment score of 8 or higher and without motion artifacts, and records with full demographic information and complete DXA results.

(2) Data composition.

  • The LUMOS dataset comprises 803 exclusively Asian patients , with 1,620 X-ray images and 280 CT scans.
  • A pronounced age skew toward older adults: 38.1% (n =306) aged 60–69 and 25% ( n=201) aged 70–79, while only 0.4% (n=3) were under 20 years.
  • A significant female predominance is observed (80.7%, n=648 vs. 19.3% male, n=155).
  • Body Mass Index (BMI) distribution indicates 50.1% within the normal range (18.5–24), 33.1% overweight (24.1–28), and 9.2% obese (> 28).
  • Osteoporosis status is distributed as follows: 36% normal BMD (n=289), 36.2% osteopenia (n=291), and 27.8% osteoporosis (n=223), with a mean BMD of 0.95 kg/m² and mean T-score of -1.38.

(3) Data Analisis. The results reveal key relationships consistent with established clinical knowledge:

  • (a) The strong linear correlation between T-score and DXA-BMD aligns with the diagnostic definition of T-scores as BMD derivatives.
  • (b) The positive BMI-BMD relationship reflects the biomechanical principle that greater weight-bearing load stimulates the increase of bone density.
  • (c) The negative age-BMD correlation matches the universal trajectory of bone loss after peak bone mass.
  • (d) A higher prevalence of osteoporosis in women (39. 2% vs 23. 9% in men), while men had higher normal BMD rates (31. 8% vs 11. 0% in women), which corresponded to global epidemiological patterns of postmenopausal bone loss.

Examples of X-ray and CT images

1

Here are some examples of X-ray and CT images in the LUMOS dataset. CT images of an osteoporotic case reveal a more porous trabecular structures and a thinner cortical layer in comparison to both a normal case and an osteopenia case.

Examples of the clinical data

1

Here are some examples of the clinial data in the LUMOS dataset, including 3 demographic information (age, gender, and ethnicity),3 anthropometric measurements (height, weight, and body mass index BMI), the osteoporosis lable (0=normal, 1= osteopenia, 2=osteoporosis), and 7 lumbar DXA scan data for L1-L2, L1-L3, L1-L4, L2-L3, L2-L4, L3-L4 (BMD, T-score, Z-score, bone mineral content BMC, area, and computed vertebral height/width averages)

Potential Applications

Osteoporosis Classification Task

We test various osteoporosis screening methods on the LUMOS dataset, with accuracy (ACC) and F1 score (F1) metrics, and the results are shown in the below Table. X-rays, accessible even in resource-limited areas, can be analyzed by these machine learning algorithms to identify features like cortical bone thinning, trabecular pattern alterations, and fractures, offering a cost-effective initial screening method. CT scans, with their high-resolution cross-sectional views, allow for the development of sophisticated models that detect subtle microarchitectural changes not easily visible on X-rays, facilitating more accurate and nuanced diagnosis. Multimodal models, combining X-ray, CT, leverage the unique strengths of each data source to capture complex relationships between imaging features and patient factors, achieving higher accuracy in diagnosing and prognosticating osteoporosis compared to single-modality methods.

1

Bone mineral density (BMD) Prediction Task

We conduct BMD prediction experiments on multiple methods using LUMOS' multimodal data, and measure prediction accuracy through metrics such as Root Mean Square Error (RMSE) and Pearson Correlation Coefficient (PCC). The results are shown in the above table. By analyzing visual cues such as bone radiolucency and trabecular pattern density in lumbar X-rays, models trained on this dataset can estimate BMD, offering a viable option for initial assessments. The CT modality, with its high-resolution capabilities, enables in-depth analysis of bone microarchitecture, allowing for more precise BMD prediction. The dataset's multimodal nature truly shines when it comes to BMD prediction, as multimodal models can integrate the quick initial assessments from X-rays, the detailed structural analysis from CTs, the reference values from DXA, and the patient-specific factors from metadata. This comprehensive integration overcomes the limitations of individual modalities, delivering highly accurate and reliable BMD predictions that enhance osteoporosis management and improve patient outcomes.

Medical Image Segmentation Task

1

We use the point-prompt and box-prompt interface of Segment Anything Model (SAM) and demonstrate robust automated segmentation results on LUMOS images, as illustrated in the above figure. Although these approaches may not match the accuracy of fully-supervised methods with high-fidelity ground truth, they validate the LUMOS dataset’s utility for segmentation research and highlight its potential to drive innovation in medical image segmentation.

Medical Image Synthesis Task

In regions with scarce medical resources, synthesizing CT images from X-rays holds significant clinical value. The previous study like X2CTGAN has attempted to use GANs to generate CT from two orthogonal X-rays. However, the scarcity of paired X-ray-CT datasets in medical research has forced reliance on synthetic X-rays created via CT-derived DRR (Digital Reconstructed Radiography) projection for training and validation. The LUMOS dataset addresses this critical gap by providing paired CT and orthogonal X-ray images, offering a rare real-world dataset for X-ray to CT image synthesis tasks. What's more, inspired by recent advancements in neural radiance fields (NeRRFs) for 3D image generation from a single natural image, LUMOS further enables the exploration of NeRF-based methods to generate 3D CT-like volumetric data from single-view X-rays, overcoming the need for multi-angle acquisitions. LUMOS' aligned, multimodal imaging of the lumbar spine provides the essential foundation for developing and validating these innovative techniques, driving forward the translation of AI-driven imaging solutions to resource-constrained environments.

Medical Image Standardization Task

The LUMOS dataset encompasses a wide variety of images, with differences in magnetic field strengths, number of slices, slice thickness, and scanner manufacturers. This diversity makes it an ideal resource for developing domain generalization and image standardization techniques. Image standardization aims to transform images from different sources into a consistent format, reducing variability and improving the performance of AI models across different datasets and imaging devices. Researchers can explore methods to normalize image intensity, align anatomical structures, and standardize image resolutions using the LUMOS dataset. These can lead to the development of more effective image analysis techniques and potentially improve the diagnosis and treatment of various diseases, not just limited to osteoporosis.

LUMOS © 2025 by Keyue Shi is licensed under CC BY-NC 4.0