MIRACLE and Beijing Jishuitan Hospital cooperate to open source the CTSpine1K dataset


The spine is an important part of the musculoskeletal system, sustaining body mobility and protecting spinal cord, the most important neural pathway in the body. The spine consists of 7 cervical vertebrae (C1-C7), 12 thoracic vertebrae (t1-t12), 5 lumbar vertebrae (L1-L5), 1 sacral vertebra, and 1 caudal vertebra. Note that there are people containing L6 (resulting from sacral lumbarization) or losing L5 (resulting from lumbar sacralization) with a rare occurrence in a population. Each vertebra is at risk of disease due to bearing the load of human body. The common spinal diseases in the clinic mainly include spinal degenerative diseases and intervertebral disc herniation. Common spinal degenerative diseases include cervical spondylosis, lumbar spondylosis, and thoracic spinal stenosis [1,2]. Disc herniation includes cervical disc herniation, thoracic disc herniation, and lumbar disc herniation, among which lumbar disc herniation is the most common. Spine-related diseases have high morbidity and cause a huge burden of social cost.

Traditionally, because of the high contrast between bone and soft tissue, CT is the preferred method to study the spine. CT image segmentation of the spine is a key step in the automatic quantification of spine morphology and pathology. In recent years, deep learning has achieved remarkable success in various medical imaging applications, and many automatic spinal CT image segmentation methods have been proposed. However, all of these methods are data-driven and validated only on private or small public datasets. For example, spineweb [3], a commonly used multimodal spine database, only contains two CT data sets: csi201410 and xvertseg11, both of which only contain dozens of CT data. To solve the problem of large-scale data scarcity, sekuboyina et al. Organized a large-scale vertebrae segmentation benchmark (verse) challenge, including verse’19 and verse’20 [4,5]. In verse’19, they released 160 CT data to the public. For verse’20, they released 300 CT data (including 86 data from verse’19), which is also the largest public spine CT data set so far. Nevertheless, these two datasets are still very small. At the same time, most of the CT data of these two datasets are “clipped”, each CT slice contains only a small part of the region with vertebrae, and the surrounding image information is clipped. Therefore, a large-scale spinal CT data set is urgently needed.

To promote the research of spinal image analysis, we have launched a large-scale spinal dataset: CTSpine1k. Under the guidance of the doctor expert team of Beijing Jishuitan Hospital, we collected and annotated a large spine CT data set from multiple fields and different manufacturers, including 1005 CT data (more than 500000 CT slices and 11000 vertebrea), as shown in Table 1. We carefully designed the labeling process to ensure the quality of labeling, as shown in Figure 1. To the best of our knowledge, our CTSpine1k dataset is the largest publicly available labeled spine CT dataset.

Open source

We open-source this data set and the pre-trained nnU-Net model[6] . We believe that this large data set will facilitate the further study of many spine-related image analysis tasks, including but not limited to spinal segmentation, feature point detection, three-dimensional spine reconstruction based on biplanar X-ray images, image super-resolution, and enhancement. For the detailed introduction and download methods of the data set, please refer to the following resources:
Paper: Yang Deng, Ce Wang, Yuan Hui, et al. Ctspine1k: A large-scale dataset for spinal vertebrae segmentation in computed tomography. arXiv preprint arXiv:2105.14711 (2021). (https://arxiv.org/abs/2105.14711)
Code and data:https://github.com/ICT-MIRACLE-lab/CTSpine1K
For more open source of MIRACLE:http://miracle.ict.ac.cn/?page_id=1525


[1] Faruqi, S. et al. Vertebral compression fracture after spine stereotactic body radiation therapy: a review of the pathophysiology and risk factors. Neurosurgery 83, 314–322 (2018).
[2] Balériaux, D. L. & Neugroschl, C. Spinal and spinal cord infection. Eur. Radiol. Suppl. 14, E72–E83 (2004).
[3] http://spineweb.digitalimaginggroup.ca/
[4] https://verse2019.grand-challenge.org/
[5] https://verse2020.grand-challenge.org/
[6] Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J. & Maier-Hein, K. H. nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18, 203–211 (2021).