Experimenting with Deep Neural Networks for X-ray Image Segmentation

Sergey Kovalev


Deep neural networks present a great interest for the field of medical image segmentation. This article shares the results of the exploratory phase of the research aimed at examining the potential of deep learning methods and encoder-decoder convolutional neural networks for lung image segmentation. The study was conducted by our partners at the Biomedical Image Analysis Department of the United Institute of Informatics Problems, National Academy of Sciences of Belarus.


Training data set

The training data set consisted of 354 chest X-ray images accompanied by the lung masks obtained through manual segmentation. Two different image sources were used:

  • 107 images from the Belarus tuberculosis portal manually segmented during the preliminary phase of this project
  • 247 images from the JSRT database

Examples of the original images and corresponding lung masks are illustrated in the following figure.


Examples of X-rays and corresponding lung masks


Network architecture and training parameters

In the figure below, you can find the neural network architecture that was used during the study.


Simplified scheme of encoder-decoder neural network architecture

The network had a typical deep architecture with the following key elements:

  • 26 convolutional layers
  • 25 batch normalization layers
  • 25 ReLU layers
  • 5 upsampling layers

All experiments and testing were performed using the Caffe framework. The input and output network fragments are illustrated in the figure below.



Input (top) and output (bottom) network elements

The neural network was trained on the Nvidia TITAN X graphics processor with 12 GB of GDDR5 memory. The network training parameters were:

  • Batch size: 6
  • Caffe solver: SGD
  • Number of iterations: 5,000
  • Number of epochs: 85

The total time of the neural network training was approximately three hours. During the training stage, the neural network used approximately 11 GB of GPU memory.



The resultant segmentation accuracy was assessed by comparing the automatically obtained lung areas with the manual version using Dice’s coefficient, which is calculated as:



  • T is the lung area resulted from manual segmentation and considered as ground truth.
  • S is the area obtained through automatic segmentation using the neural network.

During the testing stage, the average accuracy was estimated as 0.962 (the minimum score value was 0.926 and maximum score value was 0.974) with the standard deviation of 0.008.

Examples of the best and worst segmentation results are given in figures below.


Examples of segmentation results with the maximum Dice score

The red area in the image above presents the results of segmentation using the trained neural network, and the white line shows the ground truth lung mask boundary.


Examples of segmentation results with the minimum Dice score

Similar to the previous image, the red area shows the results of segmentation using the trained neural network, and the white line presents the ground truth lung mask boundary.

The results obtained during this study have demonstrated that encoder-decoder convolutional neural networks can be considered as a promising tool for automatic lung segmentation in large-scale projects. For more details about the conducted research, see Lung Image Segmentation Using Deep Learning Methods and Convolutional Neural Networks.

The described scenario was implemented with the Caffe deep learning framework. If you have tried to use Deeplearning4j, TensorFlow, Theano, or Torch for this purpose, share your experience in the comments.


About the author

Sergey Kovalev is a senior software engineer with extensive experience in high-load application development, big data and NoSQL solutions, cloud computing, data warehousing, and machine learning. He has strong expertise in back-end engineering, applying the best approaches to development, architecture design, and scaling. Sergey also has a solid background in various software development practices, such as the Agile methodology, prototyping, patterns, refactoring, and code review. Now, his main interest lies in big data distributed computing and machine learning.

To stay tuned with the latest updates, subscribe to our blog or follow @altoros.

Get new posts right in your inbox!

  • Where’s your code?

    • Hi, Yihui He,

      The source code is based on SegNet, “a deep convolutional encoder-decoder architecture for semantic pixel-wise labelling”:

      In particular, the modified version of the Caffe framework is used here (with the upsampling layer info for the decoding process). The tools was created by the authors of SegNet:

      More details in this research paper by our partners from the National Academy of Sciences (Belarus):

  • Zhu Hui

    You use only 354 images to train this deep CNN, then how many images are you using for the test? are they from another image sources or just from your training datasets?

    • Hi, Zhu Hui,

      As the amount of segmented data for the X-ray images was limited to

      • Zhu Hui

        Hi Alex, thanks for your information.

        In this case, I guess your deep CNN may benefit a lot from the original SegNet trained weights and/or the ~6100 CT images. Since the number of X-ray images is not so critical here(200,300 or 400 images might produce similar train results), it will be more convincing to split the 354 images into training/testing sets as usual, in order to showcase its reliability. 😛

  • Huaiyang Gongzi

    Hi Alex, it is a very nice post. Just curious, are there any Tensorflow implementation for the SegNet structure you are using? Thanks.

    • Hi, Huaiyang Gongzi,

      Unfortunately, there was no implementation for SegNet during that project. However, the team managed to create another related solution based on a different model. You may also find it interesting:

  • gsedej

    Hi Alex!

    I am trying to use Caffe (segnet) for machine segmentation of head
    x-ray images (bones and tissue) for my diploma. Your work on lung images
    would be very useful for me – I intend to use it for supervised
    training on even smaller dataset of head x-ray images (that i will
    manually segment for ground truth).

    I am looking for dataset that you used – x-ray and segmented images.

    There are only x-ray images in databases you mentioned, but no segmented images. Did you
    manually segmented ~400 images? Is it possible to also get segmented
    images that you used? Also your .caffemodel weights would be also useful if

    Thank you for the reply!
    Gašper Sedej

Benchmarks and Research

Subscribe to new posts

Get new posts right in your inbox!