Medical device prediction using a Convolutional Neural Network (CNN)

Author: Agustín Bignú

Physicist – Machine Learning Engineer.

Keywords: Machine Learning – Deep Learning – silicone stent – tracheal prosthesis – Convolutional Neural Network

Abstract

This is the first paper of a more ambitious Stening^® project that aims to revolutionize the market and production in the field of medicine and custom-made prostheses. In this introductory text of the project, we trained a neural network so that it is able to distinguish six different classes of devices. The results obtained are very encouraging and demonstrate the potential that this type of technology has in the future of medicine.

Introduction

At Stening^® we have always been interested in research and innovation. That is why we continually renew ourselves. Artificial intelligence is a computing discipline that is growing by leaps and bounds and has more and more impact on medicine.

The main objectives of the study are the following:

to boost the company's innovative and research level
to make a difference in the market sector in which Stening^® competes, being pioneers in this type of research
to make artificial intelligence and machine learning known

Type of study

This is a computational study oriented to the field of artificial intelligence.

Theoretical foundation

The objective of this section is to introduce the type of technology used. In this way, we will proceed to present what Artificial Intelligence (AI) is, neural networks, and the type of network used for this practical case, the Convolutional Neural Network (CNN).

Artificial intelligence

AI is a branch of computing that focuses on creating computer programs whose objective is to perform various operations and tasks considered proper to human intelligence, such as self-learning.

Specifically, we will focus on the discipline of machine learning. This discipline, within AI, is responsible for creating all this type of program. Within machine learning we find three clearly differentiated branches of learning:

supervised
unsupervised
reinforcement

The first consists of algorithms that receive a series of data from which they then learn, since they know the answer. In this way, they use what they have learned to then make predictions. The second type of learning differs from the first in that we do not know the answers, so the approach to the problem is different. The algorithms belonging to this type of learning focus on analyzing the datasets. The third type does not use large amounts of data, like the other two, but learns through trial and error in an environment. This type of learning is the one used in autonomous driving, robotics or AI applied to games.

In this study, supervised learning was used. A dataset was collected, in this case images of medical devices from six different classes. Then, the data were fed into a neural network that analyzed the images, learning from them and being able to predict new devices of these classes. In the following section we will explain what a neural network is.

Neural Networks¹

Artificial neural networks are inspired by the workings of the brain. We humans are able to see an image and describe what appears in it. We are also able to listen to a piece of music and know what genre it is. We know all this because we have heard it on another occasion and learned from it. We are applying supervised learning.

The main objective of AI is to recreate this type of behavior and learning, as stated earlier. To do this, algorithms are created, such as neural networks. This type of algorithm connects nodes called neurons. The reason for this name is that they try to emulate the neurons of the brain, which act as units that process the information that enters us through the senses: sight, hearing, touch, etc.
Figure 1: neural network²

In figure 1 a fairly simple structure of a neural network can be seen. The circles are the nodes or neurons. As can be appreciated, the network consists of two layers, an input one (the lower one) and an output one (the upper one). In the image, the "x" are the data we feed into the network and the "y" are the results we obtain from the network. The "w" are numbers that are updated, called weights. These mark the learning of the network. They will be updated as the neural network trains. The network will stop training when it has reached the percentage of certainty appropriate for our objective.

It should be noted that a neural network can have more layers of neurons, not just two. Networks that have internal layers, called hidden layers, are the most used. In this case study, a type of neural network ideal for image-recognition tasks is used, called a Convolutional Neural Network. It will be explained in the following section.

Convolutional Neural Network³

A CNN is a type of neural network based on the structure and work of the brain's first vision layers. In 1998, this type of network is introduced (LeCun et al.)⁴. A CNN is composed of two main parts: one that is responsible for extracting information from the image and another that is responsible for classifying it.

Figure 2: basic components of a CNN⁵

In the first part of this network, feature extraction, we extract the information from the image. We perform a method called convolution on it. We will try to understand, briefly, this first part. Here we extract the main qualities of the image, in our case: shape of the device, position, length, etc.

The convolution process, broadly speaking, consists of progressively reducing the dimensions of the image in such a way that we keep the most relevant part, the one that gives us the most information to be able to classify it correctly. This first phase contains the largest number of layers of the network.

Once the first phase is over, we move on to the second: classification. This last phase contains only two layers of neurons: the first receives the information that reaches us from the first phase and passes it to the output layer, which tells us which class the image that was entered at the beginning belongs to.

Results

In this section we will analyze the construction of the dataset, describe the programs with which we obtained the results, and study the results obtained.

Dataset

The construction of the dataset was our own. Given the demand for data to achieve high accuracy in the classification, we had to proceed in an innovative way. That is, to obtain a high classification percentage (>90%), a high number of images per class is necessary (>500). Thus, not having such a quantity of photos, we proceeded to make the tubes with a 3D design program called Blender, version 2.9, obtained for free from its website (Blender: www.blender.org). All of them are based on the stents manufactured by Stening^®.

The results of the designs were the following:

Figure 3: 3D devices generated with the Blender program

In figure 3, A) HE (Thin-Wall Stent), B) SET (Tracheal Stenosis Stent), C) SY13 (Stening^® Y), D) SY16 (Stening^® Y), E) TF12 (Pharyngeal Tube) and F) TM (T-tube).

For each class, 1400 images were designed, of which 1000 were used for training and the other 400 to make predictions (testing). To give the images versatility, the devices were rotated to have them in different positions and to encourage the network to find different features to rely on.

Programs and results

Two programs written in Python, version 3.5, were made. One of the programs contains the neural network and performs the training; the other receives an image and makes the prediction (on images the network never saw). We relied on the Keras library for machine learning, as well as OpenCV to visualize the images.

In the first program, the neural network is located. It was built on a CNN called VGG16⁶. It is a 16-layer neural network capable of recognizing different classes of images. The last layers of the network, corresponding to classification, were removed and our own were introduced so that it can recognize the images we want.

The training and prediction results were the following:
Figure 4: training and prediction results

In figure 4 we see two graphs. The first shows the progression of the prediction, both during training and during testing. A total of 25 epochs were carried out. We see in the graph that, upon reaching the last epoch, the prediction, both in training and in testing, exceeded 90% certainty. In the second graph we have training and testing. Here we measure the error made between the neural network's prediction and the correct result of the prediction. It can also be appreciated how, in the first epoch, the error made is very high and, as the epochs pass, it decreases until it almost reaches zero.

Conclusions

As we saw, the results obtained were very good, reaching a certainty of more than 90%. Looking to the future, improvements could be made regarding the dataset. That is, moving to training with real photos of the devices and seeing results.

References

E. Alpaydin. Introduction to Machine Learning, Second Edition. The MIT Press, 2010.
E. Alpaydin. Introduction to Machine Learning, Second Edition. The MIT Press, 2010. Fig. 11.2.
Convolutional Neural Networks, https://medium.com/x8-the-ai-community/cnn-9c5e63703c3f
Y. LeCun, L. Bottou, Y. Bengio and P. Haffner. Gradient-Based Learning Applied to Document Recognition. Proc. Of The IEEE, 1998.
ResearchGate, https://www.researchgate.net/figure/An-Example-CNN-architecture-for-a-handwritten-digit-recognition-task_fig1_220785200 Fig. 1.
K. Simonyan and A. Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556.

Back to Case studies Download PDF