Virus Xray Image Classification with Tensorflow Keras Python and Apache Spark Scala

Disclaimer:

This writing is exclusively and entirely for educational purpose in the field of computer science. Only government medical board-certified radiologist can and should perform diagnosis from an Xray image.

Introduction

Can average person tell the difference from a picture of cat or dog? Probably yes. Can average person tell the difference from looking at an Xray photo and tells the difference between normal or virus caused pneumonia? Not unless that person is a board-certified medical professional.
For educational purpose in computer science on machine learning, can a computer after it is trained by a given dataset (labeled Xray pictures) that are empirically true to differentiate an Xray photo and tells the difference between normal and virus caused pneumonia from Xray images? That needs to be found out.

Data Preparation

To begin with, I downloaded the Xray image dataset from Kaggle (Coronahack chest Xray dataset)
and build a neural network with Tensorflow Keras train the machine.
Generally, dataset to be used in image recognition is usually stored the following way, because the dataset is not a single file, with features and label, but many image files such as jpegs and a csv file telling the label and file name for each image file.
For image classification, common practice would be creating a folder, name the folder with label name, and place all the image files belong to that label inside that folder.
Therefore, I placed the files in below directory structure:
1
Train:
2
./
3
β”œβ”€β”€ normal
4
└── virus
5
Validation:
6
./
7
β”œβ”€β”€ normal
8
└── virus
9
​
Copied!

Data Preprocessing

Apache Spark SQL API Image Read API Scala code to explore the image size

First, determine the image size by the following Scala code invoking Apache Spark Image read API:
1
val df = spark.read.format("image").option("dropInvalid", true).load("file:///home/bigdata2/dataset/Coronahack-Chest-XRay-Dataset/Coronahack-Chest-XRay-Dataset/train/normal")
2
df.select("image.origin", "image.width", "image.height").show(3)
3
​
4
/* 
5
+--------------------+-----+------+
6
| origin|width|height|
7
+--------------------+-----+------+
8
|file:///home/bigd...| 2619| 2628|
9
|file:///home/bigd...| 2510| 2543|
10
|file:///home/bigd...| 2633| 2578|
11
+--------------------+-----+------+
12
only showing top 3 rows
13
*/
14
​
Copied!

Scala code to resize the jpeg image

The images are large, around 2500*2500, about 6 MP. This means, each pixel is a feature, or a column, this is like a table that has 6 million columns.
Therefore, I need to downsize to smaller image. I wrote the following Scala code to resize the image from about 2500*2500 to about 300*350, about one MP.
1
import java.awt.image.BufferedImage
2
import java.io.File
3
import javax.imageio.ImageIO
4
​
5
import javax.swing.ImageIcon;
6
import java.awt.Image;
7
import java.awt.Color;
8
import java.awt.Graphics2D;
9
import java.awt.RenderingHints;
10
​
11
//Get the Image path of the training image files, both normal and virus
12
​
13
val normal=new java.io.File("/home/bigdata2/dataset/Coronahack-Chest-XRay-Dataset/Coronahack-Chest-XRay-Dataset/train/normal/").listFiles
14
//val bacteria=new java.io.File("/home/bigdata2/dataset/Coronahack-Chest-XRay-Dataset/Coronahack-Chest-XRay-Dataset/train/bacteria/").listFiles
15
val virus=new java.io.File("/home/bigdata2/dataset/Coronahack-Chest-XRay-Dataset/Coronahack-Chest-XRay-Dataset/train/virus/").listFiles
16
​
17
/*
18
Write a helper function to resize each image file to desired width and height
19
and save the resize image file into desired path
20
​
21
*/
22
​
23
def resizeImage(image:Array[java.io.File],base:String,width:Int,height:Int):Unit=
24
{
25
//val width = 300
26
//val height = 350
27
​
28
for (filePath<-image){
29
// Load image from disk
30
var originalImage: BufferedImage = ImageIO.read(new File(filePath.toString))
31
//var originalImage: BufferedImage = ImageIO.read(new File("/home/bigdata2/dataset/Coronahack-Chest-XRay-Dataset/Coronahack-Chest-XRay-Dataset/train/normal/IM-0419-0001.jpeg"))
32
// Resize
33
var resized = originalImage.getScaledInstance(width, height, Image.SCALE_DEFAULT)
34
​
35
// saving image back to disk
36
var bufferedImage = new BufferedImage(width, height, BufferedImage.TYPE_INT_RGB)
37
bufferedImage.getGraphics.drawImage(resized, 0, 0, null)
38
//println(base+filePath.toString.split("/").last)
39
ImageIO.write(bufferedImage, "JPEG", new File(base+filePath.toString.split("/").last))
40
}
41
}
42
​
43
//resize the train/normal images to 300*350 and saved into target path
44
​
45
resizeImage(normal,"/mnt/common/20200510/train/normal/",300,350)
46
​
47
//resize the train/virus images to 300*350 and saved into target path
48
​
49
​
50
//Get the Image path of the validation image files, both normal and virus
51
​
52
​
53
val normalV=new java.io.File("/home/bigdata2/dataset/Coronahack-Chest-XRay-Dataset/Coronahack-Chest-XRay-Dataset/validation/normal/").listFiles
54
​
55
val virusV=new java.io.File("/home/bigdata2/dataset/Coronahack-Chest-XRay-Dataset/Coronahack-Chest-XRay-Dataset/validation/virus/").listFiles
56
​
57
//resize the validation/normal images to 300*350 and saved into target path
58
​
59
resizeImage(normalV,"/mnt/common/20200510/validation/normal/",300,350)
60
​
61
//resize the validation/virus images to 300*350 and saved into target path
62
​
63
resizeImage(virusV,"/mnt/common/20200510/validation/virus/",300,350)
64
​
Copied!
After resizing images to 300*350, the new location of the image files are in /mnt/common/20200510
1
./
2
β”œβ”€β”€ train
3
β”‚ β”œβ”€β”€ normal
4
β”‚ └── virus
5
└── validation
6
β”œβ”€β”€ normal
7
└── virus
8
​
Copied!

Algorithm Selection

Image classification is typically by convolutional neural network. I use Tensorflow/Keras. Now I need to switch language from Scala to Python to invoke Keras APIs.

Original Xray image

This is the example of the image before resizing:
1
from IPython.display import Image
2
Image(filename='/home/bigdata2/dataset/Coronahack-Chest-XRay-Dataset/Coronahack-Chest-XRay-Dataset/validation/normal/NORMAL2-IM-1423-0001.jpeg')
Copied!

Resized Xray Image

This is the example of resized image that is label as normal
1
Image(filename='/mnt/common/20200510/validation/normal/NORMAL2-IM-1423-0001.jpeg')
Copied!
This is the example of resized image that is labeled as pneumonia by virus
1
Image(filename='/mnt/common/20200510/validation/virus/person1609_virus_2791.jpeg')
Copied!

Xray CNN image classification by Keras

Following is the code to train the machine to classify Xray Images whether normal or pneumonia by virus by convolutional neural network with Keras and Tensorflow on the background
1
import keras
2
from keras.models import Sequential
3
from keras.layers import Conv2D, MaxPooling2D
4
from keras.layers import Activation, Dropout, Flatten, Dense
5
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
6
​
7
#Create a convolutional neural network model
8
​
9
model = Sequential()
10
model.add(Conv2D(32, (2, 2), input_shape=(300, 350,3)))
11
model.add(Activation('relu'))
12
model.add(MaxPooling2D(pool_size=(2, 2)))
13
model.add(Conv2D(32, (2, 2)))
14
model.add(Activation('relu'))
15
model.add(MaxPooling2D(pool_size=(2, 2)))
16
model.add(Conv2D(64, (2, 2)))
17
model.add(Activation('relu'))
18
model.add(MaxPooling2D(pool_size=(2, 2)))
19
model.add(Flatten()) # converts 3D feature maps to 1D feature vectors
20
model.add(Dense(64))
21
model.add(Activation('relu'))
22
model.add(Dropout(0.5))
23
model.add(Dense(1))
24
model.add(Activation('sigmoid'))
25
​
26
# Once the model is created, config the model with losses and metrics with model.compile()
27
​
28
model.compile(loss='binary_crossentropy',
29
optimizer='rmsprop',
30
metrics=['accuracy'])
31
​
32
batch_size = 16
33
​
34
# augmentation configuration for training
35
train_datagen = ImageDataGenerator(
36
rescale=1./255,
37
shear_range=0.2,
38
zoom_range=0.2,
39
horizontal_flip=True)
40
​
41
train_generator = train_datagen.flow_from_directory(
42
'/mnt/common/20200510/train/', # this is the target directory
43
target_size=(300, 350), # all images will be resized to 150x150
44
batch_size=batch_size,
45
class_mode='binary') # since we use binary_crossentropy loss, we need binary labels
46
​
47
# Found 2016 images belonging to 2 classes.
48
​
49
# generator for validation data
50
validation_generator = test_datagen.flow_from_directory(
51
'/mnt/common/20200510/validation/',
52
target_size=(300, 350),
53
batch_size=batch_size,
54
class_mode='binary')
55
​
56
# Found 670 images belonging to 2 classes.
57
​
58
# Fits the model on data yielded batch-by-batch by a Python generator
59
model.fit_generator(
60
train_generator,
61
steps_per_epoch=2000 // batch_size,
62
epochs=50,
63
validation_data=validation_generator,
64
validation_steps=800 // batch_size)
65
​
Copied!
Output below
1
​
2
Epoch 1/50
3
125/125 [==============================] - 37s 293ms/step - loss: 0.7213 - accuracy: 0.6635 - val_loss: 0.4492 - val_accuracy: 0.8208
4
Epoch 2/50
5
125/125 [==============================] - 36s 292ms/step - loss: 0.4976 - accuracy: 0.7735 - val_loss: 0.1604 - val_accuracy: 0.8659
6
...
7
Epoch 49/50
8
125/125 [==============================] - 35s 282ms/step - loss: 0.2251 - accuracy: 0.9320 - val_loss: 0.4890 - val_accuracy: 0.8885
9
Epoch 50/50
10
125/125 [==============================] - 36s 284ms/step - loss: 0.2067 - accuracy: 0.9340 - val_loss: 0.5270 - val_accuracy: 0.9261
11
<keras.callbacks.callbacks.History at 0x7f90917a6ef0>
12
​
13
​
Copied!
Save model
1
#Always saving model weights
2
​
3
model.save_weights('/mnt/common/20200510/xray.h5')
Copied!

Hardware Used:

By the way, the machine that runs this exercise is equipped with Intel 8700 8th gen CPU with 6 cores/12 threads, 64GB RAM and a nvidia GTX 1060 GPU with 6GB GPU memory. Both Tensorflow and Keras are GPU enabled version.

Summary

With not many lines of Python code and a few minutes of processing time, deep learning by CNN (Convolutional Neural Network) using Tensorflow/Keras yield training/validation accuracy of about 93%, which means, out of 100 Xray images, the machine tell whether normal or pneumonia by virus correctly on 93 images and wrong on 7 images.

Disclaimer again

This writing is exclusively and entirely for educational purpose in the field of computer science. Only government medical board-certified radiologist can and should perform diagnosis from an Xray image.
Last modified 1yr ago