ANALYSIS OF BRAIN TUMOR USING MRI
IMAGES
Submitted in partial fulfilment
of the requirements of the degree of
BACHELOR IN APPLIED DATA SCIENCE
of Noroff University College
Lewi Lie Uberg
Arendal, Norway
May 2022
Declaration
I declare that the work presented for assessment in this submission is my own, that it
has not previously been presented for another assessment, and that work completed
by others has been appropriately acknowledged.
Name: Lewi Lie Uberg Date: May 24, 2022
Abstract
An increasing rate of deadly brain tumors in humans also sees the increasing need
for highly educated medical personnel like neurologists and radiologists for diagno-
sis and treatment. Thus, to reduce the workload and the time from initial suspi-
cion of disease to diagnosis and a suitable treatment plan, there is a need to im-
plement a Computer-Aided-Disease-Diagnosis (CADD) system for brain tumor clas-
sification. By studying the types of tumors involved, how the convolutional neural
network functions, the evolution of its pre-defined architectures, models using pre-
trained weights, and their application in brain tumor classification, the likelihood of
producing a promising CADD system increases heavily. The outcome of the re-
search conducted in this project presents the starting point of an open-source project
to further develop a CADD system for brain tumor classification with reliable results.
The project includes all components of a working CADD system, including the data
preprocessing pipeline, the pipeline for defining and training CNN classification mod-
els, and a user interface in the form of an API as the backend and a website as
the frontend. The project is intended to be open to the general public—however, its
primary focus is on facilitating medical imaging researchers, medical students, radi-
ologic technologists, and radiologists.
Keywords: Brain Tumor Classification, Magnetic Resonance Imaging, Convolutional
Neural Networks, Machine Learning, Deep learning.
Acknowledgements
I want to thank my friends over at RealPython.com for their great articles and gen-
eral counsel with anything related to programming. I am also grateful to the peo-
ple on Stackoverflow who shares their knowledge and feedback, Khan Academy for
straightforwardly explaining math, the content creators of Youtube, and the open-
source community, who openly share their code on GitHub.
Thanks to my classmate, friend, and sparring partner Zeljka Matic for all our discus-
sions and her help in previous demanding classes.
I am grateful for the general counsel I received from Maxine Brandal V
˚
agnes on how
to structure this report.
I thank my supervisor, Professor Seifedine Kadry, for the guidance, help, and feed-
back he has given me.
I am grateful for the manually labeled dataset provided by Dr. Saed Khawaldeh.
I want to thank my mother for all the nights she has helped out looking after the kids
when my wife is working, and an assignment is due.
Finally, I want to thank my wife and three sons for putting up with a husband and dad
that studies while working more than full-time. The time-dept I now owe them will be
paid back with interest after this final submission.
i
Contents
1 Introduction 1
1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Scope and Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Document Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Literature Review 4
2.1 Brain Tumors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Evolution of CNN Architectures . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Brain Tumor Classification . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Data, Design and Implementation 17
3.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Data Prepossessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3.1 Model Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3.2 Architecture Selection . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.3 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4.1 Model Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4.2 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4.3 Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4 Results 31
4.1 Classification Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
ii
CONTENTS iii
5 Conclusion 42
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.2 Summary of Research . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.3 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.4 Research Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.5 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
A Short Paper 49
B Source Code Repository 56
List of Figures
3.1 Custom CNN Flowchart with The AlexNet Architecture as Comparison. 24
3.2 Custom CNN Architecture. . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1 Training History of The MobileNet Architecture. . . . . . . . . . . . . . 34
4.2 Training History of The AlexNet Architecture. . . . . . . . . . . . . . . 34
4.3 Running The API using Uvicorn . . . . . . . . . . . . . . . . . . . . . . 35
4.4 Expose Local Server To The Internet . . . . . . . . . . . . . . . . . . . 36
4.5 Interactive Documentation . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.6 Running cURL Commands From Terminal . . . . . . . . . . . . . . . . 38
4.7 Home Page of the User Interface . . . . . . . . . . . . . . . . . . . . . 39
4.8 Normally Classified MRI Image . . . . . . . . . . . . . . . . . . . . . . 40
4.9 LGG Classified MRI Image . . . . . . . . . . . . . . . . . . . . . . . . 41
iv
List of Tables
3.1 Dataset Sample Distribution . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Manually Labeled Dataset Sample Distribution . . . . . . . . . . . . . 21
4.1 Concatenation of The Best Training Exploration Results . . . . . . . . 32
4.2 CNN Architecture Results . . . . . . . . . . . . . . . . . . . . . . . . . 32
v
1
Introduction
1.1 Problem Statement
Glioblastoma is a very aggressive form of cancer and the most common form of
glioma. Occurrences of glioblastoma increase consistently, and with a five-year mor-
tality rate of 97% (Korja et al. 2018), it is considered the most deadly form of cancer.
Glioblastoma originates from glial cells in the brain, a cell type that acts as support
cells for the central nervous system. There is no specific blood test (tumor marker)
that can reveal the presence of a glioblastoma; therefore, imaging examinations of
the brain like Magnetic Resonance Imaging (MRI) or computerized tomography (CT)
are central to the discovery of glioblastoma. After imaging examinations of the brain
are performed, it remains to make a diagnosis, followed by a treatment plan. Highly
educated people are needed to set a diagnosis, such as a radiologist. However, the
amount of suitable personnel depends on the patients’ location, and the waiting list
can be significant. Implementation of a Computer-Aided-Disease-Diagnosis (CADD)
system for brain tumor classification can reduce the workload of radiologists and
reduce the time from suspicion of disease to diagnosis and suitable treatment.
1
1.2. RESEARCH OBJECTIVES 2
1.2 Research Objectives
The research project’s objective is to acquire knowledge about all the needed com-
ponents to develop and implement a working Computer-Aided-Disease-Diagnosis
(CADD) system for brain tumor classification.
Research brain tumors themselves, as well as the convention of their classifi-
cation.
Develop a good understanding of the needed data and how it is collected and
preprocessed.
Research the convolutional neural network (CNN), to understand its structure
and functionality, thereby how to manipulate it.
Research different pre-defined CNN architectures, their evolution, and how they
are implemented.
Research transfer learning to take advantage of previously gained knowledge,
and if it can be beneficial to classifying MRI images of brain tumors.
Research examples of related work to provide ideas and for comparisons.
Research ways for the project outcome to be usable to the general public.
Make use of the research by implementing and deploying a fully working CADD
system for brain tumor classification that is easy to maintain and build upon.
1.3 Scope and Limits
The research aims to explore the feasibility of a CADD system for brain tumor clas-
sification based on deep learning techniques such as CNNs in order to reduce the
workload of radiologists and reduce the time from suspicion of disease to diagno-
sis and suitable treatment. In addition, the CADD system must be easy to maintain
and build upon, as well as being usable to individuals with limited technical knowl-
edge.
The research is limited to data comprised of labeled samples of non-tumorous, LGG,
and HGG MRI images. Also, the amount of data that is available to the general public
since applying for research funding and any following administration needed to follow
up a collaboration with radiologists are too time-consuming for this project.
1.4. DOCUMENT STRUCTURE 3
1.4 Document Structure
The document is sectioned into chapters representing the significant sections of the
research and implementation process. Chapter 2 covers the research in the form of a
literature review. Chapter 3 covers the data collection and preprocessing, as well as
the system design and implementation. Chapter 4 covers the results of the research
and implementation of the system. Finally, Chapter 5 covers a closing summary of
the project.
All the source code used in this project is provided in appendix B.
2
Literature Review
2.1 Brain Tumors
The cell of origin and features found when examining the cells tissue; the histopatho-
logical characteristics, define central nervous system tumors and predict their be-
havior (Louis et al. 2007). For example, cerebral gliomas are neuroepithelial tumors
originating from the supporting glial cells of the central nervous system (Forst et al.
2014).
After meningiomas, a usually benign tumor originating from the meningeal tissue of
the brain, the most common primary brain tumor in adults overall are gliomas, with a
rate of 5 to 6 persons per 100,000 annually (Hu et al. 2020).
The World Health Organization (WHO) tissue classification system categorizes gliomas
from lowest to highest, with grade 1 being the lowest grade and grade 4 being the
highest grade. Thus, low-grade gliomas (LGG) consist of grade I and grade II tumors
(Forst et al. 2014), while high-grade gliomas (HGG) consist of grade III and grade
IV (Hu et al. 2020). Grade I are the least malignant or benign tumors, including
Pilocytic Astrocymatoma, Craniopharyngioma, Grangliocytoma, and Ganglioglioma.
4
2.1. BRAIN TUMORS 5
Grade II is relatively slow-growing but may recur as higher grade, including Astro-
cytoma, Pineocytoma, and Pure Oligodendroglioma. Grade III are malignant and
tend to recur as higher grade, including Anaplastic Astrocytoma, Anaplastic Ependy-
moma, and Anaplastic Oligodendroglioma. Finally, grade IV is the most malignant,
aggressive, necrosis and recurrence prone, including tumor types Glioblastoma Mul-
tiforme (GBM), Pineoblastoma, Medulloblastoma, and Ependymoblastoma (Louis et
al. 2007, p. 107).
Often occurring in young, otherwise healthy patients, LGGs are a diverse group of
primary brain tumors. Generally, they have a relatively good prognosis and prolonged
survival rate (Forst et al. 2014). However, over 75% of gliomas are HGG, GBM being
the most common and aggressive, accounting for 56.1% of all gliomas. In addition,
HGGs, particularly GBM, can exhibit a distinct tumor cell population that confounds
clinical diagnosis and management. As a result, GBM has a grim prognosis, with
a median survival of 15 months, despite the best available treatments. Having a
relatively high occurrence frequency and being difficult to diagnose and treat has
made HGGs, and GBM, in particular, the subject of tremendous interest in neuro-
oncologic research (Hu et al. 2020).
Histopathologic examination to study tissue morphology, diagnose and grade brain
tumors is the gold standard (Forst et al. 2014). However, surgical resection for diag-
nosing a brain tumor is an invasive and risky method. Nevertheless, there are sev-
eral non-invasive diagnostic methods, like neuroimaging. Neuroimaging techniques
widely used by the medical community include Computed Tomography (CT), Mag-
netic Resonance Imaging (MRI), Positron Emission Tomography (PET), and Plain-
Film Radiography (PFR) (Luo et al. 2018).
Conventional MRI is the current imaging procedure of choice and identifies tumor
size and associated Peritumoral Edema (PTE), one of the main features of malignant
glioma (Wu et al. 2015), with or without contrast enhancement. Nevertheless, char-
acteristic MRI findings cannot determine the tumor grade alone (Forst et al. 2014).
Moreover, MRIs of HGG, GMB in particular, also lack the capability to resolve intra-
tumoral heterogeneity that may be present. Nevertheless, more advanced imaging
procedures like PET offer a range of physiologic and biophysical image features that
can improve the accuracy of imaging diagnoses (Hu et al. 2020).
Studies comparing imaging features with tissue benchmarks have employed either
qualitative descriptions or have compared individual quantitative metrics in a uni-
variate fashion, which has produced solid correlations for certain clinical scenarios.
2.2. CONVOLUTIONAL NEURAL NETWORK 6
However, for other scenarios, the correlation of imaging and histopathological char-
acteristics may not be self-evident by visual inspection or sufficiently represented by
simple statistical features. Thus, in pursuit of brain tumor diagnosis without surgical
intervention, researchers have developed more advanced imaging methods such as
texture analysis, mechanic modeling, and machine learning (ML) that form a pre-
dictive multi-parametric image-based model. The application of ML models is an
emerging field in radiogenomics and represents a data-driven approach to identifying
meaningful patterns and correlations from often complex data sources. ML models
trains by feeding the ML algorithm a substantial amount of pre-classified image data
as input, such as MRI images, to learn which patterns belong to the different classes.
The resulting ML model uses these previously learned patterns to predict the appro-
priate class for the new instance. These ML models have enabled researchers to
predict tumor cell density, and together with texture analysis, recognize regional and
genetically distinct subpopulations coexisting within a single GBM tumor (Hu et al.
2020).
2.2 Convolutional Neural Network
The convolutional neural network (CNN) is a concept introduced by (Fukushima
1980) as the Neocognitron, as a model of the brain’s visual cortex; an improve-
ment of (Fukushima 1975) previous model for visual pattern recognition. Further-
more, (Lecun et al. 1998) significantly improved the Neocognitron to one of the most
successful pattern recognition models, which has significantly impacted the field of
computer vision.
The most common use case for CNNs is pattern detection in images. One or more
hidden convolution layers uses filters to convolve or scan over an input matrix, such
as a binary image. These filters closely resemble neurons in a dense layer of an
artificial neural network. Here a filter is learned to detect a specific pattern such
as an edge or circle; adding more filters to a convolutional layer will enable more
features to be learned. The filter size is the size of the matrix convolving over the
image matrix, and the stride is the number of pixel shifts over the input matrix. The
convolutional layer performs matrix multiplication on the image and filter matrix for
each stride taken. The resulting output is called a feature map. Two options are
available when the filter does not fit the image. The part of the image matrix that
does not fit the filter gets dropped; this is called valid padding. Alternatively, zeros
are added to the image matrix’s edges, enabling the filter matrix to fit the image
matrix entirely; this is called zero-padding.
2.2. CONVOLUTIONAL NEURAL NETWORK 7
In (Fukushima 1980)s original paper, tanh is used as the activation function. How-
ever, after Rectified Linear Unit (ReLU) was introduced as the activation function by
(Krizhevsky, Sutskever, and Hinton 2012) in 2012 with AlexNet, it has become the
most common activation function for a convolutional layer. ReLu is an almost linear
function with a low computational cost. ReLu converges fast by transforming the in-
put to the maximum of zero or the input value, meaning that the positive linear slope
does not saturate or plateau when the input becomes large. Also, ReLU does not
have a vanishing gradient problem like sigmoid or than. Hidden dense, alternatively
called fully-connected layers, also tend to use ReLU as their activation function.
Spatial pooling, a subsampling or downsampling, can be applied to reduce the num-
ber of tunable parameters, which is the dimensionality of a feature map. The most
commonly used type of spatial pooling is Max-pooling. By using filters and stride,
the Max-pooling layer operates much like a convolutional layer. However, the Max-
pooling layer takes the maximum value in its filter matrix as the output value. So,
for example, a Max-pooling layer with an input matrix of 8x8 and a filter size of 2x2
would have an output of 4x4 containing the larges value for each region. Thus, the
Max-poolings downsampling will decrease the computational cost for the following
layers in the network. While also concluding the feature extraction part of the CNN
and initiating the feature learning part.
Both convolutional and max-pooling layer outputs matrices while fully-connected
layer only accepts vectors. Adding a flattening layer reduces the last convolutional
layer’s output dimensionality to the shape (-1, 1), or it directly transforms the matrix
into a vector. In addition, this operation is a computationally cheap way of learning
non-linear combinations of higher-level feature representation from the convolutional
or max-pooling layer’s output.
The final layer, the output layer, is a dense or fully-connected layer with the same
number of neurons as classes to be classified. The activation function for the fi-
nal layer is strongly dependent on the loss function. For example, a single neuron
sigmoid activated fully-connected layer as the output, compiled with binary cross-
entropy as the loss function, would yield an equivalent result as two softmax acti-
vated neurons in a network using categorical cross-entropy as the loss function; in
other words, a binary classification.
2.3. EVOLUTION OF CNN ARCHITECTURES 8
2.3 Evolution of CNN Architectures
LeNet-5
LenNet-5’s architecture of stacking convolutional layers, activation functions, and
pooling layers, thereby concluding with fully-connected layer(s), has become the
common starting point when designing a CNN. With its two convolutional and three
dense layers, LeNet-5 is one of the most straightforward CNN architectures. Trained
initially on 60,000 patterns and later trained on an additional 540,000 artificially gen-
erated patterns by randomly distorting the original dataset to support the authors’
hypothesis that there was a strong correlation between train and test errors and
training set size. Unfortunately, while being a breakthrough when introduced, the ar-
chitecture is relatively shallow. Therefore it does not generalize well or perform well
with color images (Lecun et al. 1998).
AlexNet
AlexNet was the first to use ReLU as the activation function and the recently de-
veloped dropout method to reduce overfitting in fully-connected layers. Built upon
Lenet-5, AlexNet comprises five convolutional layers and three fully-connected lay-
ers, with 60 million parameters and 650,000 neurons (Krizhevsky, Sutskever, and
Hinton 2012). With its increased size and new activation function, the architecture
performs well with color imaged. However, compared to its successors, it struggles
to learn the dataset’s features due to its small depth. In addition, Tandel et al. (2020)
proposed using transfer learning on the AlexNet model for multiclass MRI brain tumor
classification with positive results.
VGG-16
The motivation for developing the VGG family of architectures was to improve AlexNets
performance, which was done by significantly increasing the depth of the network.
As a result, several different configurations of the architecture were developed, for
example, the VGG-16. While the architecture significantly increases accuracy and
speed, it also suffers from the vanishing gradient problem. VGG-16 is comprised
of ReLu activated thirteen convolutional layers, two ReLu activated fully-connected
layers, and finally, one softmax activated fully-connected layer, and it has 138 million
parameters (Simonyan and Zisserman 2015). In addition, Belaid and Loudini (2020)
proposed combining several CNNs based on pre-trained VGG-16s for MRI classifi-
cation of three tumor types; meningioma, glioma, and pituitary tumor. In addition,
2.3. EVOLUTION OF CNN ARCHITECTURES 9
being the best performing of the three pre-trained networks Sevli (2021) used for
performance comparison on brain tumor classification.
Inception-v1
Inception-v1 uses small neural networks within the main neural network called in-
ception modules (Lin, Chen, and Yan 2014). The inception modules use parallel
towers of convolutions and filters, thereby combining the output of the small net-
works instead of the linear approach seen in the previously mentioned architecture.
In addition, auxiliary networks are added to the main network to increase discrim-
ination and provide additional regularisation. Finally, the output of these auxiliary
networks is discarded. Inception-v1 has twenty-two layers in total. Nevertheless, the
number of parameters is significantly reduced from the previously mentioned archi-
tecture; now, the number of parameters is down to 5 million. However, the Inception
architecture can be biased towards certain classes in an unbalanced dataset, as
well as being prone to overfitting on smaller datasets (Szegedy, Liu, et al. 2015). In
addition, while Irmak (2021) proposes a custom CNN model for brain tumor clas-
sification, the Inception-v1 model is used as one of the five used for comparisons;
however, achieving the poorest accuracy results.
Inception-v3
Built upon Inception-v1, Inception-v3 uses factorization of the convolutions. In ad-
dition, it adds batch normalization to the auxiliary layers in the auxiliary network,
thereby stabilizing and significantly reducing the number of epochs required to train
the network. However, with its 48 layers, Inception-v1 significantly increases the
number of parameters from its predecessor, now having 24 million. A large amount
of parameters does, however, make the network more prone to overfitting and adds
computational cost (Szegedy, Vanhoucke, et al. 2016). In addition, being the least
performing of the three pre-trained networks Sevli (2021) used for performance com-
parison on brain tumor classification.
ResNet-50
ResNet-50 was one of the first to implement batch normalization, and it addresses
the problem of saturated and rapidly degrading accuracy. At the same time, the archi-
tecture is a quite deep neural network that allows gradients to flow from layer to layer
by using bridges or shortcuts called skip connections, thereby solving the problem of
vanishing gradients. However, many layers need to be added to improve accuracy,
2.3. EVOLUTION OF CNN ARCHITECTURES 10
thereby increasing the computational cost. With 50 layers comprising 48 convolu-
tional layers, a Max-pooling layer, and an Average pooling layer, ResNet-50 has 26
million parameters. (He et al. 2016). In addition, being the second-best performing
of the three pre-trained networks Sevli (2021) used for performance comparison on
brain tumor classification.
Xception
By entirely replacing the inception modules of Inception-v3 with depthwise sepa-
rable convolutions, Xception deals with separable spatial dimensions of the image
and kernel and depth dimensions or the number of channels of each processed im-
age. While the Xception architecture offers good memory usage and computational
speeds, it comes at the cost of accuracy performance. Xception is 71 layers deep
and has 23 million parameters (Chollet 2017).
Inception-v4
Inception-v4 modified the initial layers before the first inception module, also called
the stem. Additionally, adding more inception modules and using the same filters
for every inception module increases model size. However, the researchers found
that with a filter number exceeding 1000, residual variants became unstable, and
the network suddenly “died” early in training. Inception-v4 is 22 layers deep with 43
million parameters (Szegedy, Ioffe, et al. 2017).
Inception-ResNet-V2
Inception-ResNet-V2 was introduced in the same paper as Inception-v4. Inception-
ResNet-V2 adds more previously seen inception modules as well as some modified
inception modules. It also adds residual inception blocks, such that the output of
a layer is added to another layer deeper in the network. Residual inception blocks
allow information to flow from one layer to another without any gates in their skip
connection. However, increasing the size of the network also increases the needed
computational resources and the number of parameters, making it inclined to over-
fitting on small datasets. Inception-ResNet-V2 is 164 layers deep and has 56 million
parameters (Szegedy, Ioffe, et al. 2017).
2.3. EVOLUTION OF CNN ARCHITECTURES 11
ResNeXt-50
ResNeXt-50 builds upon ResNet and the Inception family by adding parallel towers
within the modules as a new dimension called cardinality. Adding cardinality is a
more efficient way of increasing accuracy than expanding deeper or wider since the
last two start to give diminishing results when expanded deeply. However, adapting
ResNeXt-50 to a new dataset type is a significant task due to its many hyperpa-
rameters and computations. ResNeXt-50 is 50 layers deep/wide, and the number of
parameters is not given (Xie et al. 2017).
MobileNet
MobileNet is TensorFlow’s first architecture designed for mobile applications. The
architecture is a simple but efficient general-purpose CNN Architecture often used in
object detection and fine-grained image classification. The MobileNet Architecture
uses depth-wise separable convolutions, a combination of depth-wise and point-wise
convolutions. Depth-wise, convolutions apply a single filter for each input channel
as opposed to standard convolutions that apply the filters to all the input channels.
The depth-wise convolutions do not combine the filters to produce a new feature.
Therefore, an additional layer called point-wise convolution is added. The point-wise
convolutional layer computes a linear combination of all the depth-wise convolutions
output to produce a new feature. The MobileNet architecture is designed to be as
efficient as possible while still easy to train. (Howard et al. 2017)
DenseNet-121
DenseNet is a CNN architecture where all layers with matching feature-map sizes
are directly connected. The feed-forward nature is preserved by receiving additional
inputs from all preceding layers and passing them on to succeeding layers. The
DenseNet architecture solves the problem of vanishing gradients, increasing feature
propagation and feature reuse, and considerably reducing the number of parameters.
On the other hand, memory usage increases as the input from previous layers are
concatenated. DenseNet-121 is 121 layers deep with 8 million parameters. (Huang
et al. 2019).
2.4. BRAIN TUMOR CLASSIFICATION 12
2.4 Brain Tumor Classification
Kang, Ullah, and Gwak (2021) proposed a fully automatic hybrid solution for brain
tumor classification comprised of several steps. First, pre-process the brain MRI im-
ages by cropping, resizing, and augmenting. Second, use pre-trained CNN models
for feature extraction with better generalization. Third, select the top three perform-
ing features using fined-tuned ML classifiers and concatenate these features. Finally,
use the concatenated feature as input for the ML classifiers to predict the final output
for the brain tumor MRI.
The researchers selected three different publicly available brain tumor MRI datasets
for experimentation. The researchers established a naming convention of three
parts, the type, the size, and the number of classes, i. e., a medium brain tumor
dataset with three classes is named “BT-medium-3c”. The first dataset, BT-small-2c,
comprises 253 images, 155 images classified as containing tumors, and 98 images
classified as without tumors. The second dataset, BT-large-2c, comprises 3000 im-
ages, 1500 images containing tumors, and 1500 images without tumors. The third
and final dataset, BT-large-4c, comprises 3064 images containing four classes, not
tumorous, glioma tumor, meningioma tumor, and pituitary tumor. All the datasets
follow the standard convention of subdividing into 80% for training and 20% for test-
ing.
Most of the images in the datasets contain undesired spaces and areas. However,
cropping the image only to contain the relevant area for analysis can lead to better
classification performance. In addition, if a dataset is imbalanced or small, augmen-
tation may boost the learning capabilities. Augmentation creates multiple copies of
the images, modified in different ways, like mirroring, rotating, or adjusting the im-
age brightness. In addition to dataset augmentation, the images are resized to fit
the pre-trained CNN’s expected dimensions; 224x224px, except Inception V3, which
expects 299x299px.
The proposed scheme uses a novel feature evaluation and selection mechanism,
an ensemble of 13 pre-trained CNNs, to extract robust and discriminative features
from the brain MRI images without human supervision. The CNN ensemble, is com-
prised of ResNet-50, ResNet-101, DenseNet-121, DenseNet-169, VGG-16, VGG-
19, AlexNet, Inception V3, ResNext-50, ResNext-101, ShuffleNet, MobileNet, and
MnasNet. Since the researchers use fairly small datasets for training, they take a
transfer learning-based approach by using the fixed weights on the bottleneck layers
of each CNN model pre-trained on the ImageNet dataset.
2.4. BRAIN TUMOR CLASSIFICATION 13
Using the features extracted from the CNN models, a synthetic feature is formed by
evaluating each feature from the CNN ensemble with an ensemble of nine different
ML classifiers and concatenating the top three features from the different CNNs.
Since different CNN architectures capture different aspects of the processed data,
the synthetic feature represents a more discriminative feature than features extracted
from a single CNN.
The ML classifier ensemble, implemented using the scikit-learn library, is comprised
of a fully-connected (FC) neural network (NN) layer, Gaussian Na
¨
ıve Bayes (Gaus-
sian NB), Adaptive Boosting (AdaBoost), K-Nearest Neighbors (k-NN), Random for-
est (RF), Extreme Learning Machine (ELM), Support Vector Machines (SVM) with
three different kernels: linear, sigmoid, and radial basis function (RBF).
The first classifier uses the conventional CNN approach. A softmax activated FC
layer with a cross-entropy loss function; the most commonly used loss function for
neural networks. This first classifier with an initial learning rate of 0.001 uses Adap-
tive Moment Estimation (Adam) optimization of the layer weights and adaptively re-
calculates the learning rate. Finally, collecting the highest average accuracy per run
for a total of 100 epochs.
The researchers also use the Gaussian variant of Na
¨
ıve Bayes that follows the
Gaussian (normal) distribution with no co-variance between the attributes in the
classes.
The next classifier Adaptive Boosting, or AdaBoost for short, is an ensemble learning
algorithm that combines multiple weaker classifiers (Decision trees with a single split,
called stumps.) to improve performance. AdaBoost works iteratively and assigns
higher weights to the mislabeled instances.
The following classifier is one of the simplest classifiers, the k-Nearest Neighbors
(kNN). kNN does not train a model but calculates predictions directly from the data
currently stored in memory. Using Euclidean distance as the distance metric, the
kNN classifier finds the k nearest neighbors of the training instances closest to the
given feature. It then assigns the most common class label among the given neighbor
based on the most common label of its neighbors, the majority vote. Setting the
nearest neighbors from 1 to 4, the one with the highest accuracy was selected.
Random Forest (RF) is a learning algorithm that creates multiple decision trees using
the bootstrap aggregation (bagging) method to classify features into a class—using
the Gini index as a cost function while creating the decision trees. RF selects ran-
dom n attributes or features to find the optimal split point, reducing the correlation
2.4. BRAIN TUMOR CLASSIFICATION 14
among the trees and having lower ensemble error rates. RF predicts by feeding
features into all the classification trees, counting the number of predictions for each
class, and choosing the class with the most significant number of votes as the cor-
rect class for the given feature. To find the optimal split, the researchers set the
feature consideration number to the square root of the total number of features and
the number of decision trees from 1 to 150, thereby selecting the one with the highest
accuracy.
Extreme Learning Machine (ELM) is a learning algorithm for Single-Layer Feed-
Forward Neural Networks (SLFN), which provides good performance at a fast learn-
ing speed. ELM is not an iterative algorithm, like the back-propagation algorithm
used in traditional SLFNs. Instead, ELM uses a gradient-based technique, only tun-
ing the weights once. The researchers used 5000, 6000, 7000, 8000, 9000, 10,000
hidden layers and selected the one with the highest accuracy.
The Support Vector Machine (SVM) uses the kernel function to transform the original
data space, the number of features, into a higher-dimensional space. Then aims to
find a hyperplane in that spacial dimension that distinctly classifies the given feature.
The researchers use the three most common kernel functions, linear, sigmoid, and
radial basis function (RBF). In addition, the SVM has two hyper-parameters. First,
C, the soft margin cost function that controls each support vector’s influence; set to
0.1, 1, 10, 100, 1000, 10000. Secondly, Gamma, which decides the curvature of
the decision boundaries; set to 0.00001, 0.0001, 0.001, 0.01. The hyper-parameter
combination that yielded the highest accuracy is then selected.
Experimentation on the given datasets has two main tasks. First, compare the sev-
eral pre-trained CNN networks with several ML classifiers. Second, show the effec-
tiveness of the concatenation of the top 2 or 3 features with the best results from the
first experiment.
For example, the top three features on the BT-small-2c dataset are the DenseNet-
169, Inception V3, and ResNeXt-50 features. Then on the BT-large-2c dataset, the
DenseNet-121, ResNeXt-101, and MnasNet features are the top three. While on the
BT-large-4c dataset, the DenseNet-169, MnasNet, and ShuffleNet V2 features are
the top three.
Observations from the second experiment show that SVM with RBF kernel can find
a more effective and complex set of decision boundaries, outperforming the other
ML classifiers on the two most extensive datasets. However, this is not the case for
the smallest dataset since SVM with RBF tends to underperform when the number of
2.5. USER INTERFACE 15
training data samples is smaller than the feature number for each data point. Further-
more, it is almost impossible that features extracted from the pre-trained CNNs are
entirely independent. Therefore, since Gaussian NB assumes that the features are
independent, it performs worst among the ML classifiers on all three datasets. On
the other hand, features extracted from the DenseNet architectures predict well on all
three datasets since they have all complexity levels, giving them smoother decision
boundaries, which tend to predict especially well on insufficient training data. While,
features from the VGG, with its more basic architecture and no residual blocks, yield
the worst results. The effectiveness of the concatenated top 2 or 3 features is evident
for all ML classifiers on the two largest datasets. However, on the small dataset, it is
only shown when using AdaBoost and k-NN.
The FC, RF, and Gaussian NB classifiers have the shortest inference time. In com-
parison, k-NN has the longest since it is the only ML classifier that needs to evaluate
every data point during prediction. While the results from these experiments show
promise on large datasets, further research is needed, especially on model reduction
for real-time medical systems deployment.
2.5 User Interface
The initial technical aspect of the project will be implemented using the Python pro-
gramming language. Therefore, the user interface will be developed using Python-
based technologies. Such technologies evolve rapidly; therefore, the best informa-
tion source is often the official documentation of the given technology. Such user
interfaces are usually in a desktop application but can also be implemented as a
web application. A web browser is the best option since there is no need for the user
to install any software or worry about platform compatibility.
The first component needed to be developed for a user interface is an API that will
be used to communicate with either a terminal or a web browser. Python has many
applicable frameworks for this purpose, such as Django, Flask, or FastAPI (2022).
Django is the most popular framework, but it can be a bit large and complicated to
use. Flask is a good option for this purpose. However, it does not have the same
amount of modern features as FastAPI.
Being built upon other frameworks, such as Starlette (2021), Pydantic (2022), Ope-
nAPI (2021), and JSON JSONSchema (2020), FastAPI fully supports asynchronous
programming, type validation during runtime, and autogeneration of interactive doc-
2.5. USER INTERFACE 16
umentation. FastAPI is used to build RESTful APIs, the most common API standard
for web applications.
Since FastAPI is built to work with both Gunicorn and Uvicorn (2022), it operates at
high speeds, and it is also easy to deploy the API to the webserver. Furthermore,
FastAPI supports Jinja2 (2022) templating out of the box, making it much easier for
developers that primarily use python to build a web application since knowledge of
JavaScript is no longer needed for the most basic functionality.
3
Data, Design and Implementation
3.1 Data Collection
One could assume that with the whole internet as the arena, the likelihood of finding
a relevant dataset for something as familiar as brain tumor MRIs is high. However,
when the need for a dataset with specific criteria is high, the availability seems to de-
crease exponentially. Furthermore, the most readily available datasets were different
variations of the same source, usually lacking quality. Therefore, the data-gathering
part of the research project starts with outlining some criteria that the data needs
to meet to ensure quality in the later steps of the project. The data must contain
three types of samples, non-tumorous, LGG, and HGG. It is also essential that the
amount of samples between the types is well balanced to make the classification
model generalize well. The balance is essential since too much data augmentation
on medical images can disrupt the features of interest, thereby resulting in incorrect
classification.
With optimism, the search began on familiar places like Kaggle, Google Dataset
Search, GitHub, and paperswithcode. While some of the found sources looked
promising at first, none of them quite fit the bill. They were either too small, had
17
3.2. DATA PREPOSSESSING 18
some but not all of the needed labels, or were very unbalanced between the labels.
However, the most significant problem is that most of them did not have the non-
tumorous label in the samples. This label is essential since the objective is to have
a classification system that can distinguish between non-tumorous, LGG, and HGG
MRIs. A combination of different datasets was considered for a period, but the idea
of not having a battle-tested dataset as the foundation did not sit well. With a good
foundation, each class can be supplemented with smaller batches of high-quality
data over time. So the search continued.
Finally, a well-suited candidate for this project was found at The Cancer Imaging
Archive (TCIA). The REMBRANDT (REpository for Molecular BRAin Neoplasia DaTa)
Dataset (Scarpace et al. 2019) seemed to have all the essential characteristics re-
quired by the project. In addition, the REMBRANDT dataset is one of the most
trusted publicly available datasets. The dataset is comprised of MRI scans from 130
subjects of three classes, non-tumorous, LGG, and HGG. Furthermore, the LGG
and HGG classes have subclasses that open the opportunity to make the classifi-
cation outcome more extensive in the future. For example, the LGG class includes
tumors of type Astrocytoma II and Oligodendroglioma II. On the other hand, the HGG
class includes tumors of type Astrocytoma III, Oligodendroglioma III, and Glioblas-
toma Multiforme IV. The dataset is a combination of metadata from various text and
spreadsheet files and the 110,020 images in the DICOM format, which also includes
a vast quantity of metadata.
3.2 Data Prepossessing
Due to the dataset size and the state of its metadata, preprocessing was a rather
large task. The preprocessing of the data uses the Visual Studio Code variant of the
Jupyter Notebook format “.ipynb”, which enables Markdown text cells and executable
Python code cells in the same file. Various helper functions are defined to aid in the
exploration of the data. The “Sample”, “Disease”, and “Grade” columns from the
metadata files are loaded into a pandas data frame; only these columns have value
for this particular task. The sample name in the metadata is compared to the paths
of the dataset. Differences like extra decimal points are manually removed; it is
important to ensure that future loading functions find the correct file.
The first step in preprocessing cleans the most common mistakes in any textual data.
Some of the data points have leading whitespace; these are removed. Datapoints
with missing values use differing naming conventions like “–” or “none”. In order
3.2. DATA PREPOSSESSING 19
to make them easier to work with, all these values are changed to “NaN”. Next, all
data points are converted to lowercase since both uppercase and lowercase are
used. Any values that use “-” to separate words are replaced with ”. Finally, a new
column label is added to the data frame, which will be populated later.
The next step is performed after studying the metadata files for more specific infor-
mation. For example, the grade column for disease matching “gbm” is empty, and
from searching brain tumor grading conventions online, it is clear that the appropriate
grade is IV, which is inserted for those data points. Some data points in the disease
column have the value “mixed”. Since it is unclear which diseases are associated
with the data point, they are removed. Data points with missing values for disease
and grade are data points where the disease is unknown. Usually, these data points
would be removed. However, by studying the metadata, it is clear that in this case, it
means that no disease is associated with the data point. Therefore, these data points
are given the value “none” instead. In some data points, the disease is known, and
the grade is not known. Take “oligodendroglioma”, for example; it can be grade II or
III. In these cases, the datapoint is unusable and removed. Only diseases of type
“gbm” with missing grades can be correctly labeled since they are always graded as
IV.
After verifying that there are no missing values remaining in the data frame, the
correct labels are assigned for all the data points. For example, diseases of type
“oligodendroglioma” and “astrocytoma” with grade II are labeled as “lgg”. On the
other hand, the same diseases with grade III are labeled “hgg”. Finally, all diseases
of type “gbm” with grade IV are labeled as “hgg”.
In order to make the files easier to use in the model training, an algorithm that finds all
files in all subfolders is implemented. This algorithm stores the file path for every file
in a “.csv” file. The content of the “.csv” file is then merged with the corresponding
samples of the data frame. Files discovered by the algorithm that is not linked to
any sample are removed since the information needed to label them correctly is not
present in the metadata.
After completing the preprocessing, the dataset is comprised of 123 patients and
105,265 slides, distributed as shown in Table 3.1.
At this point, it was discovered that one key bit of information was missing; how to
separate the MRI slides that contained the tumorous cells from the ones that did not
contain them. For every scan that is labeled tumorous, the tumor is only visible in 20-
3.2. DATA PREPOSSESSING 20
30% of the slides. If the dataset had been used in this state, the model would learn to
classify most healthy tissue as tumorous. In other words, it would be useless.
Table 3.1: Dataset Sample Distribution
Disease Grade Label Unique Samples Sample Count
Astrocytoma II LGG 30 25286
Astrocytoma III HGG 17 16038
GBM IV HGG 43 32837
Non-Tumorous n/a n/a 15 17041
Oligodendroglioma II LGG 11 9335
Oligodendroglioma III HGG 7 4728
Total 123 105265
All of the source data were extensively reexamined, but the needed information was
never found. Therefore, alternative methods were explored to use the now prepro-
cessed dataset. For example, each patient has several MRI scans, and every scan
comprises a group of slides. By creating an animation of each group, the place-
ment of the tumors could be observed. The idea was to find some kind of pattern
for each label in order to filter out these particular slides and use only them during
model training and testing. Unfortunately, no usable patterns were discovered; also,
intuition waved a big red flag. Therefore the idea was discarded.
While searching online for how to find the needed key in the metadata or the DI-
COM files, one paper stood out (Khawaldeh et al. 2018). The paper used that same
dataset for a similar type of application. However, the number of slides used in this
paper was reduced from the original 105,265 to 4069 slides. By reading the dataset
section of the paper and evaluating the included tables and values, it became clear
that the authors of this paper had found a way to filter the data further.
Not finding this critical bit of information was becoming a significant problem for the
project and took a large portion of the allocated time. In order to learn how to repro-
duce the results of the found paper, a meeting was requested by reaching out to the
paper’s primary author, Dr. Khawaldeh.
While being grateful for Dr. Khawaldeh’s collaboration, some devastating news arose.
The needed key was not part of the publically available dataset. After Dr. Khawaldeh’s
research team had cleaned the data to the same point as this project, they had em-
ployed help from neurologists to go through each slide manually and label them
correctly. Fortunately, Dr. Khawaldeh offered to share a dataset of labeled sam-
ples divided into Normal, LGG, and HGG. After some weeks, the dataset was re-
ceived—unfortunately, a much smaller sample than the 4069 slides described in
3.3. DESIGN 21
their paper. However, the received 736 correctly labeled slides were still a much
better candidate than anything publically available.
The turn of events has made the work done on data preprocessing obsolete. How-
ever, it includes all the steps needed for any supplementary datasets based on the
DICOM format in the future.
After some further processing, like removing duplicates, the dataset has 735 samples
distributed, as shown in Table 3.2, and is now ready for training CNN models.
Table 3.2: Manually Labeled Dataset Sample Distribution
Label Sample Count
Normal 168
LGG 287
HGG 280
Total 735
3.3 Design
3.3.1 Model Pipeline
Finding the CNN architecture with or without transfer learning that yields the best
results for the classification task is the primary goal of the research project. Therefore
designing a pipeline where different CNN architectures can easily be substituted is
essential.
Since the model training requires a massive amount of computational resources,
the actual model training will be performed on the Google Colab platform, which
uses the same notebook format as this project. The Google Colab platform allows
users to pay a monthly fee to take advantage of their powerful GPUs. However, the
implementation of the pipeline and further analysis of the results will be done on a
local machine since working in the cloud environment is a bit cumbersome and slow.
Therefore, the pipeline also needs to be able to automatically figure out if it is being
run on a local machine or in the Google Colab cloud platform. The reason is that
when running in Google Colab, the data needs to be uploaded to a google drive
and mounted in the notebook. By implementing this auto-detection, the user may
use notebook features like “Run all cells” in both environments. Also, several python
packages are not part of the Google Colab environment and need to be installed
every time the notebook is initialized.
3.3. DESIGN 22
Since the pipeline is also used to find out if data augmentation is beneficial to the
results, the pipeline needs to be able to turn this feature on and off easily during
testing. Furthermore, the capability to turn augmentation on and off should be as
smooth as possible. Therefore, the data augmentation should be done in advance,
and a switch mechanism should only swap the input source when triggered.
The pipeline shall be used to find the optimal values for splitting the data into train-
ing, validation, and test sets. Also, the pipeline should make it easy for the user to
experiment with different optimizers like SGD, Adam, and Nadam. The loss func-
tion to use and which metrics to use for evaluation during training should also be
easily adjustable. The pipeline shall also be used to find the optimal values for the
hyperparameters of the CNN architecture.
The pipeline should be able to turn on and off callback functions that are helpful in the
early stages of model training exploration. These callback functions include “Early
stopping, which stops the training before it has reached the previously determined
number of epochs if there have been no improvements to, for example, the validation
loss for a set number of epochs. The callback function “Model checkpoint”, which
only saves the model after each epoch if, for example, the validation loss has im-
proved, thereby keeping the weights for the model with the best results even though
training commenced. The callback function “Learning rate plateau” which detects
and stops the training if the learning rate has not improved enough for a set number
of epochs. The callback function “Learning rate schedule” which has a schedule for
which learning rates that should be used for different series of epochs. For example,
for epochs 1 to 5, the learning rate could be set to 0.001, then for epochs 6 to 10, it
could be changed to 0.0001, and so on.
As mentioned earlier, the pipeline will be partially run in Google Colab, and then the
results will be analyzed on a local machine. Therefore, the pipeline needs an easy
way to save the model weights and history to files. The hdf5 or h5 file format is an
appropriate choice for the model weights themselves. The pickle file format is the
most appropriate choice for the training history. It should be just as easy to load the
model weights and history from the files as it is to save them. The same applies
when the user wants to load the weights from a model checkpoint. However, due
to the training process and model checkpoint function in TensorFlow/Keras, it is not
possible to load the model history only at the point of the checkpoint.
In the final stage of the pipeline, the model’s accuracy and loss should be evaluated.
The evaluation is done by using the test set, which contains never before seen image
3.3. DESIGN 23
samples. The accuracy and loss should be calculated for each of the correct and
incorrect predictions made by the model on the test set.
3.3.2 Architecture Selection
From the research done in Chapter 2 Literature Review, it is decided to go forward
with a total of five of the researched CNN architectures, and to design a custom archi-
tecture to use in comparison. The predefined architectures are AlexNet (Krizhevsky,
Sutskever, and Hinton 2012), VGG16 (Simonyan and Zisserman 2015), MobileNet
(Howard et al. 2017), ResNext (Xie et al. 2017), and DenseNet121 (Huang et al.
2019).
The custom CNN architecture is designed to accept N amount of 224x224 image ma-
trices with 3 color channels; it consists of three convolutional layers of 32, 64, and 64
filters, each of filter size 3x3, with zero-padding, and ReLU as the activation function.
The first convolutional layer is followed by a max-pooling layer of pool size 4x4, and
the last two convolutional layers have a max-pooling size of 2x2. Furthermore, every
max-pooling layer is followed by a dropout layer with a dropout rate of 0.15. After the
convolutional blocks, a flattening layer is added to transform the matrix output into
vector inputs in order to be accepted by the first dense layer. The first dense layer
is comprised of 512 ReLU activated neurons, followed by a dropout layer with a 0.5
dropout rate. The last hidden layer is a dense layer of 256 ReLU activated neurons.
The model’s final layer, its output, is a 3 neuron softmax activated dense layer for the
classification of the three labels in the dataset. The flowchart of this model is shown
in Fig. 3.1, and the general architecture is provided in in Fig. 3.2.
In hindsight, it can be observed that this architecture resembles the AlexNet archi-
tecture, even though it was implemented before researching the AlexNet architec-
ture.
3.3. DESIGN 24
Figure 3.1: Custom CNN Flowchart with The AlexNet Architecture as Comparison.
3.3. DESIGN 25
Figure 3.2: Custom CNN Architecture.
3.3.3 User Interface
In order to make use of the models in a meaningful way, some form of a user interface
is required. Only using a Jupyter notebook for this task will exclude a lot of potential
users since there is a lot of knowledge and skill needed to get the notebook up and
running, as well as operating it. Therefore, a graphical user interface is to be imple-
mented. The user interface should be accessible on any operating system as long
as there is an internet connection, making a website the ideal choice. The website
should let the users learn about the project without leaving the website to understand
the classification systems’ capabilities, strengths, and weaknesses. Then, the user
should be able to upload an MRI image of their own for classification. Next, select
one or more classification models to use. Finally, get a prediction of the probability if
there is no tumor present, or if there is a tumor present, and if so, if the tumor is an
LGG or HGG tumor.
3.4. IMPLEMENTATION 26
3.4 Implementation
3.4.1 Model Pipeline
The model training pipeline is implemented using the Visual Studio Code variant
of the Jupyter Notebook format “.ipynb”. Notebooks are particularly useful during
the development of a machine learning pipeline since they allow for the execution
of only parts of the pipeline at a time and let the user evaluate that particular step
of the process before moving forward. The model training pipeline is implemented
to function both locally and in Google Colab without altering the code. After all the
needed packages are installed and loaded, all relevant constants are defined, like
paths to the datasets, where to save the model weights, or the class label names to
use for the classification.
The next step is to load the data for the model training, validation, and testing. The
data is loaded using the TensorFlow/Keras ImageDataGenerator, which is a gen-
erator that can generate batches of augmented image data. However, since this
project wishes to be able to switch data augmentation on and off easily, augmenta-
tion has already been done in an additional preparation step. The additional prepa-
ration step is performed in a notebook with the same preliminary steps as the model
training pipeline. First, the preparation notebook reads a file called “dataset.zip” in
the dataset’s path and decompresses the content. If a “.DS Store” file has been in-
cluded in any directory or subdirectory during compression, it is removed, so it is
not included as a sample. Then, the needed paths to place the different subsets are
created if they do not already exist.
Due to the relatively small dataset, only 20 files from each class are randomly se-
lected for model evaluation. The remaining files are then divided at random into 70%
training and 30% validation sets. The decompressed dataset is then deleted. The
different subsets are then listed in appropriately named variables, and their count is
displayed to the user for inspection.
Now the user can use a function that loads an image into an ImageDataGenera-
tor object and displays how it affects the image. It is useful to explore the different
transformations that can be applied to the image before deciding on the best trans-
formation to use going forward.
The transformations chosen for this dataset include 15°, image rotation, .5 width and
height shifts, a shear range of 10, a zoom range of 0.1, horizontal flipping, and a
random brightness range of .5 to 1.5.
3.4. IMPLEMENTATION 27
The subset containing images without tumors is smaller than those containing LGGs,
and HGGs. Therefore, the non-tumorous samples are augmented to three times their
original size, while the other two are augmented to twice their original size. Only the
images in the training set are augmented since augmenting the others will not better
the performance of the model but could decrease its ability to generalize.
With the augmented data ready, the method “.flow from directory” is used to load
the data into the ImageDataGenerator. The method takes the path to the directory
containing the images, the target size of the images, the batch size, and classes to
label the images with and shuffles the samples. In order to reproduce this outcome
later, a seed is set to ensure the same randomization is used. Finally, the method
returns a generator that can be used to iterate over the images.
One such generator is created for each of the sets needed; training, validation, and
validation. Finally, this step of the process is finished by asserting that the expected
number of images are loaded, and a sample batch is displayed to the user.
The next step in the pipeline is to define the particular CNN architecture to use.
Some of the predefined architectures are readily available in the TensorFlow/Keras
library and can easily be imported, while others are imported from third-party li-
braries. Some of the architecture is implemented using the Keras sequential API,
while others are implemented using the Keras functional API. For example, the func-
tional API is used for the ResNext CNN architecture, while the sequential API is used
for the AlexNet architecture.
When instantiating the architecture, it is decided whether the model should be using
pre-trained weights or not. All pre-trained weights in this project are from the Im-
ageNet dataset. If the model is using pre-trained weights, a number of layers are
removed from the model, then redefined in order to make them trainable. The ar-
chitecture determines the amount and type of layers that are removed. Since all
models with pre-trained weights from the ImageNet dataset are trained on 1000 out-
put classes, the last layer is removed and added again to the model and has to reflect
the number of classes to be predicted for this project.
After the model is defined, it is compiled using the Adam optimizer, and the cate-
gorical cross-entropy loss function, with accuracy as its metric. Now the number of
epochs to train the model is defined, along with the selection of callback functions
to use during training. Callback functions are usually only used while exploring the
settings of the various hyperparameters and are not used in the final model train-
ing.
3.4. IMPLEMENTATION 28
The next step is a switch that lets the user decide if a model should be trained,
loaded, or checkpoint loaded. If the user decides to train a new model, the “.fit”
method is used with appropriate parameters. The method takes the training data
generator, validation data generator, the number of epochs to train the model, and
the number of steps per epoch. The method returns a history object, which contains
the training and validation loss and accuracy for each epoch. However, if the user
decides to load a model or checkpoint, the weights are loaded to the previously
defined model, and the history object is created from the checkpoint file. The latter
is not done for the checkpoint loading.
With the model trained or loaded, the model predicts on the test set using a function
that allows for predicting on generator data. The function takes the model and the test
data generator. It returns a NumPy array of predictions, an array of the probabilities
of the predictions, an array of the true labels for each prediction, and a list of valid
class label names. A function is made to give a visual representation to the user by
marking each correctly classified image green and incorrectly classified image red
in a grid for each class. The user can then further inspect the images that were
classified incorrectly by using another function that gives the user feedback on the
prediction details on a specific image.
Finally, the model is evaluated by using the Keras “.evaluate” method. The method
takes the test data generator and returns the loss and accuracy of the model on
the test set. The evaluation can be further analyzed by defining a confusion matrix
from the pycm library, which offers a lot of metrics and visualizations to analyze the
performance of the model. A few available metrics in a pycm confusion matrix include
accuracy, precision, recall, f1 score, support, imbalance, and many more.
In order not to bloat the notebook with code, many functions are implemented in sep-
arate modules. The modules are imported into the main notebook, and the functions
are called callable from the notebook. In addition, these modules contain helper
functions for many different tasks like data augmentation, parameter counting, pre-
diction comparisons, file manipulation, different visualizers, a code injector that alters
the behavior of the pycm confusion matrix, and many more.
3.4. IMPLEMENTATION 29
3.4.2 User Interface
The user interfaces’ backend is a RESTful API implemented using FastAPI. FastAPI
is modern and one of the fastest API frameworks in python since it is based on the
lightweight ASGI framework Starlette. Being based on Starlette also makes FastAPI
fully support asynchronous programming. FastAPI uses Pydantic for data handling
and type validation in runtime. FastAPI automatically generates interactive docu-
mentation for the API in the OpenAPI format using the Swagger UI and Redoc,
which reduces the need for third-party testing tools like Postman or the VS Code
extension Thunder Client. FastAPI can run with the help of webservers like Guni-
corn or Uvicorn. Every operation performed in the graphical user interface can also
be performed using the API. Either via a browser or using cURL commands in a
terminal.
For the frontend part of the website, the Jinja2 templating engine is used. Jinja2
is a template engine that is used to render HTML templates that uses a special
syntax to enable it to contain python-like code blocks. The templates are stored in
the templates folder, which is mounted in the FastAPI main application file. When
the FastAPI gets a request for a page, it looks for the corresponding template in the
templates folder and returns it if found, and if not, it returns a 404 error.
In addition to returning a template, functions can perform actions based on the re-
quest, such as uploading images, deleting images, returning predictions, and more.
Functions that are not endpoints are not present in the main application file but im-
ported. For example, the functions that are used for predicting the image are stored
in the model module. An exception is the “startup” function, which defines a tem-
porary folder for the images to be uploaded to, and the shutdown function, which
deletes the temporary folder.
In order to make the uploading process operate as desired, some javascript is needed,
but due to a lack of experience with javascript, its use has been kept to a mini-
mum.
The layout of the webpages is done in HTML used by the Jinja2 templating engine.
The styling is primarily done by bootstrapping CSS and javascript. However, some
elements required custom styling, which was included in the HTML directly.
3.4. IMPLEMENTATION 30
3.4.3 Deployment
The initial plan for deployment was to use Heroku since it is known for its ease of
use. However, since the files that contained the model weights were very large,
some over 500MB, it was not possible due to Heroku’s file size limitations. The Deta
platform was also tested and was even easier to use than Heroku. However, it also
faced the same problem. The next platform to try is Azure App Services. Azure App
Services is a little less straightforward but should work according to the information.
It did not, however. Apparently, the Azure App Services platform has a limitation of
not using more than 10% of the available memory. Finally, the most complicated
platform tested was the Azure Virtual Machine. Even though the API was able to
run, it was not possible to make it available to the outside in due time, and it had to
be aborted.
With all attempts failing, the plan was to make a repository for the application and
only the needed files. In addition, the repository should contain an auto-install script,
so a user could clone the repository and run the application locally.
By coincidence, a service called ngrok was found when searching for unrelated top-
ics. Ngrok is a free service that allows the exposure of localhost to the internet in a
secure manner. With ngrok installed, the only steps needed were to run the FastAPI
in uvicorn on the local machine and then run ngrok in a separate terminal to expose
the local host to the internet. In order to make it a little more elegant, a domain
name was purchased, and the DNS was set to give access to ngrok. A monthly sub-
scription on ngrok was also purchased in order to connect the autogenerated ngrok
domain name to the custom domain name.
4
Results
4.1 Classification Models
Developing the pipeline required finding the generally most suitable division of the
Training, Validation, and Test subsets and the most suitable baseline hyperparameter
settings to be used as a starting point for the pipeline. It also required discovering if
image augmentation had enough impact to be included in the pipeline. For this task,
VGG16 was selected, as it had been used for the same application with promising
results (Belaid and Loudini 2020) (Sevli 2021).
Over a hundred training tests were performed using different configurations of aug-
mentation, the number of images extracted to the Test set, the split ratio of the re-
maining into Training and Validation, learning rates, and the number of epochs. The
result of all these tests is too extensive to show here. However, Table 4.1 shows a
concatenated table of all the best results. Values that are the same for all tests are
not shown in the table. For example, all tests with the best results use augmentation
and a learning rate of .001, 20 images extracted from each class before splitting the
remaining into 70% for training and 30% for validation. Also, all the tests in this table
achieved an accuracy of 95%.
31
4.1. CLASSIFICATION MODELS 32
Table 4.1: Concatenation of The Best Training Exploration Results
Batch Epochs Loss Accuracy Val. Loss Val. Accuracy Test Loss
16 10 .0885 .9816 .3119 .9082 .3365
24 10 .1134 .9755 .2741 .9219 .3178
48 10 .1718 .9559 .4096 .8698 .3186
48 12 .1446 .9659 .3243 .8958 .3277
48 14 .1207 .9749 .2863 .9115 .3254
64 10 .2176 .9580 .3929 .8438 .3417
After experimentation was concluded, a baseline for training other CNN architectures
was established. In the baseline, a total of 60 images, 20 from the three classes
picked at random for the Test set. Then, the remaining images are divided into
30% for the Validation set and 70% for the Training set. In addition, the Training set
is augmented with rotation, width and height shift, shearing, zoom and brightness
adjustment, and horizontal flipping. Experimenting with augmentation showed that
many filters with minimal adjustments to the original gave better results than fewer
with more extensive adjustments. The Normal label has a little over half of the num-
ber of samples as the LGG and HGG and is therefore producing a higher number
of augmented images. The Normal label in the Train set is increased from 103 to
309, LGG from 186 to 372, and HGG from 182 to 364. With a batch size of 24,
the baseline uses 10 epochs for training with a learning rate of 0.001 and Adam for
optimization with categorical cross-entropy as the loss function.
All the CNN architectures implemented have outstanding results, as shown in Ta-
ble 4.2. Multiple runs were made on each architecture to obtain the given results.
The results indicate that DenseNet121 is the best candidate for the given classifica-
tion task, closely followed by AlexNet and the custom architecture.
Table 4.2: CNN Architecture Results
Architecture Epochs Initial learning rate Loss Accuracy & F1
VGG16 10 0.001 0.3218 0.933
ResNext 12 0.001 0.1244 0.933
MobileNet 28 0.001 0.1141 0.950
Custom 22 0.000316 0.0989 1.00
AlexNet 28 0.001 0.0582 0.983
DenseNet121 12 0.001 0.0254 1.00
VGG16, ResNext, MobilNet, and DenseNet121 use transfer learning, that is, the pre-
trained weights from the ImagNet dataset, and only the last few layers are trained on
this project’s dataset. On the other hand, AlexNet and the custom model are not
using transfer learning. The test results did not indicate that transfer learning was
4.1. CLASSIFICATION MODELS 33
helpful in any way for classifying brain tumors on MRI images. For verification, the
DenseNet121 model was trained a few times without the use of pre-trained weights,
also not showing any significant changes to the results. The weight from the Ima-
geNet dataset is trained using manually labeled images and covers anything from the
Norwegian flag to dinosaurs. Most likely, these weights are too general to contribute
to a dataset containing such specific images as MRI brain scans.
It was believed that transfer learning would compensate for the small size of the
dataset. Unfortunately, this does not seem to be the case, and there are indications
of overfitting, especially in some of the architecture. By examining the MobileNet
model Fig. 4.1 for example, it can be observed that this model rapidly reaches a
perfect training accuracy. However, the validation accuracy is lagging behind, never
closing the gap. These results indicate that the model is overfitting on the data.
Increasing the volume of samples, preferably from different sources, would most
likely improve these results significantly.
On the other hand, for the AlexNet model Fig. 4.2, the prediction accuracy does not
increase as rapidly as the MobileNet model for each epoch. Also, the gap between
testing accuracy and validation accuracy for every epoch decreases until it is non-
existing. The AlexNet model never reaches a perfect accuracy, but this is not due to
overfitting, merely that no more features are being learned from the data.
In the early stages of this research project, it was believed that the DenseNet121
architecture was the most promising. However, after analyzing the results, the per-
ception has shifted to the AlexNet, closely followed by the custom architecture, the
two least complicated and computationally costly models of the lot. A likely reason
for this may be the simple and consistent nature of the data the models are trained
on. However, a much larger dataset for testing is needed to verify this theory.
4.1. CLASSIFICATION MODELS 34
Figure 4.1: Training History of The MobileNet Architecture.
Figure 4.2: Training History of The AlexNet Architecture.
4.2. USER INTERFACE 35
4.2 User Interface
The implemented user interface operates as intended. First, the API is started
on the local machine by running the main python file in the terminal, as shown in
Fig. 4.3.
Figure 4.3: Running The API using Uvicorn
4.2. USER INTERFACE 36
The next step exposes the localhost to the internet using the ngrok platform, as
shown in Fig. 4.4. A custom domain’s DNS is set up to redirect the traffic to the
ngrok domain; tumorclass.info.
Figure 4.4: Expose Local Server To The Internet
4.2. USER INTERFACE 37
By accessing tumorclass.info, then adding “/docs” to the end of the URL, the inter-
active documentation can be accessed as shown in Fig. 4.5. The interactive docu-
mentation allows for testing the endpoints in the same manner as done in API testing
applications like Postman or Thunder Client.
Figure 4.5: Interactive Documentation
4.2. USER INTERFACE 38
The interactive documentation also autogenerates cURL commands that can be run
directly in the terminal, as shown in Fig. 4.6.
Figure 4.6: Running cURL Commands From Terminal
4.2. USER INTERFACE 39
By entering the URL tumorclass.info in a web browser, the user is directed to the
home page as shown in Fig. 4.7. Next, the user clicks the “Choose a file” button on
the home page, which lets the user pick an image for classification.
Figure 4.7: Home Page of the User Interface
4.2. USER INTERFACE 40
By default, all the classification models are selected. Then, by clicking the “Predict”
button, the results are returned to the user. For example, in Fig. 4.8 a prediction for
a non-tumorous MRI image is shown.
Figure 4.8: Normally Classified MRI Image
4.2. USER INTERFACE 41
The user may also select only a select few classifications to use, as shown in Fig. 4.9
where an LGG tumor is detected.
Figure 4.9: LGG Classified MRI Image
The images are stored locally on the server, but they are deleted every time the
server shuts down. The user may also manually delete the uploaded image by click-
ing the “Delete” button.
5
Conclusion
5.1 Introduction
The research aimed to develop a CADD system for brain tumor classification that is
readily available to the general public, however, with a particular focus on radiolo-
gists. The following sections in this chapter summarize the research and implemen-
tation outcome and discuss the possibility for future work.
5.2 Summary of Research
This project has researched several types of brain tumors, focusing on the LGG and
HGG categories. The research has been conducted with regard to available MRI im-
ages of brain tumors and similar research projects performed by other researchers.
The Convolutional Neural Network has been studied in-depth to understand how it
best could be used to classify brain tumors. Several different predefined architec-
tures have been investigated and selected for this project. To utilize the research
outcome in a meaningful way, methods of designing and implementing a user inter-
face have been investigated.
42
5.3. RESEARCH OBJECTIVES 43
5.3 Research Objectives
All the research objectives defined in section 1.2 have been met to a satisfactory
degree, given the scope of this project.
After researching brain tumors and their classification methods, enough knowl-
edge for the CADD system has been gained, as can be observed in section 2.1
and section 2.4.
The convolutional neural network and how they can be implemented for the
classification of brain tumors has been studied in section 2.2 and section 2.3
and satisfactory results have been obtained as seen in section 4.1.
Transfer learning has been studied and why the effect is minimal on this project
is understood and discussed in section 4.1.
Research performed by other researchers has been done and is reflected in the
section 2.4. In addition, the section describes many variations of technologies
and architectures that have been used to classify brain tumors.
In section 2.5 ways of designing and implementing a user interface have been
investigated, and suitable technologies have been selected for this project.
In section 4.2 the results of the user interface design and implementation can
be observed working as intended.
5.4 Research Contribution
The research conducted in this project has been a good start for further developing
a usable CADD system. A pipeline for data preprocessing that makes preparing
new data samples both easy and fast has been developed. Similarly, a pipeline
for augmenting the training data and a pipeline for defining CNNs, training them,
and providing a classification model as the output has been developed successfully.
Also, to demonstrate the potential of the research in this project, a user interface
has been developed and deployed to a web server which the general public can
accessed at tumorclass.info. All the source code used in this project is provided in
appendix B.
5.5. FUTURE WORK 44
5.5 Future Work
The main focus on further developing this project should be expanding the data set.
The amount of suitable data available is limited. However, the REMBRANDT dataset
has a vast number of images suitable for manual labeling performed by radiologists.
Expanding the data set will make it more transparent which architectures that gen-
eralize well. As such, these architectures can be the focus of further development.
These days, MRI scans provide 3D images, so further development to facilitate 3D
classification is also a viable option. By implementing a 3D classification system, the
user interface can be expanded to include 3D images and animations. Tools such as
these could be helpful for medical imaging researchers, medical students, and radi-
ologists. Many new tools for explainable AI have been developed, such as SHAP and
tf-explain. These tools were tested without luck in this project but had to be aborted
due to time constraints. With more time, these tools could be used to explain how
the classification model reaches its predictions.
Bibliography
Belaid, Ouiza Nait and Malik Loudini (2020). “Classification of Brain Tumor by
Combination of Pre-Trained VGG16 CNN”. In: Journal of Information Technology
Management 12.2, pp. 13, 25. ISSN: 2008-5893. DOI:
10.22059/jitm.2020.75788. Retrieved from https:
//jitm.ut.ac.ir/article_75788_e36c948ee9258c82b9398f136692f3f5.pdf.
Chollet, Franc¸ois (Nov. 2017). “Xception: Deep Learning with Depthwise Separable
Convolutions”. In: 2017 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR). IEEE, pp. 1800, 1807. DOI: 10.1109/CVPR.2017.195.
Retrieved from https://arxiv.org/pdf/1610.02357.pdf.
FastAPI (May 2022). FastAPI. Version 0.75.1. URL:
https://fastapi.tiangolo.com/.
Forst, Deborah A. et al. (Apr. 2014). “Low-grade gliomas”. eng. In: The oncologist
19.4, pp. 403, 413. ISSN: 1549-490X. DOI: 10.1634/theoncologist.2013-0345.
Fukushima, Kunihiko (Sept. 1975). “Cognitron: A self-organizing multilayered neural
network”. In: Biological Cybernetics 20.3, pp. 121–136. ISSN: 1432-0770. DOI:
10.1007/BF00342633.
(Apr. 1980). “Neocognitron: A self-organizing neural network model for a
mechanism of pattern recognition unaffected by shift in position”. In: Biological
Cybernetics 36.4, pp. 193, 202. ISSN: 1432-0770. DOI: 10.1007/BF00344251.
Retrieved from
https://www.rctn.org/bruno/public/papers/Fukushima1980.pdf.
He, Kaiming et al. (June 2016). “Deep Residual Learning for Image Recognition”.
In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
IEEE, pp. 770, 778. DOI: 10.1109/CVPR.2016.90. Retrieved from
https://arxiv.org/pdf/1512.03385.pdf.
45
BIBLIOGRAPHY 46
Howard, Andrew et al. (Apr. 2017). “MobileNets: Efficient Convolutional Neural
Networks for Mobile Vision Applications”. In: DOI:
https://doi.org/10.48550/arXiv.1704.04861.
Hu, Leland S. et al. (May 2020). “Imaging of intratumoral heterogeneity in
high-grade glioma”. eng. In: Cancer letters 477, pp. 97, 106. ISSN: 1872-7980.
DOI: 10.1016/j.canlet.2020.02.025. Retrieved from
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7108976/.
Huang, Gao et al. (2019). “Convolutional Networks with Dense Connectivity”. In:
IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1, 1. ISSN:
1939-3539. DOI: 10.1109/TPAMI.2019.2918284. Retrieved from
https://arxiv.org/pdf/1608.06993.pdf.
Irmak, Emrah (Sept. 2021). “Multi-Classification of Brain Tumor MRI Images Using
Deep Convolutional Neural Network with Fully Optimized Framework”. In: Iranian
Journal of Science and Technology, Transactions of Electrical Engineering 45.3,
pp. 1015, 1036. ISSN: 2364-1827. DOI: 10.1007/s40998-021-00426-9.
Jinja2 (Mar. 2022). Jinja2. Version 3.1.1. URL:
https://jinja.palletsprojects.com/.
JSONSchema (May 2020). JSON Schema. Version 2020-12. URL:
https://json-schema.org/specification.html.
Kang, Jaeyong, Zahid Ullah, and Jeonghwan Gwak (Mar. 2021). “MRI-Based Brain
Tumor Classification Using Ensemble of Deep Features and Machine Learning
Classifiers”. In: Sensors 21.6. ISSN: 1424-8220. DOI: 10.3390/s21062222.
Khawaldeh, Saed et al. (2018). “Noninvasive Grading of Glioma Tumor Using
Magnetic Resonance Imaging with Convolutional Neural Networks”. In: Applied
Sciences 8.1. ISSN: 2076-3417. DOI: 10.3390/app8010027.
Korja, Miikka et al. (Oct. 2018). “Glioblastoma survival is improving despite
increasing incidence rates: a nationwide study between 2000 and 2013 in
Finland”. In: Neuro-Oncology 21.3, pp. 370, 379. ISSN: 1522-8517. DOI:
10.1093/neuonc/noy164.
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton (Dec. 2012). “ImageNet
Classification with Deep Convolutional Neural Networks”. In: Advances in Neural
Information Processing Systems. Ed. by F. Pereira et al. Vol. 25. NIPS’12. Lake
Tahoe, Nevada: Curran Associates, Inc., pp. 1097, 1105. URL:
BIBLIOGRAPHY 47
https://proceedings.neurips.cc/paper/2012/file/
c399862d3b9d6b76c8436e924a68c45b-Paper.pdf.
Lecun, Y. et al. (Nov. 1998). “Gradient-Based Learning Applied to Document
Recognition”. In: Proceedings of the IEEE 86.11, pp. 2278, 2324. ISSN:
1558-2256. DOI: 10.1109/5.726791. Retrieved from
http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf.
Lie Uberg, Lewi (May 2022). Tumorclass.info. Version 0.1.0. URL:
https://github.com/lewiuberg/tumorclass.info.
Lin, Min, Qiang Chen, and Shuicheng Yan (2014). “Network In Network”. In: CoRR
abs/1312.4400. eprint: 1312.4400. URL:
https://www.semanticscholar.org/paper/Network-In-Network-Lin-
Chen/5e83ab70d0cbc003471e87ec306d27d9c80ecb16.
Louis, David N. et al. (Aug. 2007). “The 2007 WHO classification of tumours of the
central nervous system”. eng. In: Acta neuropathologica 114.2, pp. 97, 109. ISSN:
0001-6322. DOI: 10.1007/s00401-007-0243-4.
Luo, Qian et al. (Aug. 2018). “Comparisons of the accuracy of radiation diagnostic
modalities in brain tumor: A nonrandomized, nonexperimental, cross-sectional
trial”. eng. In: Medicine 97.31. ISSN: 1536-5964. DOI:
10.1097/MD.0000000000011256. Retrieved from
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6081153/.
OpenAPI (Feb. 2021). OpenAPI. Version 3.1.0. URL:
https://spec.openapis.org/oas/latest.html.
Pydantic (May 2022). Pydantic. Version 1.9.0. URL:
https://pydantic-docs.helpmanual.io.
Scarpace, Lisa et al. (2019). “Data From REMBRANDT [Data set]”. In: DOI:
10.7937/K9/TCIA.2015.588OZUZB.
Sevli, Onur (June 2021). “Performance Comparison of Different Pre-Trained Deep
Learning Models in Classifying Brain MRI Images”. In: Acta Infologica 5, p. 2021.
URL:
http://iupress.istanbul.edu.tr/en/journal/acin/article/performance-
comparison-of-different-pre-trained-deep-learning-models-in-
classifying-brain-mri-images. Retrieved from https://cdn.istanbul.edu.
tr/file/JTA6CLJ8T5/99DD9C496BF14E44859851B33E49A006.
BIBLIOGRAPHY 48
Simonyan, Karen and Andrew Zisserman (Sept. 2015). “Very Deep Convolutional
Networks for Large-Scale Image Recognition”. In: International Conference on
Learning Representations. URL: https://arxiv.org/pdf/1409.1556.pdf.
Starlette (Nov. 2021). Starlette. Version 0.17.1. URL: https://www.starlette.io.
Szegedy, Christian, Sergey Ioffe, et al. (Feb. 2017). “Inception-v4, Inception-ResNet
and the Impact of Residual Connections on Learning”. In: Proceedings of the
Thirty-First AAAI Conference on Artificial Intelligence. AAAI’17. San Francisco,
California, USA: AAAI Press, pp. 4278, 4284. DOI: 10.5555/3298023.3298188.
URL: https://dl.acm.org/doi/10.5555/3298023.3298188. Retrieved from
https://arxiv.org/pdf/1602.07261.pdf.
Szegedy, Christian, Wei Liu, et al. (June 2015). “Going deeper with convolutions”.
In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
IEEE, pp. 1, 9. URL: https://ieeexplore.ieee.org/document/7298594.
Retrieved from https://arxiv.org/pdf/1409.4842.pdf.
Szegedy, Christian, Vincent Vanhoucke, et al. (Dec. 2016). “Rethinking the
Inception Architecture for Computer Vision”. In: 2016 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR). IEEE, pp. 2818, 2826. DOI:
10.1109/CVPR.2016.308. Retrieved from
https://arxiv.org/pdf/1512.00567.pdf.
Tandel, Gopal S. et al. (2020). “Multiclass magnetic resonance imaging brain tumor
classification using artificial intelligence paradigm”. In: Computers in Biology and
Medicine 122, p. 103804. ISSN: 0010-4825. DOI:
10.1016/j.compbiomed.2020.103804. Retrieved from
https://www.sciencedirect.com/science/article/pii/S0010482520301724.
Uvicorn (Mar. 2022). Uvicorn. Version 0.17.6. URL: https://www.uvicorn.org.
Wu, Chen-Xing et al. (Aug. 2015). “Peritumoral edema on magnetic resonance
imaging predicts a poor clinical outcome in malignant glioma”. In: Oncology
Letters 10.5, pp. 2769, 2776. DOI: 10.3892/ol.2015.3639.
Xie, Saining et al. (July 2017). “Aggregated Residual Transformations for Deep
Neural Networks”. In: 2017 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR). IEEE, pp. 5987, 5995. DOI: 10.1109/CVPR.2017.634.
Retrieved from https://arxiv.org/pdf/1611.05431.pdf.
A
Short Paper
49
Analysis of Brain Tumor Using MRI Images
Lewi L. Uberg
ID
Applied Data Science
Noroff University College
Kristiansand, Norway
Abstract—The increasing rates of deadly brain tumors in
humans correspondingly increase the need for highly educated
medical personnel for diagnosis and treatment. Therefore, to
reduce the workload and the time from suspicion of disease to
diagnosis and suitable treatment, there is a need to automate the
initial part of the process by implementing a Computer-Aided-
Disease-Diagnosis (CADD) system for brain tumor classification.
By studying the types of tumors involved, how the convolutional
neural network functions, some of its pre-trained models, and
their application in brain tumor classification, the likelihood
of producing a promising CADD system increases heavily. The
research shows that the DenseNet121 architecture, either fully
trained or using transfer learning, likely is the most appropriate
candidate for the CADD system in development.
Index Terms—Brain Tumor Classification, Medical Imaging,
Magnetic Resonance Imaging, Convolutional Neural Networks,
Machine Learning, Deep learning.
I. INTRODUCTION
The research undertaken is to acquire knowledge about all
the needed components to develop and implement a working
Computer-Aided-Disease-Diagnosis (CADD) system for brain
tumor classification. First, by researching the brain tumors
themselves and the convention of their classification. Second,
the convolutional neural network (CNN); to understand its
structure and functionality and how to manipulate it. Third,
find different CNN architectures with promising results in
similar tasks. Fourth, find and evaluate the applicability of
available data sources. Fifth, implement the most promising
solutions and explore the applicability of using transfer learn-
ing for the given task to take advantage of previously gained
knowledge. Finally, evaluate the results of the experiments.
II. RELATED WORK
A. Brain Tumors
The cell of origin and features found when examining the
cells tissue; the histopathological characteristics, define central
nervous system tumors and predict their behavior [1]. For ex-
ample, cerebral gliomas are neuroepithelial tumors originating
from the supporting glial cells of the central nervous system
[2].
After meningiomas, a usually benign tumor originating from
the meningeal tissue of the brain, the most common primary
brain tumor in adults overall are gliomas, with a rate of 5 to
6 persons per 100,000 annually [3].
The World Health Organization (WHO) tissue classification
system categorizes gliomas with grade 1 as the lowest and
grade 4 as the highest. Thus, low-grade gliomas (LGG) consist
of grade I and II tumors [2], while high-grade gliomas (HGG)
consist of grade III and IV [3]. Grade I are the least malignant
or benign tumors, including Pilocytic Astrocymatoma, Cranio-
pharyngioma, Grangliocytoma, and Ganglioglioma. Grade II is
relatively slow-growing but may recur as higher grade, includ-
ing Astrocytoma, Pineocytoma, and Pure Oligodendroglioma.
Grade III are malignant and tend to recur as higher grade,
including Anaplastic Astrocytoma, Anaplastic Ependymoma,
and Anaplastic Oligodendroglioma. Finally, grade IV is the
most malignant, aggressive, necrosis and recurrence prone,
including tumor types Glioblastoma Multiforme (GBM), Pi-
neoblastoma, Medulloblastoma, and Ependymoblastoma [1,
107].
Often occurring in younger people, otherwise healthy pa-
tients, LGGs are a diverse group of primary brain tumors. Gen-
erally, they have a relatively good prognosis, and prolonged
survival rate [2]. However, over 75% of gliomas are HGG,
GBM being the most common and aggressive, accounting for
56.1% of all gliomas. In addition, HGGs, particularly GBM,
can exhibit a distinct tumor cell population that confounds
clinical diagnosis and management. As a result, GBM has
a grim prognosis, with a median survival of 15 months,
despite the best available treatments. Having a relatively high
occurrence frequency and being difficult to diagnose and treat
has made HGGs, and GBM, in particular, the subject of
tremendous interest in neuro-oncologic research [3].
Histopathologic examination to study tissue morphology
diagnose and grade brain tumors is the gold standard [2].
However, surgical resection for diagnosing a brain tumor is an
invasive and risky method. Nevertheless, there are several non-
invasive diagnostic methods, like neuroimaging. Neuroimaging
techniques widely used by the medical community include
Computed Tomography (CT), Magnetic Resonance Imaging
(MRI), Positron Emission Tomography (PET), and Plain-Film
Radiography (PFR) [4].
Conventional MRI is the current imaging procedure of
choice and identifies tumor size and associated Peritumoral
Edema (PTE), one of the main features of malignant glioma
[5], with or without contrast enhancement. Nevertheless, char-
acteristic MRI findings cannot determine the tumor grade
alone [2]. Moreover, MRIs of HGG, GMB in particular, also
lack the capability to resolve intratumoral heterogeneity that
may be present. Nevertheless, more advanced imaging proce-
dures like PET offer a range of physiologic and biophysical
image features that can improve the accuracy of imaging
diagnoses [3].
Studies comparing imaging features with tissue benchmarks
have employed either qualitative descriptions or have com-
pared individual quantitative metrics in a univariate fashion,
which has produced solid correlations for certain clinical
scenarios. However, for other scenarios, the correlation of
imaging and histopathological characteristics may not be self-
evident by visual inspection or sufficiently represented by
simple statistical features. Thus, in pursuit of brain tumor diag-
nosis without surgical intervention, researchers have developed
more advanced imaging methods such as texture analysis,
mechanic modeling, and machine learning (ML) that form a
predictive multi-parametric image-based model. The applica-
tion of ML models is an emerging field in radiogenomics and
represents a data-driven approach to identifying meaningful
patterns and correlations from often complex data sources. ML
models train by feeding the ML algorithm a substantial amount
of pre-classified image data as input, such as MRI images,
to learn which patterns belong to the different classes. The
resulting ML model uses these previously learned patterns to
predict the appropriate class for the new instance. These ML
models have enabled researchers to predict tumor cell density,
and together with texture analysis, recognize regional and
genetically distinct subpopulations coexisting within a single
GBM tumor [3].
B. Convolutional Neural Network
The convolutional neural network (CNN) is a concept
introduced by [6] as the Neocognitron, as a model of the
brain’s visual cortex; an improvement of [7] previous model
for visual pattern recognition. Furthermore, [8] significantly
improved the Neocognitron to one of the most successful
pattern recognition models, which has significantly impacted
the field of computer vision.
The most common use case for CNNs is pattern detection
in images. One or more hidden convolution layers uses filters
to convolve or scan over an input matrix, such as a binary
image. These filters closely resemble neurons in a dense layer
of an artificial neural network. Here a filter is learned to detect
a specific pattern such as an edge or circle; adding more
filters to a convolutional layer will enable more features to
be learned. The filter size is the size of the matrix convolving
over the image matrix, and the stride is the number of pixel
shifts over the input matrix. The convolutional layer performs
matrix multiplication on the image and filter matrix for each
stride taken. The resulting output is called a feature map. Two
options are available when the filter does not fit the image.
First, the part of the image matrix that does not fit the filter gets
dropped; this is called valid padding. Alternatively, zeros are
added to the image matrix’s edges, enabling the filter matrix
to fit the image matrix entirely; which is called zero-padding.
In Fukushima’s original paper [6], tanh is used as the acti-
vation function. However, after Rectified Linear Unit (ReLU)
was introduced as the activation function by Krizhevsky in
2012 with AlexNet [9], it has become the most common
activation function for a convolutional layer. ReLu is an almost
linear function with a low computational cost. ReLu converges
fast by transforming the input to the maximum of zero or the
input value, meaning that the positive linear slope does not
saturate or plateau when the input becomes large. Also, ReLU
does not have a vanishing gradient problem like sigmoid or
than. Hidden dense, alternatively called fully-connected layers,
also tend to use ReLU as their activation function.
Spatial pooling, a subsampling or downsampling, can be
applied to reduce the number of tunable parameters, which is
the dimensionality of a feature map. The most commonly used
type of spatial pooling is Max-pooling. The Max-pooling layer
operates much like a convolutional layer by using filters and
stride. However, the Max-pooling layer takes the maximum
value in its filter matrix as the output value. So, for example,
a Max-pooling layer with an input matrix of 8x8 and a filter
size of 2x2 would have an output of 4x4 containing the largest
value for each region. Thus, the Max-poolings downsampling
will decrease the computational cost for the following layers
in the network. While also concluding the feature extraction
part of the CNN and initiating the feature learning part.
Both convolutional and max-pooling layer outputs matrices
while fully-connected layer only accepts vectors. Adding a
flattening layer reduces the last convolutional layer’s output
dimensionality to the shape (-1, 1), or it directly transforms
the matrix into a vector. In addition, this operation is a com-
putationally cheap way of learning non-linear combinations of
higher-level feature representation from the convolutional or
max-pooling layer’s output.
The final layer, the output layer, is a dense or fully-
connected layer with the same number of neurons as classes
to be classified. The activation function for the final layer is
strongly dependent on the loss function. For example, a single
neuron sigmoid activated fully-connected layer as the output,
compiled with binary cross-entropy as the loss function, would
yield an equivalent result as two softmax activated neurons in
a network using categorical cross-entropy as the loss function;
in other words, a binary classification.
C. Brain Tumor Classification
In 2021 a fully automatic hybrid solution for brain tumor
classification comprised of several steps was proposed [10].
First, pre-process the brain MRI images by cropping, resizing,
and augmenting. Second, use pre-trained CNN models for
feature extraction with better generalization. Third, select the
top three performing features using fined-tuned ML classifiers
and concatenate these features. Finally, use the concatenated
feature as input for the ML classifiers to predict the final output
for the brain tumor MRI.
The researchers selected three different publicly available
brain tumor MRI datasets for experimentation. The researchers
established a naming convention of three parts, the type, the
size, and the number of classes, i. e., a medium brain tumor
dataset with three classes is named ”BT-medium-3c”. The
first dataset, BT-small-2c, comprises 253 images, 155 images
classified as containing tumors, and 98 images classified as
without tumors. The second dataset, BT-large-2c, comprises
3000 images, 1500 images containing tumors, and 1500 im-
ages without tumors. The third and final dataset, BT-large-4c,
comprises 3064 images containing four classes, not tumorous,
glioma tumor, meningioma tumor, and pituitary tumor. All the
datasets follow the standard convention of subdividing into
80% for training and 20% for testing.
Most of the images in the datasets contain undesired spaces
and areas. However, cropping the image only to contain the
relevant area for analysis can lead to better classification
performance. In addition, if a dataset is imbalanced or small,
augmentation may boost the learning capabilities. Augmen-
tation creates multiple copies of the images, modified in
different ways, like mirroring, rotating, or adjusting the image
brightness. In addition to dataset augmentation, the images
are resized to fit the pre-trained CNN’s expected dimensions;
224x224px, except Inception V3, which expects 299x299px.
The proposed scheme uses a novel feature evaluation and
selection mechanism, an ensemble of 13 pre-trained CNNs,
to extract robust and discriminative features from the brain
MRI images without human supervision. The CNN ensem-
ble, is comprised of ResNet-50, ResNet-101, DenseNet-121,
DenseNet-169, VGG-16, VGG-19, AlexNet, Inception V3,
ResNext-50, ResNext-101, ShuffleNet, MobileNet, and Mnas-
Net. Since the researchers use fairly small datasets for training,
they take a transfer learning-based approach by using the fixed
weights on the bottleneck layers of each CNN model pre-
trained on the ImageNet dataset.
Using the features extracted from the CNN models, a
synthetic feature is formed by evaluating each feature from
the CNN ensemble with an ensemble of nine different ML
classifiers and concatenating the top three features from the
different CNNs. Since different CNN architectures capture
different aspects of the processed data, the synthetic feature
represents a more discriminative feature than features extracted
from a single CNN.
The ML classifier ensemble, implemented using the scikit-
learn library, is comprised of a fully-connected (FC) neural
network (NN) layer, Gaussian Na
¨
ıve Bayes (Gaussian NB),
Adaptive Boosting (AdaBoost), K-Nearest Neighbors (k-NN),
Random forest (RF), Extreme Learning Machine (ELM),
Support Vector Machines (SVM) with three different kernels:
linear, sigmoid, and radial basis function (RBF).
The first classifier uses the conventional CNN approach. A
softmax activated FC layer with a cross-entropy loss function;
the most commonly used loss function for neural networks.
This first classifier with an initial learning rate of 0.001 uses
Adaptive Moment Estimation (Adam) optimization of the layer
weights and adaptively recalculates the learning rate. Finally,
collecting the highest average accuracy per run for a total of
100 epochs.
The researchers also use the Gaussian variant of Na
¨
ıve
Bayes that follows the Gaussian (normal) distribution with no
co-variance between the attributes in the classes.
The next classifier Adaptive Boosting, or AdaBoost for
short, is an ensemble learning algorithm that combines multi-
ple weaker classifiers (Decision trees with a single split, called
stumps.) to improve performance. AdaBoost works iteratively
and assigns higher weights to the mislabeled instances.
The following classifier is one of the simplest classifiers, the
k-Nearest Neighbors (kNN). kNN does not train a model but
calculates predictions directly from the data currently stored
in memory. Using Euclidean distance as the distance metric,
the kNN classifier finds the k nearest neighbors of the training
instances closest to the given feature. It then assigns the most
common class label among the given neighbor based on the
most common label of its neighbors, the majority vote. Setting
the nearest neighbors from 1 to 4, the one with the highest
accuracy was selected.
Random Forest (RF) is a learning algorithm that creates
multiple decision trees using the bootstrap aggregation (bag-
ging) method to classify features into a class—using the
Gini index as a cost function while creating the decision
trees. RF selects random n attributes or features to find the
optimal split point, reducing the correlation among the trees
and having lower ensemble error rates. RF predicts by feeding
features into all the classification trees, counting the number
of predictions for each class, and choosing the class with the
most significant number of votes as the correct class for the
given feature. To find the optimal split, the researchers set the
feature consideration number to the square root of the total
number of features and the number of decision trees from 1
to 150, thereby selecting the one with the highest accuracy.
Extreme Learning Machine (ELM) is a learning algorithm
for Single-Layer Feed-Forward Neural Networks (SLFN),
which provides good performance at a fast learning speed.
ELM is not an iterative algorithm, like the back-propagation
algorithm used in traditional SLFNs. Instead, ELM uses a
gradient-based technique, only tuning the weights once. The
researchers used 5000, 6000, 7000, 8000, 9000, 10,000 hidden
layers and selected the one with the highest accuracy.
The Support Vector Machine (SVM) uses the kernel func-
tion to transform the original data space, the number of
features, into a higher-dimensional space. Then aims to find a
hyperplane in that spacial dimension that distinctly classifies
the given feature. The researchers use the three most common
kernel functions, linear, sigmoid, and radial basis function
(RBF). In addition, the SVM has two hyper-parameters. First,
C, the soft margin cost function that controls each support
vector’s influence; set to 0.1, 1, 10, 100, 1000, 10000. Sec-
ondly, Gamma, which decides the curvature of the decision
boundaries; set to 0.00001, 0.0001, 0.001, 0.01. The hyper-
parameter combination that yielded the highest accuracy is
then selected.
Experimentation on the given datasets has two main tasks.
First, compare the several pre-trained CNN networks with
several ML classifiers. Second, show the effectiveness of the
concatenation of the top 2 or 3 features with the best results
from the first experiment.
For example, the top three features on the BT-small-2c
dataset are the DenseNet-169, Inception V3, and ResNeXt-50
features. Then on the BT-large-2c dataset, the DenseNet-121,
ResNeXt-101, and MnasNet features are the top three. While
on the BT-large-4c dataset, the DenseNet-169, MnasNet, and
ShuffleNet V2 features are the top three.
Observations from the second experiment show that SVM
with RBF kernel can find a more effective and complex set of
decision boundaries, outperforming the other ML classifiers on
the two most extensive datasets. However, this is not the case
for the smallest dataset since SVM with RBF tends to under-
perform when the number of training data samples is smaller
than the feature number for each data point. Furthermore, it is
almost impossible that features extracted from the pre-trained
CNNs are entirely independent. Therefore, since Gaussian
NB assumes that the features are independent, it performs
worst among the ML classifiers on all three datasets. On the
other hand, features extracted from the DenseNet architectures
predict well on all three datasets since they have all complexity
levels, giving them smoother decision boundaries, which tend
to predict especially well on insufficient training data. While,
features from the VGG, with its more basic architecture and
no residual blocks, yield the worst results. The effectiveness
of the concatenated top 2 or 3 features is evident for all ML
classifiers on the two largest datasets. However, on the small
dataset, it is only shown when using AdaBoost and k-NN.
The FC, RF, and Gaussian NB classifiers have the shortest
inference time. In comparison, k-NN has the longest since it is
the only ML classifier that needs to evaluate every data point
during prediction. While the results from these experiments
show promise on large datasets, further research is needed,
especially on model reduction for real-time medical systems
deployment.
III. DATA GATHERING & ANALYSIS
One could assume that with the whole internet as the arena,
the likelihood of finding an applicable dataset for something
as common as brain tumor MRIs is high. However, when
the need for a specific dataset is high, the availability seems
to decrease. The data-gathering part of the research project
starts with outlining some criteria. The data should contain
non-tumorous, LGG, and HGG samples and be well balanced
between the labels. The latter is essential since too much data
augmentation on medical images often does not generalize
well. With optimism, the search began on familiar places like
Kaggle, Google Dataset Search, GitHub, and paperswithcode.
While some of the found sources looked promising at first,
none of them quite fit the bill. They were either too small, had
some but not all of the needed labels, or were very unbalanced
between the labels. However, the larges problem is that most
of them did not have the non-tumorous label in the samples. A
combination of different datasets was considered for a period,
but the idea of not having one battle-tested dataset at least
as the foundation did not sit well. So the search continued.
Finally, a well-suited candidate was found at The Cancer
Imaging Archive (TCIA). The REMBRANDT (REpository for
Molecular BRAin Neoplasia DaTa) Dataset [11] seemed to
have all the essential characteristics the project needs. The
dataset is one of the most trusted publicly available datasets,
comprised of MRI scans from 130 subjects of three classes,
non-tumorous, LGG, and HGG. Furthermore, the LGG and
HGG classes have subclasses that opened the opportunity to
make the classification outcome more extensive. The LGG
class includes Astrocytoma II and Oligodendroglioma II, while
HGG includes Astrocytoma III, Oligodendroglioma III, and
Glioblastoma Multiforme IV. The dataset is a combination
of metadata from various text and spreadsheet files and the
110,020 images in the DICOM format, which also includes
a vast quantity of metadata. Due to the dataset size and the
state of its metadata, preprocessing was a rather large task.
Preprocessing the data includes combining all the sources,
extracting the desired features, removing samples with missing
data points, and giving the samples a unified naming for
the labels. After completing preprocessing, the dataset is
comprised of 123 patients and 105,265 slides, distributed as
shown in Table I. At this point, it was discovered that one
key bit of information was missing; how to separate the MRI
slides that contained the tumorous cells from the ones that
did not contain them. All the source data were reexamined,
but the answer was not found. While searching online for
how to find the needed key, one paper stood out [12]. The
paper used that same dataset for a similar application. While
being reduced to 4069 slides, it seemed that the authors of this
paper had found a way to filter the data further. A meeting
to understand how to reproduce the dataset was arranged
by reaching out to the paper’s main author, Dr. Khawaldeh.
While being grateful for Dr. Khawaldeh’s collaboration, some
devastating news arose. His research team had employed help
from neurologists to go through each slide manually and label
them correctly. Fortunately, Dr. Khawaldeh offered to share a
dataset of labeled samples divided into Normal, LGG, and
HGG. After some further processing, the dataset has 735
samples distributed as shown in Table II, and is now ready
for training CNN models.
TABLE I
DATASET SAMPLE DISTRIBUTION
Disease Grade Label Unique Samples Sample Count
Astrocytoma II LGG 30 25286
Astrocytoma III HGG 17 16038
GBM IV HGG 43 32837
Non-Tumorous n/a n/a 15 17041
Oligodendroglioma II LGG 11 9335
Oligodendroglioma III HGG 7 4728
Total 123 105265
TABLE II
MANUALLY LABELED DATASET SAMPLE DISTRIBUTION
Label Sample Count
Normal 168
LGG 287
HGG 280
Total 735
Finding the CNN architecture with or without transfer
learning that yields the best results for the classification task is
the primary goal of the research project. Therefore designing a
pipeline where the CNN architecture can easily be substituted
is essential. Doing so required finding the generally most
suitable division of the Test, Validation, and Test subsets, the
most suitable baseline hyperparameter settings, and if image
augmentation were to be used or not. For this task, VGG16
[13] were selected, as it had been used for the same application
with promising results [14] [15].
After training a large number of VGG16 models, a baseline
for training other CNN architectures was established. In the
baseline, a total of 60 images, 20 from the three classes
picked at random for the Test set. Then, the remaining images
are divided into 30% for the Validation set and 70% for
the Training set. In addition, the Training set is augmented
with rotation, width and height shift, shearing, zoom and
brightness adjustment, and horizontal flipping. Experimenting
with augmentation showed that many filters with minimal
adjustments to the original gave better results than fewer with
more extensive adjustments. The Normal label has a little over
half of the number of samples as the LGG and HGG, and is
therefore producing a higher number of augmented images.
The Normal label in the Train set is increased from 103 to
309, LGG from 186 to 372, and HGG from 182 to 364. With
a batch size of 24, the baseline uses 10 epochs for training
with a learning rate of 0.001, Adam for optimization with
categorical cross-entropy as the loss function.
In addition to the VGG16 architecture, four other CNN
architectures were used to train models, including MobileNet
[16], ResNext [17], DenseNet121 [18], and a custom-designed
architecture. The custom CNN architecture is designed to
accept N amount of 224x224 image matrices with 3 color
channels; it consists of three convolutional layers of 32, 64,
and 64 filters, each of filter size 3x3, with zero-padding, and
ReLU as the activation function. The first convolutional layer
is followed by a max-pooling layer of pool size 4x4, and
the last two convolutional layers have a pool size of 2x2,
each followed by a dropout layer with a dropout rate of 0.15.
Next, a flattening layer is added to transform matrix output to
vector inputs to be accepted by the first dense layer, which
is comprised of 512 ReLU activated neurons, followed by a
dropout layer with a 0.5 dropout rate. The last hidden layer
is a dense layer of 256 ReLU activated neurons. The model’s
final layer, its output, is a 3 neuron softmax activated dense
layer for classification. The general architecture of this model
is shown in Fig. 1.
IV. RESULTS
All the CNN architectures implemented to this point show
outstanding results, as shown in Table III. While multiple
runs were made on each architecture to obtain the given
results, and there is an indication that the Test set in particular
needs expansion, the results indicate that DenseNet121 is the
best candidate for the given classification task. Surprisingly
enough, with the custom architecture as the runner-up.
Fig. 1. Custom CNN Architecture.
TABLE III
CNN ARCHITECTURE RESULTS
Architecture Epochs Initial learning rate Loss Accuracy & F1
VGG16 10 0.001 0.3218 0.933
MobileNet 28 0.001 0.1141 0.950
ResNext 12 0.001 0.1244 0.933
DenseNet121 12 0.001 0.0254 1.00
Custom 22 0.000316 0.0989 1.00
V. CONCLUSION & FUTURE WORK
The main focus going forward is to expand the dataset
with samples from other applicable datasets, primarily to
increase the Test set. The goal is to obtain the same results
with the dataset split 70%, 15%, 15% or even 60%, 20%,
20% between Train, Validation, and Test sets. However, it is
essential to be selective and verify the sources before including
them in the main dataset. The DenseNet121 and custom
architecture clearly show the best results and will therefore
be the main two architectures of the project going forward;
however, the other architectures will also be trained and tested.
If results are reduced after expanding the dataset, some actions
will be taken, like adding more layers on the architecture’s
tail, modifying parts of the architecture, and hyperparameter
tuning. Furthermore, a meeting with the inventor of the Tsetlin
Machine [19], Dr. Granmo, is scheduled in February 2022 to
discuss implementing the convolutional tsetlin machine [20]
as the classification model of the research project.
REFERENCES
[1] D. N. Louis, H. Ohgaki, O. D. Wiestler, W. K. Cavenee, P. C.
Burger, A. Jouvet, B. W. Scheithauer, and P. Kleihues, “The 2007
WHO classification of tumours of the central nervous system, Acta
neuropathologica, vol. 114, no. 2, pp. 97,109, Aug. 2007. [Online].
Available: https://pubmed.ncbi.nlm.nih.gov/17618441
[2] D. A. Forst, B. V. Nahed, J. S. Loeffler, and T. T. Batchelor,
“Low-grade gliomas, The oncologist, vol. 19, no. 4, pp. 403,413, Apr.
2014. [Online]. Available: https://pubmed.ncbi.nlm.nih.gov/24664484
[3] L. S. Hu, A. Hawkins-Daarud, L. Wang, J. Li, and K. R. Swanson,
“Imaging of intratumoral heterogeneity in high-grade glioma, Cancer
letters, vol. 477, pp. 97,106, May 2020. [Online]. Available:
https://pubmed.ncbi.nlm.nih.gov/32112907
[4] Q. Luo, Y. Li, L. Luo, and W. Diao, “Comparisons of the accuracy
of radiation diagnostic modalities in brain tumor: A nonrandomized,
nonexperimental, cross-sectional trial, Medicine, vol. 97, no. 31, Aug.
2018. [Online]. Available: https://pubmed.ncbi.nlm.nih.gov/30075495
[5] C.-X. Wu, G.-S. Lin, Z.-X. Lin, J.-D. Zhang, L. Chen, S.-Y. Liu, W.-L.
Tang, X.-X. Qiu, and C.-F. Zhou., “Peritumoral edema on magnetic
resonance imaging predicts a poor clinical outcome in malignant
glioma, Oncology Letters, vol. 10, no. 5, pp. 2769,2776, Aug. 2015.
[Online]. Available: https://www.spandidos-publications.com/10.3892/
ol.2015.3639
[6] K. Fukushima, “Neocognitron: A self-organizing neural network model
for a mechanism of pattern recognition unaffected by shift in position,
Biological Cybernetics, vol. 36, no. 4, pp. 193,202, Apr. 1980. [Online].
Available: https://link.springer.com/article/10.1007/BF00344251
[7] ——, “Cognitron: A self-organizing multilayered neural network,
Biological Cybernetics, vol. 20, no. 3, pp. 121–136, Sep. 1975.
[Online]. Available: https://doi.org/10.1007/BF00342633
[8] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-Based
Learning Applied to Document Recognition, Proceedings of the
IEEE, vol. 86, no. 11, pp. 2278,2324, Nov. 1998. [Online]. Available:
http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf
[9] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification
with Deep Convolutional Neural Networks, in Advances in Neural
Information Processing Systems, ser. NIPS’12, F. Pereira, C. J. C.
Burges, L. Bottou, and K. Q. Weinberger, Eds., vol. 25. Red
Hook, NY, USA: Curran Associates, Inc., Dec. 2012, pp.
1097,1105. [Online]. Available: https://proceedings.neurips.cc/paper/
2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf
[10] J. Kang, Z. Ullah, and J. Gwak, “MRI-Based Brain Tumor
Classification Using Ensemble of Deep Features and Machine Learning
Classifiers, Sensors, vol. 21, no. 6, Mar. 2021. [Online]. Available:
https://www.mdpi.com/1424-8220/21/6/2222
[11] L. Scarpace, A. E. Flanders, R. Jain, T. Mikkelsen, and
D. W. Andrews, “Data From REMBRANDT [Data set],
2019. [Online]. Available: https://wiki.cancerimagingarchive.net/display/
Public/REMBRANDT#35392299515cc672b974080a1394cbe9c649c74
[12] S. Khawaldeh, U. Pervaiz, A. Rafiq, and R. S. Alkhawaldeh,
“Noninvasive Grading of Glioma Tumor Using Magnetic Resonance
Imaging with Convolutional Neural Networks, Applied Sciences, vol. 8,
no. 1, 2018. [Online]. Available: https://www.mdpi.com/2076-3417/8/
1/27
[13] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks
for Large-Scale Image Recognition, in International Conference
on Learning Representations, Sep. 2015. [Online]. Available: https:
//arxiv.org/pdf/1409.1556.pdf
[14] O. N. Belaid and M. Loudini, “Classification of Brain Tumor by
Combination of Pre-Trained VGG16 CNN, Journal of Information
Technology Management, vol. 12, no. 2, pp. 13,25, 2020. [Online].
Available: https://jitm.ut.ac.ir/article 75788.html
[15] O. Sevli, “Performance Comparison of Different Pre-Trained Deep
Learning Models in Classifying Brain MRI Images / Beyin MR
G
¨
or
¨
unt
¨
ulerini Sınıflandırmada Farklı
¨
Onceden E
˘
gitilmis¸ Derin
¨
O
˘
grenme
Modellerinin Performans Kars¸ılas¸tırması, Acta Infologica, vol. 5, p.
2021, Jun. 2021. [Online]. Available: https://cdn.istanbul.edu.tr/file/
JTA6CLJ8T5/99DD9C496BF14E44859851B33E49A006
[16] A. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand,
M. Andreetto, and H. Adam, “MobileNets: Efficient Convolutional
Neural Networks for Mobile Vision Applications, ArXiv, 04 2017.
[Online]. Available: https://arxiv.org/pdf/1704.04861.pdf
[17] S. Xie, R. Girshick, P. Doll
´
ar, Z. Tu, and K. He, “Aggregated
Residual Transformations for Deep Neural Networks, in 2017
IEEE Conference on Computer Vision and Pattern Recognition
(CVPR). IEEE, Jul. 2017, pp. 5987,5995. [Online]. Available:
https://ieeexplore.ieee.org/document/8100117
[18] G. Huang, Z. Liu, G. Pleiss, L. Van Der Maaten, and K. Weinberger,
“Convolutional Networks with Dense Connectivity, IEEE Transactions
on Pattern Analysis and Machine Intelligence, pp. 1,1, 2019. [Online].
Available: https://ieeexplore.ieee.org/document/8721151
[19] O.-C. Granmo, “The Tsetlin Machine - A Game Theoretic Bandit
Driven Approach to Optimal Pattern Recognition with Propositional
Logic, ArXiv, vol. abs/1804.01508, 2018. [Online]. Available:
https://arxiv.org/abs/1804.01508
[20] O.-C. Granmo, S. Glimsdal, L. Jiao, M. Goodwin, C. W. Omlin,
and G. T. Berge, “The Convolutional Tsetlin Machine, ArXiv, vol.
abs/1905.09688, 2019. [Online]. Available: https://arxiv.org/abs/1905.
09688
B
Source Code Repository
The source code repository may be found at the following URL: https://github.
com/lewiuberg/tumorclass.info (Lie Uberg 2022).
56
Word count metrics
NUC Bachelor Project Word Count:
Total Sum count: 11433 Words in text: 11276 Words in headers: 82
Words outside text (captions, etc.): 74 Number of headers: 48 Number of
floats/tables/figures: 16 Number of math inlines: 1 Number of math displayed: 0
NOTE: References are excluded.
57