ANALYSIS OF BRAIN TUMOR USING MRI

IMAGES

Submitted in partial fulﬁlment

of the requirements of the degree of

BACHELOR IN APPLIED DATA SCIENCE

of Noroff University College

Lewi Lie Uberg

Arendal, Norway

May 2022

Declaration

I declare that the work presented for assessment in this submission is my own, that it

has not previously been presented for another assessment, and that work completed

by others has been appropriately acknowledged.

Name: Lewi Lie Uberg Date: May 24, 2022

Abstract

An increasing rate of deadly brain tumors in humans also sees the increasing need

for highly educated medical personnel like neurologists and radiologists for diagno-

sis and treatment. Thus, to reduce the workload and the time from initial suspi-

cion of disease to diagnosis and a suitable treatment plan, there is a need to im-

plement a Computer-Aided-Disease-Diagnosis (CADD) system for brain tumor clas-

siﬁcation. By studying the types of tumors involved, how the convolutional neural

network functions, the evolution of its pre-deﬁned architectures, models using pre-

trained weights, and their application in brain tumor classiﬁcation, the likelihood of

producing a promising CADD system increases heavily. The outcome of the re-

search conducted in this project presents the starting point of an open-source project

to further develop a CADD system for brain tumor classiﬁcation with reliable results.

The project includes all components of a working CADD system, including the data

preprocessing pipeline, the pipeline for deﬁning and training CNN classiﬁcation mod-

els, and a user interface in the form of an API as the backend and a website as

the frontend. The project is intended to be open to the general public—however, its

primary focus is on facilitating medical imaging researchers, medical students, radi-

ologic technologists, and radiologists.

Keywords: Brain Tumor Classiﬁcation, Magnetic Resonance Imaging, Convolutional

Neural Networks, Machine Learning, Deep learning.

Acknowledgements

I want to thank my friends over at RealPython.com for their great articles and gen-

eral counsel with anything related to programming. I am also grateful to the peo-

ple on Stackoverﬂow who shares their knowledge and feedback, Khan Academy for

straightforwardly explaining math, the content creators of Youtube, and the open-

source community, who openly share their code on GitHub.

Thanks to my classmate, friend, and sparring partner Zeljka Matic for all our discus-

sions and her help in previous demanding classes.

I am grateful for the general counsel I received from Maxine Brandal V

agnes on how

to structure this report.

I thank my supervisor, Professor Seifedine Kadry, for the guidance, help, and feed-

back he has given me.

I am grateful for the manually labeled dataset provided by Dr. Saed Khawaldeh.

I want to thank my mother for all the nights she has helped out looking after the kids

when my wife is working, and an assignment is due.

Finally, I want to thank my wife and three sons for putting up with a husband and dad

that studies while working more than full-time. The time-dept I now owe them will be

paid back with interest after this ﬁnal submission.

Contents

1 Introduction 1

1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Scope and Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4 Document Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Literature Review 4

2.1 Brain Tumors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . 6

2.3 Evolution of CNN Architectures . . . . . . . . . . . . . . . . . . . . . . 8

2.4 Brain Tumor Classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.5 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Data, Design and Implementation 17

3.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.2 Data Prepossessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.3 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.3.1 Model Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.3.2 Architecture Selection . . . . . . . . . . . . . . . . . . . . . . . 23

3.3.3 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.4.1 Model Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.4.2 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.4.3 Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4 Results 31

4.1 Classiﬁcation Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

CONTENTS iii

5 Conclusion 42

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.2 Summary of Research . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.3 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.4 Research Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.5 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

A Short Paper 49

B Source Code Repository 56

List of Figures

3.1 Custom CNN Flowchart with The AlexNet Architecture as Comparison. 24

3.2 Custom CNN Architecture. . . . . . . . . . . . . . . . . . . . . . . . . 25

4.1 Training History of The MobileNet Architecture. . . . . . . . . . . . . . 34

4.2 Training History of The AlexNet Architecture. . . . . . . . . . . . . . . 34

4.3 Running The API using Uvicorn . . . . . . . . . . . . . . . . . . . . . . 35

4.4 Expose Local Server To The Internet . . . . . . . . . . . . . . . . . . . 36

4.5 Interactive Documentation . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.6 Running cURL Commands From Terminal . . . . . . . . . . . . . . . . 38

4.7 Home Page of the User Interface . . . . . . . . . . . . . . . . . . . . . 39

4.8 Normally Classiﬁed MRI Image . . . . . . . . . . . . . . . . . . . . . . 40

4.9 LGG Classiﬁed MRI Image . . . . . . . . . . . . . . . . . . . . . . . . 41

List of Tables

3.1 Dataset Sample Distribution . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2 Manually Labeled Dataset Sample Distribution . . . . . . . . . . . . . 21

4.1 Concatenation of The Best Training Exploration Results . . . . . . . . 32

4.2 CNN Architecture Results . . . . . . . . . . . . . . . . . . . . . . . . . 32

Introduction

1.1 Problem Statement

Glioblastoma is a very aggressive form of cancer and the most common form of

glioma. Occurrences of glioblastoma increase consistently, and with a ﬁve-year mor-

tality rate of 97% (Korja et al. 2018), it is considered the most deadly form of cancer.

Glioblastoma originates from glial cells in the brain, a cell type that acts as support

cells for the central nervous system. There is no speciﬁc blood test (tumor marker)

that can reveal the presence of a glioblastoma; therefore, imaging examinations of

the brain like Magnetic Resonance Imaging (MRI) or computerized tomography (CT)

are central to the discovery of glioblastoma. After imaging examinations of the brain

are performed, it remains to make a diagnosis, followed by a treatment plan. Highly

educated people are needed to set a diagnosis, such as a radiologist. However, the

amount of suitable personnel depends on the patients’ location, and the waiting list

can be signiﬁcant. Implementation of a Computer-Aided-Disease-Diagnosis (CADD)

system for brain tumor classiﬁcation can reduce the workload of radiologists and

reduce the time from suspicion of disease to diagnosis and suitable treatment.

1.2. RESEARCH OBJECTIVES 2

1.2 Research Objectives

The research project’s objective is to acquire knowledge about all the needed com-

ponents to develop and implement a working Computer-Aided-Disease-Diagnosis

(CADD) system for brain tumor classiﬁcation.

• Research brain tumors themselves, as well as the convention of their classiﬁ-

cation.

• Develop a good understanding of the needed data and how it is collected and

preprocessed.

• Research the convolutional neural network (CNN), to understand its structure

and functionality, thereby how to manipulate it.

• Research different pre-deﬁned CNN architectures, their evolution, and how they

are implemented.

• Research transfer learning to take advantage of previously gained knowledge,

and if it can be beneﬁcial to classifying MRI images of brain tumors.

• Research examples of related work to provide ideas and for comparisons.

• Research ways for the project outcome to be usable to the general public.

• Make use of the research by implementing and deploying a fully working CADD

system for brain tumor classiﬁcation that is easy to maintain and build upon.

1.3 Scope and Limits

The research aims to explore the feasibility of a CADD system for brain tumor clas-

siﬁcation based on deep learning techniques such as CNNs in order to reduce the

workload of radiologists and reduce the time from suspicion of disease to diagno-

sis and suitable treatment. In addition, the CADD system must be easy to maintain

and build upon, as well as being usable to individuals with limited technical knowl-

edge.

The research is limited to data comprised of labeled samples of non-tumorous, LGG,

and HGG MRI images. Also, the amount of data that is available to the general public

since applying for research funding and any following administration needed to follow

up a collaboration with radiologists are too time-consuming for this project.

1.4. DOCUMENT STRUCTURE 3

1.4 Document Structure

The document is sectioned into chapters representing the signiﬁcant sections of the

research and implementation process. Chapter 2 covers the research in the form of a

literature review. Chapter 3 covers the data collection and preprocessing, as well as

the system design and implementation. Chapter 4 covers the results of the research

and implementation of the system. Finally, Chapter 5 covers a closing summary of

the project.

All the source code used in this project is provided in appendix B.

Literature Review

2.1 Brain Tumors

The cell of origin and features found when examining the cells tissue; the histopatho-

logical characteristics, deﬁne central nervous system tumors and predict their be-

havior (Louis et al. 2007). For example, cerebral gliomas are neuroepithelial tumors

originating from the supporting glial cells of the central nervous system (Forst et al.

2014).

After meningiomas, a usually benign tumor originating from the meningeal tissue of

the brain, the most common primary brain tumor in adults overall are gliomas, with a

rate of 5 to 6 persons per 100,000 annually (Hu et al. 2020).

The World Health Organization (WHO) tissue classiﬁcation system categorizes gliomas

from lowest to highest, with grade 1 being the lowest grade and grade 4 being the

highest grade. Thus, low-grade gliomas (LGG) consist of grade I and grade II tumors

(Forst et al. 2014), while high-grade gliomas (HGG) consist of grade III and grade

IV (Hu et al. 2020). Grade I are the least malignant or benign tumors, including

Pilocytic Astrocymatoma, Craniopharyngioma, Grangliocytoma, and Ganglioglioma.

2.1. BRAIN TUMORS 5

Grade II is relatively slow-growing but may recur as higher grade, including Astro-

cytoma, Pineocytoma, and Pure Oligodendroglioma. Grade III are malignant and

tend to recur as higher grade, including Anaplastic Astrocytoma, Anaplastic Ependy-

moma, and Anaplastic Oligodendroglioma. Finally, grade IV is the most malignant,

aggressive, necrosis and recurrence prone, including tumor types Glioblastoma Mul-

tiforme (GBM), Pineoblastoma, Medulloblastoma, and Ependymoblastoma (Louis et

al. 2007, p. 107).

Often occurring in young, otherwise healthy patients, LGGs are a diverse group of

primary brain tumors. Generally, they have a relatively good prognosis and prolonged

survival rate (Forst et al. 2014). However, over 75% of gliomas are HGG, GBM being

the most common and aggressive, accounting for 56.1% of all gliomas. In addition,

HGGs, particularly GBM, can exhibit a distinct tumor cell population that confounds

clinical diagnosis and management. As a result, GBM has a grim prognosis, with

a median survival of 15 months, despite the best available treatments. Having a

relatively high occurrence frequency and being difﬁcult to diagnose and treat has

made HGGs, and GBM, in particular, the subject of tremendous interest in neuro-

oncologic research (Hu et al. 2020).

Histopathologic examination to study tissue morphology, diagnose and grade brain

tumors is the gold standard (Forst et al. 2014). However, surgical resection for diag-

nosing a brain tumor is an invasive and risky method. Nevertheless, there are sev-

eral non-invasive diagnostic methods, like neuroimaging. Neuroimaging techniques

widely used by the medical community include Computed Tomography (CT), Mag-

netic Resonance Imaging (MRI), Positron Emission Tomography (PET), and Plain-

Film Radiography (PFR) (Luo et al. 2018).

Conventional MRI is the current imaging procedure of choice and identiﬁes tumor

size and associated Peritumoral Edema (PTE), one of the main features of malignant

glioma (Wu et al. 2015), with or without contrast enhancement. Nevertheless, char-

acteristic MRI ﬁndings cannot determine the tumor grade alone (Forst et al. 2014).

Moreover, MRIs of HGG, GMB in particular, also lack the capability to resolve intra-

tumoral heterogeneity that may be present. Nevertheless, more advanced imaging

procedures like PET offer a range of physiologic and biophysical image features that

can improve the accuracy of imaging diagnoses (Hu et al. 2020).

Studies comparing imaging features with tissue benchmarks have employed either

qualitative descriptions or have compared individual quantitative metrics in a uni-

variate fashion, which has produced solid correlations for certain clinical scenarios.

2.2. CONVOLUTIONAL NEURAL NETWORK 6

However, for other scenarios, the correlation of imaging and histopathological char-

acteristics may not be self-evident by visual inspection or sufﬁciently represented by

simple statistical features. Thus, in pursuit of brain tumor diagnosis without surgical

intervention, researchers have developed more advanced imaging methods such as

texture analysis, mechanic modeling, and machine learning (ML) that form a pre-

dictive multi-parametric image-based model. The application of ML models is an

emerging ﬁeld in radiogenomics and represents a data-driven approach to identifying

meaningful patterns and correlations from often complex data sources. ML models

trains by feeding the ML algorithm a substantial amount of pre-classiﬁed image data

as input, such as MRI images, to learn which patterns belong to the different classes.

The resulting ML model uses these previously learned patterns to predict the appro-

priate class for the new instance. These ML models have enabled researchers to

predict tumor cell density, and together with texture analysis, recognize regional and

genetically distinct subpopulations coexisting within a single GBM tumor (Hu et al.

2020).

2.2 Convolutional Neural Network

The convolutional neural network (CNN) is a concept introduced by (Fukushima

1980) as the Neocognitron, as a model of the brain’s visual cortex; an improve-

ment of (Fukushima 1975) previous model for visual pattern recognition. Further-

more, (Lecun et al. 1998) signiﬁcantly improved the Neocognitron to one of the most

successful pattern recognition models, which has signiﬁcantly impacted the ﬁeld of

computer vision.

The most common use case for CNNs is pattern detection in images. One or more

hidden convolution layers uses ﬁlters to convolve or scan over an input matrix, such

as a binary image. These ﬁlters closely resemble neurons in a dense layer of an

artiﬁcial neural network. Here a ﬁlter is learned to detect a speciﬁc pattern such

as an edge or circle; adding more ﬁlters to a convolutional layer will enable more

features to be learned. The ﬁlter size is the size of the matrix convolving over the

image matrix, and the stride is the number of pixel shifts over the input matrix. The

convolutional layer performs matrix multiplication on the image and ﬁlter matrix for

each stride taken. The resulting output is called a feature map. Two options are

available when the ﬁlter does not ﬁt the image. The part of the image matrix that

does not ﬁt the ﬁlter gets dropped; this is called valid padding. Alternatively, zeros

are added to the image matrix’s edges, enabling the ﬁlter matrix to ﬁt the image

matrix entirely; this is called zero-padding.

2.2. CONVOLUTIONAL NEURAL NETWORK 7

In (Fukushima 1980)s original paper, tanh is used as the activation function. How-

ever, after Rectiﬁed Linear Unit (ReLU) was introduced as the activation function by

(Krizhevsky, Sutskever, and Hinton 2012) in 2012 with AlexNet, it has become the

most common activation function for a convolutional layer. ReLu is an almost linear

function with a low computational cost. ReLu converges fast by transforming the in-

put to the maximum of zero or the input value, meaning that the positive linear slope

does not saturate or plateau when the input becomes large. Also, ReLU does not

have a vanishing gradient problem like sigmoid or than. Hidden dense, alternatively

called fully-connected layers, also tend to use ReLU as their activation function.

Spatial pooling, a subsampling or downsampling, can be applied to reduce the num-

ber of tunable parameters, which is the dimensionality of a feature map. The most

commonly used type of spatial pooling is Max-pooling. By using ﬁlters and stride,

the Max-pooling layer operates much like a convolutional layer. However, the Max-

pooling layer takes the maximum value in its ﬁlter matrix as the output value. So,

for example, a Max-pooling layer with an input matrix of 8x8 and a ﬁlter size of 2x2

would have an output of 4x4 containing the larges value for each region. Thus, the

Max-poolings downsampling will decrease the computational cost for the following

layers in the network. While also concluding the feature extraction part of the CNN

and initiating the feature learning part.

Both convolutional and max-pooling layer outputs matrices while fully-connected

layer only accepts vectors. Adding a ﬂattening layer reduces the last convolutional

layer’s output dimensionality to the shape (-1, 1), or it directly transforms the matrix

into a vector. In addition, this operation is a computationally cheap way of learning

non-linear combinations of higher-level feature representation from the convolutional

or max-pooling layer’s output.

The ﬁnal layer, the output layer, is a dense or fully-connected layer with the same

number of neurons as classes to be classiﬁed. The activation function for the ﬁ-

nal layer is strongly dependent on the loss function. For example, a single neuron

sigmoid activated fully-connected layer as the output, compiled with binary cross-

entropy as the loss function, would yield an equivalent result as two softmax acti-

vated neurons in a network using categorical cross-entropy as the loss function; in

other words, a binary classiﬁcation.

2.3. EVOLUTION OF CNN ARCHITECTURES 8

2.3 Evolution of CNN Architectures

LeNet-5

LenNet-5’s architecture of stacking convolutional layers, activation functions, and

pooling layers, thereby concluding with fully-connected layer(s), has become the

common starting point when designing a CNN. With its two convolutional and three

dense layers, LeNet-5 is one of the most straightforward CNN architectures. Trained

initially on 60,000 patterns and later trained on an additional 540,000 artiﬁcially gen-

erated patterns by randomly distorting the original dataset to support the authors’

hypothesis that there was a strong correlation between train and test errors and

training set size. Unfortunately, while being a breakthrough when introduced, the ar-

chitecture is relatively shallow. Therefore it does not generalize well or perform well

with color images (Lecun et al. 1998).

AlexNet

AlexNet was the ﬁrst to use ReLU as the activation function and the recently de-

veloped dropout method to reduce overﬁtting in fully-connected layers. Built upon

Lenet-5, AlexNet comprises ﬁve convolutional layers and three fully-connected lay-

ers, with 60 million parameters and 650,000 neurons (Krizhevsky, Sutskever, and

Hinton 2012). With its increased size and new activation function, the architecture

performs well with color imaged. However, compared to its successors, it struggles

to learn the dataset’s features due to its small depth. In addition, Tandel et al. (2020)

proposed using transfer learning on the AlexNet model for multiclass MRI brain tumor

classiﬁcation with positive results.

VGG-16

The motivation for developing the VGG family of architectures was to improve AlexNets

performance, which was done by signiﬁcantly increasing the depth of the network.

As a result, several different conﬁgurations of the architecture were developed, for

example, the VGG-16. While the architecture signiﬁcantly increases accuracy and

speed, it also suffers from the vanishing gradient problem. VGG-16 is comprised

of ReLu activated thirteen convolutional layers, two ReLu activated fully-connected

layers, and ﬁnally, one softmax activated fully-connected layer, and it has 138 million

parameters (Simonyan and Zisserman 2015). In addition, Belaid and Loudini (2020)

proposed combining several CNNs based on pre-trained VGG-16s for MRI classiﬁ-

cation of three tumor types; meningioma, glioma, and pituitary tumor. In addition,

2.3. EVOLUTION OF CNN ARCHITECTURES 9

being the best performing of the three pre-trained networks Sevli (2021) used for

performance comparison on brain tumor classiﬁcation.

Inception-v1

Inception-v1 uses small neural networks within the main neural network called in-

ception modules (Lin, Chen, and Yan 2014). The inception modules use parallel

towers of convolutions and ﬁlters, thereby combining the output of the small net-

works instead of the linear approach seen in the previously mentioned architecture.

In addition, auxiliary networks are added to the main network to increase discrim-

ination and provide additional regularisation. Finally, the output of these auxiliary

networks is discarded. Inception-v1 has twenty-two layers in total. Nevertheless, the

number of parameters is signiﬁcantly reduced from the previously mentioned archi-

tecture; now, the number of parameters is down to 5 million. However, the Inception

architecture can be biased towards certain classes in an unbalanced dataset, as

well as being prone to overﬁtting on smaller datasets (Szegedy, Liu, et al. 2015). In

addition, while Irmak (2021) proposes a custom CNN model for brain tumor clas-

siﬁcation, the Inception-v1 model is used as one of the ﬁve used for comparisons;

however, achieving the poorest accuracy results.

Inception-v3

Built upon Inception-v1, Inception-v3 uses factorization of the convolutions. In ad-

dition, it adds batch normalization to the auxiliary layers in the auxiliary network,

thereby stabilizing and signiﬁcantly reducing the number of epochs required to train

the network. However, with its 48 layers, Inception-v1 signiﬁcantly increases the

number of parameters from its predecessor, now having 24 million. A large amount

of parameters does, however, make the network more prone to overﬁtting and adds

computational cost (Szegedy, Vanhoucke, et al. 2016). In addition, being the least

performing of the three pre-trained networks Sevli (2021) used for performance com-

parison on brain tumor classiﬁcation.

ResNet-50

ResNet-50 was one of the ﬁrst to implement batch normalization, and it addresses

the problem of saturated and rapidly degrading accuracy. At the same time, the archi-

tecture is a quite deep neural network that allows gradients to ﬂow from layer to layer

by using bridges or shortcuts called skip connections, thereby solving the problem of

vanishing gradients. However, many layers need to be added to improve accuracy,

2.3. EVOLUTION OF CNN ARCHITECTURES 10

thereby increasing the computational cost. With 50 layers comprising 48 convolu-

tional layers, a Max-pooling layer, and an Average pooling layer, ResNet-50 has 26

million parameters. (He et al. 2016). In addition, being the second-best performing

of the three pre-trained networks Sevli (2021) used for performance comparison on

brain tumor classiﬁcation.

Xception

By entirely replacing the inception modules of Inception-v3 with depthwise sepa-

rable convolutions, Xception deals with separable spatial dimensions of the image

and kernel and depth dimensions or the number of channels of each processed im-

age. While the Xception architecture offers good memory usage and computational

speeds, it comes at the cost of accuracy performance. Xception is 71 layers deep

and has 23 million parameters (Chollet 2017).

Inception-v4

Inception-v4 modiﬁed the initial layers before the ﬁrst inception module, also called

the stem. Additionally, adding more inception modules and using the same ﬁlters

for every inception module increases model size. However, the researchers found

that with a ﬁlter number exceeding 1000, residual variants became unstable, and

the network suddenly “died” early in training. Inception-v4 is 22 layers deep with 43

million parameters (Szegedy, Ioffe, et al. 2017).

Inception-ResNet-V2

Inception-ResNet-V2 was introduced in the same paper as Inception-v4. Inception-

ResNet-V2 adds more previously seen inception modules as well as some modiﬁed

inception modules. It also adds residual inception blocks, such that the output of

a layer is added to another layer deeper in the network. Residual inception blocks

allow information to ﬂow from one layer to another without any gates in their skip

connection. However, increasing the size of the network also increases the needed

computational resources and the number of parameters, making it inclined to over-

ﬁtting on small datasets. Inception-ResNet-V2 is 164 layers deep and has 56 million

parameters (Szegedy, Ioffe, et al. 2017).

2.3. EVOLUTION OF CNN ARCHITECTURES 11

ResNeXt-50

ResNeXt-50 builds upon ResNet and the Inception family by adding parallel towers

within the modules as a new dimension called cardinality. Adding cardinality is a

more efﬁcient way of increasing accuracy than expanding deeper or wider since the

last two start to give diminishing results when expanded deeply. However, adapting

ResNeXt-50 to a new dataset type is a signiﬁcant task due to its many hyperpa-

rameters and computations. ResNeXt-50 is 50 layers deep/wide, and the number of

parameters is not given (Xie et al. 2017).

MobileNet

MobileNet is TensorFlow’s ﬁrst architecture designed for mobile applications. The

architecture is a simple but efﬁcient general-purpose CNN Architecture often used in

object detection and ﬁne-grained image classiﬁcation. The MobileNet Architecture

uses depth-wise separable convolutions, a combination of depth-wise and point-wise

convolutions. Depth-wise, convolutions apply a single ﬁlter for each input channel

as opposed to standard convolutions that apply the ﬁlters to all the input channels.

The depth-wise convolutions do not combine the ﬁlters to produce a new feature.

Therefore, an additional layer called point-wise convolution is added. The point-wise

convolutional layer computes a linear combination of all the depth-wise convolutions

output to produce a new feature. The MobileNet architecture is designed to be as

efﬁcient as possible while still easy to train. (Howard et al. 2017)

DenseNet-121

DenseNet is a CNN architecture where all layers with matching feature-map sizes

are directly connected. The feed-forward nature is preserved by receiving additional

inputs from all preceding layers and passing them on to succeeding layers. The

DenseNet architecture solves the problem of vanishing gradients, increasing feature

propagation and feature reuse, and considerably reducing the number of parameters.

On the other hand, memory usage increases as the input from previous layers are

concatenated. DenseNet-121 is 121 layers deep with 8 million parameters. (Huang

et al. 2019).

2.4. BRAIN TUMOR CLASSIFICATION 12

2.4 Brain Tumor Classiﬁcation

Kang, Ullah, and Gwak (2021) proposed a fully automatic hybrid solution for brain

tumor classiﬁcation comprised of several steps. First, pre-process the brain MRI im-

ages by cropping, resizing, and augmenting. Second, use pre-trained CNN models

for feature extraction with better generalization. Third, select the top three perform-

ing features using ﬁned-tuned ML classiﬁers and concatenate these features. Finally,

use the concatenated feature as input for the ML classiﬁers to predict the ﬁnal output

for the brain tumor MRI.

The researchers selected three different publicly available brain tumor MRI datasets

for experimentation. The researchers established a naming convention of three

parts, the type, the size, and the number of classes, i. e., a medium brain tumor

dataset with three classes is named “BT-medium-3c”. The ﬁrst dataset, BT-small-2c,

comprises 253 images, 155 images classiﬁed as containing tumors, and 98 images

classiﬁed as without tumors. The second dataset, BT-large-2c, comprises 3000 im-

ages, 1500 images containing tumors, and 1500 images without tumors. The third

and ﬁnal dataset, BT-large-4c, comprises 3064 images containing four classes, not

tumorous, glioma tumor, meningioma tumor, and pituitary tumor. All the datasets

follow the standard convention of subdividing into 80% for training and 20% for test-

ing.

Most of the images in the datasets contain undesired spaces and areas. However,

cropping the image only to contain the relevant area for analysis can lead to better

classiﬁcation performance. In addition, if a dataset is imbalanced or small, augmen-

tation may boost the learning capabilities. Augmentation creates multiple copies of

the images, modiﬁed in different ways, like mirroring, rotating, or adjusting the im-

age brightness. In addition to dataset augmentation, the images are resized to ﬁt

the pre-trained CNN’s expected dimensions; 224x224px, except Inception V3, which

expects 299x299px.

The proposed scheme uses a novel feature evaluation and selection mechanism,

an ensemble of 13 pre-trained CNNs, to extract robust and discriminative features

from the brain MRI images without human supervision. The CNN ensemble, is com-

prised of ResNet-50, ResNet-101, DenseNet-121, DenseNet-169, VGG-16, VGG-

19, AlexNet, Inception V3, ResNext-50, ResNext-101, ShufﬂeNet, MobileNet, and

MnasNet. Since the researchers use fairly small datasets for training, they take a

transfer learning-based approach by using the ﬁxed weights on the bottleneck layers

of each CNN model pre-trained on the ImageNet dataset.

2.4. BRAIN TUMOR CLASSIFICATION 13

Using the features extracted from the CNN models, a synthetic feature is formed by

evaluating each feature from the CNN ensemble with an ensemble of nine different

ML classiﬁers and concatenating the top three features from the different CNNs.

Since different CNN architectures capture different aspects of the processed data,

the synthetic feature represents a more discriminative feature than features extracted

from a single CNN.

The ML classiﬁer ensemble, implemented using the scikit-learn library, is comprised

of a fully-connected (FC) neural network (NN) layer, Gaussian Na

ıve Bayes (Gaus-

sian NB), Adaptive Boosting (AdaBoost), K-Nearest Neighbors (k-NN), Random for-

est (RF), Extreme Learning Machine (ELM), Support Vector Machines (SVM) with

three different kernels: linear, sigmoid, and radial basis function (RBF).

The ﬁrst classiﬁer uses the conventional CNN approach. A softmax activated FC

layer with a cross-entropy loss function; the most commonly used loss function for

neural networks. This ﬁrst classiﬁer with an initial learning rate of 0.001 uses Adap-

tive Moment Estimation (Adam) optimization of the layer weights and adaptively re-

calculates the learning rate. Finally, collecting the highest average accuracy per run

for a total of 100 epochs.

The researchers also use the Gaussian variant of Na

ıve Bayes that follows the

Gaussian (normal) distribution with no co-variance between the attributes in the

classes.

The next classiﬁer Adaptive Boosting, or AdaBoost for short, is an ensemble learning

algorithm that combines multiple weaker classiﬁers (Decision trees with a single split,

called stumps.) to improve performance. AdaBoost works iteratively and assigns

higher weights to the mislabeled instances.

The following classiﬁer is one of the simplest classiﬁers, the k-Nearest Neighbors

(kNN). kNN does not train a model but calculates predictions directly from the data

currently stored in memory. Using Euclidean distance as the distance metric, the

kNN classiﬁer ﬁnds the k nearest neighbors of the training instances closest to the

given feature. It then assigns the most common class label among the given neighbor

based on the most common label of its neighbors, the majority vote. Setting the

nearest neighbors from 1 to 4, the one with the highest accuracy was selected.

Random Forest (RF) is a learning algorithm that creates multiple decision trees using

the bootstrap aggregation (bagging) method to classify features into a class—using

the Gini index as a cost function while creating the decision trees. RF selects ran-

dom n attributes or features to ﬁnd the optimal split point, reducing the correlation

2.4. BRAIN TUMOR CLASSIFICATION 14

among the trees and having lower ensemble error rates. RF predicts by feeding

features into all the classiﬁcation trees, counting the number of predictions for each

class, and choosing the class with the most signiﬁcant number of votes as the cor-

rect class for the given feature. To ﬁnd the optimal split, the researchers set the

feature consideration number to the square root of the total number of features and

the number of decision trees from 1 to 150, thereby selecting the one with the highest

accuracy.

Extreme Learning Machine (ELM) is a learning algorithm for Single-Layer Feed-

Forward Neural Networks (SLFN), which provides good performance at a fast learn-

ing speed. ELM is not an iterative algorithm, like the back-propagation algorithm

used in traditional SLFNs. Instead, ELM uses a gradient-based technique, only tun-

ing the weights once. The researchers used 5000, 6000, 7000, 8000, 9000, 10,000

hidden layers and selected the one with the highest accuracy.

The Support Vector Machine (SVM) uses the kernel function to transform the original

data space, the number of features, into a higher-dimensional space. Then aims to

ﬁnd a hyperplane in that spacial dimension that distinctly classiﬁes the given feature.

The researchers use the three most common kernel functions, linear, sigmoid, and

radial basis function (RBF). In addition, the SVM has two hyper-parameters. First,

C, the soft margin cost function that controls each support vector’s inﬂuence; set to

0.1, 1, 10, 100, 1000, 10000. Secondly, Gamma, which decides the curvature of

the decision boundaries; set to 0.00001, 0.0001, 0.001, 0.01. The hyper-parameter

combination that yielded the highest accuracy is then selected.

Experimentation on the given datasets has two main tasks. First, compare the sev-

eral pre-trained CNN networks with several ML classiﬁers. Second, show the effec-

tiveness of the concatenation of the top 2 or 3 features with the best results from the

ﬁrst experiment.

For example, the top three features on the BT-small-2c dataset are the DenseNet-

169, Inception V3, and ResNeXt-50 features. Then on the BT-large-2c dataset, the

DenseNet-121, ResNeXt-101, and MnasNet features are the top three. While on the

BT-large-4c dataset, the DenseNet-169, MnasNet, and ShufﬂeNet V2 features are

the top three.

Observations from the second experiment show that SVM with RBF kernel can ﬁnd

a more effective and complex set of decision boundaries, outperforming the other

ML classiﬁers on the two most extensive datasets. However, this is not the case for

the smallest dataset since SVM with RBF tends to underperform when the number of

2.5. USER INTERFACE 15

training data samples is smaller than the feature number for each data point. Further-

more, it is almost impossible that features extracted from the pre-trained CNNs are

entirely independent. Therefore, since Gaussian NB assumes that the features are

independent, it performs worst among the ML classiﬁers on all three datasets. On

the other hand, features extracted from the DenseNet architectures predict well on all

three datasets since they have all complexity levels, giving them smoother decision

boundaries, which tend to predict especially well on insufﬁcient training data. While,

features from the VGG, with its more basic architecture and no residual blocks, yield

the worst results. The effectiveness of the concatenated top 2 or 3 features is evident

for all ML classiﬁers on the two largest datasets. However, on the small dataset, it is

only shown when using AdaBoost and k-NN.

The FC, RF, and Gaussian NB classiﬁers have the shortest inference time. In com-

parison, k-NN has the longest since it is the only ML classiﬁer that needs to evaluate

every data point during prediction. While the results from these experiments show

promise on large datasets, further research is needed, especially on model reduction

for real-time medical systems deployment.

2.5 User Interface

The initial technical aspect of the project will be implemented using the Python pro-

gramming language. Therefore, the user interface will be developed using Python-

based technologies. Such technologies evolve rapidly; therefore, the best informa-

tion source is often the ofﬁcial documentation of the given technology. Such user

interfaces are usually in a desktop application but can also be implemented as a

web application. A web browser is the best option since there is no need for the user

to install any software or worry about platform compatibility.

The ﬁrst component needed to be developed for a user interface is an API that will

be used to communicate with either a terminal or a web browser. Python has many

applicable frameworks for this purpose, such as Django, Flask, or FastAPI (2022).

Django is the most popular framework, but it can be a bit large and complicated to

use. Flask is a good option for this purpose. However, it does not have the same

amount of modern features as FastAPI.

Being built upon other frameworks, such as Starlette (2021), Pydantic (2022), Ope-

nAPI (2021), and JSON JSONSchema (2020), FastAPI fully supports asynchronous

programming, type validation during runtime, and autogeneration of interactive doc-

2.5. USER INTERFACE 16

umentation. FastAPI is used to build RESTful APIs, the most common API standard

for web applications.

Since FastAPI is built to work with both Gunicorn and Uvicorn (2022), it operates at

high speeds, and it is also easy to deploy the API to the webserver. Furthermore,

FastAPI supports Jinja2 (2022) templating out of the box, making it much easier for

developers that primarily use python to build a web application since knowledge of

JavaScript is no longer needed for the most basic functionality.

Data, Design and Implementation

3.1 Data Collection

One could assume that with the whole internet as the arena, the likelihood of ﬁnding

a relevant dataset for something as familiar as brain tumor MRIs is high. However,

when the need for a dataset with speciﬁc criteria is high, the availability seems to de-

crease exponentially. Furthermore, the most readily available datasets were different

variations of the same source, usually lacking quality. Therefore, the data-gathering

part of the research project starts with outlining some criteria that the data needs

to meet to ensure quality in the later steps of the project. The data must contain

three types of samples, non-tumorous, LGG, and HGG. It is also essential that the

amount of samples between the types is well balanced to make the classiﬁcation

model generalize well. The balance is essential since too much data augmentation

on medical images can disrupt the features of interest, thereby resulting in incorrect

classiﬁcation.

With optimism, the search began on familiar places like Kaggle, Google Dataset

Search, GitHub, and paperswithcode. While some of the found sources looked

promising at ﬁrst, none of them quite ﬁt the bill. They were either too small, had

3.2. DATA PREPOSSESSING 18

some but not all of the needed labels, or were very unbalanced between the labels.

However, the most signiﬁcant problem is that most of them did not have the non-

tumorous label in the samples. This label is essential since the objective is to have

a classiﬁcation system that can distinguish between non-tumorous, LGG, and HGG

MRIs. A combination of different datasets was considered for a period, but the idea

of not having a battle-tested dataset as the foundation did not sit well. With a good

foundation, each class can be supplemented with smaller batches of high-quality

data over time. So the search continued.

Finally, a well-suited candidate for this project was found at The Cancer Imaging

Archive (TCIA). The REMBRANDT (REpository for Molecular BRAin Neoplasia DaTa)

Dataset (Scarpace et al. 2019) seemed to have all the essential characteristics re-

quired by the project. In addition, the REMBRANDT dataset is one of the most

trusted publicly available datasets. The dataset is comprised of MRI scans from 130

subjects of three classes, non-tumorous, LGG, and HGG. Furthermore, the LGG

and HGG classes have subclasses that open the opportunity to make the classiﬁ-

cation outcome more extensive in the future. For example, the LGG class includes

tumors of type Astrocytoma II and Oligodendroglioma II. On the other hand, the HGG

class includes tumors of type Astrocytoma III, Oligodendroglioma III, and Glioblas-

toma Multiforme IV. The dataset is a combination of metadata from various text and

spreadsheet ﬁles and the 110,020 images in the DICOM format, which also includes

a vast quantity of metadata.

3.2 Data Prepossessing

Due to the dataset size and the state of its metadata, preprocessing was a rather

large task. The preprocessing of the data uses the Visual Studio Code variant of the

Jupyter Notebook format “.ipynb”, which enables Markdown text cells and executable

Python code cells in the same ﬁle. Various helper functions are deﬁned to aid in the

exploration of the data. The “Sample”, “Disease”, and “Grade” columns from the

metadata ﬁles are loaded into a pandas data frame; only these columns have value

for this particular task. The sample name in the metadata is compared to the paths

of the dataset. Differences like extra decimal points are manually removed; it is

important to ensure that future loading functions ﬁnd the correct ﬁle.

The ﬁrst step in preprocessing cleans the most common mistakes in any textual data.

Some of the data points have leading whitespace; these are removed. Datapoints

with missing values use differing naming conventions like “–” or “none”. In order

3.2. DATA PREPOSSESSING 19

to make them easier to work with, all these values are changed to “NaN”. Next, all

data points are converted to lowercase since both uppercase and lowercase are

used. Any values that use “-” to separate words are replaced with “ ”. Finally, a new

column label is added to the data frame, which will be populated later.

The next step is performed after studying the metadata ﬁles for more speciﬁc infor-

mation. For example, the grade column for disease matching “gbm” is empty, and

from searching brain tumor grading conventions online, it is clear that the appropriate

grade is IV, which is inserted for those data points. Some data points in the disease

column have the value “mixed”. Since it is unclear which diseases are associated

with the data point, they are removed. Data points with missing values for disease

and grade are data points where the disease is unknown. Usually, these data points

would be removed. However, by studying the metadata, it is clear that in this case, it

means that no disease is associated with the data point. Therefore, these data points

are given the value “none” instead. In some data points, the disease is known, and

the grade is not known. Take “oligodendroglioma”, for example; it can be grade II or

III. In these cases, the datapoint is unusable and removed. Only diseases of type

“gbm” with missing grades can be correctly labeled since they are always graded as

IV.

After verifying that there are no missing values remaining in the data frame, the

correct labels are assigned for all the data points. For example, diseases of type

“oligodendroglioma” and “astrocytoma” with grade II are labeled as “lgg”. On the

other hand, the same diseases with grade III are labeled “hgg”. Finally, all diseases

of type “gbm” with grade IV are labeled as “hgg”.

In order to make the ﬁles easier to use in the model training, an algorithm that ﬁnds all

ﬁles in all subfolders is implemented. This algorithm stores the ﬁle path for every ﬁle

in a “.csv” ﬁle. The content of the “.csv” ﬁle is then merged with the corresponding

samples of the data frame. Files discovered by the algorithm that is not linked to

any sample are removed since the information needed to label them correctly is not

present in the metadata.

After completing the preprocessing, the dataset is comprised of 123 patients and

105,265 slides, distributed as shown in Table 3.1.

At this point, it was discovered that one key bit of information was missing; how to

separate the MRI slides that contained the tumorous cells from the ones that did not

contain them. For every scan that is labeled tumorous, the tumor is only visible in 20-

3.2. DATA PREPOSSESSING 20

30% of the slides. If the dataset had been used in this state, the model would learn to

classify most healthy tissue as tumorous. In other words, it would be useless.

Table 3.1: Dataset Sample Distribution

Disease Grade Label Unique Samples Sample Count

Astrocytoma II LGG 30 25286

Astrocytoma III HGG 17 16038

GBM IV HGG 43 32837

Non-Tumorous n/a n/a 15 17041

Oligodendroglioma II LGG 11 9335

Oligodendroglioma III HGG 7 4728

Total 123 105265

All of the source data were extensively reexamined, but the needed information was

never found. Therefore, alternative methods were explored to use the now prepro-

cessed dataset. For example, each patient has several MRI scans, and every scan

comprises a group of slides. By creating an animation of each group, the place-

ment of the tumors could be observed. The idea was to ﬁnd some kind of pattern

for each label in order to ﬁlter out these particular slides and use only them during

model training and testing. Unfortunately, no usable patterns were discovered; also,

intuition waved a big red ﬂag. Therefore the idea was discarded.

While searching online for how to ﬁnd the needed key in the metadata or the DI-

COM ﬁles, one paper stood out (Khawaldeh et al. 2018). The paper used that same

dataset for a similar type of application. However, the number of slides used in this

paper was reduced from the original 105,265 to 4069 slides. By reading the dataset

section of the paper and evaluating the included tables and values, it became clear

that the authors of this paper had found a way to ﬁlter the data further.

Not ﬁnding this critical bit of information was becoming a signiﬁcant problem for the

project and took a large portion of the allocated time. In order to learn how to repro-

duce the results of the found paper, a meeting was requested by reaching out to the

paper’s primary author, Dr. Khawaldeh.

While being grateful for Dr. Khawaldeh’s collaboration, some devastating news arose.

The needed key was not part of the publically available dataset. After Dr. Khawaldeh’s

research team had cleaned the data to the same point as this project, they had em-

ployed help from neurologists to go through each slide manually and label them

correctly. Fortunately, Dr. Khawaldeh offered to share a dataset of labeled sam-

ples divided into Normal, LGG, and HGG. After some weeks, the dataset was re-

ceived—unfortunately, a much smaller sample than the 4069 slides described in

3.3. DESIGN 21

their paper. However, the received 736 correctly labeled slides were still a much

better candidate than anything publically available.

The turn of events has made the work done on data preprocessing obsolete. How-

ever, it includes all the steps needed for any supplementary datasets based on the

DICOM format in the future.

After some further processing, like removing duplicates, the dataset has 735 samples

distributed, as shown in Table 3.2, and is now ready for training CNN models.

Table 3.2: Manually Labeled Dataset Sample Distribution

Label Sample Count

Normal 168

LGG 287

HGG 280

Total 735

3.3 Design

3.3.1 Model Pipeline

Finding the CNN architecture with or without transfer learning that yields the best

results for the classiﬁcation task is the primary goal of the research project. Therefore

designing a pipeline where different CNN architectures can easily be substituted is

essential.

Since the model training requires a massive amount of computational resources,

the actual model training will be performed on the Google Colab platform, which

uses the same notebook format as this project. The Google Colab platform allows

users to pay a monthly fee to take advantage of their powerful GPUs. However, the

implementation of the pipeline and further analysis of the results will be done on a

local machine since working in the cloud environment is a bit cumbersome and slow.

Therefore, the pipeline also needs to be able to automatically ﬁgure out if it is being

run on a local machine or in the Google Colab cloud platform. The reason is that

when running in Google Colab, the data needs to be uploaded to a google drive

and mounted in the notebook. By implementing this auto-detection, the user may

use notebook features like “Run all cells” in both environments. Also, several python

packages are not part of the Google Colab environment and need to be installed

every time the notebook is initialized.

3.3. DESIGN 22

Since the pipeline is also used to ﬁnd out if data augmentation is beneﬁcial to the

results, the pipeline needs to be able to turn this feature on and off easily during

testing. Furthermore, the capability to turn augmentation on and off should be as

smooth as possible. Therefore, the data augmentation should be done in advance,

and a switch mechanism should only swap the input source when triggered.

The pipeline shall be used to ﬁnd the optimal values for splitting the data into train-

ing, validation, and test sets. Also, the pipeline should make it easy for the user to

experiment with different optimizers like SGD, Adam, and Nadam. The loss func-

tion to use and which metrics to use for evaluation during training should also be

easily adjustable. The pipeline shall also be used to ﬁnd the optimal values for the

hyperparameters of the CNN architecture.

The pipeline should be able to turn on and off callback functions that are helpful in the

early stages of model training exploration. These callback functions include “Early

stopping,” which stops the training before it has reached the previously determined

number of epochs if there have been no improvements to, for example, the validation

loss for a set number of epochs. The callback function “Model checkpoint”, which

only saves the model after each epoch if, for example, the validation loss has im-

proved, thereby keeping the weights for the model with the best results even though

training commenced. The callback function “Learning rate plateau” which detects

and stops the training if the learning rate has not improved enough for a set number

of epochs. The callback function “Learning rate schedule” which has a schedule for

which learning rates that should be used for different series of epochs. For example,

for epochs 1 to 5, the learning rate could be set to 0.001, then for epochs 6 to 10, it

could be changed to 0.0001, and so on.

As mentioned earlier, the pipeline will be partially run in Google Colab, and then the

results will be analyzed on a local machine. Therefore, the pipeline needs an easy

way to save the model weights and history to ﬁles. The hdf5 or h5 ﬁle format is an

appropriate choice for the model weights themselves. The pickle ﬁle format is the

most appropriate choice for the training history. It should be just as easy to load the

model weights and history from the ﬁles as it is to save them. The same applies

when the user wants to load the weights from a model checkpoint. However, due

to the training process and model checkpoint function in TensorFlow/Keras, it is not

possible to load the model history only at the point of the checkpoint.

In the ﬁnal stage of the pipeline, the model’s accuracy and loss should be evaluated.

The evaluation is done by using the test set, which contains never before seen image

3.3. DESIGN 23

samples. The accuracy and loss should be calculated for each of the correct and

incorrect predictions made by the model on the test set.

3.3.2 Architecture Selection

From the research done in Chapter 2 Literature Review, it is decided to go forward

with a total of ﬁve of the researched CNN architectures, and to design a custom archi-

tecture to use in comparison. The predeﬁned architectures are AlexNet (Krizhevsky,

Sutskever, and Hinton 2012), VGG16 (Simonyan and Zisserman 2015), MobileNet

(Howard et al. 2017), ResNext (Xie et al. 2017), and DenseNet121 (Huang et al.

2019).

The custom CNN architecture is designed to accept N amount of 224x224 image ma-

trices with 3 color channels; it consists of three convolutional layers of 32, 64, and 64

ﬁlters, each of ﬁlter size 3x3, with zero-padding, and ReLU as the activation function.

The ﬁrst convolutional layer is followed by a max-pooling layer of pool size 4x4, and

the last two convolutional layers have a max-pooling size of 2x2. Furthermore, every

max-pooling layer is followed by a dropout layer with a dropout rate of 0.15. After the

convolutional blocks, a ﬂattening layer is added to transform the matrix output into

vector inputs in order to be accepted by the ﬁrst dense layer. The ﬁrst dense layer

is comprised of 512 ReLU activated neurons, followed by a dropout layer with a 0.5

dropout rate. The last hidden layer is a dense layer of 256 ReLU activated neurons.

The model’s ﬁnal layer, its output, is a 3 neuron softmax activated dense layer for the

classiﬁcation of the three labels in the dataset. The ﬂowchart of this model is shown

in Fig. 3.1, and the general architecture is provided in in Fig. 3.2.

In hindsight, it can be observed that this architecture resembles the AlexNet archi-

tecture, even though it was implemented before researching the AlexNet architec-

ture.

3.3. DESIGN 24

Figure 3.1: Custom CNN Flowchart with The AlexNet Architecture as Comparison.

3.3. DESIGN 25

Figure 3.2: Custom CNN Architecture.

3.3.3 User Interface

In order to make use of the models in a meaningful way, some form of a user interface

is required. Only using a Jupyter notebook for this task will exclude a lot of potential

users since there is a lot of knowledge and skill needed to get the notebook up and

running, as well as operating it. Therefore, a graphical user interface is to be imple-

mented. The user interface should be accessible on any operating system as long

as there is an internet connection, making a website the ideal choice. The website

should let the users learn about the project without leaving the website to understand

the classiﬁcation systems’ capabilities, strengths, and weaknesses. Then, the user

should be able to upload an MRI image of their own for classiﬁcation. Next, select

one or more classiﬁcation models to use. Finally, get a prediction of the probability if

there is no tumor present, or if there is a tumor present, and if so, if the tumor is an

LGG or HGG tumor.

3.4. IMPLEMENTATION 26

3.4 Implementation

3.4.1 Model Pipeline

The model training pipeline is implemented using the Visual Studio Code variant

of the Jupyter Notebook format “.ipynb”. Notebooks are particularly useful during

the development of a machine learning pipeline since they allow for the execution

of only parts of the pipeline at a time and let the user evaluate that particular step

of the process before moving forward. The model training pipeline is implemented

to function both locally and in Google Colab without altering the code. After all the

needed packages are installed and loaded, all relevant constants are deﬁned, like

paths to the datasets, where to save the model weights, or the class label names to

use for the classiﬁcation.

The next step is to load the data for the model training, validation, and testing. The

data is loaded using the TensorFlow/Keras ImageDataGenerator, which is a gen-

erator that can generate batches of augmented image data. However, since this

project wishes to be able to switch data augmentation on and off easily, augmenta-

tion has already been done in an additional preparation step. The additional prepa-

ration step is performed in a notebook with the same preliminary steps as the model

training pipeline. First, the preparation notebook reads a ﬁle called “dataset.zip” in

the dataset’s path and decompresses the content. If a “.DS Store” ﬁle has been in-

cluded in any directory or subdirectory during compression, it is removed, so it is

not included as a sample. Then, the needed paths to place the different subsets are

created if they do not already exist.

Due to the relatively small dataset, only 20 ﬁles from each class are randomly se-

lected for model evaluation. The remaining ﬁles are then divided at random into 70%

training and 30% validation sets. The decompressed dataset is then deleted. The

different subsets are then listed in appropriately named variables, and their count is

displayed to the user for inspection.

Now the user can use a function that loads an image into an ImageDataGenera-

tor object and displays how it affects the image. It is useful to explore the different

transformations that can be applied to the image before deciding on the best trans-

formation to use going forward.

The transformations chosen for this dataset include 15°, image rotation, .5 width and

height shifts, a shear range of 10, a zoom range of 0.1, horizontal ﬂipping, and a

random brightness range of .5 to 1.5.

3.4. IMPLEMENTATION 27

The subset containing images without tumors is smaller than those containing LGGs,

and HGGs. Therefore, the non-tumorous samples are augmented to three times their

original size, while the other two are augmented to twice their original size. Only the

images in the training set are augmented since augmenting the others will not better

the performance of the model but could decrease its ability to generalize.

With the augmented data ready, the method “.ﬂow from directory” is used to load

the data into the ImageDataGenerator. The method takes the path to the directory

containing the images, the target size of the images, the batch size, and classes to

label the images with and shufﬂes the samples. In order to reproduce this outcome

later, a seed is set to ensure the same randomization is used. Finally, the method

returns a generator that can be used to iterate over the images.

One such generator is created for each of the sets needed; training, validation, and

validation. Finally, this step of the process is ﬁnished by asserting that the expected

number of images are loaded, and a sample batch is displayed to the user.

The next step in the pipeline is to deﬁne the particular CNN architecture to use.

Some of the predeﬁned architectures are readily available in the TensorFlow/Keras

library and can easily be imported, while others are imported from third-party li-

braries. Some of the architecture is implemented using the Keras sequential API,

while others are implemented using the Keras functional API. For example, the func-

tional API is used for the ResNext CNN architecture, while the sequential API is used

for the AlexNet architecture.

When instantiating the architecture, it is decided whether the model should be using

pre-trained weights or not. All pre-trained weights in this project are from the Im-

ageNet dataset. If the model is using pre-trained weights, a number of layers are

removed from the model, then redeﬁned in order to make them trainable. The ar-

chitecture determines the amount and type of layers that are removed. Since all

models with pre-trained weights from the ImageNet dataset are trained on 1000 out-

put classes, the last layer is removed and added again to the model and has to reﬂect

the number of classes to be predicted for this project.

After the model is deﬁned, it is compiled using the Adam optimizer, and the cate-

gorical cross-entropy loss function, with accuracy as its metric. Now the number of

epochs to train the model is deﬁned, along with the selection of callback functions

to use during training. Callback functions are usually only used while exploring the

settings of the various hyperparameters and are not used in the ﬁnal model train-

ing.

3.4. IMPLEMENTATION 28

The next step is a switch that lets the user decide if a model should be trained,

loaded, or checkpoint loaded. If the user decides to train a new model, the “.ﬁt”

method is used with appropriate parameters. The method takes the training data

generator, validation data generator, the number of epochs to train the model, and

the number of steps per epoch. The method returns a history object, which contains

the training and validation loss and accuracy for each epoch. However, if the user

decides to load a model or checkpoint, the weights are loaded to the previously

deﬁned model, and the history object is created from the checkpoint ﬁle. The latter

is not done for the checkpoint loading.

With the model trained or loaded, the model predicts on the test set using a function

that allows for predicting on generator data. The function takes the model and the test

data generator. It returns a NumPy array of predictions, an array of the probabilities

of the predictions, an array of the true labels for each prediction, and a list of valid

class label names. A function is made to give a visual representation to the user by

marking each correctly classiﬁed image green and incorrectly classiﬁed image red

in a grid for each class. The user can then further inspect the images that were

classiﬁed incorrectly by using another function that gives the user feedback on the

prediction details on a speciﬁc image.

Finally, the model is evaluated by using the Keras “.evaluate” method. The method

takes the test data generator and returns the loss and accuracy of the model on

the test set. The evaluation can be further analyzed by deﬁning a confusion matrix

from the pycm library, which offers a lot of metrics and visualizations to analyze the

performance of the model. A few available metrics in a pycm confusion matrix include

accuracy, precision, recall, f1 score, support, imbalance, and many more.

In order not to bloat the notebook with code, many functions are implemented in sep-

arate modules. The modules are imported into the main notebook, and the functions

are called callable from the notebook. In addition, these modules contain helper

functions for many different tasks like data augmentation, parameter counting, pre-

diction comparisons, ﬁle manipulation, different visualizers, a code injector that alters

the behavior of the pycm confusion matrix, and many more.

3.4. IMPLEMENTATION 29

3.4.2 User Interface

The user interfaces’ backend is a RESTful API implemented using FastAPI. FastAPI

is modern and one of the fastest API frameworks in python since it is based on the

lightweight ASGI framework Starlette. Being based on Starlette also makes FastAPI

fully support asynchronous programming. FastAPI uses Pydantic for data handling

and type validation in runtime. FastAPI automatically generates interactive docu-

mentation for the API in the OpenAPI format using the Swagger UI and Redoc,

which reduces the need for third-party testing tools like Postman or the VS Code

extension Thunder Client. FastAPI can run with the help of webservers like Guni-

corn or Uvicorn. Every operation performed in the graphical user interface can also

be performed using the API. Either via a browser or using cURL commands in a

terminal.

For the frontend part of the website, the Jinja2 templating engine is used. Jinja2

is a template engine that is used to render HTML templates that uses a special

syntax to enable it to contain python-like code blocks. The templates are stored in

the templates folder, which is mounted in the FastAPI main application ﬁle. When

the FastAPI gets a request for a page, it looks for the corresponding template in the

templates folder and returns it if found, and if not, it returns a 404 error.

In addition to returning a template, functions can perform actions based on the re-

quest, such as uploading images, deleting images, returning predictions, and more.

Functions that are not endpoints are not present in the main application ﬁle but im-

ported. For example, the functions that are used for predicting the image are stored

in the model module. An exception is the “startup” function, which deﬁnes a tem-

porary folder for the images to be uploaded to, and the shutdown function, which

deletes the temporary folder.

In order to make the uploading process operate as desired, some javascript is needed,

but due to a lack of experience with javascript, its use has been kept to a mini-

mum.

The layout of the webpages is done in HTML used by the Jinja2 templating engine.

The styling is primarily done by bootstrapping CSS and javascript. However, some

elements required custom styling, which was included in the HTML directly.

3.4. IMPLEMENTATION 30

3.4.3 Deployment

The initial plan for deployment was to use Heroku since it is known for its ease of

use. However, since the ﬁles that contained the model weights were very large,

some over 500MB, it was not possible due to Heroku’s ﬁle size limitations. The Deta

platform was also tested and was even easier to use than Heroku. However, it also

faced the same problem. The next platform to try is Azure App Services. Azure App

Services is a little less straightforward but should work according to the information.

It did not, however. Apparently, the Azure App Services platform has a limitation of

not using more than 10% of the available memory. Finally, the most complicated

platform tested was the Azure Virtual Machine. Even though the API was able to

run, it was not possible to make it available to the outside in due time, and it had to

be aborted.

With all attempts failing, the plan was to make a repository for the application and

only the needed ﬁles. In addition, the repository should contain an auto-install script,

so a user could clone the repository and run the application locally.

By coincidence, a service called ngrok was found when searching for unrelated top-

ics. Ngrok is a free service that allows the exposure of localhost to the internet in a

secure manner. With ngrok installed, the only steps needed were to run the FastAPI

in uvicorn on the local machine and then run ngrok in a separate terminal to expose

the local host to the internet. In order to make it a little more elegant, a domain

name was purchased, and the DNS was set to give access to ngrok. A monthly sub-

scription on ngrok was also purchased in order to connect the autogenerated ngrok

domain name to the custom domain name.

Results

4.1 Classiﬁcation Models

Developing the pipeline required ﬁnding the generally most suitable division of the

Training, Validation, and Test subsets and the most suitable baseline hyperparameter

settings to be used as a starting point for the pipeline. It also required discovering if

image augmentation had enough impact to be included in the pipeline. For this task,

VGG16 was selected, as it had been used for the same application with promising

results (Belaid and Loudini 2020) (Sevli 2021).

Over a hundred training tests were performed using different conﬁgurations of aug-

mentation, the number of images extracted to the Test set, the split ratio of the re-

maining into Training and Validation, learning rates, and the number of epochs. The

result of all these tests is too extensive to show here. However, Table 4.1 shows a

concatenated table of all the best results. Values that are the same for all tests are

not shown in the table. For example, all tests with the best results use augmentation

and a learning rate of .001, 20 images extracted from each class before splitting the

remaining into 70% for training and 30% for validation. Also, all the tests in this table

achieved an accuracy of 95%.

4.1. CLASSIFICATION MODELS 32

Table 4.1: Concatenation of The Best Training Exploration Results

Batch Epochs Loss Accuracy Val. Loss Val. Accuracy Test Loss

16 10 .0885 .9816 .3119 .9082 .3365

24 10 .1134 .9755 .2741 .9219 .3178

48 10 .1718 .9559 .4096 .8698 .3186

48 12 .1446 .9659 .3243 .8958 .3277

48 14 .1207 .9749 .2863 .9115 .3254

64 10 .2176 .9580 .3929 .8438 .3417

After experimentation was concluded, a baseline for training other CNN architectures

was established. In the baseline, a total of 60 images, 20 from the three classes

picked at random for the Test set. Then, the remaining images are divided into

30% for the Validation set and 70% for the Training set. In addition, the Training set

is augmented with rotation, width and height shift, shearing, zoom and brightness

adjustment, and horizontal ﬂipping. Experimenting with augmentation showed that

many ﬁlters with minimal adjustments to the original gave better results than fewer

with more extensive adjustments. The Normal label has a little over half of the num-

ber of samples as the LGG and HGG and is therefore producing a higher number

of augmented images. The Normal label in the Train set is increased from 103 to

309, LGG from 186 to 372, and HGG from 182 to 364. With a batch size of 24,

the baseline uses 10 epochs for training with a learning rate of 0.001 and Adam for

optimization with categorical cross-entropy as the loss function.

All the CNN architectures implemented have outstanding results, as shown in Ta-

ble 4.2. Multiple runs were made on each architecture to obtain the given results.

The results indicate that DenseNet121 is the best candidate for the given classiﬁca-

tion task, closely followed by AlexNet and the custom architecture.

Table 4.2: CNN Architecture Results

Architecture Epochs Initial learning rate Loss Accuracy & F1

VGG16 10 0.001 0.3218 0.933

ResNext 12 0.001 0.1244 0.933

MobileNet 28 0.001 0.1141 0.950

Custom 22 0.000316 0.0989 1.00

AlexNet 28 0.001 0.0582 0.983

DenseNet121 12 0.001 0.0254 1.00

VGG16, ResNext, MobilNet, and DenseNet121 use transfer learning, that is, the pre-

trained weights from the ImagNet dataset, and only the last few layers are trained on

this project’s dataset. On the other hand, AlexNet and the custom model are not

using transfer learning. The test results did not indicate that transfer learning was

4.1. CLASSIFICATION MODELS 33

helpful in any way for classifying brain tumors on MRI images. For veriﬁcation, the

DenseNet121 model was trained a few times without the use of pre-trained weights,

also not showing any signiﬁcant changes to the results. The weight from the Ima-

geNet dataset is trained using manually labeled images and covers anything from the

Norwegian ﬂag to dinosaurs. Most likely, these weights are too general to contribute

to a dataset containing such speciﬁc images as MRI brain scans.

It was believed that transfer learning would compensate for the small size of the

dataset. Unfortunately, this does not seem to be the case, and there are indications

of overﬁtting, especially in some of the architecture. By examining the MobileNet

model Fig. 4.1 for example, it can be observed that this model rapidly reaches a

perfect training accuracy. However, the validation accuracy is lagging behind, never

closing the gap. These results indicate that the model is overﬁtting on the data.

Increasing the volume of samples, preferably from different sources, would most

likely improve these results signiﬁcantly.

On the other hand, for the AlexNet model Fig. 4.2, the prediction accuracy does not

increase as rapidly as the MobileNet model for each epoch. Also, the gap between

testing accuracy and validation accuracy for every epoch decreases until it is non-

existing. The AlexNet model never reaches a perfect accuracy, but this is not due to

overﬁtting, merely that no more features are being learned from the data.

In the early stages of this research project, it was believed that the DenseNet121

architecture was the most promising. However, after analyzing the results, the per-

ception has shifted to the AlexNet, closely followed by the custom architecture, the

two least complicated and computationally costly models of the lot. A likely reason

for this may be the simple and consistent nature of the data the models are trained

on. However, a much larger dataset for testing is needed to verify this theory.

4.1. CLASSIFICATION MODELS 34

Figure 4.1: Training History of The MobileNet Architecture.

Figure 4.2: Training History of The AlexNet Architecture.

4.2. USER INTERFACE 35

4.2 User Interface

The implemented user interface operates as intended. First, the API is started

on the local machine by running the main python ﬁle in the terminal, as shown in

Fig. 4.3.

Figure 4.3: Running The API using Uvicorn

4.2. USER INTERFACE 36

The next step exposes the localhost to the internet using the ngrok platform, as

shown in Fig. 4.4. A custom domain’s DNS is set up to redirect the trafﬁc to the

ngrok domain; tumorclass.info.

Figure 4.4: Expose Local Server To The Internet

4.2. USER INTERFACE 37

By accessing tumorclass.info, then adding “/docs” to the end of the URL, the inter-

active documentation can be accessed as shown in Fig. 4.5. The interactive docu-

mentation allows for testing the endpoints in the same manner as done in API testing

applications like Postman or Thunder Client.

Figure 4.5: Interactive Documentation

4.2. USER INTERFACE 38

The interactive documentation also autogenerates cURL commands that can be run

directly in the terminal, as shown in Fig. 4.6.

Figure 4.6: Running cURL Commands From Terminal

4.2. USER INTERFACE 39

By entering the URL tumorclass.info in a web browser, the user is directed to the

home page as shown in Fig. 4.7. Next, the user clicks the “Choose a ﬁle” button on

the home page, which lets the user pick an image for classiﬁcation.

Figure 4.7: Home Page of the User Interface

4.2. USER INTERFACE 40

By default, all the classiﬁcation models are selected. Then, by clicking the “Predict”

button, the results are returned to the user. For example, in Fig. 4.8 a prediction for

a non-tumorous MRI image is shown.

Figure 4.8: Normally Classiﬁed MRI Image

4.2. USER INTERFACE 41

The user may also select only a select few classiﬁcations to use, as shown in Fig. 4.9

where an LGG tumor is detected.

Figure 4.9: LGG Classiﬁed MRI Image

The images are stored locally on the server, but they are deleted every time the

server shuts down. The user may also manually delete the uploaded image by click-

ing the “Delete” button.

Conclusion

5.1 Introduction

The research aimed to develop a CADD system for brain tumor classiﬁcation that is

readily available to the general public, however, with a particular focus on radiolo-

gists. The following sections in this chapter summarize the research and implemen-

tation outcome and discuss the possibility for future work.

5.2 Summary of Research

This project has researched several types of brain tumors, focusing on the LGG and

HGG categories. The research has been conducted with regard to available MRI im-

ages of brain tumors and similar research projects performed by other researchers.

The Convolutional Neural Network has been studied in-depth to understand how it

best could be used to classify brain tumors. Several different predeﬁned architec-

tures have been investigated and selected for this project. To utilize the research

outcome in a meaningful way, methods of designing and implementing a user inter-

face have been investigated.

5.3. RESEARCH OBJECTIVES 43

5.3 Research Objectives

All the research objectives deﬁned in section 1.2 have been met to a satisfactory

degree, given the scope of this project.

• After researching brain tumors and their classiﬁcation methods, enough knowl-

edge for the CADD system has been gained, as can be observed in section 2.1

and section 2.4.

• The convolutional neural network and how they can be implemented for the

classiﬁcation of brain tumors has been studied in section 2.2 and section 2.3

and satisfactory results have been obtained as seen in section 4.1.

• Transfer learning has been studied and why the effect is minimal on this project

is understood and discussed in section 4.1.

• Research performed by other researchers has been done and is reﬂected in the

section 2.4. In addition, the section describes many variations of technologies

and architectures that have been used to classify brain tumors.

• In section 2.5 ways of designing and implementing a user interface have been

investigated, and suitable technologies have been selected for this project.

• In section 4.2 the results of the user interface design and implementation can

be observed working as intended.

5.4 Research Contribution

The research conducted in this project has been a good start for further developing

a usable CADD system. A pipeline for data preprocessing that makes preparing

new data samples both easy and fast has been developed. Similarly, a pipeline

for augmenting the training data and a pipeline for deﬁning CNNs, training them,

and providing a classiﬁcation model as the output has been developed successfully.

Also, to demonstrate the potential of the research in this project, a user interface

has been developed and deployed to a web server which the general public can

accessed at tumorclass.info. All the source code used in this project is provided in

appendix B.

5.5. FUTURE WORK 44

5.5 Future Work

The main focus on further developing this project should be expanding the data set.

The amount of suitable data available is limited. However, the REMBRANDT dataset

has a vast number of images suitable for manual labeling performed by radiologists.

Expanding the data set will make it more transparent which architectures that gen-

eralize well. As such, these architectures can be the focus of further development.

These days, MRI scans provide 3D images, so further development to facilitate 3D

classiﬁcation is also a viable option. By implementing a 3D classiﬁcation system, the

user interface can be expanded to include 3D images and animations. Tools such as

these could be helpful for medical imaging researchers, medical students, and radi-

ologists. Many new tools for explainable AI have been developed, such as SHAP and

tf-explain. These tools were tested without luck in this project but had to be aborted

due to time constraints. With more time, these tools could be used to explain how

the classiﬁcation model reaches its predictions.

Bibliography

Belaid, Ouiza Nait and Malik Loudini (2020). “Classiﬁcation of Brain Tumor by

Combination of Pre-Trained VGG16 CNN”. In: Journal of Information Technology

Management 12.2, pp. 13, 25. ISSN: 2008-5893. DOI:

10.22059/jitm.2020.75788. Retrieved from https:

//jitm.ut.ac.ir/article_75788_e36c948ee9258c82b9398f136692f3f5.pdf.

Chollet, Franc¸ois (Nov. 2017). “Xception: Deep Learning with Depthwise Separable

Convolutions”. In: 2017 IEEE Conference on Computer Vision and Pattern

Recognition (CVPR). IEEE, pp. 1800, 1807. DOI: 10.1109/CVPR.2017.195.

Retrieved from https://arxiv.org/pdf/1610.02357.pdf.

FastAPI (May 2022). FastAPI. Version 0.75.1. URL:

https://fastapi.tiangolo.com/.

Forst, Deborah A. et al. (Apr. 2014). “Low-grade gliomas”. eng. In: The oncologist

19.4, pp. 403, 413. ISSN: 1549-490X. DOI: 10.1634/theoncologist.2013-0345.

Fukushima, Kunihiko (Sept. 1975). “Cognitron: A self-organizing multilayered neural

network”. In: Biological Cybernetics 20.3, pp. 121–136. ISSN: 1432-0770. DOI:

10.1007/BF00342633.

– (Apr. 1980). “Neocognitron: A self-organizing neural network model for a

mechanism of pattern recognition unaffected by shift in position”. In: Biological

Cybernetics 36.4, pp. 193, 202. ISSN: 1432-0770. DOI: 10.1007/BF00344251.

Retrieved from

https://www.rctn.org/bruno/public/papers/Fukushima1980.pdf.

He, Kaiming et al. (June 2016). “Deep Residual Learning for Image Recognition”.

In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

IEEE, pp. 770, 778. DOI: 10.1109/CVPR.2016.90. Retrieved from

https://arxiv.org/pdf/1512.03385.pdf.

BIBLIOGRAPHY 46

Howard, Andrew et al. (Apr. 2017). “MobileNets: Efﬁcient Convolutional Neural

Networks for Mobile Vision Applications”. In: DOI:

https://doi.org/10.48550/arXiv.1704.04861.

Hu, Leland S. et al. (May 2020). “Imaging of intratumoral heterogeneity in

high-grade glioma”. eng. In: Cancer letters 477, pp. 97, 106. ISSN: 1872-7980.

DOI: 10.1016/j.canlet.2020.02.025. Retrieved from

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7108976/.

Huang, Gao et al. (2019). “Convolutional Networks with Dense Connectivity”. In:

IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1, 1. ISSN:

1939-3539. DOI: 10.1109/TPAMI.2019.2918284. Retrieved from

https://arxiv.org/pdf/1608.06993.pdf.

Irmak, Emrah (Sept. 2021). “Multi-Classiﬁcation of Brain Tumor MRI Images Using

Deep Convolutional Neural Network with Fully Optimized Framework”. In: Iranian

Journal of Science and Technology, Transactions of Electrical Engineering 45.3,

pp. 1015, 1036. ISSN: 2364-1827. DOI: 10.1007/s40998-021-00426-9.

Jinja2 (Mar. 2022). Jinja2. Version 3.1.1. URL:

https://jinja.palletsprojects.com/.

JSONSchema (May 2020). JSON Schema. Version 2020-12. URL:

https://json-schema.org/specification.html.

Kang, Jaeyong, Zahid Ullah, and Jeonghwan Gwak (Mar. 2021). “MRI-Based Brain

Tumor Classiﬁcation Using Ensemble of Deep Features and Machine Learning

Classiﬁers”. In: Sensors 21.6. ISSN: 1424-8220. DOI: 10.3390/s21062222.

Khawaldeh, Saed et al. (2018). “Noninvasive Grading of Glioma Tumor Using

Magnetic Resonance Imaging with Convolutional Neural Networks”. In: Applied

Sciences 8.1. ISSN: 2076-3417. DOI: 10.3390/app8010027.

Korja, Miikka et al. (Oct. 2018). “Glioblastoma survival is improving despite

increasing incidence rates: a nationwide study between 2000 and 2013 in

Finland”. In: Neuro-Oncology 21.3, pp. 370, 379. ISSN: 1522-8517. DOI:

10.1093/neuonc/noy164.

Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton (Dec. 2012). “ImageNet

Classiﬁcation with Deep Convolutional Neural Networks”. In: Advances in Neural

Information Processing Systems. Ed. by F. Pereira et al. Vol. 25. NIPS’12. Lake

Tahoe, Nevada: Curran Associates, Inc., pp. 1097, 1105. URL:

BIBLIOGRAPHY 47

https://proceedings.neurips.cc/paper/2012/file/

c399862d3b9d6b76c8436e924a68c45b-Paper.pdf.

Lecun, Y. et al. (Nov. 1998). “Gradient-Based Learning Applied to Document

Recognition”. In: Proceedings of the IEEE 86.11, pp. 2278, 2324. ISSN:

1558-2256. DOI: 10.1109/5.726791. Retrieved from

http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf.

Lie Uberg, Lewi (May 2022). Tumorclass.info. Version 0.1.0. URL:

https://github.com/lewiuberg/tumorclass.info.

Lin, Min, Qiang Chen, and Shuicheng Yan (2014). “Network In Network”. In: CoRR

abs/1312.4400. eprint: 1312.4400. URL:

https://www.semanticscholar.org/paper/Network-In-Network-Lin-

Chen/5e83ab70d0cbc003471e87ec306d27d9c80ecb16.

Louis, David N. et al. (Aug. 2007). “The 2007 WHO classiﬁcation of tumours of the

central nervous system”. eng. In: Acta neuropathologica 114.2, pp. 97, 109. ISSN:

0001-6322. DOI: 10.1007/s00401-007-0243-4.

Luo, Qian et al. (Aug. 2018). “Comparisons of the accuracy of radiation diagnostic

modalities in brain tumor: A nonrandomized, nonexperimental, cross-sectional

trial”. eng. In: Medicine 97.31. ISSN: 1536-5964. DOI:

10.1097/MD.0000000000011256. Retrieved from

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6081153/.

OpenAPI (Feb. 2021). OpenAPI. Version 3.1.0. URL:

https://spec.openapis.org/oas/latest.html.

Pydantic (May 2022). Pydantic. Version 1.9.0. URL:

https://pydantic-docs.helpmanual.io.

Scarpace, Lisa et al. (2019). “Data From REMBRANDT [Data set]”. In: DOI:

10.7937/K9/TCIA.2015.588OZUZB.

Sevli, Onur (June 2021). “Performance Comparison of Different Pre-Trained Deep

Learning Models in Classifying Brain MRI Images”. In: Acta Infologica 5, p. 2021.

URL:

http://iupress.istanbul.edu.tr/en/journal/acin/article/performance-

comparison-of-different-pre-trained-deep-learning-models-in-

classifying-brain-mri-images. Retrieved from https://cdn.istanbul.edu.

tr/file/JTA6CLJ8T5/99DD9C496BF14E44859851B33E49A006.

BIBLIOGRAPHY 48

Simonyan, Karen and Andrew Zisserman (Sept. 2015). “Very Deep Convolutional

Networks for Large-Scale Image Recognition”. In: International Conference on

Learning Representations. URL: https://arxiv.org/pdf/1409.1556.pdf.

Starlette (Nov. 2021). Starlette. Version 0.17.1. URL: https://www.starlette.io.

Szegedy, Christian, Sergey Ioffe, et al. (Feb. 2017). “Inception-v4, Inception-ResNet

and the Impact of Residual Connections on Learning”. In: Proceedings of the

Thirty-First AAAI Conference on Artiﬁcial Intelligence. AAAI’17. San Francisco,

California, USA: AAAI Press, pp. 4278, 4284. DOI: 10.5555/3298023.3298188.

URL: https://dl.acm.org/doi/10.5555/3298023.3298188. Retrieved from

https://arxiv.org/pdf/1602.07261.pdf.

Szegedy, Christian, Wei Liu, et al. (June 2015). “Going deeper with convolutions”.

In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

IEEE, pp. 1, 9. URL: https://ieeexplore.ieee.org/document/7298594.

Retrieved from https://arxiv.org/pdf/1409.4842.pdf.

Szegedy, Christian, Vincent Vanhoucke, et al. (Dec. 2016). “Rethinking the

Inception Architecture for Computer Vision”. In: 2016 IEEE Conference on

Computer Vision and Pattern Recognition (CVPR). IEEE, pp. 2818, 2826. DOI:

10.1109/CVPR.2016.308. Retrieved from

https://arxiv.org/pdf/1512.00567.pdf.

Tandel, Gopal S. et al. (2020). “Multiclass magnetic resonance imaging brain tumor

classiﬁcation using artiﬁcial intelligence paradigm”. In: Computers in Biology and

Medicine 122, p. 103804. ISSN: 0010-4825. DOI:

10.1016/j.compbiomed.2020.103804. Retrieved from

https://www.sciencedirect.com/science/article/pii/S0010482520301724.

Uvicorn (Mar. 2022). Uvicorn. Version 0.17.6. URL: https://www.uvicorn.org.

Wu, Chen-Xing et al. (Aug. 2015). “Peritumoral edema on magnetic resonance

imaging predicts a poor clinical outcome in malignant glioma”. In: Oncology

Letters 10.5, pp. 2769, 2776. DOI: 10.3892/ol.2015.3639.

Xie, Saining et al. (July 2017). “Aggregated Residual Transformations for Deep

Neural Networks”. In: 2017 IEEE Conference on Computer Vision and Pattern

Recognition (CVPR). IEEE, pp. 5987, 5995. DOI: 10.1109/CVPR.2017.634.

Retrieved from https://arxiv.org/pdf/1611.05431.pdf.

Short Paper

Analysis of Brain Tumor Using MRI Images

Lewi L. Uberg

Applied Data Science

Noroff University College

Kristiansand, Norway

Abstract—The increasing rates of deadly brain tumors in

humans correspondingly increase the need for highly educated

medical personnel for diagnosis and treatment. Therefore, to

reduce the workload and the time from suspicion of disease to

diagnosis and suitable treatment, there is a need to automate the

initial part of the process by implementing a Computer-Aided-

Disease-Diagnosis (CADD) system for brain tumor classiﬁcation.

By studying the types of tumors involved, how the convolutional

neural network functions, some of its pre-trained models, and

their application in brain tumor classiﬁcation, the likelihood

of producing a promising CADD system increases heavily. The

research shows that the DenseNet121 architecture, either fully

trained or using transfer learning, likely is the most appropriate

candidate for the CADD system in development.

Index Terms—Brain Tumor Classiﬁcation, Medical Imaging,

Magnetic Resonance Imaging, Convolutional Neural Networks,

Machine Learning, Deep learning.

I. INTRODUCTION

The research undertaken is to acquire knowledge about all

the needed components to develop and implement a working

Computer-Aided-Disease-Diagnosis (CADD) system for brain

tumor classiﬁcation. First, by researching the brain tumors

themselves and the convention of their classiﬁcation. Second,

the convolutional neural network (CNN); to understand its

structure and functionality and how to manipulate it. Third,

ﬁnd different CNN architectures with promising results in

similar tasks. Fourth, ﬁnd and evaluate the applicability of

available data sources. Fifth, implement the most promising

solutions and explore the applicability of using transfer learn-

ing for the given task to take advantage of previously gained

knowledge. Finally, evaluate the results of the experiments.

II. RELATED WORK

A. Brain Tumors

The cell of origin and features found when examining the

cells tissue; the histopathological characteristics, deﬁne central

nervous system tumors and predict their behavior [1]. For ex-

ample, cerebral gliomas are neuroepithelial tumors originating

from the supporting glial cells of the central nervous system

[2].

After meningiomas, a usually benign tumor originating from

the meningeal tissue of the brain, the most common primary

brain tumor in adults overall are gliomas, with a rate of 5 to

6 persons per 100,000 annually [3].

The World Health Organization (WHO) tissue classiﬁcation

system categorizes gliomas with grade 1 as the lowest and

grade 4 as the highest. Thus, low-grade gliomas (LGG) consist

of grade I and II tumors [2], while high-grade gliomas (HGG)

consist of grade III and IV [3]. Grade I are the least malignant

or benign tumors, including Pilocytic Astrocymatoma, Cranio-

pharyngioma, Grangliocytoma, and Ganglioglioma. Grade II is

relatively slow-growing but may recur as higher grade, includ-

ing Astrocytoma, Pineocytoma, and Pure Oligodendroglioma.

Grade III are malignant and tend to recur as higher grade,

including Anaplastic Astrocytoma, Anaplastic Ependymoma,

and Anaplastic Oligodendroglioma. Finally, grade IV is the

most malignant, aggressive, necrosis and recurrence prone,

including tumor types Glioblastoma Multiforme (GBM), Pi-

neoblastoma, Medulloblastoma, and Ependymoblastoma [1,

107].

Often occurring in younger people, otherwise healthy pa-

tients, LGGs are a diverse group of primary brain tumors. Gen-

erally, they have a relatively good prognosis, and prolonged

survival rate [2]. However, over 75% of gliomas are HGG,

GBM being the most common and aggressive, accounting for

56.1% of all gliomas. In addition, HGGs, particularly GBM,

can exhibit a distinct tumor cell population that confounds

clinical diagnosis and management. As a result, GBM has

a grim prognosis, with a median survival of 15 months,

despite the best available treatments. Having a relatively high

occurrence frequency and being difﬁcult to diagnose and treat

has made HGGs, and GBM, in particular, the subject of

tremendous interest in neuro-oncologic research [3].

Histopathologic examination to study tissue morphology

diagnose and grade brain tumors is the gold standard [2].

However, surgical resection for diagnosing a brain tumor is an

invasive and risky method. Nevertheless, there are several non-

invasive diagnostic methods, like neuroimaging. Neuroimaging

techniques widely used by the medical community include

Computed Tomography (CT), Magnetic Resonance Imaging

(MRI), Positron Emission Tomography (PET), and Plain-Film

Radiography (PFR) [4].

Conventional MRI is the current imaging procedure of

choice and identiﬁes tumor size and associated Peritumoral

Edema (PTE), one of the main features of malignant glioma

[5], with or without contrast enhancement. Nevertheless, char-

acteristic MRI ﬁndings cannot determine the tumor grade

alone [2]. Moreover, MRIs of HGG, GMB in particular, also

lack the capability to resolve intratumoral heterogeneity that

may be present. Nevertheless, more advanced imaging proce-

dures like PET offer a range of physiologic and biophysical

image features that can improve the accuracy of imaging

diagnoses [3].

Studies comparing imaging features with tissue benchmarks

have employed either qualitative descriptions or have com-

pared individual quantitative metrics in a univariate fashion,

which has produced solid correlations for certain clinical

scenarios. However, for other scenarios, the correlation of

imaging and histopathological characteristics may not be self-

evident by visual inspection or sufﬁciently represented by

simple statistical features. Thus, in pursuit of brain tumor diag-

nosis without surgical intervention, researchers have developed

more advanced imaging methods such as texture analysis,

mechanic modeling, and machine learning (ML) that form a

predictive multi-parametric image-based model. The applica-

tion of ML models is an emerging ﬁeld in radiogenomics and

represents a data-driven approach to identifying meaningful

patterns and correlations from often complex data sources. ML

models train by feeding the ML algorithm a substantial amount

of pre-classiﬁed image data as input, such as MRI images,

to learn which patterns belong to the different classes. The

resulting ML model uses these previously learned patterns to

predict the appropriate class for the new instance. These ML

models have enabled researchers to predict tumor cell density,

and together with texture analysis, recognize regional and

genetically distinct subpopulations coexisting within a single

GBM tumor [3].

B. Convolutional Neural Network

The convolutional neural network (CNN) is a concept

introduced by [6] as the Neocognitron, as a model of the

brain’s visual cortex; an improvement of [7] previous model

for visual pattern recognition. Furthermore, [8] signiﬁcantly

improved the Neocognitron to one of the most successful

pattern recognition models, which has signiﬁcantly impacted

the ﬁeld of computer vision.

The most common use case for CNNs is pattern detection

in images. One or more hidden convolution layers uses ﬁlters

to convolve or scan over an input matrix, such as a binary

image. These ﬁlters closely resemble neurons in a dense layer

of an artiﬁcial neural network. Here a ﬁlter is learned to detect

a speciﬁc pattern such as an edge or circle; adding more

ﬁlters to a convolutional layer will enable more features to

be learned. The ﬁlter size is the size of the matrix convolving

over the image matrix, and the stride is the number of pixel

shifts over the input matrix. The convolutional layer performs

matrix multiplication on the image and ﬁlter matrix for each

stride taken. The resulting output is called a feature map. Two

options are available when the ﬁlter does not ﬁt the image.

First, the part of the image matrix that does not ﬁt the ﬁlter gets

dropped; this is called valid padding. Alternatively, zeros are

added to the image matrix’s edges, enabling the ﬁlter matrix

to ﬁt the image matrix entirely; which is called zero-padding.

In Fukushima’s original paper [6], tanh is used as the acti-

vation function. However, after Rectiﬁed Linear Unit (ReLU)

was introduced as the activation function by Krizhevsky in

2012 with AlexNet [9], it has become the most common

activation function for a convolutional layer. ReLu is an almost

linear function with a low computational cost. ReLu converges

fast by transforming the input to the maximum of zero or the

input value, meaning that the positive linear slope does not

saturate or plateau when the input becomes large. Also, ReLU

does not have a vanishing gradient problem like sigmoid or

than. Hidden dense, alternatively called fully-connected layers,

also tend to use ReLU as their activation function.

Spatial pooling, a subsampling or downsampling, can be

applied to reduce the number of tunable parameters, which is

the dimensionality of a feature map. The most commonly used

type of spatial pooling is Max-pooling. The Max-pooling layer

operates much like a convolutional layer by using ﬁlters and

stride. However, the Max-pooling layer takes the maximum

value in its ﬁlter matrix as the output value. So, for example,

a Max-pooling layer with an input matrix of 8x8 and a ﬁlter

size of 2x2 would have an output of 4x4 containing the largest

value for each region. Thus, the Max-poolings downsampling

will decrease the computational cost for the following layers

in the network. While also concluding the feature extraction

part of the CNN and initiating the feature learning part.

Both convolutional and max-pooling layer outputs matrices

while fully-connected layer only accepts vectors. Adding a

ﬂattening layer reduces the last convolutional layer’s output

dimensionality to the shape (-1, 1), or it directly transforms

the matrix into a vector. In addition, this operation is a com-

putationally cheap way of learning non-linear combinations of

higher-level feature representation from the convolutional or

max-pooling layer’s output.

The ﬁnal layer, the output layer, is a dense or fully-

connected layer with the same number of neurons as classes

to be classiﬁed. The activation function for the ﬁnal layer is

strongly dependent on the loss function. For example, a single

neuron sigmoid activated fully-connected layer as the output,

compiled with binary cross-entropy as the loss function, would

yield an equivalent result as two softmax activated neurons in

a network using categorical cross-entropy as the loss function;

in other words, a binary classiﬁcation.

C. Brain Tumor Classiﬁcation

In 2021 a fully automatic hybrid solution for brain tumor

classiﬁcation comprised of several steps was proposed [10].

First, pre-process the brain MRI images by cropping, resizing,

and augmenting. Second, use pre-trained CNN models for

feature extraction with better generalization. Third, select the

top three performing features using ﬁned-tuned ML classiﬁers

and concatenate these features. Finally, use the concatenated

feature as input for the ML classiﬁers to predict the ﬁnal output

for the brain tumor MRI.

The researchers selected three different publicly available

brain tumor MRI datasets for experimentation. The researchers

established a naming convention of three parts, the type, the

size, and the number of classes, i. e., a medium brain tumor

dataset with three classes is named ”BT-medium-3c”. The

ﬁrst dataset, BT-small-2c, comprises 253 images, 155 images

classiﬁed as containing tumors, and 98 images classiﬁed as

without tumors. The second dataset, BT-large-2c, comprises

3000 images, 1500 images containing tumors, and 1500 im-

ages without tumors. The third and ﬁnal dataset, BT-large-4c,

comprises 3064 images containing four classes, not tumorous,

glioma tumor, meningioma tumor, and pituitary tumor. All the

datasets follow the standard convention of subdividing into

80% for training and 20% for testing.

Most of the images in the datasets contain undesired spaces

and areas. However, cropping the image only to contain the

relevant area for analysis can lead to better classiﬁcation

performance. In addition, if a dataset is imbalanced or small,

augmentation may boost the learning capabilities. Augmen-

tation creates multiple copies of the images, modiﬁed in

different ways, like mirroring, rotating, or adjusting the image

brightness. In addition to dataset augmentation, the images

are resized to ﬁt the pre-trained CNN’s expected dimensions;

224x224px, except Inception V3, which expects 299x299px.

The proposed scheme uses a novel feature evaluation and

selection mechanism, an ensemble of 13 pre-trained CNNs,

to extract robust and discriminative features from the brain

MRI images without human supervision. The CNN ensem-

ble, is comprised of ResNet-50, ResNet-101, DenseNet-121,

DenseNet-169, VGG-16, VGG-19, AlexNet, Inception V3,

ResNext-50, ResNext-101, ShufﬂeNet, MobileNet, and Mnas-

Net. Since the researchers use fairly small datasets for training,

they take a transfer learning-based approach by using the ﬁxed

weights on the bottleneck layers of each CNN model pre-

trained on the ImageNet dataset.

Using the features extracted from the CNN models, a

synthetic feature is formed by evaluating each feature from

the CNN ensemble with an ensemble of nine different ML

classiﬁers and concatenating the top three features from the

different CNNs. Since different CNN architectures capture

different aspects of the processed data, the synthetic feature

represents a more discriminative feature than features extracted

from a single CNN.

The ML classiﬁer ensemble, implemented using the scikit-

learn library, is comprised of a fully-connected (FC) neural

network (NN) layer, Gaussian Na

ıve Bayes (Gaussian NB),

Adaptive Boosting (AdaBoost), K-Nearest Neighbors (k-NN),

Random forest (RF), Extreme Learning Machine (ELM),

Support Vector Machines (SVM) with three different kernels:

linear, sigmoid, and radial basis function (RBF).

The ﬁrst classiﬁer uses the conventional CNN approach. A

softmax activated FC layer with a cross-entropy loss function;

the most commonly used loss function for neural networks.

This ﬁrst classiﬁer with an initial learning rate of 0.001 uses

Adaptive Moment Estimation (Adam) optimization of the layer

weights and adaptively recalculates the learning rate. Finally,

collecting the highest average accuracy per run for a total of

100 epochs.

The researchers also use the Gaussian variant of Na

ıve

Bayes that follows the Gaussian (normal) distribution with no

co-variance between the attributes in the classes.

The next classiﬁer Adaptive Boosting, or AdaBoost for

short, is an ensemble learning algorithm that combines multi-

ple weaker classiﬁers (Decision trees with a single split, called

stumps.) to improve performance. AdaBoost works iteratively

and assigns higher weights to the mislabeled instances.

The following classiﬁer is one of the simplest classiﬁers, the

k-Nearest Neighbors (kNN). kNN does not train a model but

calculates predictions directly from the data currently stored

in memory. Using Euclidean distance as the distance metric,

the kNN classiﬁer ﬁnds the k nearest neighbors of the training

instances closest to the given feature. It then assigns the most

common class label among the given neighbor based on the

most common label of its neighbors, the majority vote. Setting

the nearest neighbors from 1 to 4, the one with the highest

accuracy was selected.

Random Forest (RF) is a learning algorithm that creates

multiple decision trees using the bootstrap aggregation (bag-

ging) method to classify features into a class—using the

Gini index as a cost function while creating the decision

trees. RF selects random n attributes or features to ﬁnd the

optimal split point, reducing the correlation among the trees

and having lower ensemble error rates. RF predicts by feeding

features into all the classiﬁcation trees, counting the number

of predictions for each class, and choosing the class with the

most signiﬁcant number of votes as the correct class for the

given feature. To ﬁnd the optimal split, the researchers set the

feature consideration number to the square root of the total

number of features and the number of decision trees from 1

to 150, thereby selecting the one with the highest accuracy.

Extreme Learning Machine (ELM) is a learning algorithm

for Single-Layer Feed-Forward Neural Networks (SLFN),

which provides good performance at a fast learning speed.

ELM is not an iterative algorithm, like the back-propagation

algorithm used in traditional SLFNs. Instead, ELM uses a

gradient-based technique, only tuning the weights once. The

researchers used 5000, 6000, 7000, 8000, 9000, 10,000 hidden

layers and selected the one with the highest accuracy.

The Support Vector Machine (SVM) uses the kernel func-

tion to transform the original data space, the number of

features, into a higher-dimensional space. Then aims to ﬁnd a

hyperplane in that spacial dimension that distinctly classiﬁes

the given feature. The researchers use the three most common

kernel functions, linear, sigmoid, and radial basis function

(RBF). In addition, the SVM has two hyper-parameters. First,

C, the soft margin cost function that controls each support

vector’s inﬂuence; set to 0.1, 1, 10, 100, 1000, 10000. Sec-

ondly, Gamma, which decides the curvature of the decision

boundaries; set to 0.00001, 0.0001, 0.001, 0.01. The hyper-

parameter combination that yielded the highest accuracy is

then selected.

Experimentation on the given datasets has two main tasks.

First, compare the several pre-trained CNN networks with

several ML classiﬁers. Second, show the effectiveness of the

concatenation of the top 2 or 3 features with the best results

from the ﬁrst experiment.

For example, the top three features on the BT-small-2c

dataset are the DenseNet-169, Inception V3, and ResNeXt-50

features. Then on the BT-large-2c dataset, the DenseNet-121,

ResNeXt-101, and MnasNet features are the top three. While

on the BT-large-4c dataset, the DenseNet-169, MnasNet, and

ShufﬂeNet V2 features are the top three.

Observations from the second experiment show that SVM

with RBF kernel can ﬁnd a more effective and complex set of

decision boundaries, outperforming the other ML classiﬁers on

the two most extensive datasets. However, this is not the case

for the smallest dataset since SVM with RBF tends to under-

perform when the number of training data samples is smaller

than the feature number for each data point. Furthermore, it is

almost impossible that features extracted from the pre-trained

CNNs are entirely independent. Therefore, since Gaussian

NB assumes that the features are independent, it performs

worst among the ML classiﬁers on all three datasets. On the

other hand, features extracted from the DenseNet architectures

predict well on all three datasets since they have all complexity

levels, giving them smoother decision boundaries, which tend

to predict especially well on insufﬁcient training data. While,

features from the VGG, with its more basic architecture and

no residual blocks, yield the worst results. The effectiveness

of the concatenated top 2 or 3 features is evident for all ML

classiﬁers on the two largest datasets. However, on the small

dataset, it is only shown when using AdaBoost and k-NN.

The FC, RF, and Gaussian NB classiﬁers have the shortest

inference time. In comparison, k-NN has the longest since it is

the only ML classiﬁer that needs to evaluate every data point

during prediction. While the results from these experiments

show promise on large datasets, further research is needed,

especially on model reduction for real-time medical systems

deployment.

III. DATA GATHERING & ANALYSIS

One could assume that with the whole internet as the arena,

the likelihood of ﬁnding an applicable dataset for something

as common as brain tumor MRIs is high. However, when

the need for a speciﬁc dataset is high, the availability seems

to decrease. The data-gathering part of the research project

starts with outlining some criteria. The data should contain

non-tumorous, LGG, and HGG samples and be well balanced

between the labels. The latter is essential since too much data

augmentation on medical images often does not generalize

well. With optimism, the search began on familiar places like

Kaggle, Google Dataset Search, GitHub, and paperswithcode.

While some of the found sources looked promising at ﬁrst,

none of them quite ﬁt the bill. They were either too small, had

some but not all of the needed labels, or were very unbalanced

between the labels. However, the larges problem is that most

of them did not have the non-tumorous label in the samples. A

combination of different datasets was considered for a period,

but the idea of not having one battle-tested dataset at least

as the foundation did not sit well. So the search continued.

Finally, a well-suited candidate was found at The Cancer

Imaging Archive (TCIA). The REMBRANDT (REpository for

Molecular BRAin Neoplasia DaTa) Dataset [11] seemed to

have all the essential characteristics the project needs. The

dataset is one of the most trusted publicly available datasets,

comprised of MRI scans from 130 subjects of three classes,

non-tumorous, LGG, and HGG. Furthermore, the LGG and

HGG classes have subclasses that opened the opportunity to

make the classiﬁcation outcome more extensive. The LGG

class includes Astrocytoma II and Oligodendroglioma II, while

HGG includes Astrocytoma III, Oligodendroglioma III, and

Glioblastoma Multiforme IV. The dataset is a combination

of metadata from various text and spreadsheet ﬁles and the

110,020 images in the DICOM format, which also includes

a vast quantity of metadata. Due to the dataset size and the

state of its metadata, preprocessing was a rather large task.

Preprocessing the data includes combining all the sources,

extracting the desired features, removing samples with missing

data points, and giving the samples a uniﬁed naming for

the labels. After completing preprocessing, the dataset is

comprised of 123 patients and 105,265 slides, distributed as

shown in Table I. At this point, it was discovered that one

key bit of information was missing; how to separate the MRI

slides that contained the tumorous cells from the ones that

did not contain them. All the source data were reexamined,

but the answer was not found. While searching online for

how to ﬁnd the needed key, one paper stood out [12]. The

paper used that same dataset for a similar application. While

being reduced to 4069 slides, it seemed that the authors of this

paper had found a way to ﬁlter the data further. A meeting

to understand how to reproduce the dataset was arranged

by reaching out to the paper’s main author, Dr. Khawaldeh.

While being grateful for Dr. Khawaldeh’s collaboration, some

devastating news arose. His research team had employed help

from neurologists to go through each slide manually and label

them correctly. Fortunately, Dr. Khawaldeh offered to share a

dataset of labeled samples divided into Normal, LGG, and

HGG. After some further processing, the dataset has 735

samples distributed as shown in Table II, and is now ready

for training CNN models.

TABLE I

DATASET SAMPLE DISTRIBUTION

Disease Grade Label Unique Samples Sample Count

Astrocytoma II LGG 30 25286

Astrocytoma III HGG 17 16038

GBM IV HGG 43 32837

Non-Tumorous n/a n/a 15 17041

Oligodendroglioma II LGG 11 9335

Oligodendroglioma III HGG 7 4728

Total 123 105265

TABLE II

MANUALLY LABELED DATASET SAMPLE DISTRIBUTION

Label Sample Count

Normal 168

LGG 287

HGG 280

Total 735

Finding the CNN architecture with or without transfer

learning that yields the best results for the classiﬁcation task is

the primary goal of the research project. Therefore designing a

pipeline where the CNN architecture can easily be substituted

is essential. Doing so required ﬁnding the generally most

suitable division of the Test, Validation, and Test subsets, the

most suitable baseline hyperparameter settings, and if image

augmentation were to be used or not. For this task, VGG16

[13] were selected, as it had been used for the same application

with promising results [14] [15].

After training a large number of VGG16 models, a baseline

for training other CNN architectures was established. In the

baseline, a total of 60 images, 20 from the three classes

picked at random for the Test set. Then, the remaining images

are divided into 30% for the Validation set and 70% for

the Training set. In addition, the Training set is augmented

with rotation, width and height shift, shearing, zoom and

brightness adjustment, and horizontal ﬂipping. Experimenting

with augmentation showed that many ﬁlters with minimal

adjustments to the original gave better results than fewer with

more extensive adjustments. The Normal label has a little over

half of the number of samples as the LGG and HGG, and is

therefore producing a higher number of augmented images.

The Normal label in the Train set is increased from 103 to

309, LGG from 186 to 372, and HGG from 182 to 364. With

a batch size of 24, the baseline uses 10 epochs for training

with a learning rate of 0.001, Adam for optimization with

categorical cross-entropy as the loss function.

In addition to the VGG16 architecture, four other CNN

architectures were used to train models, including MobileNet

[16], ResNext [17], DenseNet121 [18], and a custom-designed

architecture. The custom CNN architecture is designed to

accept N amount of 224x224 image matrices with 3 color

channels; it consists of three convolutional layers of 32, 64,

and 64 ﬁlters, each of ﬁlter size 3x3, with zero-padding, and

ReLU as the activation function. The ﬁrst convolutional layer

is followed by a max-pooling layer of pool size 4x4, and

the last two convolutional layers have a pool size of 2x2,

each followed by a dropout layer with a dropout rate of 0.15.

Next, a ﬂattening layer is added to transform matrix output to

vector inputs to be accepted by the ﬁrst dense layer, which

is comprised of 512 ReLU activated neurons, followed by a

dropout layer with a 0.5 dropout rate. The last hidden layer

is a dense layer of 256 ReLU activated neurons. The model’s

ﬁnal layer, its output, is a 3 neuron softmax activated dense

layer for classiﬁcation. The general architecture of this model

is shown in Fig. 1.

IV. RESULTS

All the CNN architectures implemented to this point show

outstanding results, as shown in Table III. While multiple

runs were made on each architecture to obtain the given

results, and there is an indication that the Test set in particular

needs expansion, the results indicate that DenseNet121 is the

best candidate for the given classiﬁcation task. Surprisingly

enough, with the custom architecture as the runner-up.

Fig. 1. Custom CNN Architecture.

TABLE III

CNN ARCHITECTURE RESULTS

Architecture Epochs Initial learning rate Loss Accuracy & F1

VGG16 10 0.001 0.3218 0.933

MobileNet 28 0.001 0.1141 0.950

ResNext 12 0.001 0.1244 0.933

DenseNet121 12 0.001 0.0254 1.00

Custom 22 0.000316 0.0989 1.00

V. CONCLUSION & FUTURE WORK

The main focus going forward is to expand the dataset

with samples from other applicable datasets, primarily to

increase the Test set. The goal is to obtain the same results

with the dataset split 70%, 15%, 15% or even 60%, 20%,

20% between Train, Validation, and Test sets. However, it is

essential to be selective and verify the sources before including

them in the main dataset. The DenseNet121 and custom

architecture clearly show the best results and will therefore

be the main two architectures of the project going forward;

however, the other architectures will also be trained and tested.

If results are reduced after expanding the dataset, some actions

will be taken, like adding more layers on the architecture’s

tail, modifying parts of the architecture, and hyperparameter

tuning. Furthermore, a meeting with the inventor of the Tsetlin

Machine [19], Dr. Granmo, is scheduled in February 2022 to

discuss implementing the convolutional tsetlin machine [20]

as the classiﬁcation model of the research project.

REFERENCES

[1] D. N. Louis, H. Ohgaki, O. D. Wiestler, W. K. Cavenee, P. C.

Burger, A. Jouvet, B. W. Scheithauer, and P. Kleihues, “The 2007

WHO classiﬁcation of tumours of the central nervous system,” Acta

neuropathologica, vol. 114, no. 2, pp. 97,109, Aug. 2007. [Online].

Available: https://pubmed.ncbi.nlm.nih.gov/17618441

[2] D. A. Forst, B. V. Nahed, J. S. Loefﬂer, and T. T. Batchelor,

“Low-grade gliomas,” The oncologist, vol. 19, no. 4, pp. 403,413, Apr.

2014. [Online]. Available: https://pubmed.ncbi.nlm.nih.gov/24664484

[3] L. S. Hu, A. Hawkins-Daarud, L. Wang, J. Li, and K. R. Swanson,

“Imaging of intratumoral heterogeneity in high-grade glioma,” Cancer

letters, vol. 477, pp. 97,106, May 2020. [Online]. Available:

https://pubmed.ncbi.nlm.nih.gov/32112907

[4] Q. Luo, Y. Li, L. Luo, and W. Diao, “Comparisons of the accuracy

of radiation diagnostic modalities in brain tumor: A nonrandomized,

nonexperimental, cross-sectional trial,” Medicine, vol. 97, no. 31, Aug.

2018. [Online]. Available: https://pubmed.ncbi.nlm.nih.gov/30075495

[5] C.-X. Wu, G.-S. Lin, Z.-X. Lin, J.-D. Zhang, L. Chen, S.-Y. Liu, W.-L.

Tang, X.-X. Qiu, and C.-F. Zhou., “Peritumoral edema on magnetic

resonance imaging predicts a poor clinical outcome in malignant

glioma,” Oncology Letters, vol. 10, no. 5, pp. 2769,2776, Aug. 2015.

[Online]. Available: https://www.spandidos-publications.com/10.3892/

ol.2015.3639

[6] K. Fukushima, “Neocognitron: A self-organizing neural network model

for a mechanism of pattern recognition unaffected by shift in position,”

Biological Cybernetics, vol. 36, no. 4, pp. 193,202, Apr. 1980. [Online].

Available: https://link.springer.com/article/10.1007/BF00344251

[7] ——, “Cognitron: A self-organizing multilayered neural network,”

Biological Cybernetics, vol. 20, no. 3, pp. 121–136, Sep. 1975.

[Online]. Available: https://doi.org/10.1007/BF00342633

[8] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-Based

Learning Applied to Document Recognition,” Proceedings of the

IEEE, vol. 86, no. 11, pp. 2278,2324, Nov. 1998. [Online]. Available:

http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf

[9] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classiﬁcation

with Deep Convolutional Neural Networks,” in Advances in Neural

Information Processing Systems, ser. NIPS’12, F. Pereira, C. J. C.

Burges, L. Bottou, and K. Q. Weinberger, Eds., vol. 25. Red

Hook, NY, USA: Curran Associates, Inc., Dec. 2012, pp.

1097,1105. [Online]. Available: https://proceedings.neurips.cc/paper/

2012/ﬁle/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf

[10] J. Kang, Z. Ullah, and J. Gwak, “MRI-Based Brain Tumor

Classiﬁcation Using Ensemble of Deep Features and Machine Learning

Classiﬁers,” Sensors, vol. 21, no. 6, Mar. 2021. [Online]. Available:

https://www.mdpi.com/1424-8220/21/6/2222

[11] L. Scarpace, A. E. Flanders, R. Jain, T. Mikkelsen, and

D. W. Andrews, “Data From REMBRANDT [Data set],”

2019. [Online]. Available: https://wiki.cancerimagingarchive.net/display/

Public/REMBRANDT#35392299515cc672b974080a1394cbe9c649c74

[12] S. Khawaldeh, U. Pervaiz, A. Raﬁq, and R. S. Alkhawaldeh,

“Noninvasive Grading of Glioma Tumor Using Magnetic Resonance

Imaging with Convolutional Neural Networks,” Applied Sciences, vol. 8,

no. 1, 2018. [Online]. Available: https://www.mdpi.com/2076-3417/8/

1/27

[13] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks

for Large-Scale Image Recognition,” in International Conference

on Learning Representations, Sep. 2015. [Online]. Available: https:

//arxiv.org/pdf/1409.1556.pdf

[14] O. N. Belaid and M. Loudini, “Classiﬁcation of Brain Tumor by

Combination of Pre-Trained VGG16 CNN,” Journal of Information

Technology Management, vol. 12, no. 2, pp. 13,25, 2020. [Online].

Available: https://jitm.ut.ac.ir/article 75788.html

[15] O. Sevli, “Performance Comparison of Different Pre-Trained Deep

Learning Models in Classifying Brain MRI Images / Beyin MR

unt

ulerini Sınıﬂandırmada Farklı

Onceden E

gitilmis¸ Derin

grenme

Modellerinin Performans Kars¸ılas¸tırması,” Acta Infologica, vol. 5, p.

2021, Jun. 2021. [Online]. Available: https://cdn.istanbul.edu.tr/ﬁle/

JTA6CLJ8T5/99DD9C496BF14E44859851B33E49A006

[16] A. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand,

M. Andreetto, and H. Adam, “MobileNets: Efﬁcient Convolutional

Neural Networks for Mobile Vision Applications,” ArXiv, 04 2017.

[Online]. Available: https://arxiv.org/pdf/1704.04861.pdf

[17] S. Xie, R. Girshick, P. Doll

ar, Z. Tu, and K. He, “Aggregated

Residual Transformations for Deep Neural Networks,” in 2017

IEEE Conference on Computer Vision and Pattern Recognition

(CVPR). IEEE, Jul. 2017, pp. 5987,5995. [Online]. Available:

https://ieeexplore.ieee.org/document/8100117

[18] G. Huang, Z. Liu, G. Pleiss, L. Van Der Maaten, and K. Weinberger,

“Convolutional Networks with Dense Connectivity,” IEEE Transactions

on Pattern Analysis and Machine Intelligence, pp. 1,1, 2019. [Online].

Available: https://ieeexplore.ieee.org/document/8721151

[19] O.-C. Granmo, “The Tsetlin Machine - A Game Theoretic Bandit

Driven Approach to Optimal Pattern Recognition with Propositional

Logic,” ArXiv, vol. abs/1804.01508, 2018. [Online]. Available:

https://arxiv.org/abs/1804.01508

[20] O.-C. Granmo, S. Glimsdal, L. Jiao, M. Goodwin, C. W. Omlin,

and G. T. Berge, “The Convolutional Tsetlin Machine,” ArXiv, vol.

abs/1905.09688, 2019. [Online]. Available: https://arxiv.org/abs/1905.

09688

Source Code Repository

The source code repository may be found at the following URL: https://github.

com/lewiuberg/tumorclass.info (Lie Uberg 2022).

Word count metrics

NUC Bachelor Project Word Count:

Total Sum count: 11433 Words in text: 11276 Words in headers: 82

Words outside text (captions, etc.): 74 Number of headers: 48 Number of

ﬂoats/tables/ﬁgures: 16 Number of math inlines: 1 Number of math displayed: 0

NOTE: References are excluded.