Anda di halaman 1dari 8

Lung Cancer Detection using

Artificial Neural Network and


Bayesian Network
Creating an image database

1. The images were formatted as .mhd and .raw files. The header data
is contained in .mhd files and multidimensional image data is stored in
.raw files. We used SimpleITK library to read the .mhd files. Each CT
scan has dimensions of 512 x 512 x n, where n is the number of axial
scans. There are about 200 images in each CT scan.
2. There were a total of 551065 annotations. Of all the annotations
provided, 1351 were labeled as nodules, rest were labeled negative. So
there was a big class imbalance. The easy way to deal with it was to
under sample the majority class and augment the minority class
through rotating images.
Creating an image database

3. The images were cropped around the coordinates provided in the


annotations. The annotation were provided in Cartesian coordinates.
So they had to be converted to voxel coordinates. Also the image
intensity was defined in Hounsfield scale. So it had to be rescaled for
image processing purposes.
4. Augmentation resulted in a 80-20 class distribution, which was not
entirely ideal. But we also did not want to augment the minority class
too much because it might result in a minority class with little variation.
Building a CNN

1. We used tflearn to build a CNN with 3 convolutional layers in my


architecture.
2. The input to this CNN model was a 50 x 50 grayscale image and it
generates the probability of the image containing the nodules.
Training the model

1. We had a total 6881 images in our training set and 1622 images in
our validation set.
2. Because the data required to train a CNN is very large, it was
desirable to train the model in batches. Loading all the training data
into memory is not always possible because we need enough
memory to handle it and the features too. So we loaded all the
images into a hdfs dataset using h5py library.
Lung Segmentation
Final XGBoost Model

• Each of the candidate nodules generated from the segmentation


approach, we can crop out a 2D patch from its center.
• By applying the trained CNN model to this 2D patch, we can eliminate
candidate nodules which didn’t result in high probability. All the
remaining nodules can be used to generate features.
Final XGBoost Model
The final feature set includes:
• nodule area, diameter, pixel intensity, and number of nodules
• aggregated features from the last fully-connected layers of the trained
CNN model
• aggregated features from last fully-connected layer of the pre-trained
ResNet model
• simple features associated with the CT scan (i.e. resolution, number
of slices, slice thickness)

Anda mungkin juga menyukai