Download EARLY DETECTION OF BREAST CANCER USING SUPPORT VECTOR MACHINE and more Thesis Machine Learning in PDF only on Docsity!
EARLY DETECTION OF BREAST CANCER
USING SUPPORT VECTOR MACHINE
A THESIS SUBMITTED TO
THE GRADUATE SCHOOL OF APPLIED SCIENCES
OF
NEAR EAST UNIVERSITY
by
HÜSEYİN GÜNEY
IN PARTIAL FULFILLMENT OF THE
REQUIREMENTS
FOR
THE DEGREE OF MASTER OF SCIENCE
IN
COMPUTER ENGINEERING
NICOSIA 20 13
(Sırt)
H
.GÜNE
Y
NEU, 201
Hüseyin Güney : Early Detection of Breast Cancer using Support
Vector Machines
Approval of the Graduate School of Applied
Sciences
Prof. Dr. İlkay Salihoğlu
Director
We certify this thesis is satisfactory for the award of the
Degree of Master of Science in Computer Engineering
Examining Committee in charge:
Assist.Prof. Dr. Kaan Uyar, Committee Chairman, Computer Engineering
Department, NEU
Assist.Prof. Dr. İbrahim Erşan, Committee Member, Computer Engineering
Department, GAU
Assist.Prof. Dr.Firudin Muradov, Committee Member, Computer Engineering
Department, NEU
Assist.Prof.Dr. Boran Şekeroğlu, Cosupervisor, Committee Member, Computer
Engineering Department, NEU
Prof. Dr. Rahib H.Abiyev, Supervisor, Committee Member, Chairman of
Computer Engineering Department, NEU
I hereby declare that all information in this document has been obtained and presented in accordance with academic rules and ethical conduct. I also declare that, as required by these rules and conduct, I have fully cited and referenced all material and results that are not original to this work. Name, Last name : Huseyin Guney Signature : Date:
ii Eşime, Anneme ve Babama To my Wife and my Parents
iii ACKNOWLEDGEMENTS First and foremost I would like to thank my supervisor Prof. Dr. Rahib ABIYEV and who has shown plenty of encouragement, patience, and support as he guided me through this endeavor fostering my development as a graduate student and scientist. In addition, I would like to thank my co-supervisor Assist. Prof. Dr. Boran ŞEKEROĞLU for his important ideas and helps that assisted and help to improved my work. I am also thankful for the contributions and comments the teaching staff of the Department of Computer Engineering. Here also I would like to thank to my friends at the Department of Computer Engineering who helped me one way or the other. This research was generously supported by the Department of Computer Engineering of the Near East University. I am grateful to all supporters.
vi
- CHAPTER 1, INTRODUCTION…………………………..…………………………....... LIST OF TABLES.....................……………………………………………………...........ix
- 1.1 Overview on Image Classification………………………………………………….....
- 1.2 Aim of the Thesis……………………………………………………………………...
- 1.3 Thesis Overview…………………………………………………………………….....
- CHAPTER 2, REVIEW OF IMAGE CLASSIFICATION...…………………..………….
- 2.1 Review of Image Classification ……………………………………………………....
- 2 .2 Procedures of Image Classification…….………………………………………….….
- 2 .3 How Image Classification Works..…………………………………………………....
- 2.4 Types of Image Classification………………………………………………………....
- 2.4.1 Supervised Image Classification……………………………………………..………
- 2.4.1.1 Advantages of Supervised Image Classification.………………………………...
- 2.4.1.2 Disadvantages of Supervised Image Classification.…………………………..….
- 2.4.1.3 Procedures of Supervised Image Classification.………………………………….
- 2.4.2 Unsupervised Image Classification…………………………………………………
- 2.4.2.1 Advantages of Unsupervised Image Classification.……………………………...
- 2.4.2.2 Disadvantages of Unsupervised Image Classification.…………………………...
- 2.4.2.3 Procedures of Unsupervised Image Classification.……………………………….
- 2.4.3 Supervised vs. Unsupervised Image Classification………………………………....
- 2.5 Practical Applications of Image Classification………………………………………..
- 2.6 Medical Image Classification…………………………………………………………
- CHAPTER 3, TECHNIQUES FOR MEDICAL IMAGE CLASSIFICATION……..…..
- 3.1 Overview……………………………………………………………………………...
- 3.2 Image Acquisition…………………………………………………………………….
- 3.2.1 Medical Image Acquisition…………………………………………………………
- 3.2.2 Medical Imaging…………………………………………………………………....
- 3.2.2.1 Magnetic Resonance Imaging (MRI)…………………………………………..…
- 3.2.2.2 Computer Tomography (CT)……………………………………………………...
- 3.2.2.3 Mammography…………………………………………………………………….
- 3.3 Image Enhancement………………………………………………………………...…
- 3.3.1 Contrast Enhancement………………………………………………………………
- 3.3.2 Contrast Stretching…………………………………………………………………..
- 3.3.3 Image Filtering……………………………………………………………………....
- 3.3.3.1 Min and Max Filtering…………………………………………………………….
- 3.3.3.2 Mean and Median Filtering………………………………………………………..
- 3.3.3.3 Gaussian Smoothing Filtering………………………………………………...….
- 3.3.3.4 Top-Hat Filtering…………………………………………………………………..
- 3.3.3.5 Image Transform…………………………………………………………………..
- 3 .3.3.5.1 Wavelet Transform………………………………………………………………
- 3.3.3.5.1.1 Continuous Wavelet Transform………………………………………...……..
- 3.3.3.5.1.2 Discrete Wavelet Transform………………………………………………......
- 3.3.3.5.1. 3 Complex Wavelet Transform…………………………………………………. v
- 3.4 Feature Extraction and Selection……………………………………………………...
- 3.5 Classification…………………………………………………………………………..
- 3.5.1 Artificial Neural Network…………………………………………………………...
- 3.5. 2 Support Vector Machines……………………………………………………………
- CHAPTER 4, SUPPORT VECTOR MACHINES……………………………………….
- 4.1 Overview of SVM…………………………………………………………………......
- 4.2 Kernel Methods of SVM………………………………………………………………
- 4.2.1 Linear SVM………………………………………………………………………….
- 4.2.2 Non-Linear SVM ……………………………………………………………………
- 4.2.2.1 Polynomial Kernel Function………………………………………………………
- 4.2.2. 2 Gaussian RBF Kernel Function …………………………………………………..
- 4.2.2.3 Sigmoid Kernel Function …………………………………………………………
- 4 .3 Advantages and Disadvantages of SVM …………………………………………….
- 4 SVM Training Algorithm…………….………………………………………....…….
- 4.4. 1 Kernel - Adatron Algorithm………….…………………………………………….
- SUPPORT VECTOR MACHINES…………………………………………..................... CHAPTER 5, DEVELOPMENT OF IMAGE CLASSIFICATION SYSTEMS USING
- 5.1 Development the flowchart of the clustering algorithm………………………….….
- 2 Image Acquisition ……………………………………………………………….….
- 3 Image Enhancement…………………………………………………………….......
- 4 Feature Extraction and Selection. …………………………………………….….....
- 5 Classification………………………………………………………………………...
- 5.5.1 Usage of Support Vector Machines.........................................…………………...5
- 5.5. 2 Results of Image Classification with Train and Test Processes…………………...
- 5 2 .1 Section 1: Results of Image Classification with Train and Test Processes …….
- 5 2 .1.1 Feature 1 and 2's Train and Test Processes with 3 Input Images……………….
- 5 2 .1.2 Feature 3 and 4's Train and Test Processes with 3 Input Images……………….
- 5 2 .1.3 Feature 5 and 6's Train and Test Processes with 3 Input Images……………….
- 5 2 .1.4 Feature 7 and 8's Train and Test Processes with 3 Input Images…………….
- 5 2 .1.5 Feature 9 and 10's Train and Test Processes with 3 Input Images…………….
- 5 2 .1.6 Feature 1 and 10's Train and Test Processes with 3 Input Images……………...
- 5 2 2 Section 2: Results of Image Classification with Train and Test Processes …....
- 5 2 .2.1 Feature 1 and 2's Train and Test Processes..................................................…….
- 5 2 .2.2 Feature 3 and 4's Train and Test Processes..................................................…….
- 5 2 .2.3 Feature 5 and 6's Train and Test Processes..................................................…….
- 5 2 .2.4 Feature 7 and 8's Train and Test Processes..................................................…….
- 5 2 .2.5 Feature 9 and 10's Train and Test Processes................................................…….
- 5 2 3 Accuracy of Classification Results..........................................................…..
- CHAPTER 6, CONCLUSION
- REFERENCES
- APPENDICES
- APPENDIX 1 Matlab Code of Developed Software; Train Process
- APPENDIX 2 Matlab Code of Developed Software; Test Process
- APPENDIX 3 Matlab Code of Developed Software; Classification Process
- Figure 2.1 Graphical Representation of Classification....……….………......….…... LIST OF FIGURES
- Figure 2. 2 Graph of Feature Space: + sewing needles, o bolts................................
- Figure 2. 3 Steps of Supervised Image Classification
- Figure 2. 4 Steps of Image Classification
- Figure 2. 5 Steps of Unsupervised Image Classification
- Figure 2. 6 Spectral Classes Class Identification
- Figure 2. 7 Supervised and Unsupervised Decision Process
- Figure 2. 8 Supervised vs. Unsupervised Classification Algorithm and Chart
- Figure 3.1 Steps of Image Classification
- Figure 3. 2 An example of Contrast Stretching Operation
- Figure 3. 3 An example of Min Filter Operation
- Figure 3. 4 An example of Max Filter Operation
- Figure 3. 5 An example of Median Filter Operation
- Figure 3. 6 An example of Mean Filter Operation
- Figure 3. 7 An example of Gaussian Smoothing Filter
- Figure 3. 8 An example of Top-Hat Filter
- Figure 3. 9 DWT Decomposition
- Figure 3. 10 General Overview of a Classification Process with Feature Steps
- Figure 3. 11 SVM Input and Feature Spaces and Classification using Kernel Fn.
- Figure 3. 12 The SVM learns a hyperplane which best separates the two classes
- Figure 4.1 Max margin hyperplanes for SVM with samples from two classes
- Figure 4.2 A sample of decision boundary and a linear classifier
- Figure 4. 3 Kernel – Adatron Algorithm
- Figure 5.1 Flowchart of seveloped software..................................................................
- Figure 5. 2 A sample of abnormal breast mammographic image
- Figure 5. 3 A sample of normal breast mammographic image
- Figure 5. 4 A sample of normal breast mammographic image with unwanted region
- Figure 5. 5 Binary image of unwanted region
- Figure 5. 6 Cropped image of grayscale unwanted region……………….......….….
- Figure 5. 7 Breast Mammographic image after removing unwanted region
- Figure 5. 8 Matlab Code for removing unwanted region
- region removed from original image Figure 5. 9 Gaussian Smoothing Filtered Image of abnormal breast after unwanted
- Figure 5. 10 Matlab Code of gaussian smoothing filter
- Figure 5. 11 Contrast Stretched Image of abnormal breast
- Figure 5. 12 Matlab Code for constrast streching............................................................
- Figure 5. 13 Top Hat Filtered Image of abnormal breast
- Figure 5. 14 Matlab Code fortop-hat filtering
- Figure 5. 15 Discrete Wavelet Transform Image of abnormal breast
- Figure 5. 16 Matlab Code for discrete wavelet transform
- Figure 5. 17 Segmented Image after Image Classification Techniques applied.............
- Figure 5. 18 Matlab Code for region segmentation
- Figure 5. 19 Matlab Code for SVM training and test processes
- Figure 5. 20 Image classification using SVM Test Process Input Images
- Figure 5. 21 Image classification using SVM Train Process, Feature
- Figure 5. 22 Image classification using SVM Test Process, Feature 1 2, Test Image (a)
- Figure 5. 23 Image classification using SVM Test Process, Feature 1 2, Test Image (b)
- Figure 5. 24 Image classification using SVM Test Process, Feature 1 2, Test Image (c)
- Figure 5. 25 Image classification using SVM Train Process, Feature
- Figure 5. 26 Image classification using SVM Test Process, Feature 3 4, Test Image (a)
- Figure 5. 27 Image classification using SVM Test Process, Feature 3 4, Test Image (b)
- Figure 5.2 8 Image classification using SVM Test Process, Feature 3 4, Test Image (c)
- Figure 5. 29 Image classification using SVM Train Process, Feature 5 6...…….....
- Figure 5. 30 Image classification using SVM Test Process, Feature 5 6, Test Image (a)
- Figure 5. 31 Image classification using SVM Test Process, Feature 5 6, Test Image (b)
- Figure 5. 32 Image classification using SVM Test Process, Feature 5 6, Test Image (c)
- Figure 5. 33 Image classification using SVM Train Process, Feature 7 8......................
- Figure 5. 34 Image classification using SVM Test Process, Feature 7 8, Test Image (a)
- Figure 5. 35 Image classification using SVM Test Process, Feature 7 8, Test Image (b)........
- Figure 5. 36 Image classification using SVM Test Process, Feature 7 8, Test Image (c)
- Figure 5. 37 Image classification using SVM Train Process, Feature
- Figure 5. 38 Image classification using SVM Test Process, Feature 9 10, Test Image (a)
- Figure 5. 39 Image classification using SVM Test Process, Feature 9 10, Test Image (b)
- Figure 5. 40 Image classification using SVM Test Process, Feature 9 10, Test Image (c)
- Figure 5. 41 Image classification using SVM Train Process, Feature 1 2...................... viii
- Figure 5. 42 Image classification using SVM Test Process, Feature
- Figure 5. 43 Image classification using SVM Train Process, Feature
- Figure 5. 44 Image classification using SVM Test Process, Feature
- Figure 5. 45 Image classification using SVM Train Process, Feature
- Figure 5. 46 Image classification using SVM Test Process, Feature
- Figure 5. 47 Image classification using SVM Train Process, Feature
- Figure 5. 48 Image classification using SVM Test Process, Feature
- Figure 5. 49 Image classification using SVM Train Process, Feature
- Figure 5. 50 Image classification using SVM Test Process, Feature
ix LIST OF TABLES Table 5. 1 Accuracy Rates of SVM Classification of Developed Application ……...... 82
2 information. This can be done in two main approaches of image classification: supervised and unsupervised image classification. Unsupervised image classification does not rely on a training set. Instead, it uses clustering techniques which measure the distance between images, and groups the images with common features together. This group can then be labeled with different class- identifiers. Unsupervised classification can be defined as the identification of natural groups or structures within the data. It clusters pixels in a data set based only on their statistics, without using previous knowledge about the spectral classes present in the image. Some of the more commonly used unsupervised classification methods are: Isodata (Witten & Frank, 2005) and k-Means (Witten & Frank, 2005). Moreover, unsupervised classification is a method which examines a large number of unknown pixels and divides them into a number of classes based on natural groupings present in the image values. Unlike supervised classification, unsupervised classification does not require analyst- specified training data. The basic premise is that values within a given color pixel should be close together in the measurement space (i.e. have similar grey levels), whereas data in different classes should be comparatively well separated (i.e. have very different grey levels) (Lillesand & Kiefer, 1994). Besides that, supervised classification uses training sets of images to create descriptors for each class. The training sets are carefully manually selected to represent a common picture set of that class. The classifier method then analyses the training set, generating a descriptor for that particular class based on the common features of the training set. This descriptor could then be used on other images, which determines if that image is a part of that class. Supervised image classification is a subset of supervised learning. Supervised learning can generate models of two types. Most commonly, supervised learning generates a global model that inputs objects to desired outputs. In some cases, however, the map is implemented a set of local models. These local models are treated as inputs in such algorithms. Such algorithms are often implemented using neural networks , decisions trees , support vector machines and Bayesian statistical methods. The support vector machines show a great promise in this area.
1.2 Aim of the Thesis
The importance of this project is designing medical image classification system that can filter and locate a tumor area at a grayscale mammographic image of breast cancer. Generally, tumor area of breast is illustrated in the form of dense region in mammographic
3 images. Therefore, by using image enhancement techniques tumor area of breast can be segmented and by using machine learning algorithm like SVM, normal and abnormal breast can be classified in an accurate and efficient way. The main task to be accomplished in this project is implementing an image classification system using SVM for early detection of breast cancer and classification purposes. SVM is used for classification because it is very compromising machine learning system and has a good accuracy in results and it can be modified for linear and non-linear classification processes. To sum up, Mias breast cancer mammographic image database, image enhancement techniques, SVM and Matlab used to develop this application. In addition, the dataset is separated into train and test sections to develop and test the results of the application. Therefore, this application can be process images and classify them depending on the type of breast. It shows all images in a hyperplane and separate them by labeling them normal and tumor images.
1.3 Thesis Overview
The remaining chapters of this thesis are organized as follows:
- Chapter 2 introduces the types of image classification their advantages and disadvantages. Supervised and Unsupervised image classification techniques have been described. The importance of medical image classification and their practical implementation have been discussed.
- Chapter 3 describes the techniques used for medical image classification. Steps of image classification have been given. Medical image acquisition using Magnetic Resonance Imaging (MRI), Computer Tomography (CT), Mammography are explained. Image enhancement using stretching, filtering, wavelet transform are described. Also feature extraction and classification steps are presented.
- Chapter 4 presents the mathematical description of the support vector machines (SVM). Linear and nonlinear SVM, the used kernel functions are described. The importance of SVM in Image classification has been shown.
5 CHAPTER 2 REVIEW OF IMAGE CLASSIFICATION 2.1. Overview In this section, brief information about image classification will be given and the used methodologies for image classification will be described. Basically medical image classification will be explained. Importance of the elements of artificial intelligence in image classification will be presented. Real world applications about image classification will be mentioned. 2.1 Review of Image Classification Image classification is the one of important topics in the field of computer vision. Image classification plays an important role in areas of Medical diagnosis, Remote Sensing, Image analysis and Pattern Recognition. Digital image classification is the operation of sorting of images into a finite number of individual classes. Graphical representation of classification is given in Figure 2.1. Here the data describing the image is classified into two classes. In medical diagnosis, images have to be classified with maximum accuracy and efficiency. For instance, diagnosing of cells that have tumor is the one of most important task in medical image analysis. Nowadays the development of accurate image classification system for finding and classification tumors are become one of important problem in image processing. Therefore, image classification system can help humans to achieve their daily tasks. Otherwise, it will lead to incomplete treatment of the corresponding disease.
6 Figure 2.1 Graphical Representation of Classification. (Fisher, R., et. al., 2003, ¶ 1). Image classification contains range of techniques to classify images depending on fields of images were taken. All algorithms that developed for image classification assumes every image has at least one feature, like spectral region of a land at remote sensing system, region of tumor area of a medical image, and each of these features belongs one or more classes. In addition, those classes can be specified by analyze of images which is supervised classification or automatically clustered of images which is unsupervised classification (Fisher, R., et. al. 2003). In other words, image classification uses information that contains digital number representation of images and tries to separate and classify each individual pixel of image depending on needed information. The aim of this system is assigning all related pixels to particular classes such as, water and forest in landscapes. In addition, the resulting classified image is a combination of pixels and it is a “thematic map” of the original image (Natural Resources Canada, 2008, ¶ 1). The main idea is image classification system automatically categorize all pixels in an image into classes. In another word, it converts image data into information. There are two kinds of classes which are information classes and spectral classes. Information classes tries to define and separate particular parts in the image, such as different forest types or tree species, different geologic units or rock types, etc. Spectral classes form the group of similar pixels depending on their values like brightness in the different spectral channels of the data. The aim of image classification system while creating those classes is matching spectral classes in the data according to the interested region of information classes. Sometimes, there is a one-to-one match for those two classes. However, generally, those two groups do not