Prototype System to Detect Skin Cancer Through Images

This paper proposes the development of a software that performs the pre-diagnosis of malignant melanoma, spincellular carcinoma and basal-cell carcinoma. The software is divided into five modules, these being: digital imaging, analysis and processing, storage, feature extraction and classification by means of an Artificial Neural Network (ANN). The results shown the performance of the software for two different combination of activation functions in the network. With the use of spectroscopic techniques for the acquisition of images and the combination of non-linear and linear activation functions in the ANN, the software shows an effectiveness greater than 80%, concluding that it can be an effective tool as an aid in the diagnosis of cancer of skin.


Introduction
The skin is the biggest organ of the body, which acts as a protective barrier that isolates the organism from its surrounding environment, the skin has several layers (see fig.1).
The main ones are: 1. The epidermis in the lower layer.
2. The dermis in the upper layer. 3. The hypodermis in the deepest layer of the skin. According to specialists there are at least 200 types of cancer that humans can suffer. Cancer can arise in any of the 60 organs that the human body has, but each type of cancer is unique. The type of cancer that was treated is the skin cancer of which the pre-diagnosis to be performed is of three types, in specific Basal cell carcinoma is the most common type of skin cancer, it is characterized by being locally invasive, slow growing and with low metastasis risk. Usually you can appreciate four types of frequent injuries: exophytic, flat, ulcerated and pigmented. (see fig.2). Spinocellular Carcinoma. This type of cancer appears in those areas of the body exposed to the sun, it can also arise in scars or sores (see fig.3). They are more likely to grow into the deeper layers of the skin and spread to other parts of the body.
Melanomas. These cancer originates from the melanocytes, the skin cells that produce the pigments can be seen in fig.4. Melanocytes can also form benign growths (non-cancerous) which are known as spots [1]. Samples of photographs of digital images were used throughout the terminal work. A digital image is the twodimensional representation of an image. Each number of this is associated with a color. Digital images are stored as a matrix. For the creation of digital images (see fig.5) it is necessary to have the following elements: 1. A scene. This is the area where is the object we want to scan, in this case is the real world. 2. Light. Transverse electromagnetic waves that propagate through the air, it is made up of photons. The electromagnetic waves illuminate the object, some of these waves will be refracted and others will be reflected, which will allow said object to be visible to the capture device. 3. Capture device. These devices capture the images using an electronic sensor that stores them in a digital memory. 4. Storage device. Device where the digital image to be analyzed will be stored. There are two main methods to manipulate the information that integrates a digital image: bitmap images and vector images.
1. Bitmap images are formed by cell grids. Each of these cells are called pixels which are assigned a color and luminance. 2. Vector images. Its size is much smaller compared to bitmaps, so that the information is organized more easily. Generate the objects that form the image through geometric strokes which are subsequently determined by calculations and mathematical formulas. The vector graphics are visualized from strings of a line that serves as a reference. Each object within the vector image can be modified without the need to alter the others (Valerie, 2000) The RGB model consists of three components of the image, one for each has a primary color. When an RGB monitor is powered, these three images are combined on the screen to produce a compound color image. The number of bits used to represent each pixel in the RGB space is called pixel depth (see fig.6). The current digital cameras, with its high resolution, and integrated hardware and software, allow an almost punctual observation of the biological tissue, as well as the composition of light in the basic colors RGB (red, green y blue) and the luminous intensities that these emanate through histograms. A histogram is a graphical representation of a variable (in this case the RGB model) in the form of bars, where the Surface of each bar is proportional to the frequency of values represented, either differentially or cumulatively. It serves to obtain a view of the distribution of the population, or the sample, with respect to a quantitative and continuous characteristic. In the graph of the histogram it is shown with vertical bars where in the horizontal axis the numbers from 0 to 255 are aligned representing the intensity of color and on the vertical axis the amount of repetition of said color is represented in the image as shown in fig.7 (Valerie, 2000)

Figure-7. Example of histograms in RGB
The classifier that was used in the present work was an Artificial Neural Network (ANN), which is defined as a massive arrangement of simple processing elements called neurons, with a high degree of interconnectivity or feedback. These arrangements are inspired in the biological nature of neurons. "The ANN are massively interconnected networks of simple elements in parallel of simple elements and with hierarchical organization, which try to interact with real-world objects in the same way that the biological nervous system does" [2]. Some of its characteristics are: • Adaptive learning • Self -organization • Fault tolerance • Real time operation. • Easy construction in integrated circuits In the present system, the type of learning used is supervised as it is characterized because the learning is carried out through a training controlled by an external agent that determines the response that the network should generate from a specific input. The supervisor checks the network output and in the case where it does not match with the desired, the weights of the connections will be modified, in order to get that they obtained output approximates to the desired.

Material and Method
The system is divided in to five modules

Module 1. Obtaining Digital Images
Obtaining digital images. For the completion of the terminal work it was important to define the devices used for the development of the software, as well as knowing the hardware that was used for the manufacture LED power source which was used a white light emitting diode MOLEX 800 QXRA series 180081-22, led UV of 365 nm, driver LEDs RCD-24-1.00/PL/A of RECOM which is a voltage regulated power source, that goes from 0 to 1000 mA regulated by a voltage from 0 to 4.2 V. to regulate the LEDs Driver voltage it was used an analog digital converter (DAC of 12 bits), as well as a MOTOROLA voltage regulator MC7805CT which promotes 5 V at PIC18f2550, at LCD LM016L of voltage regulator LM7812AC. The camera used for taking pictures was a Cyber-shot DSC-HX200V Sony [3]. The digital image is a very important element for the extraction of characteristics to later be able to obtain the pre diagnosis of the said ailments. It is worth mentioning that we counted with a base of images of 38 patients, of which 5 were diagnosed with Melanoma, 11 are from basal cell carcinoma, 7 are of spinocellular carcinoma and 14 new intradermal (see Fig. 8)

Module 2. Image Analysis and Processing
When photographing the cellular tissue we will obtain the digital image which will be processed through the system, the area to be analyzed will be selected together with the perilesion and the histograms of both samples will be obtained (see fig.9). It was necessary a module where the doctor was able to manipulate the image and observe the characteristics that were not so perceptible within the histogram.
1. Grayscale. Pixels are traversed across the image. Assigning the component value of its three channels from a whole number already declared, likewise changes are made in the three channels replacing the new pixel value in the three channels [4] (see Figure 10). 2. Binarization. Pixels are traversed across the image. Making a comparison in its three channels when the value is less than 127 will Paint the pixel with the value 255 (white) otherwise the value will be 0 (black) thus performing the thresholding by half pixels. [5] (see fig.11).  The data obtained in the histogram that are stored in the database are: maximum pixel on all three channels, maximum pixel position per channel, average per channel, variance for each cannel and intensity percentage per channel, it is worth mentioning that these data were saved for both white and ultraviolet images for its later use in Module 4.

Module 4. Features Extraction
The characteristics were obtained by the histogram, for both white light and ultraviolet light and they helped us to be a discriminating factor among the images of the patients who had the diagnosis. These values were normalized to be able to enter them in to the ANN (see Table 1).

Module 5. Classifier
Through the data obtained by the histogram and stored in the database it is compared with the samples previously diagnosed using the ANN to classify and pre-diagnose with the possible ailment. To perform the ANN it was necessary to define the necessary inputs for their respective classification. Taking the 8 inputs already defined in the previous module and the four types to classify (basal cell carcinoma, spinocellular carcinoma, melanoma and intradermal nervous). By the geometric pyramid rule we get an approximation of the number of neurons in the hidden layer. Thus obtaining the design of the ANN which has 8 inputs (white dots), 5 neurons in the hidden layer (red dots), 4 outputs (green dots) and the threshold (yellow dots) (see Fig.15)

Figure-15. ANN Structure
When designing an ANN it must be established how the values of each neuron will be and it must be said the activation function (FA) with which each neuron processes the inputs. Therefore, because in the input vector we get negative values and while we are waiting for the ANN to produce the responses that are not in linear function of the inputs, we decided to use nonlinear and bounded neurons. From which we choose the hyperbolic tangent transfer function to obtain values in the range of Infocancer [1], Hilera José and y Víctor [2], Santalices.net. [5].
The output vector that is obtained from the ANN will be interpreted in the system taking the values according to the following conditions: Output  .16).

Figure-16. Output vector interpreted by the system
Then, are shown the binary combinations that the system will recognize and will assign the interpretation with the respective ailment depending on the binary combination (see Table 2).

Results
When data is compared, a final result will be shown this being the pre-diagnosis. Due to the amount of available images, the performance of the ANN was calculated by the re-substitution method. Providing a percentage of success among the classes to be classified and the total number of samples (see table 3) [6].
The Study of digital images through RGB histograms to know the intensity of the selected areas as injuries and healthy skin, it was found that the variation in percentage of these helps to differentiate melanoma from the other lesions studied.
For digital images with white light we can see that most of the lesions are less intense compared to healthy skin, when using UV light for certain types of ailments, we can see that the lesions become darker than with white light and for other ailments fluorescence is present, by giving the lesions greater intensities than to the healthy skin.

Conclusion
The software developed has advantages for the pre-diagnosis of skin cancer that provides the clinician a tool to make his diagnosis using of digital photography, spectroscopic techniques and artificial intelligence. It is concluded that the light intensity differences of healthy tissue lesions were more evident with the use of photographs in white light and ultraviolet light, this method is accessible, economical and easy to use, this way it will be very useful in centers where these ailments are treated.
The classifier was based on a multilayer artificial neural network with retro propagation learning algorithm, gave greater results when activation functions and linear were used. Obtaining a percentage of 81.27% of efficiency.
The tools that were used for the elaboration of this terminal work: Qt, SQLite and Matlab (for the simulation and experimentation of the ANN) helped to support the needs of the macro project.
In future work we can consider the following: the system architecture allows to adapt to the pre diagnosis of other types of skin cancer because the code is scalable, the neural network in the output layer allows classifying more than four cases. For greater accuracy in the classification module it is necessary to populate the database with more standardized images and adapted to the two types of lights: white light and ultraviolet light. Other features may be added outside the image analysis.