Text Extraction in Images Using DWT, Gradient Method And SVM Classifier

International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume...
4 downloads 0 Views 307KB Size
International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 4, Issue 6, June 2014)

Text Extraction in Images Using DWT, Gradient Method And SVM Classifier Niti Syal1, Naresh Kumar Garg2 1

2

PG Scholar, GZS PTU Campus Bathinda, Punjab, India Assistant Professor, GZS PTU Campus Bathinda, Punjab, India II. LITERATURE SURVEY

Abstract— Text embedded in an image may contain large amount of meta data. Such meta data provides semantic information for indexing and retrieval purposes. This paper proposes a hybrid method to extract the text in images, based on fusion of Daubechies DWT, gradient difference and SVM along with some pre-processing and post-processing steps. This algorithm is tested with images taken from various newspapers, magazines, commercial products as well as some are collected from ICDAR 2003 dataset. Experimental results show that the current proposed method can correctly and effectively extract the text region.

A lot of research has been done in text extraction. We have done a survey from the year 2010 and we have highlighted only the articles published in the past 5 years. P. Nagabhusha et al. [1] proposed a hybrid approach that combines connected component analysis and texture feature analysis .Canny detector is used to get the possible text regions. Connected component analysis is used to get the candidate text regions. Due to background complexity non –text regions may also get collected, so to remove non – text regions texture feature analysis is done.The proposed method can handle document images with complex background effectively. R. Chandrasekaran et al. [2] focuses on the use of morphological operation for text localization and ,SVM for text recognition along with some pre-processing and postprocessing steps. Median filtering is done to remove the noise. Then LOG Edge detector is used to get the edges. morphological operation is done to get the text localization. After that non-text components get removed using connected-component technique. Then characters are classified using support Vector machine. Uddin et al. [3] worked on a modified morphological filter to improve text extraction accuracy. Due to inefficiency of a single threshold value, input images are divided into different clusters depending on the size of text. Matko et al. [4] proposed text extraction method based on K-means clustering with modified cylindrical distance in HSI color space and Euclidian distance in RGB color space. Euclidian distance is used to get good results in terms of color difference measure. B.H Shekhar et al. [5] proposed a hybrid approach to robustly localize the texts in natural images, based on integration of Haar DWT, gradient difference. Then, morphological operation is performed to generate the text regions. In proposed method, we have used a fusion of Daubechies DWT, gradient difference, Euclidian distance to measure the color space measure, Otsu method for filtration, then global threshold value to remove non-text regions and then Support Vector Machine for text classification and recognition.

Keywords— Daubechies DWT, Gradient Difference, Median Filter, Otsu Method, Support Vector Machine, Text Extraction

I. INTRODUCTION Text extraction in real world images is an open problem since it is a critical task in computer vision applications like reading labels, keyword based image search, identification of parts of industrial automation, tourist guide, street signs, mobile reading system for visually challenged persons etc. Text printed on magazine covers, book covers etc, always mixes with pictures. This kind of text may contain semantic and useful information, so separation of text strings from images is an very important issue. Images can be subjected to various degradations like blur, low resolution, uneven lightening which makes it difficult to get the text from noisy image. In this paper, we propose a robust approach for text extraction in images. First, the pre-processing of input image is done in which RGB image is converted into gray-scale image. Then edges are detected by using Canny as well as Sobel detector. Then median filter is used to remove any noises.Then Daubechies DWT is used which provides a powerful tool for obtaining the characteristics of textured image. Then gradient difference is applied on an image getting after applying the Daubechies DWT to get the text regions of high contrast. Further to remove the nontext region, global thresholding and otsu method is applied along with Euclidian distance. The output is fed to Support Vector Machine for recognising text strings.

477

International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 4, Issue 6, June 2014) III. PROPOSED METHOD

B. Daubechies DWT The processed image is then again inputted to Daubechies Wavelet Transform to get the three kinds of edges and texture as missed by the traditional edge detectors. With Daubechies DWT, the detected edges become more precise and obvious. The main reason of applying wavelet transform for edge detection is that wavelet transform can remove the noise whereas conventional edge operators identifies noisy pixels as edge pixels [9] .The Daubechies Wavelet Transforms are defined in the same way as Haar Wavelet Transforms but the only difference between them consists in the scaling signals. Daubechies wavelet has high frequency coefficient spectrum than the Haar wavelet. Therefore audio – denoising and compression is done more pleasingly with them , so we used Daubechies DWT. Dubechies Discreet wavelet Transform decompose the signal into four different components of frequency domain, one average component (LL) , three detail components (HL, LH, HH).The result of 2-D DWT Decomposition is shown in Figure.2 as below.

The proposed work is divided into seven modules, as shown in Figure.1. Input image

Pre-Processing

Daubechies DWT

Gradient Difference

Non-text removal using Otsu and Thresholding

SVM LL

HL

LH

HH

Results

Figure.1 Module Diagram

A. Preprocessing If the input image is gray-scale image, then there is no need of conversion, it is directly fetched to the Daubechies DWT. But if the image is RGB image, then it needs to be converted into gray –scale image. Intensity image Y is given as:

Figure.2 2-D DWT Decomposition

The operation for Daubechies DWT is more effective than other wavelets of the wavelet family. It has been applied to images of multi resolution representation. C. Gradient Difference The basis of gradient difference technique is that the gradient information in text areas differs from non-text regions because of high contrast of text. The gradient difference GD is obtained for each pixel as the difference between maximum and minimum gradient values. G(x,y) [9] be the gradient image, maximum and minimum gradient values are obtained through following formulas.

Y= R(.29) + G(.58) + B(.11) Image Y is then preprocessed with 2-D Daubechies DWT. Y is actually the hue- Saturation-Value color space. During conversion Red-Green-Blue color space is converted into HSV color space. Canny edge detector is applied to get the edges of the image. Matlab function filled image is used then to fill the holes if present in the image. Region props function is then used to get the properties of image like Area, Filled Image, Pixel list. Median filtering is then done to make the gray scale image noisy free. Next again Sobel edge detector is applied to get edges more effectively. After this Connected-Component Analysis is done, to extract those regions which are not extracted by the boundaries.

Min (x,y) = min (G (x1 , y1 ) )

(1)

Max (x,y) = max (G (x1 , y1 ) )

(2)

From eq. (1) and (2) we get; GD (x,y) = Max(x,y) – Min (x,y) 478

International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 4, Issue 6, June 2014) The bright region corresponds to pixels that are text regions.

Feature.set=dwt.value(); If ( Region.value-dwt.value)>0

D. Non- text Removal The confident region which we get may contain non-text regions. So to eliminate non-text region, Firstly we apply Euclidian distance measure to measure the color space values then we perform filtration with Otsu method [12] to obtain text clusters .This text clusters are being segmented from non- text by using the set value of global threshold. Then image is fed to SVM to get the classification of text and image more accurately.

{ Otsu( filtration);} If filterated.value.threshold==org.threshold. Repeat otsu. Display(segmented image). V. RESULTS AND DISCUSSIONS The proposed method was implemented with Matlab 7.0 version. This technique is tested on 30 images (10-12 kb) that are taken from various newspapers, magazines, commercial products as well as from ICDAR 2003 dataset. This algorithm is developed and tested under PC environment of core-2 processor.

IV. IMPLEMENTATION i. Input Image. ii. For i:1:1dps dps= site of image If( bit. Value > avg value) End => bit.reduce.value.value Apply transform(― CANNY‖ ) => Output=>‖SOBEL‖==OK. iii. Apply (dwt(modified-bit)) { value-bits [actual, quantization. Coefficients>0] = dwt (modified-bit) } iv. If(color. Pattern==3)

(a)

{ color. Reduce=1; Scale. value=1; Gradient .difference –value= enhanced.value; Optiur.function = gradient . difference: } v. If value-set.value< gradient .diff.value { Apply edge.canny.val; If edge.value==segmented value Characters.OK.true Else

(b) Figure.3 (a) Original image, (b) Extracted image

Find Euclidian distance.images == set.value(SVM) 479

International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 4, Issue 6, June 2014) r = correctly detected characters / [correctly detected characters + FN] Definition: 4 Precision rate (p) is defined as the ratio of correctly detected characters to the sum of correctly detected characters plus false positives. p = correctly detected characters / [correctly detected characters + FP] Definition: 5 F-score is the harmonic mean of the recall and precision rates.

(a)

Table.1 Evaluation Performance: Precision, Recall and F-Score metrics for images

Parameters

P

R

F-Score

Proposed Method

95.71

92.69

94.22

Despite its better performance, the proposed method has limitation in that the text region whose pixel value is less than global threshold is treated as noise and then the text region gets eliminated as shown in Figure.5. Solution to this problem will be discussed in future.

(b) Figure.4 (a) Original image, (b) Extracted image

From the results shown in Fig.3-4, it is obvious that the proposed algorithm gives best results. The performance of the proposed method is evaluated using ICDAR-2003 evaluation metric which is shown as below [10]: Definition: 1 False Negatives (FN)/ Misses are those regions in the image which are actually text characters, but have not been detected by the algorithm. Definition: 2 False Positives (FP) / False alarms are those regions in the image which are actually not characters of a text, but have been detected by the algorithm as text. Definition: 3 Recall rate (r) is defined as the ratio of the correctly detected characters to sum of correctly detected characters plus false negatives.

(a)

480

International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 4, Issue 6, June 2014) R. Chandrasekaran, RM. Chandrasekaran ―Morphology based Text Extraction in Images‖ , IJCST, pp.103-107 , 2011. [3] Uddin, Sultana, M. Rahman, Busra ―Extraction of text from Scene Image using Morphological based approach‖, IEEE, pp.876-880, 2012. [4] Matko Saric, Hrvoje Dujmic, Mladen Russo ―scene text extraction in HIS colorspace using K-means and modified cylindrical distance‖, PRZEGLĄD ELEKTROTECHNICZNY, pp. 117-121, 2013. [5] B.H.Shekhar, Smitha M.L, P.Shivkumara ―Discrete wavelet transform and gradient difference based approach for text localization in videos‖, IEEE , pp.280-284, 2014. [6] C. A. Bouman: Digital Image Processing - January 13, 2014. [7] Fatma H. Elfouly, Mohamed I. Mahmoud, Moawad I. M. Dessouky, and Salah Deyab ― Comparison between Daubechies wavelet and Haar transform using FGPA, World Academy of Science, Engineering and Technology, pp.395-400, 2008. [8] S.Audithan, Chandrasekaran ―Document Text Extraction from Document Images Using Haar Discrete Wavelet Transform‖ , European Journal of Scientific Research, pp.502-512, 2009. [9] Punam Patel, Shamik Tiwari ― Text segmentation From Images‖ , International Journal of Computer Applications , pp.25-28 , 2013. [10] R.Chandrasekaran, RM.Chandrasekaran, P.Natarajan ―Text Localization and Extraction using Support Vector Machine and morphological functions, IEEE, pp.55-60, 2012. [11] Neha Gupta, V.K Banga ―Localization of Text in Complex Images Using Haar Wavelet Transform‖ , International Journal of Innovative Technology and Exploring Engineering (IJITEE), pp.111-115, 2012. [12] A.J.Jadhav Vaibhav Kolhe Sagar Peshwe ― Text Extraction from Images: A Survey‖ , IJARCSSE, pp. 333-337 , 2013. [2]

(b) Figure.5 (a) Original image, (b) Extracted image

VI. CONCLUSION In this paper we present a fusion technique, based on integration of Daubechies DWT, Gradient Difference and SVM. Experiments prove that in this work, main emphasis has been given in eliminating false positives which is the major drawback of traditional approaches. There are several extensions for this work: implementation of OCR system to recognize the text, use better method in non-text removal. REFERENCES [1]

P. Nagabhushan, S.Nirmala "Text Extraction in Complex Color Document Images for Enhanced Readability", Intelligent management system , pp. 120-133, 2010 .

481

Suggest Documents