Automatic classification of defects with the review of an appropriate feature extraction.
Alicia ROMERO RAMIREZ, Neil PEARSON and Dr. J. S.D. MASON Swansea University, SA28PP, UK. Email: 436912@swansea.ac.uk
Abstract. A novel method for the automatic classification of defects using magnetic flux leakage inspection is presented. A technique based on geometric measures to distinguish between different defects due to petro-chemical tank corrosion is presented. In order to characterize a defect, a process of feature extraction is proposed. Principal component analysis is then used to select the most powerful set of features. The performance is compared using two different methods: k-nearest neighbor and support vector machine. The results show an accuracy of 91% with which automatic classification is possible on unseen test examples on steel plates.
Introduction Petro-chemical storage tanks are important objects in the industrial sector. The servicelife of a storage tank is between 20-40 years, although in some cases this life is reduced to 2-3 years due to failures within the storage tank [1]. The importance of predicting a possible tank failure for the industry is not only the value of the tank content, which in the case of petroleum tank is very high, but also the environmental damage that the tank failure can cause and consequently, fines the industry would potentially face. The main cause of failure of steel storage tanks is corrosion [2]. Magnetic flux leakage (MFL) can reliably detect metal loss due to corrosion, permitting a quantitative evaluation of the size of the defect [3]. MFL tools are equipped with sensors to collect information about the state of the floor tank. It is suggested in [4] that a deeper understanding of the shape of the MFL signals could be beneficial to find more effective type-of-defect separation methods. Once a defect is detected, its shape and size is not easy to predict. This is the topic of this paper. In recent years there has been much interest in the development of automatic classification of defect patterns and many are using neural networks, for example [5-9]. Some successful work in the use of ultrasonic approaches is reported in [5,8,9] and automatic defect classification using MFL is reported in [7] for pipe welds achieving a success classification rate of 71% across 3 classes namely spheres, parallelepipeds and cylinders. In [6] eddy current signals are used for defect classification in aluminium plates with a reported error rate of 10%.
1.
Data acquisition
MFL testing machines require a mapping of magnetic fields onto flaw geometry. The machine used for the development of this work, measures magnetic flux signals and converts them into estimates of percentage of volumetric loss. A discussion about the reliability of this method can be found in [3]. Each prospective or potential defect is represented by a matrix of data ([m x n]), in which the geometry of the defect is not easily determined. In very simple terms the defect classification could be in terms of the number of rows and the number of columns ([m x n]) for which the signal exceeds some threshold. The percentage of metal-loss could be a suitable integration of these values. However we show here that more useful classification is possible.
2.
Feature extraction and feature transformation
Defect characterization and pattern recognition are essential for the development of an automatic classifier. The goal is to highlight similarities between defects from the same class and to draw attention to the differences between defects from different classes. A number of geometrically derived features are considered. These include: relation between length and width, position of the maximum and minimum, angle of the slopes, gradient of the channel with the maximum, mean values, area from the top view, volume and length relations. In total, the number of extracted features per defect is 50. By examining a number of examples and studying the different classes, there have been identified that lead to good classification performance. Clearly, there is significant redundancy across the set of 50 chosen features. Hence in an attempt to reduce this redundancy, principal component analysis (PCA) [11] is used. Figure 1 shows the results of the PCA analysis on a preliminary training data set. Note subsequent experimental results are performed on data test sets that do not direct overlap with this PCA set. In Figure 1 it is shown that by selecting the 30 most significant components the percentage of information lost is below 1%.
Figure 1. % of information versus PCA components retained. Experiment on preliminary data set.
3.
Classification
The overall objective of the work is to determine that an incoming signal belongs to a specific category from among a finite set of possibilities. This is referred to as identification and is a 1-from-n classification task. Alternatively verification can be considered; this is a 1-from-2 class task and can be regarded as a special case of identification. The choice of classes form a closed set was made taking into consideration the shape of real defects appearing in a petro-chemical storage tank. Due to the width of the platforms (between 6-12 mm) undercutting defects are unlikely to appear [15]. Consequently the shapes of our defects are restricted. Defects are assumed to have a profile similar to the ones shown in Figure 2, namely pipe, conical and lake.
Figure 2. Different defect profiles.
4.
Experimental work
The assessment of the classifier is reported in two different ways. The first uses a multi-class classifier with three classes and the second uses three binary classifiers (one per class). The arrangements are shown in Figure 3. In both cases a ‘signal to test’ gives rise to three scores (Ps, Pc, Pl) and these are subjected to a single decision threshold. The relative performance is shown on a detection error tradeoff (DET) curve [14] as the decision threshold is varied. In addition accuracy scores are given.
Multi class classifier Probability of being a pipe defect. (Ps) Probability of being a conical defect. (Pc) Probability of being a lake defect (Pl) Probability of being a pipe defect. (Ps) Probability of NOT being a pipe defect. Probability of being a conical defect. (Pc) Probability of NOT being a conical defect. Probability of being a lake defect (Pl) Probability of NOT being a lake defect
Signal to test
Pipe classifier
Signal to test
Conical classifier
Lake classifier
Figure 3. Experimental setup.
DET curves are frequently used to measure the verification performance of a classification algorithm. Verification addresses a two-class problem where the expected answer is either true or false. Here we focus on verification so that the assessment is independent of the number of classes. For each trial, the score is thresholded and a decision is made. The DET curve shows the variation of this threshold. Such an approach gives a robust indication of performance. Figure 4 presents the scheme of a feature vector entering the classifier which produces a score which is them threshold. Moving the threshold changes the distribution of the error rates as indicated in the DET plots.
Threshold
2.5 2
Feature Vector
Density
1 0.5 0 −1
Classifier
SCORE
1.5
−0.5
0
0.5 Critical values
1
1.5
2
Figure 4. A feature vector enters the classifier which produces a score which is then thresholded. The 2 distributions represent actual in-class and out-of-class scores.
Due to the difficulty of having real data from the field with accurate ground truth, emulated corrosion has been used. Thirty defects in total representing three classes (pipe, conical, lake) were created in steel plates. The profiles and sizes are shown in Table 1. The data set contains 816 records of a total of 30 defects classified as: pipe (121), conical (330), lake (365). It is worth noting that the recordings were made on 7 different dates. The reason for having fewer samples of pipe defects is due to a limitation of the scanning tool, which captures only defects with a volume-loss larger than 20%. The data set is divided into two groups: a) training (408 recordings), b) testing (408 recordings). There is no data overlap in between the two.
Table 1. Description of emulated defects. Profile Type Thickness of plate 6 [mm] Diameters [mm] x maximum depth [mm] 2.0 x 2.0 2.0 x 4.0 10 x 1.2 17.3 x 4.2 20 x 1.0 20 x 2.0 3.0 x 2.0 3.0 x 4.0 12.1 x 1.8 18.2 x 4.8 30 x 1.0 30 x 2.0 4.0 x 2.0 4.0 x 4.0 13.7 x 2.4 10 x 1.2 40 x 1.0 40 x 2.0 5.0 x 2.0 5.0 x 4.0 15.1 x 3.0 13.7 x 2.4 50 x 1.0 50 x 2.0 6.0 x 2.0 6.0 x 4.0 16.3 x 3.6 16.3 x 3.6 100 x 1.0 100 x 2.0
Pipe
Conical
6 [mm]
Lake
6 [mm]
Table 2. Experimental results for K-nearest neighbour and support vector machine classifiers. The SVM with three verification classifiers seem to give the best results.
Performances with KNN
Accuracy using “multiclass” classifier: 85.75 % Accuracy using three binary classifiers: 90.25 %
Performances with SVM
Accuracy using “multiclass” classifier: 87.46 % Accuracy using three binary classifiers: 91.44 %
The experimental results in Table 2 show the accuracy and the DET curves using two different classifiers, namely K-nearest neighbour and support vector machine. In both cases there is a discontinuous profile which corresponds to the ‘multiclass’ experimental setup and a continuous profile that correspond to the three binary classifiers experimental setup. The number of scores used per profile is 1224 scores.
5.
Conclusion
Classification of emulated corrosion based on the proposed approach has been successfully achieved with accuracy over a 90% when classifying pipe, conical and lake defects. The final goal of this work is the characterization of real defects appearing in storage tank floors. Here a simplification of the problem based on emulated corrosion is presented. The feasibility of using emulated corrosion as data to train the classifier for real inspections is currently being tested.
Acknowledgements The work is funded by European Social Funds in collaboration with Silverwing Ltd.
References
[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] M.L. MEDVEDENA and T.D. TIAM. Classification of corrosion damage in Steel Storage Tanks. Chemical and Petroleum Engineering.Vol.34. Nos 9-10. 1998 GEYER W.B Handbook of storage Tank Systems: Codes, Regulations and Designs. K.REBER, A.BELANGER. Reliability of Flaw Size Calculation based on Magnetic Flux Leakage Inspection of Pipelines. ECNDT 2006 Till SCHMITTE. Modelling of Magnetic Flux Leakage Measurements of Steel Pipes. ECNDT 2006 Oleg KARPASH, Maksym KARPASH, Valentine MYNDJUK. Development of Automatic Neural Network Classifier of Defects Detected by Ultrasonic Means. ECNDT 2006 Adam DOCEKAL. Signal Preprocessing Methods for Automated Analysis of Eddy Current Signatures during Manual Inspection. ECNDT 2006 A.A.CARVALHO. MFL signals and artificial neural networks applied to detection and classification of pipe weld defects. NDT &E International. June 2005. J.B. SANTOS. Automatic defects classification-a contribution. NDT &E International.. June 2000. A.MASNATA. Neurual network classification of flaws detected by. ultrasonic means.NDT & E International. October 1995 K. MANDALY, D.L. ANTHERTON.A study of magnetic flux-leakage signals. July 1998 LINDSAY I SMITH. Principal component analysis. February 2002. V.N Vapnik, Statistical Learning Theory. Adress: New York: Wiley, 1998. C.J.C BURGES. Discov., vol2, no 2, pp.1-47, 1998. NIST, “DET-Curve Plotting software for use with MATLAB”. Software available at http://www.nist.gov/speech/tools http://www.corrosion-doctors.org/Forms-pitting/Pitting.html