VIEWS: 1 PAGES: 31 CATEGORY: Business POSTED ON: 6/25/2013 Public Domain
International Journal of JOURNAL OF COMPUTER (IJCET), ISSN 0976- INTERNATIONALComputer Engineering and Technology ENGINEERING 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME & TECHNOLOGY (IJCET) ISSN 0976 – 6367(Print) ISSN 0976 – 6375(Online) IJCET Volume 4, Issue 3, May-June (2013), pp. 260-290 © IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2013): 6.1302 (Calculated by GISI) ©IAEME www.jifactor.com HYBRID ZERO-WATERMARKING AND MARKOV MODEL OF WORD MECHANISM AND ORDER TWO ALGORITHM FOR CON- TENT AUTHENTICATION OF ENGLISH TEXT DOCUMENTS Kulkarni U. Vasantrao1, Fahd N. Al-Wesabi2, Adnan Z. Alsakaf3 1 Professor, Department of Comp. Sci. and Engg., SGGS Institute of Engg. and Tech., Maharashtra, INDIA. 2 PhD Candidate, Faculty of Engineering, SRTM University, Nanded, INDIA, 2 Assistant Teacher, Department of IT, Faculty of Computing and IT, UST, Sana’a, Yemen. 3 Professor, Department of IS, Faculty of Computing and IT, UST, Sana’a, Yemen. ABSTRACT Content authentication and tamper detection of digital text documents has become a major concern in the communication and information exchange era via the Internet. There are very limited techniques available for content authentication of text documents using digital watermarking techniques. English text Zero-Watermark approach based on word mechanism order twoof Markov model is developed in this paper for content authentication and tamper detection of text documents. In the proposed approach, Markov model used as a soft computing tool for text analysis and hybrid with digital watermarking techniques in order to improve the accu- racy and complexity issues of the previous watermark technique presented in reference(27). The proposed approach is implemented using PHP programming language. Further- more, the effectiveness and feasibility of the proposed approachis proved with experiments using six datasets of varying lengths. The experiment results shows that the proposed ap- proach is more sensitive for all kinds of tampering attacks and has good accuracy of tamper- ing detection. The accuracy of tampering detection is compared with other recent approaches under random insertion, deletion and reorder attacks in multiple random locations of experi- mental datasets. The comparative results shows that the proposed approach is better than WO1 approach in term of watermark complexity, capacity, and watermark accuracy of tam- pering detection under insertion and deletion attacks. Which means the proposed approach is recommended in these cases, but it is not applicable under reorder tampering attacks espe- cially on large sizes of text documents. 260 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME Keywords: Digital watermarking, Markov Model, order two, word mechanism, probabilistic patterns, information hiding, content authentication, tamper detection, copyright protection. I. INTRODUCTION With the increasing use of internet, e-commerce, and other efficient communication technologies, the copyright protection and authentication of digital contents, have gained great importance. Most of these digital contents are in text form such as email, websites, chats, e-commerce, eBooks, news, and short messaging systems/services (SMS) [1]. These text documents may be tempered by malicious attackers, and the modified data can lead to fatal wrong decision and transaction disputes [2]. Content authentication and tamper detection of digital image, audio, and video has been of great interest to the researchers. Recently, copyright protection, content authentication, and tamper detection of text document attracted the interest of researchers. Moreover, during the last decade, the research on text watermarking schemes is mainly focused on issues of copy- right protection, but gave less attention on content authentication, integrity verification, and tamper detection [4]. Various techniques have been proposed for copyright protection, authentication, and tamper detection for digital text documents. Digital Watermarking (DWM) techniques are con- sidered as the most powerful solutions to most of these problems. Digital watermarking is a technology in which various information such as image, a plain text, an audio, a video or a combination of all can be embedded as a watermark in digital content for several applications such as copyright protection, owner identification, content authentication, tamper detection, access control, and many other applications [2]. Traditional text watermarking techniques such as format-based, content-based, and image-based require the use of some transformations or modifications on contents of text document to embed watermark information within text. A new technique has been proposed named as a zero-watermarking for text documents. The main idea of zero-watermarking techniques is that it does not change the contents of original text document, but utilizes the contents of the text itself to generate the watermark information [13]. In this paper, the authors present a new zero-watermarking technique for digital text documents. This technique utilizes the probabilistic nature of the natural languages, mainly the second order based on word level of Markov model. The paper is organized as follows. Section 2 provides an overview of the previous work done on text watermarking. The proposed generation and detection algorithms are described in detail in section 3. Section 4 presents the experimental results for the various tampering attacks such as insertion, deletion and reordering. Performance of the proposed approach is evaluated by multiple text datasets. The last section concludes the paper along with directions for future work. II. PREVIOUS WORK Text watermarking techniques have been proposed and classified by many literatures based on several features and embedding modes of text watermarking. We have examined briefly some traditional classifications of digital watermarking as in literatures. These tech- niques involve text images, content based, format based, features based, synonym substitu- tion based, and syntactic structure based, acronym based, noun-verb based, and many others of text watermarking algorithms that depend on various viewpoints [1][3][4]. 261 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME A. Format-based Techniques Text watermarking techniques based on format are layout dependent. In [5], proposed three different embedding methods for text documents which are, line shift coding, word shift coding, and feature coding. In line-shift coding technique, each even line is shifted up or down depending on the bit value in the watermark bits. Mostly, the line is shifted up if the bit is one, otherwise, the line is shifted down. The odd lines are considered as control lines and used at decoding. Similarly, in word-shift coding technique, words are shifted and modifies the inter- word spaces to embed the watermark bits. Finally, in the feature coding technique, certain text features such as the pixel of characters, the length of the end lines in characters are altered in a specific way to encode the zeros and ones of watermark bits. Watermark detection process is performed by comparing the original and watermarked document. B. Content-based Techniques Text watermarking techniques based on content are structure-based natural language dependent [4]. In [6][14], a syntactic approach has been proposed which use syntactic struc- ture of cover text for embedding watermark bits by performed syntactic transformations to syntactic tree diagram taking into account conserving of natural properties of text during wa- termark embedding process. In [18], a synonym substitution has been proposed to embed wa- termark by replacing certain words with their synonyms without changing the sense and con- text of text. C. Binary Image-based Techniques Text Watermarking techniques of binary image documents depends on traditional im- age watermarking techniques that based on space domain and transform domain, such as Dis- crete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), and Least Significant Bit (LSB) [5]. Several formal text watermarking methods have been proposed based on em- bedding watermark in text image by shifting the words and sentences right or left, or shifting the lines up or down to embed watermark bits as it is mentioned above in section format- based watermarking [5][7]. D. Zero-based Techniques Text watermarking techniques based on Zero-based watermarking are content features dependent. There are several approaches that designed for text documents have been pro- posed in the literatures which are reviewed in this paper [1][19] [20] and [21]. The first algorithm has been proposed by [19] for tamper detection in plain text documents based on length of words and using digital watermarking and certifying authority techniques. The second algorithm has been proposed by [20] for improvement of text authenticity in which utilizes the contents of text to generate a watermark and this watermark is later extracted to prove the authenticity of text document. The third algorithm has been proposed by [1] for copy- right protection of text contents based on occurrence frequency of non-vowel ASCII characters and words. The last algorithm has been proposed by [21] to protect all open textual digital con- tents from counterfeit in which is insert the watermark image logically in text and extracted it later to prove ownership. In [22], Chinese text zero-watermark approach has been proposed based on space model by using the two-dimensional model coordinate of word level and the sentence weights of sentence level. 262 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME E. Combined-based Techniques One can say the text is dissimilar image. Thus, language has a distinct and syntactical nature that makes such techniques more difficult to apply. Thus, text should be treated as text instead of an image, and the watermarking process should be performed differently. In [23] A combined method has been proposed for copyright protection that combines the best of both image based text watermarking and language based watermarking techniques. The above mentioned text watermarking approaches are not appropriate to all types of text documents under document size, types and random tampering attacks, and its mecha- nisms are very essential to embed and extract the watermark in which maybe discovered eas- ily by attackers . On the other hands, these approaches are not designed specifically to solve problem of authentication and tamper detection of text documents, and are based on making some modifications on original text document to embed added external information in text document and this information can be used later for various purposes such as content authen- tication, integrity verification, tamper detection, or copyright protection. This paper proposes a novel intelligent approach for content authentication and tamper detection of English text documents in which the watermark embedding and extraction process are performed logically based on text analysis and extract the features of contents by using hidden Markov model in which the original text document is not altered to embed watermark. III. THE PROPOSED APPROACH This paper presents an improved intelligent approach of English text zero- watermarking based on word level and second order of Markov model for content authentica- tion and tampering detection of text documents. An improved approach depends on word mechanism and order two of Markov model to improve the performance, complexity and accuracy of tampering detection of similar ap- proach that used order one of Markov model presented in [27] and developed by F. Al- wesbiet. el. An improved approach should perform watermark generation, embedding, extrac- tion and detection processes under higher accuracy and security measures. An improved ap- proach hybrid text zero-watermarking techniques and soft computing tools for natural lan- guage processing and protect the digital text documents. A Markov model uses for text analy- sis and extracts the interrelationship between its contents as probabilistic patterns based on word level and second order of Markov model in order to generate the watermark information. This watermark can later be extracted using extraction algorithm and matched with water- mark generated from attacked document using detection algorithm for identifying any tam- pering and prove the authenticity of text document. Before we explain the watermark generation and detection processes, in the next sub- section we present a preliminary mathematical description for second order of Markov mod- els based on word mechanism for text analysis A. Markov Models for Text Analysis In this subsection, we explain how to model text using a Markov chain, which is defined as a stochastic (random) model for describing the way that processes move from state to a state. For example, suppose that we want to analyse the following sentence: “The quick brown fox jumps over the brown fox who is slow jumps over the brown fox who is dead.” 263 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME When we use a Markov model of order two for words mechanism, then each sequence of two words is a state. As the above sample text is processed, the system makes the following transitions: "the quick" -> "quick brown" -> "brown fox" -> "fox jumps" -> "jumps over" -> "over the" -> "the brown" -> "brown fox" -> "fox who" -> "who is" -> "is slow" -> "slow jumps" -> …etc Next we present a simple method to build a Markov matrix of states and tions which is the most basic part of text analysis using Markov model. Based on this approach, the size of Markov matrix is not fixed, which means the number of states and transition probabilities are vary based on contents of the given text. A list of all possible states and transitions can be computed by the equation (1) and (2): Ps = (n-2) ………. (1) Ps = (n-2) ^ 2 ………. (2) Where, - n: is the length of given text document. So the matrix of states probabilities for the above given sample text should have (20 – 2) = 18 double of words. A matrix of transition probabilities from each state, there are (n -2) possible transitions. IF the Markov chain is currently at first state (first two words) in the given text document, the possible states that could come next are [W i+2, W i+3, W i+4, …, Wi+n]. So the matrix of transition probabilities should have (n – 2) ^ 2 entries. For example in the above given sam- ple text, If the Markov chain is currently at "the quick" state, the possible transitions that could come next are [brown, fox, jumps, over, the, ...,dead]. So the matrix of transition probabilities for the above given sample text should have (20 – 2) ^ 2 = 18 ^ 2 = 324 entries. In general, if each state has n transitions of words, then there are (n-2) states, and the matrix of transition probabilities needs (n-2) ^ 2 entries. As a result of a Markov model of order two for words mechanism to analysing the above given sentence which contains 20 words and after processed by the system and repre- sented in a Markov chains, we obtain the figure 2 which gives the 11 present states as a words setsin matrix of Markov chains without reputations and 18 (n – 2) all possible transi- tions. 264 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME Fig. 2. Sample text states and transitions based on order 2 of a Markov model Now if we consider state "brown fox", the next state transitions are "jumps", "who", and ”who”. We observe that state “who” occurs twice. For the analysis of large sized text, we calculate the frequencies of occurrences of the next states to finally obtain the probabilities. Next is a simple procedure to obtain a Markov model of order twofor a given text. Build (and initialize to all-zeroes) an-2 by n-2 matrix to store the transitions. The entry will be used to keep track of the number of times that the word is followed by the wordwithin given text. For , where is the length of the text document - 2", let x be the ithword in the text and y be the (i+2)stword in the text. Then increment M[x,y]. Now the matrix M contains the counts of all transitions. We want to turn these counts into probabilities. Here is a method that can do it. For each i from 3 to n, sum the entries on the ith row, i.e., let counter[i] = M[i,3] + M[i,4] + M[i,5] + ... + M[i,n]. Now define P[i,j] = M[i,j] / counter[i] for all pairs i,j. This just gives a matrix of probabilities. In other words, now P[i,j] is the probability of making a transition from word i to word j. Hence a matrix of probabili- ties that describes a Markov model of order twofor the given text is obtained. B. Watermark Generation and Embedding Algorithm The watermark generation and embedding algorithm requires the original text docu- ment (To) as input which provided by the author, then as a pre-processing step it is required to perform conversion of capital letters to small letters. A watermark pattern is generated as the output of this algorithm. This watermark is then stored in watermark database along with the main properties of the original text document such as document identity, author name, current date and time. This stage includes involves three algorithms, which are pre-processing and building the Markov matrix, text analysis, and watermark generation and embedding as shown in fig- ure 3. 265 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME Original Text Document (TDO) Text Pre- processing Building Markov matrix Word Level Order TWO Text Analysis using Markov model Compute # of occurrences of NS transitions for every PS Present State NextState (PS) (NS) Probabilistic patterns No Terminate? Yes Watermark DB WMO Patterns, DocID, Date, WMO Time DigestWM patterns using MD5 algorithm Fig. 3. Watermark generation and embedding processes 1) Pre-processing and Build the Markov Matrix This algorithm requires the original text document as inputs, and provides the prepro- cessed text document and Markov matrix as outputs. Building the states and transition matrix is the most base part of text analysis and watermark generation using Markov model. A Markov matrix that represents the possible states and transitions available in given text is constructed without reputations. In this approach, each unique sequence of two words within given text represent as state (words set) and transition in the Markov matrix. During building process of Markov matrix, the proposed algorithm initialize all transition values by zero to use these cells later to keep track of the number of times that the word is followed by the word within given text document. 266 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME The algorithm of preProcessing executes as following: PROCEDURE preProcessing(TO) - Input: Original Text Document (TO) - Output: preprocessed text document (TP), state matrix of given text without repeats arrayList[Ts], - BEGIN - Loop index = 0 to Text.Length - 2, o // convert letters case from capital to small o IF UpperCharacter(TO[index]) = True THEN TP[index] = LowerCharacter(TO[index]); o // List all unique sequence of two words within given text as states in array list o exist = TP[index]; o Loop j = 0 to index o IF arrayList[j] <>exist THEN o arrayList[index] = exist; - index ++; - END Where, o To: represent the original text document, Tp: represent the processed text docu- ment,arrayList: represent the states arrayof given text after preprocessing process,index: represent the current word in given text. The algorithm of buildingMarkov matrix executes as following: PROCEDURE Build_Markov_Matrix(TP) - Input: preprocessed text (TP) - Output: Markov matrix with zeros initial value - BEGIN - // perform preprocessing process - Call preProcessing (TP) - // Build states and transitions matrix of Markov model and initialize all zeros - Loop ps = 0 to arrayList.Length - 2, o Loop ns = 0 to arrayList.Length, MarkovMatrix[ps][ns] = 0; o ns ++; - ps ++; - END Where, o TP: represent the preprocessed text document, MarkovMatrix: States and Transi- tions matrix with zero value for all cells, ps: The present state, ns: The next state. 2) Algorithm of Text Analysis This algorithm takes the preprocessed text document as input, and provides the wa- termark patterns as output. Aafter the Markov matrix was constructed, text analysis process should be done using Markov model based on order two of word mechanism by- 267 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME finding the interrelationship between words of the given text document. In the other word, the proposed algorithm computes the number occurrences of the next state transi- tions for every present state. A Matrix of transition probabilities that represents the num- ber of occurrences of transition from a state to another which constructed by equation (3) as following. MarkovMatrix[ps][ns] =Total Number of Transition[i][j], for i.j=1,2, .,n-2 …….(3) Where, o n: is the total number of states. o i: refers to PS "the present state". o j: refers to NS "the next state". o P[i,j]: is the probability of making a transition from wordi to word j. Text analysisof given sentence based on word mechanism and order twoshowed in Markov chain and proceeds as illustrated in figure 4. Fig. 4. Text analysis processes based on order 2 of a Markov model Let TPis the preprocessed text, MarkovMatrix[ps][ns] represent the Markov matrix to store values of the number of times that the word is followed by the word in the given text. The text analysis algorithm is presented formally and executes as following: PROCEDURE text_analysis(TP) - Input: preprocessed text (TP) - Output: Markov matrix with values of transition probabilities - BEGIN - // build states and transitions matrix of Markov model - Call Build_Markov_Matrix (TP) - // compute the total frequencies of transitions for every state - Loop ps = 0 to arrayList.Length - 2, o Loop ns = 1 to arrayList.Length, Loop counter = 2 to TP.length - 1, • MarkovMatrix[ps][ns] = Total Number of Transition[ps][ns] counter ++; 268 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME o ns ++; - ps ++; - END Where, o TP: represent the preprocessed text document, MarkovMatrix: States and Transi- tions matrix with transition probability values for every state. 3) Algorithm of Watermark Generation and Embedding After performing the text analysis and extracting the probability features, the water- mark is obtained by identifying all the nonzero values in the above Markov matrix. These nonzero values are sequentially concatenated to generate a watermark pattern, de- noted by WMPO as given by equation (4) and presented in figure 5. WMPO &= MarkovMatrix [ps] [ns], for i,. j= nonzero values in the Markov ma- trix………….. (4) Fig. 5. The original watermark patterns for a given sample text The embedding process will be done logically during text analysis process by keeping the tracksof all nonzero transitions and its values shown in the Markov matrix. In which the cells of nonzero transitions contains the number of times that the word is followed by the word within given text document. These tracks can be used later by detection algorithm for matching it with those tracks that will be producing from the attacked text document. This watermark is then stored in the watermark database along with some properties of the original text document such as document identity, author name, current date and time.After watermark generation as sequential patterns, an MD5 message digest is gen- erated for obtaining a secure and compact form of the watermark, notationalyas given by equation (5) and presented in figure 6. DWM = MD5(WMPO) ……………….. (5) Fig. 6. The original watermark after MD5 digesting 269 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME The proposed watermark generation and embedding algorithm, using the first order of Markov model word level is presented formally andexecutes as the following: PROCEDURE watermark_gen_embed(M[I,j]) - Input: Markov matrix[i,j] - Output: originalwatermark patterns (WMPO) - BEGIN - // compute the total frequencies of transitions for every state of original document - Call text_analysis(TP) - // concatenate watermark patterns of every states shown in Markov matrix - Loop ps = 0 to MarkovMatrix[rows].Length - 2, o Loop ns = 1 to MarkovMatrix[columns].Length, o IF MarkovMatrix [ps][ns] != 0 // states that have nonzero transitions o WMPO &= MarkovMatrix [ps] [ns] o ns ++; - ps ++; - Store WMPOin DWM database. - // Digest the original watermark using MD5 algorithm - WMO = MD5(WMPO) - Output WMPO, WMO - END Where, o WMO: Original watermark, WMPO: Original watermark patterns, MD5: Hash al- gorithm. C. Algorithms of Watermark Extraction and Detection The watermark detection algorithm is on the base of zero-watermark, so before detec- tion for attacked text document TA, the proposed algorithm still need to generate the attacked watermark patterns′. When received the watermark patterns′, the matching rate of patterns′ and watermark distortion are calculated in order to determine tampering detection and content authentication. This stage includes two main processes which are watermark extraction and detection. Extracting the watermark from the received attacked text document and matching it with the original watermark will be done by the detection algorithm. The proposed watermark extraction algorithm takes the attacked text document, and performs the same watermark generation algorithm to obtain the watermark pattern for the attacked text document as shown in figure 7. 270 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME Attacked Text Document (TDA) Text Pre-processing Building Markov matrix Word Level Order TWO Text Analysis using Markov model Compute # of occurrences of NS transitions for every PS Present State NextState (PS) (NS) Probabilistic patterns No Terminate? Yes Watermark DB WMO Patterns, DocID, Date, Time EWMA WMO No WM Pattern Text Document Matching Tampered Yes Text Document is Authentic Fig. 7. Watermark Extraction and Detection processes 1) Watermark Extraction Algorithm In this algorithm the proposed approach takes the attacked text document (TA), origi- nal watermark patterns or original text document as inputs and the procedure is similar to that of watermark generation. Output of this algorithm is attacked watermark patterns’ (WMPA). 271 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME The watermark extraction algorithm is executes as following PROCEDURE watermark_extraction(TP’) - Input: Attacked text document(TA) - Output: attackedwatermark patterns (WMPA). - BEGIN - // perform preprocessing process for attacked text document - Call preProcessing(TA) - // compute the total frequencies of transitions for every state of attacked document - Call text_analysis(TAp) - // Generatethe attacked watermark patterns from the attacked text document. - Loop ps = 0 to MarkovMatrix’[rows].Length - 2, o Loop ns = 0 to MarkovMatrix’[columns].Length, o IF MarkovMatrix’[ps][ns] != 0, o WMPA &= MarkovMatrix’[ps] [ns], o ns ++; - ps ++; - Output WMPA - END Where, o WMPA: Attacked watermark patterns, TA: Attacked text document, TAp: prepro- cessed attacked text document, MarkovMatrix’[ps] [ns]: Markov matrix of the at- tacked text document. 2) Watermark Detection Algorithm After extracting the attacked watermark pattern, the watermark detection is performed in three steps, • Primary matching is performed on the whole watermark pattern of the original document WMPO, and the attacked document WMPA. If these two patterns are found the same, then the text document will be called authentic text without tam- pering. If the primary matching is unsuccessful, the text document will be called not authentic and tampering occurred, then we proceed to the next step. • Secondary matching is performed by comparing the components associated with each state of the overall pattern. which compares the extracted watermark pattern for each state with equivalent transition of original watermark pattern. This process can be described by the following mathematical equations (6), and (7). , (0 < PMRT<=1) ……….. (6) Where, o :represent the value of pattern matching rate on transition level.. o : represent the indexes of states and transitions respectively, i= 0 ..number of non-zero states in given text, j= 0 .. number of non-zero transitions in given text. o : represent the value of original watermark in transition level. o : represent the value of attacked watermark in transition level. 272 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME , (0 < PMRS<=1) …….. (7) Where, o n: is the number of non-zero transitions of every state represented in matrix of Markov model. o i: is the count of non-zero patterns of every state represented in matrix of Markov model. o : represent the value of pattern matching rate on state level. After we get the pattern matching rate of every state, we have find the weight of every state from a whole states in Markov matrix. We can get for it by equation (8) as shown follow. State weight Sw = …………………...…….. (8) Where, o : is the total pattern matching rate of the state i. o is the number of states of given text document. Finally, the PMR is calculated by equation (9), which represent the pattern matching rate between the original and attacked text document. ……………….…….. (9) Where, o N:is the total number of statesin the Markov matrix. The watermark distortion rate refers to tampering amount occurred by attacks on contents of attacked text document, this value represent in WDR which we can get for it by equation (10): ……………………. (10) This process is illustrated in figure 8. Fig. 8: Watermark extraction process based on order 2 of a Markov model 273 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME The watermark detection algorithm is executes as following: PROCEDURE watermark_detection(Pt, Pt’) - Input: preprocessed text (TP, TP’) - Output: PMR, WDR - BEGIN - // getting watermark of the original document - Call watermark_gen_embed(MarkovMatrix[ps][ns]) - // extract watermark from the attacked document - Call watermark_extraction(MarkovMatrix’[ps][ns]) - // pweform matching process between the original and attacked watermark patterns - IF WMA = WMO o Print “Document is authentic and no tampering occurred” o PMR = 1 - Else Print “Document is not authentic and tampering occurred” o // compute pattern matching rate on the transition level - Loop i = 0 to MarkovMatrix’[rows].Length - 2, o Loop j = 0 to MarkovMatrix’[columns].Length o IF WMPO[i][j] != 0 patternCount +=1 transPMRTotal += o Else IF WMPA[i][j] != 0 patternCount += - // compute pattern matching rate on state level - - - - stateWeight = - += stateWeight - - // compute pattern matching rate on document level - - - - // compute watermark distortion rate on document level - WDR = 1 – PMR - END - Where, o SW: is the weight of states correctly matched. o : represent the value of watermark distortion rate (0 < WDRS<=1). 274 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME IV. EXPERIMENTAL SETUP, RESULTS AND DISCUSSION A. Experimental Setup In order to test the proposed approach and compare with other the approach, we con- ducted a series of simulation experiments. The experimental environment is listed as below: CPU: Intel Core™i5 M480/2.67 GHz, RAM: 8.0GB, Windows 7; Programming language PHP NetBeans IDE 7.0. With regard to the data sets used, six samples from the data sets de- signed in [24]. These samples were categorized into three classes according to their size, namely Small Size Text (SST), Medium Size Text (MST), and Large Size Text (LST). Next, we define the types of attacks and their percentage as follows, Insertion attack, deletion attack and reorder attack performed randomly on multiple locations of these datasets. The details of our datasets volume and attacks percentage used is shown in table I, which is considered are similar to those performed in [25] for comparison purpose, and it should be mentioned that we perform the reorder attack on the datasets which is not con- tained in the same paper. TABLE I ORIGINAL AND ATTACKED TEXT SAMPLES WITH INSERTION AND DELETION PERCENTAGE Original Attacks Percentage Sample Text Text ID Word Insertion Deletion Reorder Count [SST4] 179 [SST2] 421 5$, 10%, 20%, 5$, 10%, 20%, [MST5] 469 5$, 10%, 20%, 50$ 50$ 50$ [MST2] 559 [LST4] 2018 To measure the performance of our approach and compare it with others, the tamper- ing accuracy which is a measure of the watermark robustness will be used. The PMR value will give the Tampering Accuracy of the given text document. The watermark distortion rate WDR is also measured and compared with other approaches. The values of both PMR and WDR range between 0 and 1 value. The larger PMR value, and obviously the lowest WDR value mean more robustness, while the lowest PMR value and largest WDR value means less robustness. Desirable value of PMR with close to 0, and close to 1 with WDR. We categorize tam- per detection states into three classes based on PMR threshold values which are: (High when PMR values greater than 0.70, Mid when PMR values between 0.40 and 0.70, and Low when PMR values less than 0.40). To evaluate the accuracy of the proposed approach, a series of experiments were con- ducted with all the well-known attacks such as random insertion, deletion and reorder of words and sentences on each sample of the datasets. These various kinds of attacks were applied at mul- tiple locations in the datasets. The experiments were conducted, firstly with individual attacks, then with all attacks at the same time and conducted comparative results of the proposed ap- proach with recently similar approach. 275 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME B. Experiments with the proposed approach In this section, we evaluate the performance of the proposed approach. The character set of words cover all English letters, space, numbers, and specialsymbols. The experiments were conducted with the various kinds of attacks individually with the rates of these attacks which are 5%, 10%, 20% and 50%respectively. The performance results of this approach un- der all the mentioned attacks are presented in tabular form in table II and graphically repre- sented in figure 9, 10, 11 and 12, for Insertion, Deletion, and Reorder attacks respectively. These results are discussed below. TABLE II EXTRACTED WATERMARK MATCHING AND DISTORTION PERCENTAGE UNDER VARIOUS INDIVID- UAL ATTACKS Origi- nal Extracted watermark matching and accuracy under 3 ATTACKS Sample Text AT- Text TACKS Category Word Volume Insertion Deletion Reorder Count PMR PMR PMR WDR% WDR% WDR% % % % [SST4] 179 5% 0.9409 0.0591 0.8936 0.1064 0.7354 0.2646 10% 0.8929 0.1071 0.8506 0.1494 0.7825 0.2175 20% 0.693 0.307 0.8652 0.1348 0.363 0.637 50% 0.6386 0.3614 0.7576 0.2424 0.3008 0.6992 [SST2] 421 5% 0.9246 0.0754 0.9448 0.0552 0.8835 0.1165 10% 0.9052 0.0948 0.7423 0.2577 0.7412 0.2588 20% 0.8182 0.1818 0.8083 0.1917 0.7535 0.2465 50% 0.6622 0.3378 0.9144 0.0856 0.7624 0.2376 [MST5] 469 5% 0.9473 0.0527 0.9854 0.0146 0.8589 0.1411 10% 0.9068 0.0932 0.9553 0.0447 0.7484 0.2516 20% 0.8233 0.1767 0.9475 0.0525 0.5715 0.4285 50% 0.6428 0.3572 0.548 0.452 0.2619 0.7381 [MST2] 559 5% 0.9463 0.0537 0.9565 0.0435 0.8916 0.1084 10% 0.9006 0.0994 0.823 0.177 0.7544 0.2456 20% 0.8282 0.1718 0.8269 0.1731 0.5697 0.4303 50% 0.6576 0.3424 0.3258 0.6742 0.0493 0.9507 [LST4] 2018 5% 0.0102 0.9898 0.9852 0.0148 0.8697 0.1303 10% 0.0095 0.9905 0.979 0.021 0.0577 0.9423 20% 0.0106 0.9894 0.0065 0.9935 0.0676 0.9324 50% 0.0066 0.9934 0.008 0.992 0.0502 0.9498 Results of various attacks under 5% scenario The results shows the PMR accuracy of the proposed algorithm, as applied on different datasets, under 5% rate of insertion, deletion and reorder attacks. The PMR is more than 70% for all kinds of attacks except under insertion attack with large size of text document (LST4) as shown in figure no. 9. It can be observed also that as the PMR is the worst un- der reorder attack, and it is the best under deletion attack, in which the PMR still maintains a value close to or greater than 90% in all cases. 276 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 100 90 80 70 60 50 40 30 20 10 0 [SST4] [SST2] [MST5] [MST2] [LST4] Insertion 5% Deletion 5% Reorder 5% Fig. 9 PMR accuracy under5% scenariosof various attacks Results of various attacks under 10% scenario As applied on different datasets under 10% rate of insertion, deletion and reorder at- tacks as shown in figure 10, the PMR value is the best under deletion attacks in which the PMR still maintains a value greater than 70% for all datasets. However, in insertion attack, the PMR is still maintains its values close to90% for all datasets except with LST4 dataset. Which refer to that the proposed approach is not applicable under 10% of insertion attacks with large sizes of text document. Finally, in case of reorder attack, the PMR value is in- crease with small size of text documents and decrease with the large documents. 100 90 80 70 60 50 40 30 20 10 0 [SST4] [SST2] [MST5] [MST2] [LST4] Insertion 10% Deletion 10% Reorder 10% Fig. 10 PMR accuracy under 10% scenariosof various attacks 277 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME Results of various attacks under 20% scenario Figure 11 shows experimental results as applied on different datasets under 20% rate of insertion, deletion and reorder attacks. As shown in figure 11, the PMR accuracy is good with small and middle sizes of text document, but it is bad with the large size of text docu- ments with all kinds of attacks as shown with LST4 dataset. Which it is refers to that the pro- posed approach dose not applicable with large documents under 20% rates of various kinds of attacks. 100 80 60 40 20 0 [SST4] [SST2] [MST5] [MST2] [LST4] Insertion 20% Deletion 20% Reorder 20% Fig. 11 PMR accuracy under20% scenariosof various attacks Results of various attacks under 50% scenario As applied on different datasets under 50% rate of insertion, deletion and reorder at- tacks as shown in figure 12, the PMR accuracy is increase with small sizes of text documents, decrees with middle size of text documents, and very bad with large size of text documents in which values are close to zero with all scenarios. As shown also from figure 12, the PMR still maintains a value greater than 60% for small and middle datasets under insertion attack. 100 80 60 40 20 0 [SST4] [SST2] [MST5] [MST2] [LST4] Insertion 50% Deletion 50% Reorder 50% Fig. 12 PMR accuracy under50% scenariosof various attacks 278 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME C. Comparative Results In order to compare the performance of the proposed approach named here as (WO2) with recently published approach for text watermarking which presented in [27] named here as (WO1) proposed by F. Al-wesabi et al., TheWO1 has the environment and parameters as the same of the proposed approach. Both WO2 and WO1approaches depend on wordmecha- nism of Markov model. However, the core difference between that is order nature, which WO1 approach based on order one of Markov model, while the proposed approach (WO2) based on order two of Markov model. In this experiments, random multiple insertions, deletion and reorder attacks were per- formed individually on each sample of the datasets with various rates of attacks as shown above in table I. Ratios of successfully detected watermark of the proposed algorithm as compared with reference 27 (WO1)are shown in table III and graphically represented in fig- ure 13, 14, 15and 16. TABLE III COMPARATIVE PERFORMANCE ACCURACY OF THE PROPOSED ALGORITHM WITH WO1 UNDER INDIVIDUAL ATTACKS Origi nal Successfully detected watermark Sample Text AT- Text TACKS Reference 27 (WO1) The proposed approach Word Category Volume (WO2) Coun Inser- Dele- Reor- Inser- Dele- Reor- t tion tion der tion tion der 5% 94.59 86.02 81.85 94.09 89.36 73.54 10% 89.53 86.39 81.65 89.29 85.06 78.25 [SST4] 179 20% 67.47 88.02 45.91 69.3 86.52 36.3 50% 64.21 71.56 42.68 63.86 75.76 30.08 5% 91.91 95.19 72.05 92.46 94.48 88.35 10% 90.34 69.49 78.69 90.52 74.23 74.12 [SST2] 421 20% 81.42 73.95 81.46 81.82 80.83 75.35 50% 65.54 75.13 82.59 66.22 91.44 76.24 5% 94.64 97.33 88.21 94.73 98.54 85.89 10% 90.6 93.19 78.85 90.68 95.53 74.84 [MST5] 469 20% 80.93 89.18 63.2 82.33 94.75 57.15 50% 63.03 40.96 0.7 64.28 54.8 26.19 5% 94.8 93.69 90.02 94.63 95.65 89.16 10% 89.99 78.71 80.13 90.06 82.3 75.44 [MST2] 559 20% 82.43 73.97 65.6 82.82 82.69 56.97 50% 66.08 27.86 7.89 65.76 32.58 4.93 5% 94.76 96.62 88.61 1.02 98.52 86.97 10% 89.67 93.02 60.8 0.95 97.9 5.77 [LST4] 2018 20% 4.09 1.2 9.09 1.06 0.65 6.76 50% 1.07 1.54 7.37 0.66 0.8 5.02 279 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME Comparative Results for Individual Dataset In order to evaluate the accuracy of the proposed approach, we will compare its expe- riment results with WO1 approach under various scenarios of insertion, deletion and reorder attacks. To perform this comparative, we choose three classes of experimental datasets, SST4 as small dataset, MST5 as a middle dataset, and LST4 as a large dataset. Comparative Results under Various Scenarios for Individual Datasets As shown in figure 13 under 5% volume of various attacks, the pattern matching rate (PMR) of WO2 is better than PMR of WO1 in terms of deletion attacks for all datasets. How- ever, in terms of insertion and reorder attacks, the PMR of WO1 is better than PMR of WO2 for all datasetsexcept with MST5 dataset under insertion attack. Which means the proposed approach provide added value with all sizes of text documents under deletion attacks. SST4 Dataset MST5 Dataset 100 100 80 90 60 80 40 70 20 60 0 50 Insertion 5% Deletion 5% Reorder 5% Insertion 5% Deletion 5% Reorder 5% Refrence 27 (WO1) Refrence 27 (WO1) This Approach (WO2) This Approach (WO2) LST4 Dataset 110 100 90 80 70 60 50 Insertion 5% Deletion 5% Reorder 5% Refrence 27 (WO1) This Approach (WO2) Fig. 13 Comparison results between (WO1) and (WO2) under5% of various attacks 280 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME As shown in figure 14, the performance of WO2 approach is better than WO1 on middle sizes of text documents under insertion and deletion attacks. On the other hand, the WO1 is better than WO2 under reorder attacks for all datasets, and under insertion attack on large size of text documents such as LST4. Which means the proposed approach is robustness against deletion attacks for all sizes of text documents, recommended for small and middle sizes of text document under this range of insertion attacks, and not applicable under inser- tion and reorder attacks for large size of text documents. SST4 Dataset 100 90 80 70 60 50 Insertion 10% Deletion 10% Reorder 10% Refrence 27 (WO1) This Approach (WO2) MST5 Dataset 100 80 60 40 20 0 Insertion 10% Deletion 10% Reorder 10% Refrence 27 (WO1) This Approach (WO2) LST4 Dataset 100 90 80 70 60 50 40 30 20 10 0 Insertion 10% Deletion 10% Reorder 10% Refrence 27 (WO1) This Approach (WO2) Fig. 14 Comparison results between (WO1) and (WO2) under 10% of various attacks 281 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME As shown in figure 15, and comparative with20% scenario of various attacks, the ro- bustness of the proposed approach (WO2) improves with small and middle size of text docu- ments when the volume of insertion and deletion attacks is increments. On the other hand, comparative results shown also, the robustness of WO2 approach is shown worse than (WO1) approach under this rate of reorder attack for all datasets. This means that it is applying the test of the proposed approach (WO2) under 20% and less of volume attacks are applicable for small and middle size of text documents but not recommended for large sizes of text docu- ments. SST4 Dataset 100 90 80 70 60 50 40 30 20 10 0 Insertion 20% Deletion 20% Reorder 20% Refrence 27 (WO1) This Approach (WO2) MST5 Dataset 100 90 80 70 60 50 40 30 20 10 0 Insertion 20% Deletion 20% Reorder 20% Refrence 27 (WO1) This Approach (WO2) LST4 Dataset 100 90 80 70 60 50 40 30 20 10 0 Insertion 20% Deletion 20% Reorder 20% Refrence 27 (WO1) This Approach (WO2) Fig. 15 Comparison results between (WO1) and (WO2) under 20% of various attacks 282 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME As shown in figure 16, and comparative with previous discussed scenarios of various attacks, the robustness of the proposed approach is still better than (WO1) approach under this rate (50%) especially under insertion and deletion attacks for all datasets.And the robust- ness value is decrease with large size of text documents. In the other word, the proposed ap- proach provide added value in term of robustness on small and middle sizes of text docu- ments especially under insertion and deletion attacks. SST4 Dataset 100 90 80 70 60 50 40 30 20 10 0 Insertion 50% Deletion 50% Reorder 50% Refrence 27 (WO1) This Approach (WO2) MST5 Dataset 100 90 80 70 60 50 40 30 20 10 0 Insertion 50% Deletion 50% Reorder 50% Refrence 27 (WO1) This Approach (WO2) LST4 Dataset 8 6 4 2 0 Insertion 50% Deletion 50% Reorder 50% Refrence 27 (WO1) This Approach (WO2) Fig. 16 Comparison results between (WO1) and (WO2) under 50% of various attacks 283 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME Comparative Results under Various Scenarios for All Datasets The following figure no. 17, shows the performance of the two approaches, as applied under 5% of various kind of attacks on different datasets. As shown, for all datasets, the pro- posed approach WO2 is better performance under insertion and deletion attacks. However, the WO1 is better performance than WO2 approach under reorder attacks, which show in general that the proposed approach is recommended under low volume of all tampering at- tacks for all sizes of text documents 100 90 80 70 60 50 40 30 20 10 0 WO1 WO2 WO1 WO2 WO1 WO2 Insertion 5% Deletion 5% Reorder 5% [SST4] [SST2] [MST5] [MST2] [LST4] Fig. 17 Comparison results between (WO1) and (WO2) under 5% of various attacks for all datasets Figure 18 illustrate the comparative results under 10% rate of various attacks, As shown for all datasets, the WO1 and WO2 approaches are close together under insertion and deletion attacks exception on large size dataset (LST4 dataset) which WO1 is better under insertion attacks. Also, under reorder attack, compression results shows that the WO1 ap- proach is better than WO2 with all datasets. 100 90 80 70 60 50 40 30 20 10 0 WO1 WO2 WO1 WO2 WO1 WO2 Insertion 10% Deletion 10% Reorder 10% [SST4] [SST2] [MST5] [MST2] [LST4] Fig. 18 Comparison results between (WO1) and (WO2) under 10% of various attacks for all datasets 284 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME As applied under 20% of different attacks for all datasets. We can say the perform- ance of the two approachesis the same under insertion and deletion attacks as shown in figure no. 19. However, the performance of WO1 approach is better than WO2 under reorder attacks. 100 90 80 70 60 50 40 30 20 10 0 WO1 WO2 WO1 WO2 WO1 WO2 Insertion 20% Deletion 20% Reorder 20% [SST4] [SST2] [MST5] [MST2] [LST4] Fig. 19 Comparison results between (WO1) and (WO2) under 20% of various attacks for all datasets Figure 20 illustrate the comparative results under high rate (50%) of various attacks, As shown for all datasets, the proposed approach WO2 has the best performance and provide added value under insertion and deletion attacks for all datasets, and the proposed approach WO2 is not effective under reorder attacks. 100 90 80 70 60 50 40 30 20 10 0 WO1 WO2 WO1 WO2 WO1 WO2 Insertion 50% Deletion 50% Reorder 50% [SST4] [SST2] [MST5] [MST2] [LST4] Fig. 20 Comparison results between (WO1) and (WO2) under 50% of various attacks for all datasets 285 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME Comparative Results of PMR Standard Deviation for Individual Dataset In order to evaluate the performance of the proposed approach (WO2), we find the PMR standard deviation between WO1 and WO2 approaches (PMR of WO2 - PMR of WO1) for all scenarios of each attack applied on each dataset as shown in Table IV. TABLE. IV STANDARD DEVIATION OF ALL SCENARIOS FOR ALL DATASETS UNDER VARIOUS ATTACKS PMR Standard Deviation Reference 27 (WO1) This approach (WO2) Dataset Insertion Deletion Reorder Insertion Deletion Reorder SST4 78.95 83.00 99.44 79.14 84.18 54.54 SST2 82.30 78.44 99.52 82.76 85.25 78.52 MST5 82.30 80.17 87.53 83.01 85.91 61.02 MST2 83.33 68.56 98.19 83.32 73.31 56.63 LST4 47.40 48.10 99.23 0.92 49.47 26.13 The average of standard deviation of all scenarios for small dataset (SST4), medium dataset (MST5), and large dataset (LST4)are shown respectively in figure21. As shows, in case of SST4 dataset, the proposed approachWO2observed as the best under insertion and deletion attacks. On the other side, the WO1 approach is the best under reorder tampering attack in which the difference of standard deviation average with the proposed approach WO2 is (-44.9) which means that WO1 approach is recommended for detect reorder attacks, and the performance has been improved by the proposed approach WO2 under insertion and deletions attacks. As shown in case of MST5 dataset, the performance of WO2has improved under in- sertion and deletion attacks especially under deletion attacks with deference of standard devi- ation average with WO1 approach that approximately equal to (5.74), we observed also the PMR of the proposed approach has improved under reorder attacks with middle size of text document (MST5) as a comparative with small size of text document (SST4) but the WO1 approach still the best under this reorder tampering attacks. 286 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME Finally, in case of large dataset LST4, comparative results shows that the PMR stan- dard deviation of WO2 approach still the best under deletion attacks, and decrease under in- sertion and reorder attacks. SD of SST4 Dataset 100 90 80 70 60 50 40 30 20 10 0 Insertion Deletion Reorder Reference 27(WO1) This approach (WO2) SD of MST5 Dataset 100 90 80 70 60 50 40 30 20 10 0 Insertion Deletion Reorder Reference 27(WO1) This approach (WO2) SD of LST4 Dataset 100 90 80 70 60 50 40 30 20 10 0 Insertion Deletion Reorder Reference 27(WO1) This approach (WO2) Fig. 21 PMR standard deviation of all scenarios forSST4, MST5 and LST4 dataset under various attacks 287 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME Comparative Results of PMR Standard Deviation for All Datasets As shown in figure 22, the averageof standard deviation of all scenarios for all data- sets shows that the proposed approach WO2 has positive difference with WO1approach (WO2 PMR – WO1 PMR) in term of deletion attack which equal to (3.97) and has negative- difference under insertion (-9.03) and reorder (-41.42) attacks. Thus, the test of WO2 provide added value and recommended under deletion attacks, and it is not recommended for inser- tion and reorder attacks. SD for all Datasets 100 80 60 40 20 0 Insertion Deletion Reorder Reference 27 (WO1) This approach (WO2) Fig. 22 PMR standard deviation of all scenarios for all datasets under various attacks V. CONCLUSION Based on word mechanism of Markov model order two, the authors have designed a text zero-watermark approach which is based on text analysis. The algorithm uses the text features as probabilistic patterns of states and transitions in order to generate and detect the watermark. The proposed approach is implemented using PHP programming language. The experiment results shows that the proposed approach is sensitive for all kinds of random tam- pering attacks and has good accuracy of tampering detection. Compared with the recent pre- vious watermark approach named WO1 presented in reference (27) under random insertion, deletion and reorder attacks in multiple locations of 5 variable size text datasets, the compara- tive results shows that the watermark complexity is increased with the proposed approach, not effective under reorder attacks. However, the accuracy of tampering detection of the pro- posed approach is improved under all rates of deletion attacks with all sizes of text docu- ments, and it’s close to accuracy of WO1 approach under insertion attacks.This means that the proposed approach provide added value and recommended in these cases, but it is not ro- bust against reorder attacks especially for large sizes of text documents. 288 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME REFERENCES [1] Z. JaliI, A. Hamza, S. Shahidm M. Arif, A. Mirza, A Zero Text Watermarking Algorithm based on Non-Vowel ASCII Characters. International Conference on Educational and In- formation Technology (ICET 2010), IEEE. [2] Suhail M. A., Digital Watermarking for Protection of Intellectual Property. A Book Published by University of Bradford, UK, 2008. [3] L. Robert, C. Science, C. Government Arts, A Study on Digital Watermarking Tech- niques. International Journal of Recent Trends in Engineering, Vol. 1, No. 2, pp. 223-225, 2009. [4] X. Zhou, S. Wang, S. Xiong, Security Theory and Attack Analysis for Text Watermarking. International Conference on E-Business and Information System Security, IEEE, pp. 1-6, 2009. [5] T. Brassil, S Low, and N. F. Maxemchuk, Copyright Protection for the Electronic Dis- tribution of Text Documents. Proceedings of the IEEE, vol. 87, no. 7, July 1999, pp. 1181-1196. [6] M. Atallah, V. Raskin, M. C. Crogan, C. F. Hempelmann, F. Kerschbaum, D. Mohamed, and S.Naik, Natural language watermarking: Design, analysis, andimplementation. Pro- ceedings of the a Fourth Hiding Workshop, vol. LNCS 2137, 25-27 , 2001. [7] N. F. Maxemchuk and S Low, Marking Text Documents. Proceedings of the IEEE Inter- national Conference on Image Processing, Washington, DC, Oct 26-29, 1997, pp. 13- 16. [8] D. Huang, H. Yan, Interword distance changes represented by sine waves for water- marking text images. IEEE Trans. Circuits and Systems for Video Technology, Vol.11, No.12, pp. 1237 1245, 2001. [9] N. Maxemchuk, S. Low, Performance Comparison of Two Text Marking Methods. IEEE Journal of Selected Areas in Communications (JSAC), vol. 16 no. 4, pp. 561-572, 1998. [10] S. Low, N. Maxemchuk, Capacity of Text Marking Channel. IEEE Signal Processing Letters, vol. 7, no. 12 , pp. 345 -347, 2000. [11] M. Kim, Text Watermarking by Syntactic Analysis. 12th WSEAS International Confe- rence on Computers, Heraklion, Greece, 2008. [12] H. Meral, B. Sankur, A. Sumru, T. Güngör, E. Sevinç , Natural language watermarking via morphosyntactic alterations. Computer Speech and Language, 23, pp. 107-125, 2009. [13] Z. Jalil, A. Mirza, A Review of Digital Watermarking Techniques for Text Documents. International Conference on Information and Multimedia Technology, pp. 230-234 , IEEE, 2009. [14] M. AtaIIah, C. McDonough, S. Nirenburg, V. Raskin, Natural Language Processing for Information Assurance and Security: An Overview and Implementations. Proceedings 9th ACM/SIGSAC New Security Paradigms Workshop, pp. 5 1-65, 2000. [15] H. Meral, E. Sevinc, E. Unkar, B. Sankur, A. Ozsoy, T. Gungor, Syntactic tools for text watermarking. In Proc. of the SPIE International Conference on Security, Steganography, and Watermarking of Multimedia Contents, pp. 65050X-65050X-12, 2007. [16] O. Vybornova, B. Macq., Natural Language Watermarking and Robust Hashing Based on Presuppositional Analysis. IEEE International Conference on Information Reuse and Integration, IEEE, 2007. [17] M. tallah, V. Raskin, C. Hempelmann, language watermarking and tamperproofing. Proc. of al.. Natural 5th International Information Hiding Workshop, Noordwijkerhout, Netherlands, pp.196-212, 2002. 289 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME [18] U. Topkara, M. Topkara, M. J. Atallah, The Hiding Virtues of Ambiguity: Quantifiably Resilient Watermarking of Natural Language Text through Synonym Substitutions. In Proceedings of ACM Multimedia and Security Conference, Geneva, 2006. [19] Z Jalil, A. Mirza, H. Jabeen, Word Length Based Zero-Watermarking Algorithm for Tamper Detection in Text Documents. 2nd International Conference on Computer Engi- neering and Technology, pp. 378-382, IEEE, 2010. [20] Z Jalil, A. Mirza, M. Sabir, Content based Zero-Watermarking Algorithm for Authentica- tion of Text Documents. (IJCSIS) International Journal of Computer Science and Infor- mation Security, Vol. 7, No. 2, 2010. [21] Z. Jalil , A. Mirza, T. Iqbal, A Zero-Watermarking Algorithm for Text Documents based on Structural Components. pp. 1-5 , IEEE, 2010. [22] M.Yingjie, G. Liming, W.Xianlong, G Tao, Chinese Text Zero-Watermark Based on Space Model.In Proceedings of I3rd International Workshop on Intelligent Systems and Applications,pp. 1-5 , IEEE, 2011. [23] S. Ranganathan, A. Johnsha, K. Kathirvel, M. Kumar, Combined Text Watermarking. In- ternational Journal of Computer Science and Information Technologies, Vol. 1 (5), pp. 414-416, 2010. [24] Fahd N. Al-Wesabi, Adnan Alsakaf, Kulkarni U. Vasantrao, “A Zero Text Watermark- ing Algorithm based on the Probabilistic weights for Content Authentication of Text Documents”, in Proc. On International Journal of Computer Applications(IJCA), U.S.A, pp. 388 - 393, 2012. [25] Fahd N. Al-Wesabi, Adnan Z. Alsakaf and Kulkarni U. Vasantrao, “A Zero Text Watermarking Algorithm Based on the Probabilistic Patterns for Content Authentication of Text Documents”, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 1, 2013, pp. 284 - 300, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. [26] Fahd N. Al-Wesabi, Adnan Alsakaf, Kulkarni U. Vasantrao, “English Text Zero- Watermark Based on Markov Model of Letter Level Order Two”, Inderscience, Interna- tional Journal of Applied Cryptography (IJACT), Submitted.. [27] Fahd N. Al-Wesabi, Adnan Alsakaf, Kulkarni U. Vasantrao, “Content Authentication of English Text Documents Using Word Mechanism Order ONE of Markov Model and Ze- ro-Watermarking Techniques”, Elsevier, International journal of applied soft compu- ting, Submitted. 290