HYBRID ZERO-WATERMARKING AND MARKOV MODEL OF WORD MECHANISM AND ORDER-2-3 by iaemedu

VIEWS: 1 PAGES: 31

									  International Journal of JOURNAL OF COMPUTER (IJCET), ISSN 0976-
 INTERNATIONALComputer Engineering and Technology ENGINEERING
  6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
                          & TECHNOLOGY (IJCET)

ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)                                                  IJCET
Volume 4, Issue 3, May-June (2013), pp. 260-290
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2013): 6.1302 (Calculated by GISI)
                                                                       ©IAEME
www.jifactor.com




     HYBRID ZERO-WATERMARKING AND MARKOV MODEL OF
    WORD MECHANISM AND ORDER TWO ALGORITHM FOR CON-
     TENT AUTHENTICATION OF ENGLISH TEXT DOCUMENTS

              Kulkarni U. Vasantrao1, Fahd N. Al-Wesabi2, Adnan Z. Alsakaf3
     1
       Professor, Department of Comp. Sci. and Engg., SGGS Institute of Engg. and Tech.,
                                      Maharashtra, INDIA.
          2
            PhD Candidate, Faculty of Engineering, SRTM University, Nanded, INDIA,
   2
     Assistant Teacher, Department of IT, Faculty of Computing and IT, UST, Sana’a, Yemen.
       3
         Professor, Department of IS, Faculty of Computing and IT, UST, Sana’a, Yemen.


  ABSTRACT

          Content authentication and tamper detection of digital text documents has become a
  major concern in the communication and information exchange era via the Internet. There are
  very limited techniques available for content authentication of text documents using digital
  watermarking techniques.
          English text Zero-Watermark approach based on word mechanism order twoof
  Markov model is developed in this paper for content authentication and tamper detection of
  text documents. In the proposed approach, Markov model used as a soft computing tool for
  text analysis and hybrid with digital watermarking techniques in order to improve the accu-
  racy and complexity issues of the previous watermark technique presented in reference(27).
          The proposed approach is implemented using PHP programming language. Further-
  more, the effectiveness and feasibility of the proposed approachis proved with experiments
  using six datasets of varying lengths. The experiment results shows that the proposed ap-
  proach is more sensitive for all kinds of tampering attacks and has good accuracy of tamper-
  ing detection. The accuracy of tampering detection is compared with other recent approaches
  under random insertion, deletion and reorder attacks in multiple random locations of experi-
  mental datasets. The comparative results shows that the proposed approach is better than
  WO1 approach in term of watermark complexity, capacity, and watermark accuracy of tam-
  pering detection under insertion and deletion attacks. Which means the proposed approach is
  recommended in these cases, but it is not applicable under reorder tampering attacks espe-
  cially on large sizes of text documents.


                                              260
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

Keywords: Digital watermarking, Markov Model, order two, word mechanism, probabilistic
patterns, information hiding, content authentication, tamper detection, copyright protection.
I. INTRODUCTION
         With the increasing use of internet, e-commerce, and other efficient communication
technologies, the copyright protection and authentication of digital contents, have gained
great importance. Most of these digital contents are in text form such as email, websites,
chats, e-commerce, eBooks, news, and short messaging systems/services (SMS) [1].
         These text documents may be tempered by malicious attackers, and the modified data
can lead to fatal wrong decision and transaction disputes [2].
         Content authentication and tamper detection of digital image, audio, and video has been
of great interest to the researchers. Recently, copyright protection, content authentication, and
tamper detection of text document attracted the interest of researchers. Moreover, during the
last decade, the research on text watermarking schemes is mainly focused on issues of copy-
right protection, but gave less attention on content authentication, integrity verification, and
tamper detection [4].
         Various techniques have been proposed for copyright protection, authentication, and
tamper detection for digital text documents. Digital Watermarking (DWM) techniques are con-
sidered as the most powerful solutions to most of these problems. Digital watermarking is a
technology in which various information such as image, a plain text, an audio, a video or a
combination of all can be embedded as a watermark in digital content for several applications
such as copyright protection, owner identification, content authentication, tamper detection,
access control, and many other applications [2].
         Traditional text watermarking techniques such as format-based, content-based, and
image-based require the use of some transformations or modifications on contents of text
document to embed watermark information within text. A new technique has been proposed
named as a zero-watermarking for text documents. The main idea of zero-watermarking
techniques is that it does not change the contents of original text document, but utilizes the
contents of the text itself to generate the watermark information [13].
         In this paper, the authors present a new zero-watermarking technique for digital text
documents. This technique utilizes the probabilistic nature of the natural languages, mainly
the second order based on word level of Markov model.
         The paper is organized as follows. Section 2 provides an overview of the previous work
done on text watermarking. The proposed generation and detection algorithms are described in
detail in section 3. Section 4 presents the experimental results for the various tampering attacks
such as insertion, deletion and reordering. Performance of the proposed approach is evaluated
by multiple text datasets. The last section concludes the paper along with directions for future
work.
II.    PREVIOUS WORK
        Text watermarking techniques have been proposed and classified by many literatures
based on several features and embedding modes of text watermarking. We have examined
briefly some traditional classifications of digital watermarking as in literatures. These tech-
niques involve text images, content based, format based, features based, synonym substitu-
tion based, and syntactic structure based, acronym based, noun-verb based, and many others
of text watermarking algorithms that depend on various viewpoints [1][3][4].


                                               261
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

A. Format-based Techniques
        Text watermarking techniques based on format are layout dependent. In [5], proposed
three different embedding methods for text documents which are, line shift coding, word shift
coding, and feature coding. In line-shift coding technique, each even line is shifted up or down
depending on the bit value in the watermark bits. Mostly, the line is shifted up if the bit is one,
otherwise, the line is shifted down. The odd lines are considered as control lines and used at
decoding. Similarly, in word-shift coding technique, words are shifted and modifies the inter-
word spaces to embed the watermark bits. Finally, in the feature coding technique, certain text
features such as the pixel of characters, the length of the end lines in characters are altered in a
specific way to encode the zeros and ones of watermark bits. Watermark detection process is
performed by comparing the original and watermarked document.

B. Content-based Techniques
         Text watermarking techniques based on content are structure-based natural language
dependent [4]. In [6][14], a syntactic approach has been proposed which use syntactic struc-
ture of cover text for embedding watermark bits by performed syntactic transformations to
syntactic tree diagram taking into account conserving of natural properties of text during wa-
termark embedding process. In [18], a synonym substitution has been proposed to embed wa-
termark by replacing certain words with their synonyms without changing the sense and con-
text of text.

C. Binary Image-based Techniques
        Text Watermarking techniques of binary image documents depends on traditional im-
age watermarking techniques that based on space domain and transform domain, such as Dis-
crete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), and Least Significant
Bit (LSB) [5]. Several formal text watermarking methods have been proposed based on em-
bedding watermark in text image by shifting the words and sentences right or left, or shifting
the lines up or down to embed watermark bits as it is mentioned above in section format-
based watermarking [5][7].

D. Zero-based Techniques
        Text watermarking techniques based on Zero-based watermarking are content features
dependent. There are several approaches that designed for text documents have been pro-
posed in the literatures which are reviewed in this paper [1][19] [20] and [21].
   The first algorithm has been proposed by [19] for tamper detection in plain text documents
based on length of words and using digital watermarking and certifying authority techniques.
The second algorithm has been proposed by [20] for improvement of text authenticity in which
utilizes the contents of text to generate a watermark and this watermark is later extracted to
prove the authenticity of text document. The third algorithm has been proposed by [1] for copy-
right protection of text contents based on occurrence frequency of non-vowel ASCII characters
and words. The last algorithm has been proposed by [21] to protect all open textual digital con-
tents from counterfeit in which is insert the watermark image logically in text and extracted it
later to prove ownership. In [22], Chinese text zero-watermark approach has been proposed
based on space model by using the two-dimensional model coordinate of word level and the
sentence weights of sentence level.




                                                262
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

E. Combined-based Techniques
        One can say the text is dissimilar image. Thus, language has a distinct and syntactical
nature that makes such techniques more difficult to apply. Thus, text should be treated as text
instead of an image, and the watermarking process should be performed differently. In [23] A
combined method has been proposed for copyright protection that combines the best of both
image based text watermarking and language based watermarking techniques.
        The above mentioned text watermarking approaches are not appropriate to all types of
text documents under document size, types and random tampering attacks, and its mecha-
nisms are very essential to embed and extract the watermark in which maybe discovered eas-
ily by attackers . On the other hands, these approaches are not designed specifically to solve
problem of authentication and tamper detection of text documents, and are based on making
some modifications on original text document to embed added external information in text
document and this information can be used later for various purposes such as content authen-
tication, integrity verification, tamper detection, or copyright protection. This paper proposes
a novel intelligent approach for content authentication and tamper detection of English text
documents in which the watermark embedding and extraction process are performed logically
based on text analysis and extract the features of contents by using hidden Markov model in
which the original text document is not altered to embed watermark.

III.      THE PROPOSED APPROACH

        This paper presents an improved intelligent approach of English text zero-
watermarking based on word level and second order of Markov model for content authentica-
tion and tampering detection of text documents.
        An improved approach depends on word mechanism and order two of Markov model
to improve the performance, complexity and accuracy of tampering detection of similar ap-
proach that used order one of Markov model presented in [27] and developed by F. Al-
wesbiet. el. An improved approach should perform watermark generation, embedding, extrac-
tion and detection processes under higher accuracy and security measures. An improved ap-
proach hybrid text zero-watermarking techniques and soft computing tools for natural lan-
guage processing and protect the digital text documents. A Markov model uses for text analy-
sis and extracts the interrelationship between its contents as probabilistic patterns based on
word level and second order of Markov model in order to generate the watermark information.
This watermark can later be extracted using extraction algorithm and matched with water-
mark generated from attacked document using detection algorithm for identifying any tam-
pering and prove the authenticity of text document.
        Before we explain the watermark generation and detection processes, in the next sub-
section we present a preliminary mathematical description for second order of Markov mod-
els based on word mechanism for text analysis
A. Markov Models for Text Analysis

    In this subsection, we explain how to model text using a Markov chain, which is defined
as a stochastic (random) model for describing the way that processes move from state to a
state. For example, suppose that we want to analyse the following sentence:
       “The quick brown fox jumps over the brown fox who is slow jumps over the brown
                                    fox who is dead.”

                                              263
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

   When we use a Markov model of order two for words mechanism, then each sequence of
two words is a state. As the above sample text is processed, the system makes the following
transitions:

"the quick" -> "quick brown" -> "brown fox" -> "fox jumps" -> "jumps over" -> "over
the" -> "the brown" -> "brown fox" -> "fox who" -> "who is" -> "is slow" -> "slow jumps"
-> …etc

   Next we present a simple method to build a Markov matrix of states and
tions       which is the most basic part of text analysis using Markov model.
   Based on this approach, the size of Markov matrix is not fixed, which means the number of
states and transition probabilities are vary based on contents of the given text. A list of all
possible states and transitions can be computed by the equation (1) and (2):

                        Ps = (n-2)                   ……….                  (1)

                      Ps = (n-2) ^ 2                 ……….                  (2)

  Where,
    - n: is the length of given text document.

So the matrix of states probabilities for the above given sample text should have (20 – 2) = 18
double of words.

       A matrix of transition probabilities from each state, there are (n -2) possible transitions.
IF the Markov chain is currently at first state (first two words) in the given text document, the
possible states that could come next are [W i+2, W i+3, W i+4, …, Wi+n]. So the matrix of
transition probabilities should have (n – 2) ^ 2 entries. For example in the above given sam-
ple text, If the Markov chain is currently at "the quick" state, the possible transitions that
could come next are [brown, fox, jumps, over, the, ...,dead].

       So the matrix of transition probabilities for the above given sample text should have
(20 – 2) ^ 2 = 18 ^ 2 = 324 entries.

       In general, if each state has n transitions of words, then there are (n-2) states, and the
matrix of transition probabilities needs (n-2) ^ 2 entries.

       As a result of a Markov model of order two for words mechanism to analysing the
above given sentence which contains 20 words and after processed by the system and repre-
sented in a Markov chains, we obtain the figure 2 which gives the 11 present states as a
words setsin matrix of Markov chains without reputations and 18 (n – 2) all possible transi-
tions.




                                               264
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME




         Fig. 2. Sample text states and transitions based on order 2 of a Markov model

  Now if we consider state "brown fox", the next state transitions are "jumps", "who",
and ”who”. We observe that state “who” occurs twice.

         For the analysis of large sized text, we calculate the frequencies of occurrences of the
next states to finally obtain the probabilities. Next is a simple procedure to obtain a Markov
model of order twofor a given text.
         Build (and initialize to all-zeroes) an-2 by n-2 matrix to store the transitions. The
entry         will be used to keep track of the number of times that the word is followed by
the wordwithin given text. For                      , where is the length of the text document -
2", let x be the ithword in the text and y be the (i+2)stword in the text. Then increment M[x,y].
Now the matrix M contains the counts of all transitions. We want to turn these counts into
probabilities. Here is a method that can do it. For each i from 3 to n, sum the entries on the ith
row, i.e., let counter[i] = M[i,3] + M[i,4] + M[i,5] + ... + M[i,n]. Now define P[i,j] = M[i,j] /
counter[i] for all pairs i,j. This just gives a matrix of probabilities. In other words, now P[i,j]
is the probability of making a transition from word i to word j. Hence a matrix of probabili-
ties that describes a Markov model of order twofor the given text is obtained.

B. Watermark Generation and Embedding Algorithm

        The watermark generation and embedding algorithm requires the original text docu-
ment (To) as input which provided by the author, then as a pre-processing step it is required
to perform conversion of capital letters to small letters. A watermark pattern is generated as
the output of this algorithm. This watermark is then stored in watermark database along with
the main properties of the original text document such as document identity, author name,
current date and time.
        This stage includes involves three algorithms, which are pre-processing and building
the Markov matrix, text analysis, and watermark generation and embedding as shown in fig-
ure 3.




                                               265
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME




                                                         Original Text Document
                                                                  (TDO)



                                                         Text Pre- processing



                                                        Building Markov matrix

                                                        Word Level        Order TWO


                                                                          Text Analysis using
                                                                            Markov model

                                            Compute # of occurrences of NS transitions for every
                                                                   PS


                                                         Present State         NextState
                                                             (PS)                (NS)




                                                           Probabilistic patterns



                                                   No
                                                                   Terminate?


                                                                 Yes
                          Watermark DB WMO
                          Patterns, DocID, Date,                         WMO
                                   Time



                                                             DigestWM patterns
                                                            using MD5 algorithm



                  Fig. 3. Watermark generation and embedding processes


 1) Pre-processing and Build the Markov Matrix
      This algorithm requires the original text document as inputs, and provides the prepro-
   cessed text document and Markov matrix as outputs. Building the states and transition
   matrix is the most base part of text analysis and watermark generation using Markov
   model. A Markov matrix that represents the possible states and transitions available in
   given text is constructed without reputations. In this approach, each unique sequence of
   two words within given text represent as state (words set) and transition in the Markov
   matrix. During building process of Markov matrix, the proposed algorithm initialize all
   transition values by zero to use these cells later to keep track of the number of times that
   the word is followed by the word within given text document.




                                                         266
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

   The algorithm of preProcessing executes as following:

     PROCEDURE preProcessing(TO)
     - Input: Original Text Document (TO)
     - Output: preprocessed text document (TP), state matrix of given text without repeats
       arrayList[Ts],
     - BEGIN
     - Loop index = 0 to Text.Length - 2,
             o // convert letters case from capital to small
             o IF UpperCharacter(TO[index]) = True THEN
                        TP[index] = LowerCharacter(TO[index]);
             o // List all unique sequence of two words within given text as states in array
                list
             o exist = TP[index];
             o Loop j = 0 to index
             o IF arrayList[j] <>exist THEN
             o arrayList[index] = exist;
     - index ++;
     - END
      Where,
       o To: represent the original text document, Tp: represent the processed text docu-
           ment,arrayList: represent the states arrayof given text after preprocessing
           process,index: represent the current word in given text.

      The algorithm of buildingMarkov matrix executes as following:

      PROCEDURE Build_Markov_Matrix(TP)
      - Input: preprocessed text (TP)
      - Output: Markov matrix with zeros initial value
      - BEGIN
      - // perform preprocessing process
      - Call preProcessing (TP)
      - // Build states and transitions matrix of Markov model and initialize all zeros
      - Loop ps = 0 to arrayList.Length - 2,
               o Loop ns = 0 to arrayList.Length,
                        MarkovMatrix[ps][ns] = 0;
               o ns ++;
      - ps ++;
      - END
      Where,
      o TP: represent the preprocessed text document, MarkovMatrix: States and Transi-
           tions matrix with zero value for all cells, ps: The present state, ns: The next state.

 2) Algorithm of Text Analysis
      This algorithm takes the preprocessed text document as input, and provides the wa-
   termark patterns as output. Aafter the Markov matrix was constructed, text analysis
   process should be done using Markov model based on order two of word mechanism by-

                                              267
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

    finding the interrelationship between words of the given text document. In the other
    word, the proposed algorithm computes the number occurrences of the next state transi-
    tions for every present state. A Matrix of transition probabilities that represents the num-
    ber of occurrences of transition from a state to another which constructed by equation (3)
    as following.

    MarkovMatrix[ps][ns] =Total Number of Transition[i][j], for i.j=1,2, .,n-2      …….(3)

      Where,
      o n: is the total number of states.
      o i: refers to PS "the present state".
      o j: refers to NS "the next state".
      o P[i,j]: is the probability of making a transition from wordi to word j.

     Text analysisof given sentence based on word mechanism and order twoshowed in
    Markov chain and proceeds as illustrated in figure 4.




             Fig. 4. Text analysis processes based on order 2 of a Markov model


      Let TPis the preprocessed text, MarkovMatrix[ps][ns] represent the Markov matrix to
    store values of the number of times that the word is followed by the word in the
    given text. The text analysis algorithm is presented formally and executes as following:

     PROCEDURE text_analysis(TP)
     - Input: preprocessed text (TP)
     - Output: Markov matrix with values of transition probabilities
     - BEGIN
     - // build states and transitions matrix of Markov model
     - Call Build_Markov_Matrix (TP)
     - // compute the total frequencies of transitions for every state
     - Loop ps = 0 to arrayList.Length - 2,
              o Loop ns = 1 to arrayList.Length,
                         Loop counter = 2 to TP.length - 1,
                             • MarkovMatrix[ps][ns] = Total Number of Transition[ps][ns]
                         counter ++;

                                             268
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

           o ns ++;
     - ps ++;
     - END
     Where,
      o TP: represent the preprocessed text document, MarkovMatrix: States and Transi-
        tions matrix with transition probability values for every state.

 3) Algorithm of Watermark Generation and Embedding
     After performing the text analysis and extracting the probability features, the water-
   mark is obtained by identifying all the nonzero values in the above Markov matrix.
   These nonzero values are sequentially concatenated to generate a watermark pattern, de-
   noted by WMPO as given by equation (4) and presented in figure 5.

      WMPO &= MarkovMatrix [ps] [ns], for i,. j= nonzero values in the Markov ma-
                              trix………….. (4)




               Fig. 5. The original watermark patterns for a given sample text

      The embedding process will be done logically during text analysis process by keeping
    the tracksof all nonzero transitions and its values shown in the Markov matrix. In which
    the cells of nonzero transitions contains the number of times that the  word is followed
    by the     word within given text document. These tracks can be used later by detection
    algorithm for matching it with those tracks that will be producing from the attacked text
    document.

      This watermark is then stored in the watermark database along with some properties of
    the original text document such as document identity, author name, current date and
    time.After watermark generation as sequential patterns, an MD5 message digest is gen-
    erated for obtaining a secure and compact form of the watermark, notationalyas given by
    equation (5) and presented in figure 6.

                    DWM = MD5(WMPO)                 ………………..           (5)




                     Fig. 6. The original watermark after MD5 digesting




                                            269
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

 The proposed watermark generation and embedding algorithm, using the first order of
Markov model word level is presented formally andexecutes as the following:

      PROCEDURE watermark_gen_embed(M[I,j])
      - Input: Markov matrix[i,j]
      - Output: originalwatermark patterns (WMPO)
      - BEGIN
      - // compute the total frequencies of transitions for every state of original document
      - Call text_analysis(TP)
      - // concatenate watermark patterns of every states shown in Markov matrix
      - Loop ps = 0 to MarkovMatrix[rows].Length - 2,
               o Loop ns = 1 to MarkovMatrix[columns].Length,
               o IF MarkovMatrix [ps][ns] != 0 // states that have nonzero transitions
               o WMPO &= MarkovMatrix [ps] [ns]
               o ns ++;
      - ps ++;
      - Store WMPOin DWM database.
      - // Digest the original watermark using MD5 algorithm
      - WMO = MD5(WMPO)
      - Output WMPO, WMO
      - END


      Where,
      o WMO: Original watermark, WMPO: Original watermark patterns, MD5: Hash al-
        gorithm.


C. Algorithms of Watermark Extraction and Detection

       The watermark detection algorithm is on the base of zero-watermark, so before detec-
tion for attacked text document TA, the proposed algorithm still need to generate the attacked
watermark patterns′. When received the watermark patterns′, the matching rate of patterns′
and watermark distortion are calculated in order to determine tampering detection and content
authentication.
       This stage includes two main processes which are watermark extraction and detection.
Extracting the watermark from the received attacked text document and matching it with the
original watermark will be done by the detection algorithm.
       The proposed watermark extraction algorithm takes the attacked text document, and
performs the same watermark generation algorithm to obtain the watermark pattern for the
attacked text document as shown in figure 7.




                                             270
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME


                                                 Attacked Text Document
                                                          (TDA)



                                                      Text Pre-processing



                                                 Building Markov matrix

                                                 Word Level       Order TWO


                                                                  Text Analysis using
                                                                    Markov model

                                     Compute # of occurrences of NS transitions for every
                                                            PS


                                                  Present State      NextState
                                                      (PS)             (NS)




                                                        Probabilistic patterns



                                            No               Terminate?


                                                           Yes
                         Watermark DB WMO
                        Patterns, DocID, Date, Time           EWMA




                                         WMO                                        No
                                                            WM Pattern                      Text Document
                                                             Matching                         Tampered


                                                            Yes

                                                          Text Document is
                                                             Authentic




                      Fig. 7. Watermark Extraction and Detection processes


 1) Watermark Extraction Algorithm
     In this algorithm the proposed approach takes the attacked text document (TA), origi-
   nal watermark patterns or original text document as inputs and the procedure is similar to
   that of watermark generation. Output of this algorithm is attacked watermark patterns’
   (WMPA).




                                                              271
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

  The watermark extraction algorithm is executes as following

      PROCEDURE watermark_extraction(TP’)
      - Input: Attacked text document(TA)
      - Output: attackedwatermark patterns (WMPA).
      - BEGIN
      - // perform preprocessing process for attacked text document
      - Call preProcessing(TA)
       - // compute the total frequencies of transitions for every state of attacked document
      - Call text_analysis(TAp)
      - // Generatethe attacked watermark patterns from the attacked text document.
      - Loop ps = 0 to MarkovMatrix’[rows].Length - 2,
              o Loop ns = 0 to MarkovMatrix’[columns].Length,
              o IF MarkovMatrix’[ps][ns] != 0,
              o WMPA &= MarkovMatrix’[ps] [ns],
              o ns ++;
      - ps ++;
      - Output WMPA
      - END

       Where,
       o WMPA: Attacked watermark patterns, TA: Attacked text document, TAp: prepro-
         cessed attacked text document, MarkovMatrix’[ps] [ns]: Markov matrix of the at-
         tacked text document.

  2) Watermark Detection Algorithm
  After extracting the attacked watermark pattern, the watermark detection is performed in
three steps,
       • Primary matching is performed on the whole watermark pattern of the original
           document WMPO, and the attacked document WMPA. If these two patterns are
           found the same, then the text document will be called authentic text without tam-
           pering. If the primary matching is unsuccessful, the text document will be called
           not authentic and tampering occurred, then we proceed to the next step.
       • Secondary matching is performed by comparing the components associated with
           each state of the overall pattern. which compares the extracted watermark pattern
           for each state with equivalent transition of original watermark pattern. This process
           can be described by the following mathematical equations (6), and (7).

                                                               , (0 < PMRT<=1) ………..            (6)

       Where,
       o        :represent the value of pattern matching rate on transition level..
       o    : represent the indexes of states and transitions respectively, i= 0 ..number of
         non-zero states in given text, j= 0 .. number of non-zero transitions in given text.
       o         : represent the value of original watermark in transition level.
       o         : represent the value of attacked watermark in transition level.

                                              272
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

                                                           , (0 < PMRS<=1) ……..    (7)
Where,
      o n: is the number of non-zero transitions of every state represented in matrix of
        Markov model.
      o i: is the count of non-zero patterns of every state represented in matrix of Markov
        model.
      o         : represent the value of pattern matching rate on state level.

  After we get the pattern matching rate of every state, we have find the weight of every state
from a whole states in Markov matrix. We can get for it by equation (8) as shown follow.


      State weight Sw =                                    …………………...……..            (8)
       Where,
       o         : is the total pattern matching rate of the state i.
       o   is the number of states of given text document.


  Finally, the PMR is calculated by equation (9), which represent the pattern matching rate
between the original and attacked text document.

                                             ……………….……..                (9)
Where,
      o N:is the total number of statesin the Markov matrix.

  The watermark distortion rate refers to tampering amount occurred by attacks on contents
of attacked text document, this value represent in WDR which we can get for it by equation
(10):
                                                ……………………. (10)

  This process is illustrated in figure 8.




           Fig. 8: Watermark extraction process based on order 2 of a Markov model

                                               273
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

    The watermark detection algorithm is executes as following:

        PROCEDURE watermark_detection(Pt, Pt’)
        - Input: preprocessed text (TP, TP’)
        - Output: PMR, WDR
        - BEGIN
        - // getting watermark of the original document
        - Call watermark_gen_embed(MarkovMatrix[ps][ns])
        - // extract watermark from the attacked document
        - Call watermark_extraction(MarkovMatrix’[ps][ns])
        - // pweform matching process between the original and attacked watermark patterns
                 - IF WMA = WMO
                        o Print “Document is authentic and no tampering occurred”
                        o PMR = 1
                 - Else
                            Print “Document is not authentic and tampering occurred”
         o // compute pattern matching rate on the transition level
                 - Loop i = 0 to MarkovMatrix’[rows].Length - 2,
                        o Loop j = 0 to MarkovMatrix’[columns].Length
                        o IF WMPO[i][j] != 0
                                   patternCount +=1


                                   transPMRTotal +=
                       o Else
                                  IF WMPA[i][j] != 0
                                  patternCount +=
               -   // compute pattern matching rate on state level
               -
               -
               -
               -   stateWeight =
        -      += stateWeight
        -
        -   // compute pattern matching rate on document level
        -
        -
        -
        - // compute watermark distortion rate on document level
        - WDR = 1 – PMR
        - END

-    Where,
       o SW: is the weight of states correctly matched.
       o     : represent the value of watermark distortion rate (0 < WDRS<=1).

                                              274
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

IV.     EXPERIMENTAL SETUP, RESULTS AND DISCUSSION

A. Experimental Setup
        In order to test the proposed approach and compare with other the approach, we con-
ducted a series of simulation experiments. The experimental environment is listed as below:
CPU: Intel Core™i5 M480/2.67 GHz, RAM: 8.0GB, Windows 7; Programming language
PHP NetBeans IDE 7.0. With regard to the data sets used, six samples from the data sets de-
signed in [24]. These samples were categorized into three classes according to their size,
namely Small Size Text (SST), Medium Size Text (MST), and Large Size Text (LST).
        Next, we define the types of attacks and their percentage as follows, Insertion attack,
deletion attack and reorder attack performed randomly on multiple locations of these datasets.
        The details of our datasets volume and attacks percentage used is shown in table I,
which is considered are similar to those performed in [25] for comparison purpose, and it
should be mentioned that we perform the reorder attack on the datasets which is not con-
tained in the same paper.

                                             TABLE I
        ORIGINAL AND ATTACKED TEXT SAMPLES WITH INSERTION AND DELETION PERCENTAGE

                Original
                                                 Attacks Percentage
      Sample      Text
      Text ID    Word
                                 Insertion               Deletion            Reorder
                 Count
      [SST4]      179
      [SST2]      421
                                                       5$, 10%, 20%,      5$, 10%, 20%,
      [MST5]      469       5$, 10%, 20%, 50$
                                                            50$                50$
      [MST2]      559
      [LST4]     2018

        To measure the performance of our approach and compare it with others, the tamper-
ing accuracy which is a measure of the watermark robustness will be used. The PMR value
will give the Tampering Accuracy of the given text document. The watermark distortion rate
WDR is also measured and compared with other approaches. The values of both PMR and
WDR range between 0 and 1 value. The larger PMR value, and obviously the lowest WDR
value mean more robustness, while the lowest PMR value and largest WDR value means less
robustness.
        Desirable value of PMR with close to 0, and close to 1 with WDR. We categorize tam-
per detection states into three classes based on PMR threshold values which are: (High when
PMR values greater than 0.70, Mid when PMR values between 0.40 and 0.70, and Low when
PMR values less than 0.40).
        To evaluate the accuracy of the proposed approach, a series of experiments were con-
ducted with all the well-known attacks such as random insertion, deletion and reorder of words
and sentences on each sample of the datasets. These various kinds of attacks were applied at mul-
tiple locations in the datasets. The experiments were conducted, firstly with individual attacks,
then with all attacks at the same time and conducted comparative results of the proposed ap-
proach with recently similar approach.



                                               275
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

B. Experiments with the proposed approach
        In this section, we evaluate the performance of the proposed approach. The character
set of words cover all English letters, space, numbers, and specialsymbols. The experiments
were conducted with the various kinds of attacks individually with the rates of these attacks
which are 5%, 10%, 20% and 50%respectively. The performance results of this approach un-
der all the mentioned attacks are presented in tabular form in table II and graphically repre-
sented in figure 9, 10, 11 and 12, for Insertion, Deletion, and Reorder attacks respectively.
These results are discussed below.

                                 TABLE II
 EXTRACTED WATERMARK MATCHING AND DISTORTION PERCENTAGE UNDER VARIOUS INDIVID-
                                UAL ATTACKS

             Origi-
              nal                Extracted watermark matching and accuracy under 3 ATTACKS
   Sample    Text      AT-
    Text              TACKS
  Category   Word     Volume          Insertion             Deletion             Reorder
             Count
                                 PMR                   PMR                 PMR
                                           WDR%                 WDR%                 WDR%
                                   %                     %                   %
   [SST4]     179        5%      0.9409     0.0591     0.8936    0.1064    0.7354     0.2646
                        10%      0.8929     0.1071     0.8506    0.1494    0.7825     0.2175
                        20%       0.693      0.307     0.8652    0.1348     0.363     0.637
                        50%      0.6386     0.3614     0.7576    0.2424    0.3008     0.6992
   [SST2]     421        5%      0.9246     0.0754     0.9448    0.0552    0.8835     0.1165
                        10%      0.9052     0.0948     0.7423    0.2577    0.7412     0.2588
                        20%      0.8182     0.1818     0.8083    0.1917    0.7535     0.2465
                        50%      0.6622     0.3378     0.9144    0.0856    0.7624     0.2376
  [MST5]      469        5%      0.9473     0.0527     0.9854    0.0146    0.8589     0.1411
                        10%      0.9068     0.0932     0.9553    0.0447    0.7484     0.2516
                        20%      0.8233     0.1767     0.9475    0.0525    0.5715     0.4285
                        50%      0.6428     0.3572      0.548    0.452     0.2619     0.7381
  [MST2]      559        5%      0.9463     0.0537     0.9565    0.0435    0.8916     0.1084
                        10%      0.9006     0.0994      0.823    0.177     0.7544     0.2456
                        20%      0.8282     0.1718     0.8269    0.1731    0.5697     0.4303
                        50%      0.6576     0.3424     0.3258    0.6742    0.0493     0.9507
   [LST4]     2018       5%      0.0102     0.9898     0.9852    0.0148    0.8697     0.1303
                        10%      0.0095     0.9905      0.979    0.021     0.0577     0.9423
                        20%      0.0106     0.9894     0.0065    0.9935    0.0676     0.9324
                        50%      0.0066     0.9934      0.008    0.992     0.0502     0.9498



  Results of various attacks under 5% scenario
    The results shows the PMR accuracy of the proposed algorithm, as applied on different
  datasets, under 5% rate of insertion, deletion and reorder attacks. The PMR is more than
  70% for all kinds of attacks except under insertion attack with large size of text document
  (LST4) as shown in figure no. 9. It can be observed also that as the PMR is the worst un-
  der reorder attack, and it is the best under deletion attack, in which the PMR still maintains
  a value close to or greater than 90% in all cases.



                                             276
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

              100
               90
               80
               70
               60
               50
               40
               30
               20
               10
                0
                        [SST4]           [SST2]          [MST5]     [MST2]       [LST4]

                                   Insertion 5%     Deletion 5%   Reorder 5%


                     Fig. 9 PMR accuracy under5% scenariosof various attacks

   Results of various attacks under 10% scenario
       As applied on different datasets under 10% rate of insertion, deletion and reorder at-
tacks as shown in figure 10, the PMR value is the best under deletion attacks in which the
PMR still maintains a value greater than 70% for all datasets. However, in insertion attack,
the PMR is still maintains its values close to90% for all datasets except with LST4 dataset.
Which refer to that the proposed approach is not applicable under 10% of insertion attacks
with large sizes of text document. Finally, in case of reorder attack, the PMR value is in-
crease with small size of text documents and decrease with the large documents.

            100
             90
             80
             70
             60
             50
             40
             30
             20
             10
              0
                       [SST4]          [SST2]        [MST5]       [MST2]        [LST4]

                                Insertion 10%     Deletion 10%    Reorder 10%


                    Fig. 10 PMR accuracy under 10% scenariosof various attacks



                                                   277
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

   Results of various attacks under 20% scenario
        Figure 11 shows experimental results as applied on different datasets under 20% rate
of insertion, deletion and reorder attacks. As shown in figure 11, the PMR accuracy is good
with small and middle sizes of text document, but it is bad with the large size of text docu-
ments with all kinds of attacks as shown with LST4 dataset. Which it is refers to that the pro-
posed approach dose not applicable with large documents under 20% rates of various kinds of
attacks.

           100

             80

             60

             40

             20

              0
                    [SST4]          [SST2]      [MST5]       [MST2]       [LST4]

                             Insertion 20%   Deletion 20%   Reorder 20%


                  Fig. 11 PMR accuracy under20% scenariosof various attacks

   Results of various attacks under 50% scenario
        As applied on different datasets under 50% rate of insertion, deletion and reorder at-
tacks as shown in figure 12, the PMR accuracy is increase with small sizes of text documents,
decrees with middle size of text documents, and very bad with large size of text documents in
which values are close to zero with all scenarios. As shown also from figure 12, the PMR still
maintains a value greater than 60% for small and middle datasets under insertion attack.

           100

             80

             60

             40

             20

              0
                    [SST4]          [SST2]      [MST5]       [MST2]       [LST4]

                             Insertion 50%   Deletion 50%   Reorder 50%


                  Fig. 12 PMR accuracy under50% scenariosof various attacks

                                               278
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

C. Comparative Results
        In order to compare the performance of the proposed approach named here as (WO2)
with recently published approach for text watermarking which presented in [27] named here
as (WO1) proposed by F. Al-wesabi et al., TheWO1 has the environment and parameters as
the same of the proposed approach. Both WO2 and WO1approaches depend on wordmecha-
nism of Markov model. However, the core difference between that is order nature, which
WO1 approach based on order one of Markov model, while the proposed approach (WO2)
based on order two of Markov model.
        In this experiments, random multiple insertions, deletion and reorder attacks were per-
formed individually on each sample of the datasets with various rates of attacks as shown
above in table I. Ratios of successfully detected watermark of the proposed algorithm as
compared with reference 27 (WO1)are shown in table III and graphically represented in fig-
ure 13, 14, 15and 16.

                                           TABLE III
 COMPARATIVE PERFORMANCE ACCURACY OF THE PROPOSED ALGORITHM WITH WO1 UNDER
                                     INDIVIDUAL ATTACKS
              Origi
               nal                            Successfully detected watermark
  Sample      Text
                AT-
   Text        TACKS                 Reference 27 (WO1)           The proposed approach
          Word
 Category      Volume                                                    (WO2)
          Coun
                                  Inser-     Dele-     Reor-     Inser-   Dele-   Reor-
            t
                                   tion       tion      der       tion     tion     der
                          5%      94.59      86.02     81.85     94.09    89.36    73.54
                         10%      89.53      86.39     81.65     89.29    85.06    78.25
   [SST4]      179
                         20%      67.47      88.02     45.91      69.3    86.52     36.3
                         50%      64.21      71.56     42.68     63.86    75.76    30.08
                          5%      91.91      95.19     72.05     92.46    94.48    88.35
                         10%      90.34      69.49     78.69     90.52    74.23    74.12
   [SST2]      421
                         20%      81.42      73.95     81.46     81.82    80.83    75.35
                         50%      65.54      75.13     82.59     66.22    91.44    76.24
                          5%      94.64      97.33     88.21     94.73    98.54    85.89
                         10%       90.6      93.19     78.85     90.68    95.53    74.84
  [MST5]       469
                         20%      80.93      89.18     63.2      82.33    94.75    57.15
                         50%      63.03      40.96      0.7      64.28    54.8     26.19
                          5%       94.8      93.69     90.02     94.63    95.65    89.16
                         10%      89.99      78.71     80.13     90.06    82.3     75.44
  [MST2]       559
                         20%      82.43      73.97     65.6      82.82    82.69    56.97
                         50%      66.08      27.86     7.89      65.76    32.58     4.93
                          5%      94.76      96.62     88.61      1.02    98.52    86.97
                         10%      89.67      93.02     60.8       0.95    97.9      5.77
   [LST4]     2018
                         20%       4.09        1.2     9.09       1.06    0.65      6.76
                         50%       1.07       1.54     7.37       0.66      0.8     5.02




                                              279
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

        Comparative Results for Individual Dataset
        In order to evaluate the accuracy of the proposed approach, we will compare its expe-
riment results with WO1 approach under various scenarios of insertion, deletion and reorder
attacks. To perform this comparative, we choose three classes of experimental datasets, SST4
as small dataset, MST5 as a middle dataset, and LST4 as a large dataset.

        Comparative Results under Various Scenarios for Individual Datasets
        As shown in figure 13 under 5% volume of various attacks, the pattern matching rate
(PMR) of WO2 is better than PMR of WO1 in terms of deletion attacks for all datasets. How-
ever, in terms of insertion and reorder attacks, the PMR of WO1 is better than PMR of WO2
for all datasetsexcept with MST5 dataset under insertion attack. Which means the proposed
approach provide added value with all sizes of text documents under deletion attacks.




                  SST4 Dataset                                       MST5 Dataset
   100                                                 100
    80                                                 90
    60                                                 80
    40                                                 70
    20                                                 60
     0                                                 50
           Insertion 5% Deletion 5% Reorder 5%                 Insertion 5% Deletion 5% Reorder 5%

                   Refrence 27 (WO1)                                   Refrence 27 (WO1)
                   This Approach (WO2)                                 This Approach (WO2)




                                           LST4 Dataset
                           110
                           100
                            90
                            80
                            70
                            60
                            50
                                  Insertion 5%   Deletion 5%     Reorder 5%

                                 Refrence 27 (WO1)      This Approach (WO2)



         Fig. 13 Comparison results between (WO1) and (WO2) under5% of various attacks




                                                 280
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

        As shown in figure 14, the performance of WO2 approach is better than WO1 on
middle sizes of text documents under insertion and deletion attacks. On the other hand, the
WO1 is better than WO2 under reorder attacks for all datasets, and under insertion attack on
large size of text documents such as LST4. Which means the proposed approach is robustness
against deletion attacks for all sizes of text documents, recommended for small and middle
sizes of text document under this range of insertion attacks, and not applicable under inser-
tion and reorder attacks for large size of text documents.


                                        SST4 Dataset
                     100
                      90
                      80
                      70
                      60
                      50
                            Insertion 10%    Deletion 10%    Reorder 10%

                             Refrence 27 (WO1)       This Approach (WO2)



                                        MST5 Dataset
                     100
                      80
                      60
                      40
                      20
                       0
                            Insertion 10%    Deletion 10%    Reorder 10%

                             Refrence 27 (WO1)       This Approach (WO2)



                                        LST4 Dataset
                     100
                      90
                      80
                      70
                      60
                      50
                      40
                      30
                      20
                      10
                       0
                            Insertion 10%    Deletion 10%     Reorder 10%

                             Refrence 27 (WO1)        This Approach (WO2)

     Fig. 14 Comparison results between (WO1) and (WO2) under 10% of various attacks

                                             281
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

         As shown in figure 15, and comparative with20% scenario of various attacks, the ro-
bustness of the proposed approach (WO2) improves with small and middle size of text docu-
ments when the volume of insertion and deletion attacks is increments. On the other hand,
comparative results shown also, the robustness of WO2 approach is shown worse than (WO1)
approach under this rate of reorder attack for all datasets. This means that it is applying the
test of the proposed approach (WO2) under 20% and less of volume attacks are applicable for
small and middle size of text documents but not recommended for large sizes of text docu-
ments.

                                         SST4 Dataset
                      100
                       90
                       80
                       70
                       60
                       50
                       40
                       30
                       20
                       10
                        0
                             Insertion 20%    Deletion 20%    Reorder 20%

                              Refrence 27 (WO1)       This Approach (WO2)



                                        MST5 Dataset
                      100
                       90
                       80
                       70
                       60
                       50
                       40
                       30
                       20
                       10
                        0
                             Insertion 20%   Deletion 20%    Reorder 20%

                              Refrence 27 (WO1)       This Approach (WO2)



                                         LST4 Dataset
                      100
                       90
                       80
                       70
                       60
                       50
                       40
                       30
                       20
                       10
                        0
                             Insertion 20%   Deletion 20%    Reorder 20%

                              Refrence 27 (WO1)       This Approach (WO2)


     Fig. 15 Comparison results between (WO1) and (WO2) under 20% of various attacks

                                              282
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

         As shown in figure 16, and comparative with previous discussed scenarios of various
attacks, the robustness of the proposed approach is still better than (WO1) approach under
this rate (50%) especially under insertion and deletion attacks for all datasets.And the robust-
ness value is decrease with large size of text documents. In the other word, the proposed ap-
proach provide added value in term of robustness on small and middle sizes of text docu-
ments especially under insertion and deletion attacks.


                                            SST4 Dataset
                     100
                      90
                      80
                      70
                      60
                      50
                      40
                      30
                      20
                      10
                       0
                             Insertion 50%      Deletion 50%     Reorder 50%

                              Refrence 27 (WO1)          This Approach (WO2)



                                            MST5 Dataset
                      100
                       90
                       80
                       70
                       60
                       50
                       40
                       30
                       20
                       10
                        0
                             Insertion 50%      Deletion 50%    Reorder 50%

                              Refrence 27 (WO1)         This Approach (WO2)



                                            LST4 Dataset
                      8
                      6
                      4
                      2
                      0
                            Insertion 50%      Deletion 50%     Reorder 50%

                              Refrence 27 (WO1)         This Approach (WO2)


     Fig. 16 Comparison results between (WO1) and (WO2) under 50% of various attacks

                                                283
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

        Comparative Results under Various Scenarios for All Datasets
        The following figure no. 17, shows the performance of the two approaches, as applied
under 5% of various kind of attacks on different datasets. As shown, for all datasets, the pro-
posed approach WO2 is better performance under insertion and deletion attacks. However,
the WO1 is better performance than WO2 approach under reorder attacks, which show in
general that the proposed approach is recommended under low volume of all tampering at-
tacks for all sizes of text documents

               100
                90
                80
                70
                60
                50
                40
                30
                20
                10
                 0
                        WO1        WO2             WO1       WO2        WO1       WO2

                          Insertion 5%              Deletion 5%          Reorder 5%

                              [SST4]      [SST2]    [MST5]     [MST2]    [LST4]

   Fig. 17 Comparison results between (WO1) and (WO2) under 5% of various attacks for all datasets

        Figure 18 illustrate the comparative results under 10% rate of various attacks, As
shown for all datasets, the WO1 and WO2 approaches are close together under insertion and
deletion attacks exception on large size dataset (LST4 dataset) which WO1 is better under
insertion attacks. Also, under reorder attack, compression results shows that the WO1 ap-
proach is better than WO2 with all datasets.

                100
                 90
                 80
                 70
                 60
                 50
                 40
                 30
                 20
                 10
                  0
                        WO1            WO2         WO1       WO2        WO1       WO2

                          Insertion 10%             Deletion 10%         Reorder 10%

                              [SST4]      [SST2]    [MST5]     [MST2]    [LST4]


  Fig. 18 Comparison results between (WO1) and (WO2) under 10% of various attacks for all datasets


                                                    284
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

        As applied under 20% of different attacks for all datasets. We can say the perform-
ance of the two approachesis the same under insertion and deletion attacks as shown in figure
no. 19. However, the performance of WO1 approach is better than WO2 under reorder
attacks.

           100
            90
            80
            70
            60
            50
            40
            30
            20
            10
             0
                   WO1         WO2            WO1        WO2       WO1       WO2

                     Insertion 20%             Deletion 20%          Reorder 20%

                           [SST4]    [SST2]     [MST5]    [MST2]   [LST4]

  Fig. 19 Comparison results between (WO1) and (WO2) under 20% of various attacks for all
                                       datasets

       Figure 20 illustrate the comparative results under high rate (50%) of various attacks,
As shown for all datasets, the proposed approach WO2 has the best performance and provide
added value under insertion and deletion attacks for all datasets, and the proposed approach
WO2 is not effective under reorder attacks.

           100
            90
            80
            70
            60
            50
            40
            30
            20
            10
             0
                   WO1         WO2            WO1        WO2       WO1        WO2

                     Insertion 50%             Deletion 50%          Reorder 50%

                           [SST4]    [SST2]     [MST5]    [MST2]   [LST4]


  Fig. 20 Comparison results between (WO1) and (WO2) under 50% of various attacks for all
                                       datasets


                                                285
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

       Comparative Results of PMR Standard Deviation for Individual Dataset

         In order to evaluate the performance of the proposed approach (WO2), we find the
PMR standard deviation between WO1 and WO2 approaches (PMR of WO2 - PMR of WO1)
for all scenarios of each attack applied on each dataset as shown in Table IV.

                                   TABLE. IV
   STANDARD DEVIATION OF ALL SCENARIOS FOR ALL DATASETS UNDER VARIOUS ATTACKS

                                         PMR Standard Deviation


                       Reference 27 (WO1)                     This approach (WO2)
   Dataset

                Insertion    Deletion      Reorder      Insertion    Deletion      Reorder


    SST4
                 78.95         83.00        99.44        79.14         84.18        54.54

    SST2         82.30         78.44        99.52
                                                         82.76         85.25        78.52

    MST5         82.30         80.17        87.53
                                                         83.01         85.91        61.02

    MST2         83.33         68.56        98.19
                                                         83.32         73.31        56.63

    LST4         47.40         48.10        99.23
                                                          0.92         49.47        26.13


        The average of standard deviation of all scenarios for small dataset (SST4), medium
dataset (MST5), and large dataset (LST4)are shown respectively in figure21. As shows, in
case of SST4 dataset, the proposed approachWO2observed as the best under insertion and
deletion attacks. On the other side, the WO1 approach is the best under reorder tampering
attack in which the difference of standard deviation average with the proposed approach
WO2 is (-44.9) which means that WO1 approach is recommended for detect reorder attacks,
and the performance has been improved by the proposed approach WO2 under insertion and
deletions attacks.
        As shown in case of MST5 dataset, the performance of WO2has improved under in-
sertion and deletion attacks especially under deletion attacks with deference of standard devi-
ation average with WO1 approach that approximately equal to (5.74), we observed also the
PMR of the proposed approach has improved under reorder attacks with middle size of text
document (MST5) as a comparative with small size of text document (SST4) but the WO1
approach still the best under this reorder tampering attacks.


                                             286
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

        Finally, in case of large dataset LST4, comparative results shows that the PMR stan-
dard deviation of WO2 approach still the best under deletion attacks, and decrease under in-
sertion and reorder attacks.


                                      SD of SST4 Dataset
                100
                 90
                 80
                 70
                 60
                 50
                 40
                 30
                 20
                 10
                  0
                          Insertion           Deletion              Reorder

                          Reference 27(WO1)              This approach (WO2)


                                  SD of MST5 Dataset
                100
                 90
                 80
                 70
                 60
                 50
                 40
                 30
                 20
                 10
                  0
                          Insertion           Deletion              Reorder

                          Reference 27(WO1)              This approach (WO2)


                                      SD of LST4 Dataset
                100
                 90
                 80
                 70
                 60
                 50
                 40
                 30
                 20
                 10
                  0
                          Insertion           Deletion              Reorder

                          Reference 27(WO1)          This approach (WO2)


   Fig. 21 PMR standard deviation of all scenarios forSST4, MST5 and LST4 dataset under
                                      various attacks

                                              287
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

        Comparative Results of PMR Standard Deviation for All Datasets
        As shown in figure 22, the averageof standard deviation of all scenarios for all data-
sets shows that the proposed approach WO2 has positive difference with WO1approach
(WO2 PMR – WO1 PMR) in term of deletion attack which equal to (3.97) and has negative-
difference under insertion (-9.03) and reorder (-41.42) attacks. Thus, the test of WO2 provide
added value and recommended under deletion attacks, and it is not recommended for inser-
tion and reorder attacks.



                                        SD for all Datasets
                100

                 80

                 60

                 40

                 20

                   0
                            Insertion             Deletion                Reorder

                             Reference 27 (WO1)              This approach (WO2)


     Fig. 22 PMR standard deviation of all scenarios for all datasets under various attacks


V. CONCLUSION

        Based on word mechanism of Markov model order two, the authors have designed a
text zero-watermark approach which is based on text analysis. The algorithm uses the text
features as probabilistic patterns of states and transitions in order to generate and detect the
watermark. The proposed approach is implemented using PHP programming language. The
experiment results shows that the proposed approach is sensitive for all kinds of random tam-
pering attacks and has good accuracy of tampering detection. Compared with the recent pre-
vious watermark approach named WO1 presented in reference (27) under random insertion,
deletion and reorder attacks in multiple locations of 5 variable size text datasets, the compara-
tive results shows that the watermark complexity is increased with the proposed approach,
not effective under reorder attacks. However, the accuracy of tampering detection of the pro-
posed approach is improved under all rates of deletion attacks with all sizes of text docu-
ments, and it’s close to accuracy of WO1 approach under insertion attacks.This means that
the proposed approach provide added value and recommended in these cases, but it is not ro-
bust against reorder attacks especially for large sizes of text documents.




                                               288
 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

 REFERENCES

 [1] Z. JaliI, A. Hamza, S. Shahidm M. Arif, A. Mirza, A Zero Text Watermarking Algorithm
     based on Non-Vowel ASCII Characters. International Conference on Educational and In-
     formation Technology (ICET 2010), IEEE.
 [2] Suhail M. A., Digital Watermarking for Protection of Intellectual Property. A Book
     Published by University of Bradford, UK, 2008.
 [3] L. Robert, C. Science, C. Government Arts, A Study on Digital Watermarking Tech-
     niques. International Journal of Recent Trends in Engineering, Vol. 1, No. 2, pp. 223-225,
     2009.
 [4] X. Zhou, S. Wang, S. Xiong, Security Theory and Attack Analysis for Text Watermarking.
     International Conference on E-Business and Information System Security, IEEE, pp. 1-6,
     2009.
 [5] T. Brassil, S Low, and N. F. Maxemchuk, Copyright Protection for the Electronic Dis-
     tribution of Text Documents. Proceedings of the IEEE, vol. 87, no. 7, July 1999, pp.
     1181-1196.
 [6] M. Atallah, V. Raskin, M. C. Crogan, C. F. Hempelmann, F. Kerschbaum, D. Mohamed,
     and S.Naik, Natural language watermarking: Design, analysis, andimplementation. Pro-
     ceedings of the a Fourth Hiding Workshop, vol. LNCS 2137, 25-27 , 2001.
 [7] N. F. Maxemchuk and S Low, Marking Text Documents. Proceedings of the IEEE Inter-
     national Conference on Image Processing, Washington, DC, Oct 26-29, 1997, pp. 13- 16.
 [8] D. Huang, H. Yan, Interword distance changes represented by sine waves for water-
     marking text images. IEEE Trans. Circuits and Systems for Video Technology, Vol.11,
     No.12, pp. 1237 1245, 2001.
 [9] N. Maxemchuk, S. Low, Performance Comparison of Two Text Marking Methods. IEEE
     Journal of Selected Areas in Communications (JSAC), vol. 16 no. 4, pp. 561-572, 1998.
[10] S. Low, N. Maxemchuk, Capacity of Text Marking Channel. IEEE Signal Processing
     Letters, vol. 7, no. 12 , pp. 345 -347, 2000.
[11] M. Kim, Text Watermarking by Syntactic Analysis. 12th WSEAS International Confe-
     rence on Computers, Heraklion, Greece, 2008.
[12] H. Meral, B. Sankur, A. Sumru, T. Güngör, E. Sevinç , Natural language watermarking
     via morphosyntactic alterations. Computer Speech and Language, 23, pp. 107-125, 2009.
[13] Z. Jalil, A. Mirza, A Review of Digital Watermarking Techniques for Text Documents.
     International Conference on Information and Multimedia Technology, pp. 230-234 ,
     IEEE, 2009.
[14] M. AtaIIah, C. McDonough, S. Nirenburg, V. Raskin, Natural Language Processing for
     Information Assurance and Security: An Overview and Implementations. Proceedings
     9th ACM/SIGSAC New Security Paradigms Workshop, pp. 5 1-65, 2000.
[15] H. Meral, E. Sevinc, E. Unkar, B. Sankur, A. Ozsoy, T. Gungor, Syntactic tools for text
     watermarking. In Proc. of the SPIE International Conference on Security, Steganography,
     and Watermarking of Multimedia Contents, pp. 65050X-65050X-12, 2007.
[16] O. Vybornova, B. Macq., Natural Language Watermarking and Robust Hashing Based
     on Presuppositional Analysis. IEEE International Conference on Information Reuse and
     Integration, IEEE, 2007.
[17] M. tallah, V. Raskin, C. Hempelmann, language watermarking and tamperproofing.
     Proc. of al.. Natural 5th International Information Hiding Workshop, Noordwijkerhout,
     Netherlands, pp.196-212, 2002.

                                              289
 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

[18] U. Topkara, M. Topkara, M. J. Atallah, The Hiding Virtues of Ambiguity: Quantifiably
     Resilient Watermarking of Natural Language Text through Synonym Substitutions. In
     Proceedings of ACM Multimedia and Security Conference, Geneva, 2006.
[19] Z Jalil, A. Mirza, H. Jabeen, Word Length Based Zero-Watermarking Algorithm for
     Tamper Detection in Text Documents. 2nd International Conference on Computer Engi-
     neering and Technology, pp. 378-382, IEEE, 2010.
[20] Z Jalil, A. Mirza, M. Sabir, Content based Zero-Watermarking Algorithm for Authentica-
     tion of Text Documents. (IJCSIS) International Journal of Computer Science and Infor-
     mation Security, Vol. 7, No. 2, 2010.
[21] Z. Jalil , A. Mirza, T. Iqbal, A Zero-Watermarking Algorithm for Text Documents based
     on Structural Components. pp. 1-5 , IEEE, 2010.
[22] M.Yingjie, G. Liming, W.Xianlong, G Tao, Chinese Text Zero-Watermark Based on
     Space Model.In Proceedings of I3rd International Workshop on Intelligent Systems and
     Applications,pp. 1-5 , IEEE, 2011.
[23] S. Ranganathan, A. Johnsha, K. Kathirvel, M. Kumar, Combined Text Watermarking. In-
     ternational Journal of Computer Science and Information Technologies, Vol. 1 (5), pp.
     414-416, 2010.
[24] Fahd N. Al-Wesabi, Adnan Alsakaf, Kulkarni U. Vasantrao, “A Zero Text Watermark-
     ing Algorithm based on the Probabilistic weights for Content Authentication of Text
     Documents”, in Proc. On International Journal of Computer Applications(IJCA), U.S.A,
     pp. 388 - 393, 2012.
[25] Fahd N. Al-Wesabi, Adnan Z. Alsakaf and Kulkarni U. Vasantrao, “A Zero Text
     Watermarking Algorithm Based on the Probabilistic Patterns for Content Authentication
     of Text Documents”, International Journal of Computer Engineering & Technology
     (IJCET), Volume 4, Issue 1, 2013, pp. 284 - 300, ISSN Print: 0976 – 6367, ISSN Online:
     0976 – 6375.
[26] Fahd N. Al-Wesabi, Adnan Alsakaf, Kulkarni U. Vasantrao, “English Text Zero-
     Watermark Based on Markov Model of Letter Level Order Two”, Inderscience, Interna-
     tional Journal of Applied Cryptography (IJACT), Submitted..
[27] Fahd N. Al-Wesabi, Adnan Alsakaf, Kulkarni U. Vasantrao, “Content Authentication of
     English Text Documents Using Word Mechanism Order ONE of Markov Model and Ze-
     ro-Watermarking Techniques”, Elsevier, International journal of applied soft compu-
     ting, Submitted.




                                            290

								
To top