Steganography and Digital Watermarking

Document Sample
Steganography and Digital Watermarking Powered By Docstoc
					Steganography and
Digital Watermarking

           Jonathan Cummins,
           Patrick Diskin,
           Samuel Lau,
           Robert Parlett.
                         (covered writing, covert channels)

Protection against detection                    Protection against removal
         (data hiding)                                    (document marking)

                                 Watermarking                             Fingerprinting
                                (all objects are marked                (identify all objects, every
                                   in the same way)                    object is marked specific)

                                                                                   Source: Richard Popa.
             Why not Encryption?

        Steganography                                        Encryption
(hide existence of the secret message,                    (encrypt the message,
      but do not use encryption)                     but do not hide the message)

                                                                             Source: Richard Popa.

 •   Ideally nobody can see both parties              •    Anybody can see both parties
     are secretly communicating.                           are communicating in secret.

 •   Innocent.                                        •    Suspicious.
• 440 B.C.
   – Histiaeus shaved the head of his most trusted slave and tattooed it with a message
     which disappeared after the hair had regrown. To instigate a revolt against Persians.

• 1st and 2nd World Wars
   – German spies used invisible ink to print very small dots on letters.
   – Microdots – Blocks of text or images scaled down to the size of a regular dot.

• Current
   – Special inks are used to write a hidden messages on bank notes.
   – Industry demands for digital watermarking and fingerprinting of audio and video.
        Copyright Watermarking
Why is it so important?

• Internet has lead to sharing of information e.g. digital libraries.

• However problem of ownership is introduced.

• People can simply copy shared information and claim it is theirs.

• Cannot detect when people have violated copyright of material.

• Could cause massive losses in revenue for the copyright holders.

• Global piracy costs music recording industry over £2.8bn a year.

• Need a way of prosecuting individuals for use of non-copyright materials.
          Hiding Information Digitally


• Secret data integrity must remain after being embedded in stego object.

• The stego object must remain unchanged or almost unchanged to naked eye.

• In watermarking, changes in stego object have no effect on watermark.

• We assume the attacker knows secret data is hidden inside the stego object.
Basic Principle in Steganography
                                                  Stego Object


  Secret            Decoder
  Image                                        Communications

           Types of Steganography
• Fragile
   – Hidden information destroyed as soon as object is modified.

    – Protocols tend to be easy to implement.

    – Useful in proving objects have not been manipulated and changed e.g.
      evidence in a court of law.

• Robust
   – It should be infeasible to remove the hidden data without degrading the
     perceived quality of the data.

    – Protocols are more complex.

    – One single protocol may not withstand all object manipulations.

    – Useful in copyright watermarking.
      Steganography Techniques

• Binary Files
• Text
   – Document
   – XML
• Images
   – LSB, DCT, Wavelet
• Audio
   – Midi, MP3
• Other Types
Information Hiding in Binary Files
 • If we change or remove something in a binary file, execution could be different.

 • We can use a serial key or authors logo to achieve copyright protection.

 • Cracks and key generators are widely available for common programs.

 • Method for watermarking the binary source:

 a = 2;                 b = 3;                b = 3;                b = 3;
 b = 3;                 c = b + 3;            a = 2;                c = b + 3;
 c = b + 3;             a = 2;                c = b + 3;            d = b + c;
 d = b + c;             d = b + c;            d = b + c;            a = 2;

 • b, c, d must be done in same order, but a can be executed at any time.
Information Hiding in Binary Files
 W = {w1, w2, w3, w4, ....., wn}   (Watermark)    wi Є {0, 1}

 • Divide program into n blocks.

 • 0 = code left unchanged, 1 = two instructions are switched.

 • To decode we need the original binary file.

 • Comparing the original and marked binary files, we can recover W.

 • Not resistant to attacks.

 • If the attacker has enough copies, he can recover W.
Information Hiding in Documents
• Originals and photocopied materials look different on paper.

• Electronic originals and copied materials are identical.

• Data hiding in documents uses embedded marks.

• Marks can be same or different on all copies.

• Can be achieved by:
    – Altering text formatting,
    – Altering characteristics of characters.

• Alterations not visible but decodable.
Information Hiding in Documents

     •   General protocol in document hiding.
     •   A page of doc is represented by a function, f.
     •   Codeword assigned to document decides which line is to be altered.
     •   Differential Encoding Technique generally used.
     •   Require a encoder and decoder.


Original Document                                            Marked Documents
 Information Hiding in Documents
• One of three techniques are applied to hiding data:
   – Line Shift Coding - Vertical shifting of lines
                         Shifts lines up slightly up or down                       h-i
 Shifted up slightly                                                                h+i
                         Lines to be shifted decided by Codebook

   – Word Shift Coding - Horizontal spacing between each word

       Shift of words slightly left or right, decided by codebook
       An Example of this          “Example” is shifted to the left. “this” is shifted
       An Example of this                            to the right

   – Feature Coding - Analyse document, then pick features to change
     e.g. text height
                    Text Techniques

• White Space manipulation
   – Text viewers can’t see white space at the end of lines.

• Using a document’s grammar to hide information
   – “The auto drives fast on a slippery road over the hill” changed to
     “Over the slope the car travels quickly on an ice-covered street”.

• Encoding text with a different meaning
   – By using a cipher key.
                    Text Techniques
                                   Dear Friend , Especially for you - this red-hot
                                   intelligence. We will comply with all removal requests .
                                   This mail is being sent in compliance with Senate bill
                                   2116 , Title 9 ; Section 303 ! THIS IS NOT A GET RICH
                                   SCHEME.Why work for somebody else when you can
                                   become rich inside 57 weeks . Have you ever noticed
• Text being hidden: “I'm having   most everyone has a cellphone & people love
  a great time learning about      convenience . Well, now is your chance to capitalize on
                                   this . WE will help YOU SELL MORE and sell more !
  computer security”.
                                   You are guaranteed to succeed because we take all the
                                   risk ! But don't believe us . Ms Simpson of Washington
                                   tried us and says "My only problem now is where to park
                                   all my cars" . This offer is 100% legal . You will blame
                                   yourself forever if you don't order now ! Sign up a friend
                                   and you'll get a discount of 50% . Thank-you for your
                                   serious consideration of our offer . Dear Decision maker
                                   ; Thank-you for your interest in our briefing . If you are
                                   not interested in our publications and wish to . . .
                   Text Techniques


  – Universal format for structured data and documents.

  – Basic technology for information exchange.

  – Due to this security is a growing factor which Steganography can help

  – Different components in which data can be hidden – css, dtd, xsl.
                    Text Techniques

• Using tag structure to hide

                                        Stego data:
                                <img src=”foo1.jpg”></img>
         Stego key:
                                   <img src=”foo2.jpg”/>
      <img></img> … 0
                                   <img src=”foo3.jpg”/>
         <img/> … 1
                                   <img src=”foo4.jpg”/>
                                <img src=”foo5.jpg”></img>

        Bit String: 01110
                 Text Techniques
• Using white space in tags

       Stego key:
                               <user >
   <tag>, </tag>, or
                                        <name>Alice</name >
        <tag/> … 0
  <tag >, </tag >, or                   <id >01</id>
       <tag /> … 1

                       Bit String: 101100
                 Text Techniques

• Containment of elements

                          stego key:
      <favorite><fruit>SOMETHING</fruit></favorite> … 0
      <fruit><favorite>SOMETHING</favorite></fruit> … 1

• Using the order of elements

                        stego key:
      <user><name>NAME</name><id>ID</id></user> … 0
      <user><id>ID</id><name>NAME</name></user> … 1
                 Image Techniques

• Simple Watermarking

   – A simple way of watermarking images is to embed another image into them.

   – This embedded image can be a company logo or name etc.

                       +                          =
                  Image Techniques

• LSB – Least Significant Bit

   – A simple yet effective way of hiding data in an image for any purpose.

   – The least significant bits of the host image are used to hide the most
     significant bits of the hidden image (for image-in-image hiding).

   – The least significant bits can always be used to hide other data types.

   – The next example will show how image-in-image hiding works via this method.
                    Image Techniques

•   Store host image and hidden image in memory.

•   Pick the number of bits you wish to hide the hidden image in.

•   Scan through the host image and alter its LSB’s with the hidden images MSB’s.
    So when 4 bits are used to hide information…

                             Host Pixel: 10110001
                            Secret Pixel: 00111111
                          New Image Pixel: 10110011

•   To extract the hidden image, you basically take out the LSB’s from the host
    image and create a new image from them.
Image Techniques

      Bit Level 1
    Original Images
                    Image Techniques

•   This method works best when both the hidden image and host image have equal
    priority in terms of the number of bits used.

•   Not a very good way of watermarking as it is easy to remove the hidden data.

•   The hidden data can easily be corrupted by noise.

•   The LSB’s can be used to store other information like text – the only limitation is
    the size of the data you wish to store.
                 Image Techniques

• DCT (Direct Cosine Transformation)

• DCT’s convert images from the spatial domain to the
  frequency domain.

   – High frequencies correspond to rapidly changing pixel values.
   – Low frequencies correspond to slowly changing pixel values.

• Used to compress JPEG images and can be used as part of a
  information hiding technique.
                 Image Techniques

• A Quantizer is used as part of the JPEG compression technique.

• It lowers the accuracy of the DCT coefficients which are obtained by
  executing a DCT on 8x8 blocks of the host image.

• These values can be tweaked to be all even or all odd.

                             All even = 1
                              All odd = 0

• An image can store 1 bit of information per 8x8 block.
                Image Techniques

• DCT example

    Original            Watermarked           JPEG Compressed

                 Pretty much no difference!
                 Image Techniques

• Wavelet Transformation

   – Wavelets are mathematical functions for image
     compression and digital signal processing.

   – Used in the JPEG2000 standard.

   – Wavelets are better for higher compression levels than the DCT method.

   – Generally wavelets are more robust and are a good way of hiding data.
                    Image Techniques

•   Wavelets are used to store the “detail” in images.

•   They store the high frequency information while the low frequency information is
    stored separately.

•   This allows for high compression as the detail is never lost and yet the low
    frequency image parts can be compressed continually.

•   Same techniques as used with DCT during the quantizer step.

•   Currently an ongoing research area.
                 Sound Techniques
• Midi

   – Midi files are made up of a number of different messages – some of which
     are silent, some of which are audible.

   – A message called Program Change is used to change the current instrument.

   – If a number of these messages are placed together, only the last change is

   – Store the hidden information in the preceding “fake” program changes!
                 Sound Techniques
• MP3

  – The data to be hidden is stored as the MP3 file is
    created – in the compression stage.

  – As the sound file is being compressed, data is selectively lost depending on
    the bit rate the user has specified.

  – The hidden data is encoded in the parity bit of this information.

  – To retrieve the data all you need to do is uncompress the MP3 file and read
    the parity bits.
                  Other Techniques

• Video

   – A mixture of both image and sound techniques are used.


   – Use different DNA bases to code secret messages via some cipher key.

   – DNA is so small it can be hidden in a dot like the microdot method.
            Limitations And Attacks
• Five categories of attacks:

   - Basic attacks take advantage of weaknesses in embedding technique.

   - Robustness attacks attempt to diminish or remove the watermark.

   - Presentation attacks modify the content of the file to prevent detection of

   - Interpretation attacks involve finding a situation which prevents assertion of

   - Implementation attacks take advantage of poorly implemented software.
                       Basic Attacks

• Simple spread spectrum techniques vulnerable to timing errors.

• Adjusting the length of an audio file, while leaving the pitch unaffected
  can remove hidden data.
                Robustness Attacks

• Need to cope with common transformations to prevent accidental
  removal of mark.

• Many techniques can survive individual transformations but are
  vulnerable to combinations of them.

• Try to anticipate pirate’s actions and design to cope with them.

• Use of benchmarking can help determine vulnerabilities.
    Robustness Attacks - StirMark
• Performs a series of almost
  unnoticeable distortions to
  attempt to remove mark:

    • Geometric distortion

    • Low frequency deviation

    • Transfer function

• Applying StirMark to (a) and (c)
  produces (b) and (d).
     Presentation Attacks - Mosaic
• Takes advantage of minimum size requirements
  for embedding.

• Split image into small tiles to prevent detection of
  the mark.

• Recombine when displaying.

• Attempt to remove mark inserted by Digimarc.

• Tiles bordered in red show no sign of the mark.

• Even with 16 tiles, 6 still contain the mark.
              Interpretation Attacks

• Cannot tell which watermark is inserted first.

• Copyright owner publishes document d with watermark w, ie d + w.

• Pirate adds watermark w’ and claims that original is d + w – w’.

• Clear that someone is lying but no way of telling who is genuine owner.
            Implementation Attacks

• If software implementation is poor it can allow some attacks.

• Digimarc requires users to register ID and password.

• Attacker broke into software and disabled password checks.

• Could then change the ID, affecting already marked images and
  bypassing checks for existing marks to overwrite them.

                Confidentiality   Integrity   Unremovability

Encryption           Yes             No            Yes

                      No            Yes            No

Steganography       Yes/No        Yes/No           Yes
•   Steganography will become increasingly important as more copyrighted material
    becomes available online.

•   Many techniques are not robust enough to prevent detection and removal of
    embedded data.

•   For technique to be considered robust:
     –   The quality of the media should not noticeably degrade upon embedding data.
     –   Data should be undetectable without secret knowledge typically the key.
     –   If multiple marks are present they should not interfere with each other.
     –   The marks should survive attacks that don’t degrade the perceived quality of the work.

•   Methods of embedding and detecting are likely to continue to improve.