Chapter 7_ Data Structures

Document Sample
Chapter 7_ Data Structures Powered By Docstoc
					      Chapter 7: Data Structures
• Earlier in the semester we introduced data types
  – Data types: built-in types for the language
     • primitive types: integer, real, character, boolean, string (in
       some languages)
     • more exotic types: arrays, records, pointers
• Here, we introduce data structures
  – Data structures: more elaborate data types usually
    composed out of primitive types using arrays,
    records and pointers
     • lists, queues, stacks, trees, graphs, objects
     • leads to ADTs
• While primitive types allow • We use arrays to store lists
  us to store one item, an array – Arrays offer quick access to any
  allows us to store many items    element, known as random access
  of the same type                   • easy to work with, pass entire
                                               array as one parameter
   – homogeneous storage                     • we can also use arrays to
   – define type to be stored and              represent other structures such
     array size                                as queues, stacks and trees
       • size is denoted by indices          • One drawback of the array is
                                               its fixed size
            – some languages have fixed
              lower end (1 or 0)                  – most languages require
                                                     that the upper limit on the
            – other languages allow the
                                                     array size be specified at
              user to define the lower and
                                                     compile time (or run time)
              upper ends of the indices
                                                  – too small of an upper limit
   – to access an array element, we                  and your array is too small
     must specify an index as in                  – too large of an upper limit
     A[i] or A[5]                                    and you waste memory
  Language Implementation of Arrays
• The language must have a mechanism for
  determining the storage location of an array item
  given the array index
  – Arrays are stored in consecutive memory locations no
    matter what the “shape” of the array is
  – This conversion mechanism is the mapping function
     • 1 dimensional array: location = offset + (i-1)*unit_size
         – i is the index, unit_size is the size in bytes of each element
     • 2-d array: location = offset + [(i - 1) * c + (j - 1)] * unit_size
         – i, j are the indices and c is the number of columns in the array
     • The equation for the 2-d array assumes row-major order, there
       is also the possibility for column-major order
         – although this appears to only be used in FORTRAN
         – see figures 7.1 and 7.2 p. 323
• Lists represent ordered or unordered
  information that can be accessed randomly
  – We can implement this using an array
  – We can also implement this using individual
    elements connected together by pointers
     • A pointer is a variable which stores a memory location
     • An advantage of a list made with pointers is that it is
       dynamic, it can grow or shrink thus using the exact
       amount of memory required
            A record storing 2 items
        Datum                  Pointer to another datum
        List Implementations
• Contiguous list: using an array
  – Problems:
     • fixed size (not dynamic)
     • if the list is ordered (sorted) then
         – adding into the list requires shifting elements down
         – deleting from the list requires shifting elements up
     • see figure 7.4 p. 328 for a contiguous list of strings
• Linked list: using pointers
     • adding and deleting are simplified but
     • unlike the array, there is no random access, we have
       to follow the chain of pointers when searching for
       an item whether the list is sorted or not
                  More on Pointers
• Pointers are studied in detail in 2380
  – here we will only briefly talk about them
     • Languages require instructions to
           – create a data structure that is accessed by pointer
           – destroy a structure accessed by a pointer
           – a way to access the block of memory through the pointer
     • Languages will also have a special value for a pointer
       called NIL when the pointer is currently not pointing at
     • A list also requires a head pointer (points to first item)

           Adding and Deleting in Lists
• Array based adding and             • Pointer-based adding and
  deleting:                            deleting:
• To add at position i, we shift the • First search for correct position to
  elements from i to n down 1          do the adding or deleting)
  position each                      • To Add:
   –   for (j= n; j>1; j--)               – Previous item’s pointer must now
            a[j+1] = a[j];                  point to the new item
       a[i] = new_item;                   – New item’s pointer must now point
        n = n+1;                            to what previous item’s pointer used
• To delete the item at position i,         to point to
                                          • See figure 7.7 p. 330
  we shift the elements from i+1
  to n up 1 position each and       • To Delete
  subtract 1 from n                    – Previous item’s pointer must point to
                                            what the item after the item to be
   – for (j=i; j<=n; j++)
            a[j-1] = a[j];
     n = n-1;                             – Delete item (return to heap)
                                              • See figure 7.6 p. 330
                      Conceptual Lists
– A list must support certain       – Notice however that a list
  operations:                         implemented as an array has the
   •   sorting                        drawback of static size while the
                                      list implemented using pointers
   •   searching
                                      has the drawback of no random
   •   accessing the ith item
   •   adding
                                    – We use computational
   •   deleting
                                      complexity analysis to
   •   printing                       determine which implementation
   •   destroying                     is better for a given situation
– Once a list is created, with         • Arrays offer O(1) access, O(log n)
  these operations, we can use           searching with binary search, O(n)
                                         adding and deleting
  the list without worrying about      • Linked lists offer O(n) access, O(n)
  its implementation, this is the        searching, and O(n) adding and
  concept behind the ADT                 deleting
           Problem with pointers
• There are several problems associated with
  pointers and programmers often have problems
  using pointers until they are used to them
  – dangling pointers
     • destroying an item but leaving the pointer around
  – lost objects
     • changing a pointer to point at something else, the initial
       object is lost
  – using pointer arithmetic
     • in C, it was common to use integers as pointers and use
       arithmetic to adjust pointers, but this is not a good idea!
  – running out of heap memory
               Stacks and Queues
   Top, Head
                    • In a list, you can access any
                      of the elements
Interior            • In a stack, you can only
are                   access the top element
for queues          • In a queue, you can only
and stacks            access the top (head) and
                      bottom (tail) elements
                       – you insert at the tail and
                         remove from the head
• Similar to a list but all accesses are • Stacks are mostly used for
  made at one location called the top backtracking purposes
   – Add items only at the top                   – you want to get back to the point
   – Remove item only at the top                   you were at before trying the
   – The stack is a LIFO (last in first out)       latest move
     structure                                • Uses/Applications
       • Always remove the most recently         – used in mazes or game playing
         placed item                               to “undo” a move or decision
   – Oldest item will be the last removed        – OS uses a run-time stack to keep
       • Like a stack of trays in a cafeteria      track of where you are at in your
   – Operations:                                   program
       • create, destroy                         – Procedures and functions are
                                                   pushed onto the stack when they
       • push -- add an item to the top            are called – returning is easy,
       • pop -- remove an item from the            just pop the location of the
         top                                       procedure call off the run-time
       • empty and full                            stack
                                                     • See figure 7.9 p. 333
             Stack Implementation 1
• Array-based
  – array of N items and an integer -- the stack
    pointer called the top
     • Top indicates the index of the top of stack
         – pushing: increment Top, insert new item there
         – popping: remove item at Top, decrement Top
         – full: if Top = = N
         – empty: if Top = = 0
     • Notice that the top of the stack moves up and down, the
       bottom of the stack remains fixed at position 1 in the
  – because the array is fixed size, our stack is
    limited in size and so, while this implementation
    is easy, it may not be a good implementation
             Stack Implementation 2
• Pointer-based
  – Use record structures for each stack element, an info field and a
    next pointer
     • Need pointer, Top, to point to current TOP of the stack, initially NIL
         – pushing: create new item, pointed to by temp, set temp’s next pointer
            to top, set top to temp
         – popping: set temp to top, make top’s pointer point to temp’s next
            pointer, delete temp
         – empty: if top = = NIL
         – full: assume it is never full
     • The pointer-based implementation is not fixed size and only full if you run
       out of heap memory
         – Unlike the list implemented using pointers, the stack is not inefficient
            because we never try to randomly access into the stack

• FIFO structure (first in, first out)
                                       • Since we access the queue
   – Used to represent lists that are
     processed in order
                                         at two points, we need to
                                       means of accessing it, we
       • such as a list of jobs waiting at a
         printer                       call these the head (or
       • Queues are used throughout thefront) pointer and the tail
         operating system
                                       (or rear) pointer
• Unlike the stack, the queue offers
  access at two ends:                • Operations:
   – at the front end: we remove               – Enqueue: add item at the tail
     items                                     – Dequeue: remove item at
       • called the head                         the head
   – at the rear end: we add items             – Create, Destroy, Empty, Full
       • called the tail
             Queue Implementation 1
• Array based:
  – Use an array of N items and two integers, tail and
    head, that indicate the index of the two ends of the
     • Enqueue: Add item at q[tail] and increment tail
     • Dequeue: Remove item at q[head] and increment head
     • Empty: if tail = = head; Full: if tail = = n
  – Note that the queue may seem full but there may
    be empty spaces if head<>1!
     • So, we use a circular queue instead:
         – if tail = = n and we want to add, we reset tail = 1
         – this queue is full if tail = = head - 1
              » see fig 7.15 p. 340
           Queue Implementation 2
• Use a linked list accessed through two pointers, head and
   – The initial, empty queue has head and tail = NIL
   – Enqueue: create a new item pointed to by temp, set tail’s next
     pointer to point to temp and set tail to point at temp
      • note: if it’s the first item in the list, then we also set HEAD to point at it
   – Dequeue: set temp to head, set head to point to the item after
     head, delete temp and return it to the heap
   – Empty: if head and tail are NIL                Tail
   – Full: assume it is never full

• A tree is a structure in         – Siblings (or twins) are nodes
  which items have more              that share the same parent node
  than one successor               – A subtree is a structure within a
                                     tree that starts at a given node
   – A parent is a node that has
                                     onwards to the leaf nodes, with
     successors, the successors
                                     the given node being a subroot
     are the parent’s children
                                   – The depth of the tree is the
   – The root node is the first
                                     number of levels, or the number
     node, which has no parents
                                     of edges from the root node to
   – Leaf nodes are at the other     the furthest leaf node
     end of the tree, nodes with
                                   – A general tree is a tree in which
     no children
                                     nodes can have any number of
                                   – A binary tree is a tree in which
                                     nodes have up to 2 children
            Why use binary trees?
• While general trees can be used to represent such
  information as classification hierarchies,
  organizational charts (see fig 7.16 p. 342) and
  family trees…
  – We use binary trees as a means of ordered storage
  – Each node has a Left and Right child such that
     • The left child and all nodes in the left child’s subtree) are
       less than the given node
     • The right child and all nodes in the right child’s subtree are
       greater than the given node
  – Unlike a sorted list in an array, adding and deleting
    become easier
          Representing a Tree
• As with lists, we can represent trees using
  arrays and pointers
  – Array based: root node goes in array index 1,
    left child in 2, right child in 3, etc
     • Node i has a left child at index 2*i and a right child
       at index 2*i+1
         – See figure 7.19 p. 345
     • Unfortunately, if the tree is sparse, then a lot of the
       array locations will be empty (see fig 7.20 p. 346)
  – Pointer based: each node has three items, an
    information field, a left pointer and a right
    pointer (see figure 7.18 p. 344)
     Example Binary Tree


             13                          41

     6                 21          33           48

2        7        16             30     40    44     51

    Where would you add 20? What about 23?
       Pointer-based Binary Tree
                           • Each node in the tree
           Root              has a left and right
                             pointer and an info
   Left       Right          field (here storing the
   Child      Child          node’s name)
                           • To move from one
                             node to another, have
 Left Right Left Right       a TEMP pointer which
 Child Child Child Child     is updated to point at
                             TEMP’s LEFT or
                             RIGHT child
                  Trees and Recursion
• If we use the pointer based implementation, then how do we
  go from a node to its parent?
   – We do not usually have parent pointers in our tree
   – So there is no easy way to get back up the tree
      • If we use recursion to implement tree algorithms then we use the have a
        LIFO structure that allows us to easily find a pointer to a given node’s parent
        because the parent was the most recently visited node prior to the given node
      • We implement traversal, add, delete, print all recursively
           – some of these algorithms are given in figures 7.24 and 7.26
   – To search the tree, we do not need recursion as we are only going
     down the tree, not back up
      • The binary search tree algorithm is given in figure 7.22 p. 348
          – We will skip the details of the tree here, you will study them in detail in
            both 2380 and 3333
         Customized Data Types
• You saw in 1380 that C/C++ allows a
  programmer to define his/her own data types
• This allows for customized structures
  including records which can be used to define
  linked list elements:
  – Ex: struct foo {
           char info[7];
           ptr next;            Info    Next
                  Abstract Data Types
• One form of user-defined type             • We will explore ADTs in detail
  is the ADT                                  in 2380 and 3333, but for now,
   – The programmer defines the data
     structure and also the operations        lets consider why we might use
     that will be used on the data            them:
     structure                                 – we could use other people’s code
   – The data type and data structure            (the idea of reusability or off-the-
     are encapsulated into one set of            shelf components)
                                               – we can let someone else use our
       • In Pascal this would be a
         Unit, in C++ it is an object, in        code where we know that they will
         Ada it is a package (see fig            not be able to misuse it
         7.27 for an Ada package for a         – promotes the idea of modularity
         Stack)                                  more than procedures by
   – By encapsulating the structure              themselves, now we have process
     and operations together, the                modularity and data modularity
     details are not available for a
     programmer to misuse
               Information Hiding
• In order to make sure that other programmers do
  not know how the data structure is implemented,
  the details are encapsulated with the processes
  and hidden from view
  – Different languages achieve information hiding in
    different ways
     • In C++ and Java, it is through private sections in the class
       or struct definitions
     • In Ada, it is also through a private section
         – see figure 7.28 p. 357
     • In Pascal, it is not possible -- one of the reasons that Pascal
       is not used much in “real world programming”
            Example of Using ADTs
                                        The Interface
Your code                               provides the
                        Interface       means by
adt a;                    procedure     which program
                          names and     code can
Access a                  parameters    access the
 through                                ADT – in C++
                     adt definitions
 procedure                              and Java, the
                       (structure and
 calls                                  Interface is
                                        public and the
                                        ADT code is
                                        private (see p.
    Pointers in Machine Language
• Notice that in this chapter, we used pointers to
  reference the next item in the list
  – A pointer variable stores a memory location rather
    than a value
  – In our machine language (from chapter 2), we had
    two load operations
     • Load register with value from given memory location
     • Load register with given value
  – Unfortunately, we do not have a load that will load
    register with value stored in the memory location
    pointed at by the given memory location
               Addressing Modes
– This idea of loading a value    – We can enhance our
  that is stored at a memory        machine language with two
  location that we reference        additional instructions
  through another memory          – Op code D
  location is known as indirect
                                     • DRXY
                                     • Load register R with the value
– We now have three loads:             stored at the memory location
   • Load Direct                       that is stored in XY
   • Load Immediate               – Op code E
   • Load Indirect                   • ERXY
– We will similarly need 2           • Store the value in register R at
  saves, Save Direct and Save          the memory location that is
                                       stored at memory location XY
                                  – Addressing modes are
                                    covered in detail in 2333

Shared By: