# Chapter 7_ Data Structures

Document Sample

```					      Chapter 7: Data Structures
• Earlier in the semester we introduced data types
– Data types: built-in types for the language
• primitive types: integer, real, character, boolean, string (in
some languages)
• more exotic types: arrays, records, pointers
• Here, we introduce data structures
– Data structures: more elaborate data types usually
composed out of primitive types using arrays,
records and pointers
• lists, queues, stacks, trees, graphs, objects
Arrays
• While primitive types allow • We use arrays to store lists
us to store one item, an array – Arrays offer quick access to any
allows us to store many items    element, known as random access
of the same type                   • easy to work with, pass entire
array as one parameter
– homogeneous storage                     • we can also use arrays to
– define type to be stored and              represent other structures such
array size                                as queues, stacks and trees
• size is denoted by indices          • One drawback of the array is
its fixed size
– some languages have fixed
lower end (1 or 0)                  – most languages require
that the upper limit on the
– other languages allow the
array size be specified at
user to define the lower and
compile time (or run time)
upper ends of the indices
– too small of an upper limit
– to access an array element, we                  and your array is too small
must specify an index as in                  – too large of an upper limit
A[i] or A[5]                                    and you waste memory
Language Implementation of Arrays
• The language must have a mechanism for
determining the storage location of an array item
given the array index
– Arrays are stored in consecutive memory locations no
matter what the “shape” of the array is
– This conversion mechanism is the mapping function
• 1 dimensional array: location = offset + (i-1)*unit_size
– i is the index, unit_size is the size in bytes of each element
• 2-d array: location = offset + [(i - 1) * c + (j - 1)] * unit_size
– i, j are the indices and c is the number of columns in the array
• The equation for the 2-d array assumes row-major order, there
is also the possibility for column-major order
– although this appears to only be used in FORTRAN
– see figures 7.1 and 7.2 p. 323
Lists
• Lists represent ordered or unordered
information that can be accessed randomly
– We can implement this using an array
– We can also implement this using individual
elements connected together by pointers
• A pointer is a variable which stores a memory location
• An advantage of a list made with pointers is that it is
dynamic, it can grow or shrink thus using the exact
amount of memory required
A record storing 2 items
Datum                  Pointer to another datum
List Implementations
• Contiguous list: using an array
– Problems:
• fixed size (not dynamic)
• if the list is ordered (sorted) then
– adding into the list requires shifting elements down
– deleting from the list requires shifting elements up
• see figure 7.4 p. 328 for a contiguous list of strings
• adding and deleting are simplified but
• unlike the array, there is no random access, we have
to follow the chain of pointers when searching for
an item whether the list is sorted or not
More on Pointers
• Pointers are studied in detail in 2380
– here we will only briefly talk about them
• Languages require instructions to
– create a data structure that is accessed by pointer
– destroy a structure accessed by a pointer
– a way to access the block of memory through the pointer
• Languages will also have a special value for a pointer
called NIL when the pointer is currently not pointing at
anything
• A list also requires a head pointer (points to first item)

pointer
deleting:                            deleting:
• To add at position i, we shift the • First search for correct position to
elements from i to n down 1          do the adding or deleting)
–   for (j= n; j>1; j--)               – Previous item’s pointer must now
a[j+1] = a[j];                  point to the new item
a[i] = new_item;                   – New item’s pointer must now point
n = n+1;                            to what previous item’s pointer used
• To delete the item at position i,         to point to
• See figure 7.7 p. 330
we shift the elements from i+1
to n up 1 position each and       • To Delete
subtract 1 from n                    – Previous item’s pointer must point to
what the item after the item to be
– for (j=i; j<=n; j++)
deleted
a[j-1] = a[j];
• See figure 7.6 p. 330
Conceptual Lists
– A list must support certain       – Notice however that a list
operations:                         implemented as an array has the
•   sorting                        drawback of static size while the
list implemented using pointers
•   searching
has the drawback of no random
•   accessing the ith item
access
– We use computational
•   deleting
complexity analysis to
•   printing                       determine which implementation
•   destroying                     is better for a given situation
– Once a list is created, with         • Arrays offer O(1) access, O(log n)
these operations, we can use           searching with binary search, O(n)
the list without worrying about      • Linked lists offer O(n) access, O(n)
its implementation, this is the        searching, and O(n) adding and
Problem with pointers
• There are several problems associated with
pointers and programmers often have problems
using pointers until they are used to them
– dangling pointers
• destroying an item but leaving the pointer around
– lost objects
• changing a pointer to point at something else, the initial
object is lost
– using pointer arithmetic
• in C, it was common to use integers as pointers and use
arithmetic to adjust pointers, but this is not a good idea!
– running out of heap memory
Stacks and Queues
• In a list, you can access any
of the elements
Interior            • In a stack, you can only
are                   access the top element
inaccessible
for queues          • In a queue, you can only
and stacks            access the top (head) and
bottom (tail) elements
– you insert at the tail and
Tail
Stacks
• Similar to a list but all accesses are • Stacks are mostly used for
made at one location called the top backtracking purposes
– Add items only at the top                   – you want to get back to the point
– Remove item only at the top                   you were at before trying the
– The stack is a LIFO (last in first out)       latest move
structure                                • Uses/Applications
• Always remove the most recently         – used in mazes or game playing
placed item                               to “undo” a move or decision
– Oldest item will be the last removed        – OS uses a run-time stack to keep
• Like a stack of trays in a cafeteria      track of where you are at in your
– Operations:                                   program
• create, destroy                         – Procedures and functions are
pushed onto the stack when they
• push -- add an item to the top            are called – returning is easy,
• pop -- remove an item from the            just pop the location of the
top                                       procedure call off the run-time
• empty and full                            stack
• See figure 7.9 p. 333
Stack Implementation 1
• Array-based
Top
– array of N items and an integer -- the stack
pointer called the top
• Top indicates the index of the top of stack
– pushing: increment Top, insert new item there
– popping: remove item at Top, decrement Top
– full: if Top = = N
– empty: if Top = = 0
• Notice that the top of the stack moves up and down, the
bottom of the stack remains fixed at position 1 in the
array
– because the array is fixed size, our stack is
limited in size and so, while this implementation
is easy, it may not be a good implementation
Stack Implementation 2
• Pointer-based
– Use record structures for each stack element, an info field and a
next pointer
• Need pointer, Top, to point to current TOP of the stack, initially NIL
– pushing: create new item, pointed to by temp, set temp’s next pointer
to top, set top to temp
– popping: set temp to top, make top’s pointer point to temp’s next
pointer, delete temp
– empty: if top = = NIL
– full: assume it is never full
• The pointer-based implementation is not fixed size and only full if you run
out of heap memory
– Unlike the list implemented using pointers, the stack is not inefficient
because we never try to randomly access into the stack

Top
Queues
• FIFO structure (first in, first out)
• Since we access the queue
– Used to represent lists that are
processed in order
at two points, we need to
means of accessing it, we
• such as a list of jobs waiting at a
printer                       call these the head (or
• Queues are used throughout thefront) pointer and the tail
operating system
(or rear) pointer
• Unlike the stack, the queue offers
access at two ends:                • Operations:
– at the front end: we remove               – Enqueue: add item at the tail
items                                     – Dequeue: remove item at
– at the rear end: we add items             – Create, Destroy, Empty, Full
• called the tail
Queue Implementation 1
• Array based:
Tail
– Use an array of N items and two integers, tail and
head, that indicate the index of the two ends of the
queue
• Enqueue: Add item at q[tail] and increment tail
• Empty: if tail = = head; Full: if tail = = n
– Note that the queue may seem full but there may
• So, we use a circular queue instead:
– if tail = = n and we want to add, we reset tail = 1
– this queue is full if tail = = head - 1
» see fig 7.15 p. 340
Queue Implementation 2
tail
– The initial, empty queue has head and tail = NIL
– Enqueue: create a new item pointed to by temp, set tail’s next
pointer to point to temp and set tail to point at temp
• note: if it’s the first item in the list, then we also set HEAD to point at it
– Dequeue: set temp to head, set head to point to the item after
head, delete temp and return it to the heap
– Empty: if head and tail are NIL                Tail
– Full: assume it is never full

Trees
• A tree is a structure in         – Siblings (or twins) are nodes
which items have more              that share the same parent node
than one successor               – A subtree is a structure within a
tree that starts at a given node
– A parent is a node that has
onwards to the leaf nodes, with
successors, the successors
the given node being a subroot
are the parent’s children
– The depth of the tree is the
– The root node is the first
number of levels, or the number
node, which has no parents
of edges from the root node to
– Leaf nodes are at the other     the furthest leaf node
end of the tree, nodes with
– A general tree is a tree in which
no children
nodes can have any number of
children
– A binary tree is a tree in which
nodes have up to 2 children
Why use binary trees?
• While general trees can be used to represent such
information as classification hierarchies,
organizational charts (see fig 7.16 p. 342) and
family trees…
– We use binary trees as a means of ordered storage
– Each node has a Left and Right child such that
• The left child and all nodes in the left child’s subtree) are
less than the given node
• The right child and all nodes in the right child’s subtree are
greater than the given node
– Unlike a sorted list in an array, adding and deleting
become easier
Representing a Tree
• As with lists, we can represent trees using
arrays and pointers
– Array based: root node goes in array index 1,
left child in 2, right child in 3, etc
• Node i has a left child at index 2*i and a right child
at index 2*i+1
– See figure 7.19 p. 345
• Unfortunately, if the tree is sparse, then a lot of the
array locations will be empty (see fig 7.20 p. 346)
– Pointer based: each node has three items, an
information field, a left pointer and a right
pointer (see figure 7.18 p. 344)
info
Example Binary Tree

22

13                          41

6                 21          33           48

2        7        16             30     40    44     51

TEMP
Pointer-based Binary Tree
• Each node in the tree
Root              has a left and right
pointer and an info
Left       Right          field (here storing the
Child      Child          node’s name)
• To move from one
node to another, have
Left Right Left Right       a TEMP pointer which
Child Child Child Child     is updated to point at
TEMP’s LEFT or
RIGHT child
Trees and Recursion
• If we use the pointer based implementation, then how do we
go from a node to its parent?
– We do not usually have parent pointers in our tree
– So there is no easy way to get back up the tree
• If we use recursion to implement tree algorithms then we use the have a
LIFO structure that allows us to easily find a pointer to a given node’s parent
because the parent was the most recently visited node prior to the given node
• We implement traversal, add, delete, print all recursively
– some of these algorithms are given in figures 7.24 and 7.26
– To search the tree, we do not need recursion as we are only going
down the tree, not back up
• The binary search tree algorithm is given in figure 7.22 p. 348
– We will skip the details of the tree here, you will study them in detail in
both 2380 and 3333
Customized Data Types
• You saw in 1380 that C/C++ allows a
programmer to define his/her own data types
• This allows for customized structures
including records which can be used to define
– Ex: struct foo {
char info[7];
ptr next;            Info    Next
}
Abstract Data Types
• One form of user-defined type             • We will explore ADTs in detail
is the ADT                                  in 2380 and 3333, but for now,
– The programmer defines the data
structure and also the operations        lets consider why we might use
that will be used on the data            them:
structure                                 – we could use other people’s code
– The data type and data structure            (the idea of reusability or off-the-
are encapsulated into one set of            shelf components)
definitions
– we can let someone else use our
• In Pascal this would be a
Unit, in C++ it is an object, in        code where we know that they will
Ada it is a package (see fig            not be able to misuse it
7.27 for an Ada package for a         – promotes the idea of modularity
Stack)                                  more than procedures by
– By encapsulating the structure              themselves, now we have process
and operations together, the                modularity and data modularity
details are not available for a
programmer to misuse
Information Hiding
• In order to make sure that other programmers do
not know how the data structure is implemented,
the details are encapsulated with the processes
and hidden from view
– Different languages achieve information hiding in
different ways
• In C++ and Java, it is through private sections in the class
or struct definitions
• In Ada, it is also through a private section
– see figure 7.28 p. 357
• In Pascal, it is not possible -- one of the reasons that Pascal
is not used much in “real world programming”
The Interface
Interface       means by
names and     code can
Access a                  parameters    access the
procedure                              and Java, the
(structure and
calls                                  Interface is
procedures)
public and the
private (see p.
359)
Pointers in Machine Language
• Notice that in this chapter, we used pointers to
reference the next item in the list
– A pointer variable stores a memory location rather
than a value
– In our machine language (from chapter 2), we had
• Load register with value from given memory location
• Load register with given value
register with value stored in the memory location
pointed at by the given memory location
– This idea of loading a value    – We can enhance our
that is stored at a memory        machine language with two
location that we reference        additional instructions
through another memory          – Op code D
location is known as indirect
• DRXY
• Load register R with the value
– We now have three loads:             stored at the memory location
• Load Direct                       that is stored in XY
• Load Immediate               – Op code E
– We will similarly need 2           • Store the value in register R at
saves, Save Direct and Save          the memory location that is
stored at memory location XY
Indirect