Chapter 4 Linked Lists

W
Document Sample
scope of work template
							Chapter 4: Linked
Lists
The problem with arrays

  An array has a fixed size
    Its size is determined at compile time
    Very cumbersome to try to dynamically
     allocate an array
  Having wasted memory that we aren’t
   using is not an efficient solution to a
   problem
Linked Lists

  We could alter the way that we look at an array
   to make it more efficient
  Rather than having a static block of memory
   allocated to an array, what if we only locked off
   what we needed?
     In a manner of speaking, we could make an array of
      one element and then link it somehow to other
      elements as they are added
     We would link it to the other elements via a pointer
How to do a linked list

  We start with the primitive of our list; for
   the sake of argument, we’ll say that we
   are dealing with integers
  We create a new data type that has both
   a integer and a pointer contained within it
    The integer is the data we want to keep
    The pointer points to the next item in the
     array
Inserting items into the
linked list
  If we want to insert an item, we simply
   instantiate our data type
  If the list is unsorted, we set the pointer in the
   list to point to the newly created element
  If the list is sorted, we find where it goes and
   change two pointers
     The neophyte element copies the pointer value of
      the element before it
     The element before our neophyte has its pointer
      changed to our new element
Deleting items from the
linked list
  To delete an item, we look at the element
   before the one we want to delete
    Change that element’s pointer to the pointer value
     of the one we’re deleting
    We then garbage collect the deleted item to
     ensure that we do not create a memory leak!
    We can leave the deleted element around for a bit if
     we think we’re going to use it again, but be sure that
     you remember to garbage collect it at some point
A holistic look at the
linked list
Why we call it a linked list

 The items in the data structure are linked to
  one another, i.e. the first item points to
  the second which in turn points to the
  third which in turn points to the fourth…
A few pointers

  A pointer variable or more commonly
   referred to as simply a pointer contains
   the location or address in memory where
   the item in question resides
  By using pointers, we can quickly locate
   data that the operating system moves
   around
The Pointer sisters
  Let’s say that Jennifer Pointer moves about
   quite a bit
     She shops at all the trendiest places
     She has many friends that she likes to visit
  If I want to find Jennifer Pointer, I need to know
   some constant about her that will allow me to
   locate her
     Unfortunately, Jennifer is a bit of a technophobe and
      does not like cellular telephones
     However, I do know where she lives
Where in the world is
Jennifer Pointer?
  Since I know where she lives and that she is not given
   to moving her bivouac willy-nilly, I can drive to her
   abode and talk to her more homebody sibling, Melody
   Pointer
  Melody always knows where I can find Jennifer
  Melody tells me where Jennifer is and then I drive to
   that location to see her
  In this case, Melody becomes a pointer to Jennifer;
   she is a static (non-moving) reference to a dynamic
   (moving) resource; I can always find where Jennifer is
   because I know where to find Melody
A picture of what I mean
All this sounds great, but...

  How do I get a pointer variable p to point
   to a memory cell (location)?
  How do you use p to get to the content of
   the memory cell to which p points?
  Whoa there, partner you’re a bit ahead of
   yourself; First, let us declare p as a
   pointer:
                   int *p;
Be careful with your
syntax!
 int *p, q; and int* p,q;
 are the same as saying:
 int *p;
 int q;

 To declare both as pointers, we do the
  following:
 int *p, *q;
When is this memory
allocated?
  This memory is allocated statically
   which means that it is done at compile-
   time
  We therefore refer to these variables as
   statically allocated variables.
Working with pointers
  In addition to being careful with our syntax, we
   also need to be a bit careful with how we
   handle pointers
  If we declare an integer x, we cannot just set
   p=x; for this statement will be rejected by the
   compiler because p and x are different
   fundamental types
  We can, however, use the address or
   address-of operator, &
                         p = &x;
   which places the address of x into p.
Dynamic memory
allocation
  We can make a variable to be allocated at run-time;
   this is called dynamic memory allocation.
  A variable of this type is said to be a dynamically
   allocated variable (real shocker, eh?)
  We accomplish this by use of the C++ keyword new
                       p = new int;
  It is important to note that after executing this
   statement, the value of p is indeterminate; it is not
   initialized to some particular value
  You therefore need to initialize it to some value to
   prevent weird things from happening to your code!
Unused memory
  Suppose that we no longer require the services
   of a pointer
  We could assign the pointer to NULL which
   makes the pointer point to the language-default
   never-land, i.e. a pointer that points to NULL
   should never be used until its value is
   reassigned to something more tangible such as
   0xfcde0895 (just an example).
  But what if we know we’re not going to use that
   pointer again?
Deleting a variable to
recover memory
  In this case, we want to delete the variable
   so that the memory is recovered by the
   operating system
  If we do not do this, the memory cell remains
   allocated to the program thus producing the
   much-feared memory leak
  We could delete p in the following manner:
                     delete p;
A few caveats on delete
  When we delete a variable, we do not de-
   allocate the variable
    We simply leave its contents undefined
    It is no longer protected by the program or the OS
  The memory cell remains allocated to the
   program even though it is no longer accessible;
   referencing *p after we have done a delete
   p; can be disastrous!
  We stave off this problem by assigning
                    p = NULL;
Why delete doesn’t do this

  So why doesn’t delete automatically set
   the value of the pointer to NULL?
  The system cannot always clearly
   determine who should be set to NULL.
    You may have more than one pointer that
     points to that location
    It will therefore remain the responsibility of
     the developer to set that value to NULL.
An incorrect pointer to a
non-protected node
An example
Example continued
End of the example
   Dynamic array allocation
 We can allocate arrays dynamically with a bit of chicanery
int arraySize = 50;
int *anArray = new int[arraySize];
 The pointer variable anArray will point to the first item in the
  array.
 Since arraySize is a variable, we could change the size of
  the array
      We then create a new array
      Copy the old array to the new
      garbage collect anArray
      This can be inefficient if anArray is sufficiently large!
 Pointer arithmetic
 C++ treats the name of an array as a pointer to the first
  element in the array, e.g.
           *anArray is equivalent to anArray[0]
        *(anArray+1) is the same as anArray[1]
 This is called pointer arithmetic.
 Be careful! If a pointer points to an array of integers, you
  must add sizeof(int) to get to the next value!
    Most compilers will handle this for you
    This type of arithmetic can really haunt you if you port this to
     another system whose compiler doesn’t compute it for you!
Deleting the array

  To effectively de-allocate the array, use
   the following notation:
            delete [] anArray;
  Remember that this memory is returned
   to the system for future use
    The values it contains may still be valid
    Set the pointer to NULL so that others (and
     you) will not be tempted to use it!
Pointer-based linked lists

  Each component is called a node
  Each node has two components
    The data itself
    A pointer to the next item in the list
  Since each node in the linked list
   contains two pieces of native primitives, it
   is natural to conclude that the linked list
   should be a struct instead of a class.
Some pointers on pointers
  A pointer can point to almost anything:
       Integers
       Chars
       Arrays
       Floats
       Structs
  A pointer cannot, however, point to a file (there
   are special file pointers to do this)
  Therefore a pointer in our node structure can
   point to another node structure
More on linked lists
 We have all the elements pointing to one
   another, but what about the beginning and end
   of the list?
  We have an additional pointer that points to the
   beginning of the list – the head pointer or
   head of the list
     The head is usually pointed to NULL when the list is
      initialized
     The head is also pointed to NULL if the list becomes
      empty
  The last item in the linked list can also point to
   NULL to indicate that it is indeed the last thing
   on the list.
A picture of the linked list
Displaying the contents of
a linked list
  If we want to show all the elements in a
   linked list, we can simply employ a loop
   to iterate through the entire list displaying
   each one of the elements
  This solution requires that we keep
   around an additional pointer cur which
   keeps track of the current node to which
   we are pointing
Some code
 //Display the data in a linked list
 //Loop invariant: cur points to the next
 //node to be displayed.
 for(Node *cur=head; cur!=NULL; cur=cur->next)
   cout << cur->item << endl;
Some N.B.s on the code

 A common error is to compare cur->next with
  NULL instead of cur.
    When cur points to the last node of a non-empty
     linked list, cur->next = NULL.
    This means you won’t display the last item in the list!
 Displaying a linked list is an example of a
  common operation called list traversal which
  sequentially visits each node in the list until it
  reaches the end of the list.
Deleting a node from the
linked list
  We simply take the next pointer of the
   previous entry in the list and set it equal to
   next pointer of the item we wish to delete
  The node we have deleted still remains in
   existence! It must be garbage collected.
         prev->next = cur->next;
Does this work for every
node in the list?
  Unfortunately, the answer is no.
  If we try to delete the first element in the list,
   prev->next is undefined.
  Fortunately, there is a simple solution:
               head = cur->next;
Avoiding a memory leak

  After we delete a node from the list, the
   node still exists
  We must return the node to the OS for
   garbage collection
 cur->next = NULL;
 delete cur;
 cur = NULL;
To delete, we perform 3
tasks
  Locate the node we wish to delete by list
   traversal
    We can delete the ith node
    We can delete a node that has a particular data item
  Disconnect the node from the list by changing
   pointers
  Return the disconnected node to the system to
   be garbage collected
Inserting a node in the
linked list
  We do just the opposite of the deletion code
  We create our new node
 newPtr = new Node;
  We initialize our new node with our data
  We traverse the list until we find where we wish to insert the node
 newPtr->next = cur;
 prev->next = newPtr;
Inserting at the beginning
of the linked list
  As you might have suspected, this is a special case
  We just point head to the newly minted node and let
   the neophyte’s next item be the old head
 newPtr->next = head;
 head = newPtr;
To insert, we perform 3
tasks
  Traverse the list to determine the point of
   insertion
  Create a new node and store the new
   data in it
  Connect the new node to the linked list
   by changing pointers
More on inserting

  In order to insert an item in our list, we
   need to keep a trailing pointer prev that
   points to the previous item in the list
  In this manner, we can look at the value
   of the current node and see if the new
   node has to go before it
  We can then back up a node and do the
   insertion
Determining the point of
insertion/deletion
 //Determine the point of insertion/deletion
 //for a sorted linked list
 for (prev = NULL, cur=head; (cur != NULL) && (newValue >
    cur->item); prev = cur, cur = cur->next);
Pointer-based ADT List
  Unlike the array-based implementation, there is
   no shifting of items necessary during
   insertion/deletion
    It is therefore a much more efficient algorithm for
     larger data sets
    It also cuts down on the memory footprint for the
     code
  It also does not impose a strict
   minimum/maximum for the size of the list other
   than the amount of memory available to the
   system
ADT List redefined

  Suppose that we define a function find(i)
   that finds the ith node of the list
  To insert/delete at this node, we also need to
   know the location of the previous node
  Since we’ve taken 2315, we think smartly and
   realize that we could make find(i) find the
   (i-1)th node which leaves cur pointed to the (i-
   1)th node and cur->next pointed to the ith
   item
More on find()

  It isn’t a specified ADT operation
  Moreover, find() returns a pointer
    Recall that pointers are powerful and we
     don’t want just anyone to have them
    We would therefore not want any client of
     the class to call it
General observation on
ADTs
  In general, it is perfectly reasonable for
   an ADT to define variables and functions
   that the rest of the program should not
   access.
  Many ADTs require a special constructor
   called a copy constructor so that your
   code can correctly handle
        List yourList = myList;
Shallow or deep copy?

  If we only need a shallow copy of a data
   structure, we do not need to provide a
   copy constructor as the compiler’s
   rendition will suffice
  If we need a deep copy (as we do for the
   ADT List), we must provide our own copy
   constructor
What’s the difference?


        Shallow copy




              Deep copy
Destructors

  Classes that only use statically allocated
   memory can rely upon the compiler-
   generated destructor to free up memory
  However, classes that use dynamically
   allocated memory need to have their own
   custom written destructor that returns all
   used resources to the system
  A destructor for List would be ~List()
 Header file of ADT List
// *********************************************************
// Header file ListP.h for the ADT list. // Pointer-based implementation.
// *********************************************************
#include "ListException.h"
#include "ListIndexOutOfRangeException.h"
typedef desired-type-of-list-item ListItemType;
class List {
public:
// constructors and destructor:
    List(); // default constructor
    List(const List& aList);        // copy constructor
    -List();            // destructor
// list operations:
    bool isEmpty() const;
    int getLength() const;
    void insert(int index, ListItemType newItem)
            throw(ListIndexoutOfRangeException, ListException);
    void remove(int index)
            throw(ListIndexOutOfRangeException);
    void retrieve(int index, ListItemType& dataltem) const throw(ListIndexOutOfRangeException);
private:
    struct ListNOde     // a node on the list
    {
            ListItemType            item; // a data item on the list
            ListNode    *next; // pointer to next node
    }; // end struct
    int     size; // number of items in list
    ListNode *head; // pointer to linked list of items
    ListNode *find(int index) const;
// Returns a pointer to the index-th node // in the linked list.
} //end class
// End of header file.
The implementation file

  Since we can’t put everything on a single page,
   we will do the implementation file piecemeal
  The default constructor is simple:
 List::List(): size(0), head(NULL)
 {
 //Nothing needed here
 }//end default constructor
 Copy constructor
List(const List& aList): size(aList.size)
{
    if (aList.head == NULL)
          head = NULL; //original list empty
    else
    {     //Copy first element
          head = new ListNode;
          assert(head != NULL); //check allocation
          head->item = aList.head->item;
          //Copy rest of list
          ListNode *newPtr = head; //Last node in new list
          for (ListNode *origPtr = aList.head->next; origPtr != NULL; origPtr = origPtr->next)
          {
                    newPtr->next = new ListNode;
                    assert(newPtr->next != NULL);
                    newPtr = newPtr->next;
                    newPtr->item = origPtr->item;
          } //end for
          newPtr->next = NULL;
    }
}
Destructor

 We can de-allocate the entire list by
   continually removing an element until the
   list is empty
 List::~List()
 {
   while (!isEmpty())
        remove(1);
 } //end destructor
List operations
 bool List::isEmpty() const
 {
   return bool(size == 0);
 }

 int List::getLength() const
 {
   return size;
 }
More list operations

  Because the list doesn’t allow direct
   access to elements the retrieval, insertion
   and deletion operations must all traverse
   the list from the beginning until the
   specified point is reached
  Because of this, we define the operation
   find(i).
find(i)
 List::ListNode *List::find(int index) const
 //Locates a node in the list
 //Precondition: index is number of node desired
 //Postcondition: Returns pointer to desired node.   If node not
 // located, NULL returned.
 {
    if ((index < 1) || (index > getLength()))
         return NULL;
    else
    {
         ListNode *cur = head
         for (int skip = 1; skip < index; ++skip)
                 cur = cur->next;
         return cur;
    }
 }
 retrieve(i)
void List::retrieve(int index, ListItemType& dataItem)
  const
{
  if ((index < 1) || (index > getLength()))
      throw ListIndexOutOfRangeException(“Index out of
      range.”);
  else
  {
      ListNode *cur = find(index);
      dataItem = cur->item;
  }
}
insert(i, newItem)
void List::insert(int index, ListItemType newItem)
{
    int newLength = getLength() + 1;
    if ((index < 1) || (index > newLength))        throw ListIndexOutOfRangeException(
    "ListOutOfRangeException: insert index out of range");
    else
    { // create new node and place newItem in it
          ListNode *newPtr = new ListNode;
          if (newPtr == NULL)
                     throw ListException( "ListException: insert cannot allocate
    memory");
    else
    {
          size = newLength; newPtr->item = newItem;
          // attach new node to list
          if (index == 1)
          { // insert new node at beginning of list
             newPtr->next = head;
             head = newPtr;
          }
          else
          {
             ListNode *prev = find(index-1); // insert new node after node to which
             prev points
             newPtr->next = prev->next;
             prev->next = newPtr;
          } //end if
    } //end if
} //end insert
delete(i)
void List::remove(int index)
{
   ListNOde *cur;
   if ((index < 1) || (index > getLength()))
         throw ListIndexoutofRangeException( "ListoutofRangeException:
   remove index out of range");
   else
   {
         --size;
         if (index == 1)
         { // delete the first node from the list
           cur = head; // save pointer to node
           head = head->next;
         }
         else
         {
                  ListNOde *prev = find(index-1);
                  cur = prev->next; //Save pointer to node
                  prev->next = cur->next;
         } //end if
         cur->next = NULL;
         delete cur;
         cur = NULL;
   } //end if
} //end remove
Comparing the array-based
and pointer-based
implementations
  As usual, there are pros and cons to each
   implementation strategy
  You should carefully weigh these pros and cons before
   selecting a strategy
     Arrays have a fixed size
     Arrays have direct access because their elements are stored
      one after the other
         This is called an implicit address
         A pointer-based has to explicitly specify the next address
     Because an array-based implementation doesn’t need address
      information for the next element, they require a smaller
      memory footprint
     Arrays don’t require you to traverse the entire list to find your
      element
     Arrays require you to shift the data anytime you insert or delete
      elements
Saving and restoring a
linked list from a file
  The algorithm that restores a linked list also
   demonstrates how you can build a linked list
   from scratch
  Writing the pointers to a file serves no purpose
   because those pointers become invalid once
   the program terminates
  Therefore, writing out the entire node to a file is
   not an eloquent solution
  All you really need to save in the file is the data
   portion of each node (easy to do if each item
   has a fixed size, but a bit trickier if you’re
   storing strings or other variable length data)
More restoring from a file

  You can use the native insert() code to
   keep adding items to your list
  However, each time you insert something to
   the end requires a traversal to the end of the
   list
    We could save the file in reverse order of the list so
     we always insert at the head of the list
    We could make a tail pointer that points to the
     end of the list
       tail could be local and destroyed after the list is created
       Or it could exist as long as head exists – it’s up to you!
Passing a linked list to a
function
  It is sufficient to pass the head pointer to the function
  This should not be the case if the function is outside of
   the class’ scope (remember, pointers are powerful and
   this would violate the wall of the ADT!)
  Recursive functions might need the head pointer as an
   argument
     These must not be in the public section of the class
     This keeps our ADT safe from others
  Pass the head pointer by reference
     A linked list passed to a function as an argument is shallow
      copied, not deep copied
     Passing the head pointer causes a deep copy by the copy
      constructor
Recursively processing
linked lists
  If the recursive functions are members of a
   class they should not be public because they
   require the linked list’s head pointer as an
   argument
  One such recursive function would be list
   traversal for writeBackward()
  Another example would be repeated insertion
   which eliminates the need for both a trailing
   pointer and a special case of inserting at the
   beginning of a list
Repeated insertion

  Suppose we want to insert into a sorted linked
   list with a recursive function
  The linked list is sorted if
    head = NULL;
    head->next = NULL;
    head->item < head->next->item and the
     pointer head->next points to a sorted linked list
  The first two of those cases become our base
   cases
 Some code to do just so!
void linkedListInsert(Node *& headPtr, ItemType newltem)
{
   if ((headPtr == NULL) II (newItem < headPtr->item))
   { //base case: insert newltem at beginning
        Node *newPtr = new Node;
        if (newPtr != NULL)
          throw ListException( "ListException: insert cannot allocate
                memory");
        else
        {
                newPtr->item = newItem;
                newPtr->next = headPtr;
                headPtr = newPtr;
        } //end if
   else
        linkedListInsert(headPtr->next, newItem);
} //end linkedListInsert
Some N.B.s

  The function inserts at one of the base
   cases
    Either when the list is empty
    Or when the data item is smaller than all the
     other data items in the list
  In either one of these cases, you need to
   insert the item at the front of the list
General Insert (yes, sir!)

  The general case in which the item is
   inserted somewhere in the innards of the
   list is very similar
  When the base case is reached, the
   next pointer of the node is the argument
   that corresponds to the headPtr in our
   recursive definition
The general insert case
Variations on a theme

  There are different flavors of a linked list
      Circular(ly) linked lists
      Dummy head nodes
      Doubly linked lists
      Circular(ly) doubly linked lists
  Which one you should use depends on
   what you are trying to do within your
   design
Circular(ly) linked lists
  If we make the next pointer of the last element
   point to the first element in the list, we have
   created a circularly linked list
    No node contains NULL in its pointer
    We must be careful when traversing to the end of
     the list or we will create an infinite loop
    We can save the current pointer and keep
     traversing until we hit that pointer again
  We don’t have to keep track of the head
   pointer, just the current pointer
Dummy Head Nodes
 Both the insertion and deletion algorithms
  require a special case for the head node
 The Dummy Head Node method eliminates the
  need for this special case
 The Dummy Head Node is present even when
  the list is empty
 In this implementation, the first item in the list is
  actually the second item in the linked list
 The insertion and deletion algorithms initialize
  prev to point to the dummy head node instead
  of to NULL.
Doubly linked lists
  When deleting a node from a list, it would be handy to
   not have to remember a prev pointer or to have to re-
   traverse the list to find the previous node
  With a doubly linked list, we have two pointers
   packaged with the data item
     A next pointer which points to the next node
     A prev pointer which points to the previous item
  Because there are more pointers, the mechanics of
   doing an insert or delete are a bit more involved,
   especially at the head or tail of the list
  It is common to use a dummy head node with a linked
   list to eliminate some of its inherent problems
Circular doubly linked list

  You can take a doubly linked list and
   change the next pointer for the last item
   to make it a circular doubly linked list
  The pointer will now point to the head
   node/dummy head node of the linked list

						
Related docs