Storage
Document Sample


Storage
27-Jun-11
Parts of a computer
For purposes of this talk, we will assume three main
parts to a computer
Main memory, once upon a time called core, but these days
called RAM (Random Access Memory)
RAM consists of a very long sequence of bits, organized into bytes
(8-bit units) or words (longer units)
Peripheral memory, these days called disks (even when
they aren’t) or drives
Peripheral memory consists of a very, very long sequence of bits,
organized into pages of words or bytes
Peripheral memory is thousands of times slower than RAM
The CPU (Central Processing Unit), which manipulates
these bits, and moves them back and forth between main
memory and peripheral memory
2
It’s all bits
Everything in a computer is represented by a sequence
of bits—integers, floating point numbers, characters,
and, most importantly, instructions
Bits are the ultimate flexible representation—at least until
we have working quantum computers, which use qubits
(quantum bits)
Modern languages use strong typing to prevent you
from accidentally treating a floating point number as a
boolean, or a string as an integer
A weakly typed language provides some protection, but
there are ways around it
But it wasn’t always this way...
3
Storage is storage
At one time, words representing machine instructions
and words representing data could be intermixed
Strong typing was a thing of the future
It was the programmer’s responsibility to avoid executing
data, or doing arithmetic on instructions
Both of these things could be done, either accidentally or
deliberately
Machine instructions are just a sequence of bits
They can be manipulated like any other sequence of bits
Hence, programmers could change any instruction into any
other instruction (of the same size), or rewrite whole blocks
of instructions
A self-modifying program is one that changes its own
instructions
4
Self-modifying programs
Once upon a time, self-modifying programs were
thought of as a good thing
Just think of how flexible your programs could be!
...yes, and smoking was once considered good for your health
The usual way to step through an array was by adding
one to the address part of a load or store instruction
You could write some really clever self-modifying
programs
But, as the poet Piet Hein says:
Here’s a good rule of thumb:
Too clever is dumb.
5
Preparation for next example
In the next example, we will talk about how a higher-level
language might be translated into assembly languages
Here are some of the assembly instructions we will use:
The load instruction copies a value from a memory location into a special
register called the accumulator
Example: load 53 gets whatever is in location 53 and puts it into the
accumulator
The enter instruction puts a given value into the accumulator
Example: enter 53 puts 53 itself into the accumulator
All arithmetic is done in the accumulator
Example: add 53 adds the contents of location 53 to the accumulator
The store instruction copies a value from the accumulator into memory
Example: store 53 puts whatever is in the accumulator into location 53
6
Procedure calls
Consider the following: 20 [load from 43] // addr of b
a = add(b, c); 21 [store in 71]
22 [load from 44] // addr of c
...
23 [store in 72]
function add(x, y) { 24 [enter 27] // the return addr
return x + y; 25 [store in addr part of 70]
} 26 [jump to 73]
Here’s how it might have 27 [store in 42] // addr of a
been translated to assembly
language in the old days (red 70 [jump to 27] // gets return addr
values are filled in as the 71 [ 10] // will receive b
program runs): 72 [ 15] // will receive c
73 [load value at addr 71]
42 [ 0] // a 74 [add value at addr 72]
43 [ 10] // b 75 [jump to 70]
44 [ 15] // c
7
Problems with the previous code
In this example, storage was static—you always knew
where everything was (and it didn’t move around)
If you called a function, you told it where to return to,
by storing the return address in the function itself
Hence, you could call the function from (almost) anywhere,
and it would find its way back
You stored the parameter values in the function itself
This worked fine until recursion was invented
Recursion requires:
Multiple return addresses
Multiple copies of parameters and local variables
In other words, recursion requires dynamic storage
8
The end of an era
What really killed off self-modifying programs was the advent of
timesharing computers
Multiple users, or at least multiple programs, could share the computer,
taking turns
But there isn’t always enough main memory to satisfy everybody
When one program is running, another program (or parts of it) may need to be
copied to disk, and brought back in again later
This is really, really slow
If only the data changed, not the program, we wouldn’t have to save the
program (which is often the largest part) over and over and over...
Besides, with the new emphasis on understandable programs, self-modifying
programs were turning out to be a Really Bad Idea
Besides, think about what a security nightmare self-modifying
programs could be!
9
An aside—compilers and loaders
Although self-modifying code is a bad idea, it is still
necessary for computers to be able to create and
modify machine instructions
This is what a compiler does—it creates machine
instructions
A loader takes a compiled program and puts it
somewhere in a computer memory
It can’t always put it in the same place, so it has to be able to
modify the addresses in the instructions
Still, compilers and loaders don’t modify themselves
10
Static and dynamic storage
In the beginning, storage was static—you declared your variables
at the beginning of the program, and that was all you got
A procedure or function with, say, three parameters, got three
words in which to store them
The parameters went in a fixed, known location in memory, assigned to
them by the compiler
Recursion had not yet been invented
The programming language Algol 60 introduced recursive
functions and procedures
Parameters went onto a stack
Hence, parameters were dynamically assigned to memory locations, not by
the compiler, but by the running program itself
Storage was dynamically allocated and deallocated as needed
11
Stacks
Stacks obey a simple regimen—last in, first out (LIFO)
When you enter a function or procedure or method, storage is
allocated for you on the stack
When you leave, the storage is released
In Java, this is even more fine-grained—storage is allocated and
deallocated for individual blocks, and even for for statements
Since this is so well-defined, your compiler writes the code to do
it for you
But it’s still dynamic—done by your running program
Since virtually every language supports recursion these days
(and all the popular languages do), computers typically provide
machine-language instructions to simplify stack operations
12
Heaps
Stacks are great, but they have their limitations
Suppose you want to write a method to read in an array
You enter the method, and declare the array, thus dynamically
allocating space for it
You read values into the array
You return from the method and POOF! your array is gone
You need something more flexible—something where
you have control over allocation and deallocation
The invention that allows this (which came somewhat
later than the stack, I’m not sure when) is the heap
You explicitly get storage via malloc (C) or new (Java)
The storage remains until you are done with it
13
Stacks vs. heaps
Stack allocation and deallocation is very regular
Heap allocation and deallocation is unpredictable
Stack allocation and deallocation is handled by the compiler
Heap allocation is at the whim of the programmer
Heap deallocation may also be up to the programmer (C, C++)
or by the programming language system (Java)
Values on stacks are typically small and uniform in size
In Java, arrays and objects don’t go in the stack—references to them do
Values on the heap can be any size
Stacks are tightly packed, with no wasted space
Deallocation can leave gaps in the heap
14
Implementing a heap
A heap is a single large area of storage
When the program requests a block of storage, it is given a pointer
(reference) to some part of this storage that is not already in use
The task of the heap routines is to keep track of which parts of the heap
are available and which are in use
To do this, the heap routines create a linked list of blocks of varying sizes
Every block, whether available or in use, contains header information
about the block
We will describe a simple implementation in pointer to next
which each block header contains two items of size of block
information:
A pointer to the next block, and User data
user gets
The size of this block
from here (an Object)
on down
15
Anatomy of a block
ptr-2 pointer to next
Here is our simple block: ptr-1 size of block
ptr
user gets N words ptr+1
ptr+2 User data
from here (ptr) to : (an Object)
end of block :
ptr+N-1
Java Objects hold more information than this (for example, the
class of the object)
Notice that our implementation will return a pointer to the first
word available to the user
Data with negative offsets are header data
ptr-1 contains the size of this block, including header information
ptr-2 will be used to construct a free space list of available blocks
16
The heap, I
0 next = 0 Initially, the user has no blocks,
1 size = 20
free 2 and the free space list consists
3
4 of a single block
5
6 In our implementation, we will
7
8 allocate space from the end of
9
10 the block
11
12 To begin, let’s assume that the
13
14 user asks for a block of two
15
16 words
17
18
19
17
The heap, II
0 next = 0 The user has asked for a block
1 size = 16
free 2 of size 2
3
4
5
The “free” block is reduced in
6 size from 20 to 16 (two words
7
8 asked for by the user, plus two
9
10 for a new header)
11
12 The new block has size 4 and
13
14 the next field is not used
15
16 next = 0 Next, assume the user asks for a
given to 17 size = 4
user 18 //////////// block of three words
19 ////////////
18
The heap, III
0 next = 0 The user has asked for a block
1 size = 11
free 2 of size 3
3
4
5
The “free” block is reduced in
6 size from 16 to 11 (three words
7
8 asked for by the user, plus two
9
10 for a new header)
11 next = 0
given to 12 size = 5 The new block has size 5 and
user 13 ////////////
14 //////////// the next field is not used
15 ////////////
16 next = 0 Next, assume the user asks for a
17 size = 4
18 //////////// block of just one word
19 ////////////
19
The heap, IV
0 next = 0 The user has asked for a block
1 size = 8
free 2 of size 1
3
4
5
The “free” block is reduced in
6 size from 11 to 8 (one word for
7
given to 9
8 next = 0 the user, plus two for a new
size = 3
user 10 //////////// header)
11 next = 0
12 size = 5 The new block has size 3 and
13 ////////////
14 //////////// the next field is not used
15 ////////////
16 next = 0 Next, the user releases the
17 size = 4
18 //////////// second block (at 13)
19 ////////////
20
The heap, V
0 next = 0 The user has released the block
1 size = 8
2 of size 5
3
4 The freed block is added to the
5 front of the free space list:
6
7 Its next field is set to the old
8 next = 0
9 size = 3 value of free
10 //////////// free is set to point to this block
11 next = 2
12 size = 5 Next, the user requests a block
free 13
14 of size 4
15
16 next = 0 The first block on the free list
17 size = 4
18 //////////// isn’t large enough, so we have
19 //////////// to go to the next free block
21
The heap, VI
0 next = 0 The user requests a block of
1 size = 2
2 next = 0 size 3
given to 3 size = 6
user 4 //////////// The size of the first free block
5 //////////// is now 3, and its next field
6 ////////////
7 //////////// does not change
8 next = 0
9 size = 3 The user gets a pointer to the
10 ////////////
11 next = 2 new block
12 size = 5
free 13 Now the user releases the
14
15 smallest block (at 10)
16 next = 0
17 size = 4 Again, this will be added to the
18 //////////// beginning of the free space list
19 ////////////
22
The heap, VII
0 next = 0 The user releases the smallest block
1 size = 2 (at 10)
2 next = 0
3 size = 6 The freed block is added to the front of
4 ////////////
5 //////////// the free space list:
6 ////////////
7 ////////////
Its next field is set to the old value of free
8 next = 13 free is set to point to this block
9 size = 3
free 10 Now the user requests a block of size 4
11 next = 2
12 size = 5 Currently, we cannot satisfy this
13 request
14
15 We have enough space, but no single block
16 next = 0 is large enough
17 size = 4
18 //////////// However, free blocks 10 and 13 are
19 //////////// adjacent to each other
We can coalesce blocks 10 and 13
23
The heap, VIII
0 next = 0 Blocks at 10 and 13 have now
1 size = 2
2 next = 0 been coalesced
3 size = 6
4 //////////// The size of the new block is the
5 //////////// sum of the sizes of the old
6 ////////////
7 //////////// blocks
8 next = 2
free
9 size = 8 We had to adjust the links
10
11 Now we can give the user a
12
13 block of size 4
14
15
16 next = 0
17 size = 4
18 ////////////
19 ////////////
24
Pointers
Allocating storage from the heap is easy
Person p = new Person ( );
In Java, you request storage from the heap with new; there is no
other way to get storage on the heap
All Objects are on the heap
In C and C++ you get a pointer to the new storage; in Java you
get a reference
The implementation is identical; the difference is that there are more
operations on pointers than on references
C and C++ provide operations on pointers
C and C++ let you do arithmetic on pointers, for example, p++;
Pointers are pervasive in C and C++; you can't avoid them
25
Advantages/disadvantages
Pointers give you:
Greater flexibility and (maybe) convenience
A much more complicated syntax
More ways to create hard-to-find errors
Serious security holes
References give you:
Less flexibility (no pointer arithmetic)
Simpler syntax, more like that of other variables
Much safer programs with fewer mysterious bugs
Pointer arithmetic is inherently unsafe
You can accidentally point to the wrong thing
You cannot be sure of the type of the thing you are pointing to
26
Deallocation
There are two potential errors when de-allocating (freeing)
storage yourself:
De-allocating too soon, so that you have dangling references (pointers
to storage that has been freed and possibly reused)
A dangling reference is not a null link—it points to something (you just
don’t know what)
Forgetting to de-allocate, so that unused storage accumulates and you
have a memory leak
If you have to de-allocate storage yourself, a good strategy is
to keep track of which function or method “owns” the storage
The function that owns the storage is responsible for de-allocating it
Ownership can be transferred to another function or method
You just need a clearly defined policy for determining ownership
In practice, this is easier said than done
27
Discipline
Most C/C++ advocates say:
It's just a matter of being disciplined
I'm disciplined, even if other people aren't
Besides, there are good tools for finding memory problems
However:
Virtually all large C/C++ programs have memory problems
28
Garbage collection
Garbage is storage that has been allocated but is not longer
available to the program
It's easy to create garbage:
Allocate some storage and save the pointer to it in a variable
Assign a different value to that variable
A garbage collector automatically finds and de-allocates garbage
This is far safer (and more convenient) than having the programmer do it
Dangling references cannot happen
Memory leaks, while not impossible, are pretty unlikely
Practically every modern language, not including C++, uses a
garbage collector
29
Garbage collection algorithms
There are two well-known algorithms (and several not
so well known ones) for doing garbage collection:
Reference counting
Mark and sweep
30
Reference counting
When a block of storage is allocated, it includes header data
that contains an integer reference count
The reference count keeps track of how many references the program
has to that block
Any assignment to a reference variable modifies reference counts
If the variable previously referenced an object (was not null), the
reference count of that object is decremented
If the new value is an object (not null), the reference count for the new
object is incremented
When a reference count reaches zero, the storage can immediately be
garbage collected
For this to work, the reference count has to be at a known
displacement from the reference (pointer)
If arbitrary pointer arithmetic is allowed, this condition cannot be
guaranteed
31
Problems with reference counting
If object A points to object B, and object B points to
object A, then each is referenced, even if nothing else
in the program references either one
This fools the garbage collector, which doesn't collect either
object A or object B
Thus, reference counting is imperfect and unreliable;
memory leaks still happen
However, reference counting is a simple technique and
is occasionally used
32
Mark and sweep
When memory runs low, languages that use mark-and-
sweep temporarily pause the program and run the garbage
collector
The collector marks every block
It then does an exhaustive search, starting from every reference
variable in the program, and unmarks all the storage it can reach
When done, every block that is still marked must not be
accessible from the program; it is garbage that can be freed
In order for this technique to work,
It must be possible to find every block (so they are in a linked list)
It must be possible to find and follow every reference
The mark has to be at a known displacement from the reference
Again, this is not compatible with arbitrary pointer arithmetic
33
Problems with mark and sweep
Mark-and-sweep is a complex algorithm that takes
substantial time
Unlike reference counting, it must be done all at once—
nothing else can be going on
The program stops responding during garbage
collection
This is unsuitable for many real-time applications
34
Garbage collection in Java
Java uses mark-and-sweep
Mark-and-sweep is highly reliable, but may cause
unexpected slowdowns
You can ask Java to do garbage collection at a time
you feel is more appropriate
The call is System.gc();
But not all implementations respect your request
This problem is known and is being worked on
There is also a “Real-time Specification for Java”
35
No garbage collection in C or C++
C and C++ do not have garbage collection—it is up to
the programmer to explicitly free storage when it is no
longer needed by the program
C and C++ have pointer arithmetic, which means that
pointers might point anywhere
There is no way to do reference counting if the programming
language does not have strict control over pointers
There is no way to do mark-and-sweep if the programming
language does not have strict control over pointers
Pointer arithmetic and garbage collection are
incompatible--it is essentially impossible to have both
36
The End
37
Get documents about "