SICP 10b
Storage Allocation & Garbage Collection
Table of Contents
1 Encoding expressions
Gรถdel invented a method for encoding statements as numbers. Using a similar method, we can represent expressions as numbers:
(cons x y)
→
(car x)
→ number of factors of 2 in x
.
(cdr x)
→ number of factors of 3 in x
.
The representation of expressions is not viable in the practical world.
We are accustomed to visualising expressions in terms of tree structures, but semiconductor manufacturers provide linear memory. How can we represent trees in a linear space?
Representation of ((1 2) 3 4)
in linear memory:
Index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
the-cars |
-- | p5 | n3 | -- | n4 | n1 | -- | n2 | -- |
the-cdrs |
-- | p2 | p4 | -- | e0 | p7 | -- | e0 | -- |
This implementation requires vectors.
(vector-ref <vector> <index>) (assign a (car (fetch b))) ====> (assign a (vector-ref (fetch the-cars) (fetch b))) (assign a (cdr (fetch b))) ====> (assign a (vector-ref (fetch the-cdrs) (fetch b))) (perform (set-car! (fetch a) (fetch b))) ====> (perform (vector-set! (fetch the-cars) (fetch a) (fetch b))) (perform (set-cdr! (fetch a) (fetch b))) ====> (perform (vector-set! (fetch the-cdrs) (fetch a) (fetch b)))
2 Allocation
Free list allocation scheme is a method for allocating memory. All of the free memory is linked together in a linked list. When a new list needs to be made, grab the first element in the linked list to be allocated, and re-assign the free list to be the cdr of the free list.
With free list method of allocation:
(assign a (cons (fetch b) (fetch c))) ====> (assign a (fetch free)) (assign free (vector-ref (fetch the-cdrs) (fetch free))) (perform (vector-set! (fetch the-cars) (fetch a) (fetch b))) (perform (vector-set! (fetch the-cdrs) (fetch a) (fetch c)))
3 Garbage collection
Unfortunately, memory is finite. Previous allocations no longer in use will need to be reallotted.
An example source of garbage.
(define (rev-loop x y) (if (null? x) y (rev-loop (cdr x) (cons (car x) y)))) (define (append u v) (rev-loop (rev-loop u'()) v))
Many pieces of memory allocated in the intermediate steps will not and can not be accessed again. There is a technique to prove that this piece of junk will never be accessed again.
The entire consciousness of machine is in its registers. If we start with all of the lead pointers that are in all of the registers and recursively go through marking all of the places we can get to by the selectors, then eventually we will mark everything that can be reached. Everything else is garbage.
gc (assign thing (fetch root)) (assign continue sweep) mark (branch (not-part? (fetch thing)) done) pair (assign mark-flag (vector-ref (fetch the-marks) (fetch thing))) (branch (= (fetch mark-flag) 1) done) (perform (vector-set! (fetch the-marks) (fetch thing) 1)) mcar (push thing) (push continue) (assign continue mcdr) (assign thing (vector-ref (fetch the-cars) (fetch thing))) (goto mark) mcdr (pop continue) (pop thing) (assign thing (vector-ref (fetch the-cdrs) (fetch thing))) (goto mark) done (goto (fetch continue))
Because of the recursive nature of this algorithm, the garbage collector requires auxiliary memory of its own to run, putting limitations on the depth of nesting allowed in LISP systems.
Peter Deutsch, Herb Schorr and Waite found a clever modification which allows for the algorithm to run without auxiliary memory.
sweep (assign free '()) (assign scan (-1+ (fetch memtop))) slp (branch (negative? (fetch scan)) end) (assign mark-flag (vector-ref (fetch the-marks) (fetch scan))) (branch (= (fetch mark-flag) 1) unmk) unmk (perform (vector-set! (fetch the-marks) (fetch scan) 0)) (assign scan (-1+ (fetch scan))) (goto slp)
One serious disadvantage of mark-sweep algorithms is their unsuitability for systems with a lot of memory. The larger the memory, the more costly it becomes to scan all of the memory. Ideally, only the useful sections would be scanned.
The Minksy-Feinchel-Yochelson garbage collector is the fastest known. It depends on having double the address space actively in use. The transition from FROMSPACE to TOSPACE is explained in depth in the book and demonstrated in the video.
Henry Baker tinkered with this algorithm so that it can run in realtime.
One thing you should realise is that garbage collectors have to be small. Not because they have to be fast, but because no one can debug a complicated garbage collector. A garbage collector, if it doesn't work, will trash your memory in such a way that you cannot figure out what the hell happened. You need an audit trail. Because it rearranges everything, and how do you know what happened there? So this is the only kind of program where it really seriously matters that if you stare at it long enough, you believe that it works.
4 Halt
Are there any impossible programs? Yes. One example is an evaluator, which takes a procedure as input and can determine whether that procedure will terminate or run forever.
Supposing we could create such a procedure called safe?
which computes the
value of
(where
is a procedure and
its arguments), we quickly run
into contradictions.
Proof 1
(define diag1 (lambda (p) (if (safe? p p) (inf) 3))) (define inf (lambda () ((lambda (x) (x x)) (lambda (x) (x x))))) (diag1 diag1)
Is diag1
safe? If yes, it returns an infinite loop, which is a contradiction.
If no, it returns 3, so it is safe, which is also a contradiction. Hence no
such procedure safe?
may exist.
Proof 2
(define diag2 (lambda (p) (if (safe? p p) (other-than (p p)) false))) (defien other-than (lambda (x) (if (eq? x 'a) 'b 'a)))
Then (diag2 diag2)
= (other-than (diag2 diag2))
which is evidently
nonsensical.
Both proofs use the procedure diag
as the proofs bear similarity to
Cantor's diagonal argument for the existence of multiple infinities.
So I suppose that pretty much wraps it up. I've just proved what we call the halting theorem, and I suppose with that we're going to halt. I hope you've had a good time.
And with that, I suppose this is the last sentence.