coffee by huanghengdong


									    The Coffee Automaton:
 Quantifying the Rise and Fall of
 Complexity in Closed Systems

              Scott Aaronson (MIT)
Joint work with Lauren Ouellette and Sean Carroll
 It all started with a talk
  Sean Carroll gave last
summer on a cruise from
  Norway to Denmark…
  Our modest goal: Understand the rise and
   fall of complexity quantitatively in some
              simple model system
Proposed system: the
“coffee automaton”
nn grid (n even), initially in
the configuration to the right
(Half coffee and half cream)
At each time step, choose 2
horizontally or vertically
adjacent squares uniformly
at random and swap them if
they’re colored differently
  “Control Experiment”: The Non-
   Interacting Coffee Automaton
The starting configuration is the same, but now we
let an arbitrary number of cream particles occupy
the same square (and treat the coffee as just an inert
Dynamics: Each cream particle follows an
independent random walk
Intuitively, because of the lack of interaction,
complexity should never become large in this system
  How to Quantify “Complexity”?
Lots of Approaches in the Santa Fe Community

Fundamental requirement: Need to assign a value
near 0 to both “completely ordered” states and
“completely random” states, but assign large values
to other states (the “complex” states)
Also, should be possible to compute or approximate
the measure efficiently in cases of interest
 Warmup: How to Quantify Entropy
H p x    p x log 2
                         1        Problem: Approximating
                         px          H is SZK-complete!
 K(x) = Kolmogorov complexity of x = length of the
 shortest program that outputs x
 Old, well-understood connection between K and H:

            E K x   H Dn   O1
           x ~ Dn

 K(x) is uncomputable—worse than SZK-complete!!
 But in some (not all) situations, one can approximate
 K(x) by G(x)K(x), where G(x) is the gzip file size of x
   Approach 1 to Quantifying
Complexity: Coarse-Grained Entropy
 Let f(x) be a function that outputs only the
 “important, macroscopic” information in a state x,
 washing or averaging out the “random fluctuations”
 Then look at H(f(x))  H(x). Intuitively, H(f(x)) should
 be maximized when there’s “interesting structure”
 Advantage of coarse-graining:
 Something physicists do all the time in practice
 Seems to some like a “human” notion—who decides
 which variables are important or unimportant?
 Approach 2: “Causal Complexity”
                 (Shalizi et al. 2004)
Given a point (x,t) in a cellular automaton’s
spacetime history, let P and F be its past and future
lightcones respectively:
                                        (x,t)   Time t
Then consider the expected mutual information
between the configurations in P and F:
  EI P, F   EH P   H F   H P, F 
If dynamics are “simple” then I(P,F)0 since H(P)H(F)0
If dynamics are “random” then I(P,F)0 since H(F|P)H(F)
In “intermediate” cases, I(P,F) can be large since the past
has nontrivial correlations with the future
Advantages of causal complexity:
Has an operational meaning
Depends only on causal structure, not on arbitrary
choices of how to coarse-grain
Not a function of the current state only
Requires going arbitrarily far into past and future
I(P,F) can be large simply because not much is changing
      Approach 3: Logical Depth
                    (Bennett 1988)
Depth(x) = Running time of the shortest program that
outputs x
Depth(0n) = Depth(random string) = n
But there must exist very deep strings, since otherwise
Kolmogorov complexity would become computable!
Connects “Santa Fe” and computational complexity
There are intuitively complex patterns that aren’t deep
Computability properties are terrible
       Approach 4: Sophistication
Sophistication is often thought of in terms of a “two-part code”:
          (Kolmogorov 1983, Koppel 1987)
    Program for S Incompressible index of x in S
Given a set S{0,1}n, let K(S) be the length of the shortest
              lists c(x) = size of of S
program that Sophthe elements this part
Given x{0,1}n, let Sophc(x) be the minimum of K(S), over
all S{0,1}n such that xS and K(S)+log2|S|K(x)+c
   In a near-minimal program for x, the smallest number of
   bits that need to be “code” rather than “random data”
Sophc(0n)=O(1), for take S={0n}
Sophc(random string)=O(1), for take S={0,1}n
On the other hand, one can show that there exist x with
Sophc(x)n-O(log n)
   Special Proof Slide for Hebrew U!
Theorem (far from tight; follows Antunes-Fortnow): Let
c=o(n); then there exists a string z{0,1}n with Sophc(z)n/3
Proof: Let A = {x{0,1}n : x belongs to some set S{0,1}n
with K(S)n/3 and K(S)+log2|S|2n/3 }
Let z be the lexicographically-first n-bit string not in A (such
a z must exist by counting)
K(z)n/3+o(n), since among all programs that define a set S
with K(S)n/3, we simply need to specify which one runs for
the longest time
Suppose Sophc(z)n/3. Then there’s a set S containing z
such that K(S)n/3 and K(S)+log2|S|  K(z)+c  n/3+o(n).
But that means zA, contradiction.
Problem: Sophc(x) is tiny for typical states x of the
coffee automaton! Why? Because we can let S be the
ensemble of sampled states at time t; then x is almost
certainly an incompressible element of S

Solution: Could use resource-bounded sophistication, e.g.,
minimal length of p in a minimal 2-part code consisting of
(polytime program p outputting AC 0 circuit C, input to C)

Advantage of resource-bounded sophistication:
The two-part code “picks out a coarse-graining for free”
without our needing to put it in by hand
Disadvantages: Hard to compute; approximations to
Sophcefficient(x) didn’t work well in experiments
    Our “Complextropy” Measure
Let I = coffee-cup bitmap (n2 bits)
Let C(I) = coarse-graining of I. Each pixel gets colored
by the mean of the surrounding LL block (with, say,
L~n), rounded to one of (say) 10 creaminess levels

Complextropy := K(C(I))
G(C(I))  K(C(I)): gzip file size of C(I); approximation
to complextropy that we’re able to compute
Complextropy’s connection to sophistication and
two-part codes:
Compressed coarse-grained image Remaining info in image

                 K(C(I)) = size of this part

Complextropy can be seen as an extremely resource-
bounded type of sophistication!

Complextropy’s connection to causal complexity:
The regions over which we coarse-grain aren’t
totally arbitrary! They can be derived from the
coffee automaton’s causal structure
         The Border Pixels Problem
Even in the non-interacting case, rounding effects cause a
“random” pattern in the coarse-grained image, at the
border between the cream and the coffee
Makes K(C(I)) artificially large

Hacky Solution: Allow
rounding 1 to the most
common color in each
row. That gets rid of the
border pixel artifacts,
while hopefully still
preserving structure in
the interacting case
Behavior of G(I) and G(C(I)) in Interacting Case
Behavior of G(I) and G(C(I)) in Non-Interacting Case
Qualitative Pattern Doesn’t Depend on
         Compression Program
  Dependence on the Grid Size n

Maximum entropy G(I)             Maximum
 increases like ~n2 for   coarse-grained entropy
   an nn coffee cup      G(C(I)) increases like ~n
       Analytic Understanding?
We can give a surprisingly clean proof that K(C(I))
never becomes large in the non-interacting case

Let at(x,y) be the number of cream particles at
point (x,y) at step t
Claim: E[at(x,y)]1 for all x,y,t
Proof: True when t=0; apply induction on t

Now let at(B) =  (x,y)B at(x,y) be the number
of cream particles in an LL square B after t steps
Clearly E[at(B)]L2 for all t,B by linearity
By a Chernoff bound,
                                        L 
                                  
 Pr at B   Eat B   0.1L  2 exp 
                                        300 
                                            
So by a union bound, provided L              
                                           log n ,
                                      
   Pr at B  Eat B  0.1L B  1  o1

If the above happens, then by symmetry, each row
of C(I) will be a uniform color, depending only on the
height of the row and t

Hence K(C(I))  log2n + log2t + O(1)
                 Open Problems
Prove that, in the interacting case, K(C(I)) does
indeed become (n) (or even (log n))
   Requires understanding detailed behavior of a Markov chain
   prior to mixing—not so obvious what tools to use
   Maybe the 1D case is a good starting point?

Clarify relations among coarse-grained entropy,
causal complexity, logical depth, and sophistication

Find better methods to approximate entropy and to
deal with border pixel artifacts
Long-range ambition: “Laws” that, given any mixing
process, let us predict whether or not coarse-grained
entropy or other types of complex organization will
form on the way to equilibrium
                        So far…
Theorem: In a “gas” of non-interacting particles, no
nontrivial complextropy ever forms
Numerically-supported conjecture: In a “liquid” of
mutually-repelling particles, some nontrivial
complextropy does form
Effects of gravity / viscosity / other more realistic

To top