Document Sample

Algorithms Modified from slides by Sanjoy Dasgupta Why you should know Algorithms • Courses: CSE 21 (Discrete Math), CSE 100 (Data Structures), CSE 101 (Algorithms) • Motivation? If you want to be a good programmer, why should you know this stuff? • Definition: Algorithms are recipes for faster, smarter computation in lieu of brute force (more HW) • Apps!: Many real world problems and systems (Roomba, Google, Facebook, Internet) have a few core algorithms • Bottom line: Knowing algorithms can help you make some spectacular speed wins in your problem area. Some big algorithm wins • Compression: Lempel and Ziv. Gzip for files faster downloads, less disk storage • Fast Fourier Transforms (FFTs): Cooley-Tukey. Signal processing modern cell phones • Shortest Path Algorithm: Dijkstra. Computing Internet routes fast response when comm links crash • Primality Testing: Rabin-Miller. Finding large primes for public key encryption PGP • String Matching: Boyer-Moore. Fast search for a keyword in a file grep Algorithm Areas Useful recipes in different areas • Mathematical Computing – App: Solid modeling, more interactive • String matching – App: Google search, faster search • Graph – App: Computing Internet Routes, fast failure recovery • Scientific Computation – App: Drug Design, faster design Area 1: Computing Mathematics Counting rabbits A royal mathematical challenge (1202): Suppose that rabbits take exactly one month to become fertile, after which they produce one child per month, forever. Starting with one rabbit, how many are there after n months? Leonardo da Pisa, aka Fibonacci The proliferation of rabbits Fertile Not fertile Initially One month Two months Three months Four months Five months Formula for rabbits Let Fn = number of rabbits at month n F1 = 1 F2 = 1 Fn = Fn-1 + Fn-2 These are the Fibonacci numbers: 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, … They grow very fast: F30 > 106 ! In fact, Fn 20.694n, exponential growth. Useful in CSE: merge sort, compression Computing Fibonacci numbers function F(n) F(5) if n = 1 return 1 if n = 2 return 1 return F(n-1) + F(n-2) F(4) F(3) F(3) F(2) F(2) F(1) F(2) F(1) A recursive algorithm Two questions we always ask Does it work correctly? (Proofs) Yes – it directly implements the definition of Fibonacci numbers. How long does it take? (Time analysis) This is not so obvious… Running time analysis function F(n) if n = 1 return 1 if n = 2 return 1 return F(n-1) + F(n-2) Let T(n) = number of steps needed to compute F(n). Then: T(n) > T(n-1) + T(n-2) But recall that Fn = Fn-1 + Fn-2. Therefore T(n) > Fn 20.694n ! Exponential time. How bad is exponential time? Need 20.694n operations to compute Fn. Eg. Computing F200 needs about 2140 operations. How long does this take on a fast computer? NEC Earth Simulator NEC Earth Simulator Can perform up to 40 trillion operations per second. Is exponential time all that bad? The Earth simulator needs 295 seconds for F200. Time in seconds Interpretation 210 17 minutes 220 12 days 230 32 years 240 cave paintings Post mortem What takes so long? Let’s unravel the recursion… F(n) F(n-1) F(n-2) F(n-2) F(n-3) F(n-3) F(n-4) F(n-3) F(n-4) F(n-4) F(n-5) F(n-4) F(n-5) F(n-5) F(n-6) The same subproblems get solved over and over again! A better algorithm There are n subproblems F1, F2, …, Fn. Solve them in order from low to high (unlike recursion, from high to low) function F(n) Create an array fib[1..n] fib[1] = 1 fib[2] = 1 for i = 3 to n: fib[i] = fib[i-1] + fib[i-2] return fib[n] The usual two questions: 1. Does it return the correct answer? 2. How fast is it? Running time analysis function F(n) Create an array fib[1..n] fib[1] = 1 fib[2] = 1 for i = 3 to n: fib[i] = fib[i-1] + fib[i-2] return fib[n] The number of operations is proportional to n. [Previous method: 20.7n] F200 is now reasonable to compute, as are F2000 and F20000. Motto: the right algorithm makes all the difference. Big-O notation function F(n) Create an array fib[1..n] Running time is fib[1] = 1 proportional to n. fib[2] = 1 for i = 3 to n: But what is the constant: fib[i] = fib[i-1] + fib[i-2] is it 2n or 3n or what? return fib[n] The constant depends upon: The units of time – minutes, seconds, milliseconds,… Specifics of the computer architecture. It is much too hairy to figure out exactly. Moreover it is nowhere as important as the huge gulf between n and 2n. So we simply say the running time is O(n). A more careful analysis function F(n) Create an array fib[1..n] fib[1] = 1 fib[2] = 1 for i = 3 to n: fib[i] = fib[i-1] + fib[i-2] return fib[n] Dishonest accounting: the O(n) operations aren’t all constant-time. The numbers get very large: Fn 20.7n has 0.7n bits. Adding numbers of this size is hardly a unit operation. Addition Adding two n-bit numbers [22] 1 0 1 1 0 [13] 1 1 0 1 ---------------------------------------- [35] 1 0 0 0 1 1 Takes O(n) operations… and we can’t hope for better. Addition takes linear time. Revised time analysis function F(n) Create an array fib[1..n] fib[1] = 1 fib[2] = 1 for i = 3 to n: fib[i] = fib[i-1] + fib[i-2] return fib[n] Takes O(n) simple operations and O(n) additions… a total of O(n2), quadratic time. The inefficient algorithm: not O(20.7n) but O(n20.7n). Polynomial vs. exponential Running times like n, n2, n3, are polynomial. Running times like 2n, en, 2n are exponential. To an excellent first approximation: polynomial is reasonable exponential is not reasonable This is the most fundamental dichotomy in algorithms. Multiplication Intuitively harder than addition… a time analysis lets us quantify this. [22] 1 0 1 1 0 [5] 1 0 1 ---------------------------------------------- 1 0 1 1 0 0 0 0 0 0 1 0 1 1 0 -------------------------------------------------------------------- [110] 1 1 0 1 1 1 0 To multiply two n-bit numbers: create an array of n intermediate sums, and add them up. Each addition is O(n)… a total of O(n2). It seems that addition is linear, while multiplication is quadratic. Euro-multiplication There are other ways to multiply! 22 5 11 10 5 20 2 40 1 80 ----------------------------- 110 Still quadratic, but other methods are close to linear! Area 2: String Matching Imaginary Scenario • Context: You agree to help a historian build a web site of famous speeches • Problem: Historian wants a keyword facility: e.g., all lines with keyword “truth” • Measure: Need fast response to search so that the site feels “interactive” Keyword Search in File • Naïve: search for keyword at every offset in file, one character at a time truths truths We hold these truths truths truths Usual Questions • How fast does it run? If file length is F and keyword is K, can take F * K time. • e.g. F = 10 Mbyte, K = 10: 10 M 10 sec aaaaz aaaaz aaaaaa . . . . aaaaz aaaaz Boyer-Moore String Matching • Ideas: Compare last character first; on failure consult table to find next offset to try • e.g. F = 10 Mbyte, K = 10: 0.2M 0.2 sec aaaaz aaaabaaaabaaaaz TABLE COMPUTED AT START aaaaz aaaaz ... Failure a for z Skip 5 ... Area 3: Graph Algorithms A cartographer’s problem Color this map (one color per country) using as few colors as possible. Neighboring countries must be different colors. Graphs 6 5 4 3 2 8 7 1 12 10 13 11 9 Graph specified by nodes and edges. node = country edge = neighbors The graph coloring problem: color the nodes of the graph with as few colors as possible, such that there is no edge between nodes of the same color. Exam scheduling Schedule final exams: - use as few time slots as possible - can’t schedule two exams in the same slot if there’s a student taking both classes. This is also graph coloring! Node = exam 3 2 Edge = some student is taking 4 both endpoint-exams Color = time slot 1 5 Building short networks Wichita Albuquerque Amarillo Tulsa Little Rock Abilene Dallas El Paso ? Houston San Antonio Minimum spanning tree Finding the minimum spanning tree Kruskal’s algorithm (1957): Repeat until nodes are connected: Add the shortest edge which doesn’t create a cycle Greedy strategy: at each stage, make the move which gives the greatest immediate benefit. Greedy: a popular algorithmic paradigm. Network Example: Shortest Path • Sally to Jorge via ISP ATT New York 1 Sally Seattle 2 3 Chicago 12 14 15 2 Los Angeles Boston Jorge Dikstra’s Algorithm (Greedy) • Add shortest node to tree each step 3 New York 2 Sally Seattle Chicago 12 16 Los Angeles Boston Jorge Network Example: Shortest Path 3 New York 2 Sally Seattle Chicago 12 6 Los Angeles Boston Jorge Area 4: Scientific Computation A slight variation of MST Chicago Boston = “Steiner point” Atlanta Minimum Steiner tree A slight variation Wichita Albuquerque Amarillo Tulsa Little Rock Abilene Dallas El Paso Houston San Antonio Minimum Steiner tree Bad news The existing methods for Steiner tree fall into two categories: 1. Correct answer, but exponential time. 2. Efficient, but often the wrong answer. Let’s look at the second category… Biological inspiration What are the underlying algorithmic paradigms behind evolution? 1. Mutation 2. Crossover (during reproduction) 3. Survival of the fittest Can these ideas be used to design algorithms? Hill climbing Start with a solution in this case: minimum spanning tree, but more generally scientific optimizations Produce a mutation add/remove/move a Steiner point and its connections Keep it if it works well ie. if it reduces the total length Length keeps improving at each step… Hill climbing/Simulated Annealing Length Local optimum Optimal solution Solutions Genetic algorithms “Breed” good solutions… Maintain a population of candidate solutions. At each time step, have them reproduce: Solution 1 [set of Steiner points] “Child” solution Solution 2 [set of Steiner points] Routinely eliminate “unfit” solutions. Applications to scientific computing • Problems in physics, chemistry and biology boil down to searching for an “optimal” solution • For example, recall Bafna’s lecture: finding best arrangement of primers for cancer • Simulated annealing and genetic algorithms are very useful for such problems • No strong bounds on performance: yet they do very well in practice. Questions 1. Write in big-O notation. (a) 24n (b) 320n2 2. Draw the graph which captures the adjacency relationships in this map (the countries are Guatemala, Belize, El Salvador, Honduras, Nicaragua, Costa Rica). Conclusions • Most programming has a few parts of performance critical code that can benefit from a knowledge of algorithms • Beyond application, it’s a fundamental way of thinking • This way of thinking is impacting all the sciences: physics, chemistry, linguistics • Even if you love to program, take CSE 21 and 101 seriously. It will help you! Specs for Earth simulator Distributed Memory Parallel Computing System which 640 processor nodes interconnected by Single-Stage Crossbar Network Processor Node: 8 vector processors with shared memory Peak Performance: 40 Tflops Total Main Memory: 10 TB Purpose: --- Climate change --- IBM said that Blue Gene/L is special for its size compared to the Yokohama, Japan-based Simulator, which gauges climate changes. Blue Gene/L is one- hundredth the physical size (320 vs. 32,500 square feet) and consumes one- twentieth the power (216 kilowatts vs. 6,000 KW) compared to the Earth Simulator. Leonardo da Pisa a.k.a. Fibonacci His Liber abaci (1202) begins: These are the nine figures of the Indians: 9 8 7 6 5 4 3 2 1. With these nine figures, and with the sign 0 which in Arabic is called zephirum, any number can be written, as will be demonstrated.

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 8 |

posted: | 10/8/2012 |

language: | English |

pages: | 53 |

OTHER DOCS BY zhouwenjuan

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.