6.001 Structure and Interpretation of Computer Programs. Copyright o 2004 by Massachusetts Institute of Technology 6.001 Notes: Section 31.1 Slide 3l.ll Trees, graphs and Search In previous lectures we have seen a number of important themes, which relate to designing code for complex syste One was the idea of proof by induction, meaning that we could Induction for proving code correctness reason formally about whether and why our code would work Data structures can gather information together into abstract units correctly based on an inductive proof. This specifically said that Code to manipulate data abstractions tends to reflect if we could prove(using basic axioms) that the code correctly structure of abstraction handled a base case, and if we could prove that assuming the code ran correctly for all cases whose input was less than some given size, then it ran correctly for input of that given size, then we could conclude that it ran correctly on all correct inputs 922003 60015e A second was the idea that we can gather related information ogether into higher-level data units which we could abstract into treating as simple elements. Lists were a good example. And we saw that so long as the constructor and selectors of the abstraction obeyed a contract, we could suppress the details of the abstraction from its The third was that code written to manipulate data abstractions frequently had a structure that reflected the underlying structure of the abstraction: often we would use selectors to extract out subparts, manipulate those parts, and then use the constructor to create a new version of the abstraction Trees, Graphs and Search Slide 31.1.2 oday we are going to explore these themes in more detail. We Key themes from previous lectures are going to use them to build more complex data structures, Induction for proving code correctness Data structures can gather information together into and are particularly going to see how we can use inductive reasoning to design procedures to manipulate such structures Code to manipulate data abstractions tends to reflect Specifically, we are going to look at two very useful data abstractions trees and graphs. And we are going to see how Exploring the combination of these themes procedures that search for information in large collections can be nicely designed to interact with such structures Graph Algorithms for trees and graph
6.001 Structure and Interpretation of Computer Programs. Copyright © 2004 by Massachusetts Institute of Technology. 6.001 Notes: Section 31.1 Slide 31.1.1 In previous lectures we have seen a number of important themes, which relate to designing code for complex systems. One was the idea of proof by induction, meaning that we could reason formally about whether and why our code would work correctly based on an inductive proof. This specifically said that if we could prove (using basic axioms) that the code correctly handled a base case, and if we could prove that assuming the code ran correctly for all cases whose input was less than some given size, then it ran correctly for input of that given size, then we could conclude that it ran correctly on all correct inputs. A second was the idea that we can gather related information together into higher-level data units, which we could abstract into treating as simple elements. Lists were a good example. And we saw that so long as the constructor and selectors of the abstraction obeyed a contract, we could suppress the details of the abstraction from its use. The third was that code written to manipulate data abstractions frequently had a structure that reflected the underlying structure of the abstraction: often we would use selectors to extract out subparts, manipulate those parts, and then use the constructor to create a new version of the abstraction. Slide 31.1.2 Today we are going to explore these themes in more detail. We are going to use them to build more complex data structures, and are particularly going to see how we can use inductive reasoning to design procedures to manipulate such structures. Specifically, we are going to look at two very useful data abstractions: trees and graphs. And we are going to see how procedures that search for information in large collections can be nicely designed to interact with such structures
6.001 Structure and Interpretation of Computer Programs. Copyright o 2004 by Massachusetts Institute of Technology Slide 31.1.3 Let's start by revisiting lists. Remember that we said a list was Revisiting lists imply a sequence of cons cells, connected by cdr's, ending in an empty list(also known as nil) Remember that we said lists had the property of closure. If we List: sequence of cons pairs, ending in nil were to cons anything onto the beginning of the list, we would Closed under the action of cons get a new sequence of cons cells, ending in the empty list, Closed (except for the empty list) under the action of hence a list. And with the exception of the empty list, taking the| -Induction shows that all lists satisfy this property cdr of a list results in a sequence of cons cells, ending in the empty list, ng really has an inductive flavor to it The base case Note that this is an empty list. Given that every collection of cons cells of size 4 marms 6001S less than n, ending in an empty list, is a list; then clearly consing a new element onto such a list yields a list, and thus by induction, every collection of cons cells ending in the empty list is a list. We should be able to use this idea to reason about procedures that manipulate lists Code that manipulates lists Slide 31.1. 4 Here is one of our standard procedures for manipulating lists (define (map proc 1st) kif (null? lst) our old friend map. Intuitively, we know how this code should work, but can we establish formally that it does what we cans(proc《car1st) (map proc (cdr lst))) Inductive expect? Sure. We just use our notion of induction here, both on the data (ma 1234)) structure and on the code itself. for the base case we have the base case data structure for a list, namely an empty list. Thus, the code clearly returns the right structure. Now, assume that map correctly returns a list in which each element of the input I list has had proc applied to it, and that the order of the elements is preserved, for any list of size smaller than the current list. Then we know, given Ist, that by induction on the data structure (cdr lst) is a list. By induction on the procedure map we know this willl return a list of the same size, with each element replaced by the application of proc to that element. We can then process the first element of the list, and by induction on the data structure, cons will return a new list of the appropriate size that also satisfies the conditions of map. Thus by induction, we know that map will correctly perform as expected Slide 31.1.5 Code that manipulates lists Now, is there anything explicit in the code that says this applies only to lists of numbers? p proc Ist) 主f(nu11?1st) Of course not. It could be lists of symbols, or lists of strings. Or con(Px。《cax1at) Base case ists of anything. That can be seen by looking at the type ( map proc (cdr lst)))) Inductive definition for a list (1234) As with pairs, we will represent a list by the symbol list Va1ue:(14916) followed by angle brackets containing a type definition for the Is there anyting sedal about lists o numben? elements of the list. Since lists are created out of pairs, we can Type Definition Listanunber and whose second element is a List of the same type, or(which Liban b an i mor dta) be more explicit about this definition, by noting that a list of some type is either a Pair whose first element is of that type pafr<c, List<o> I noll 600SC is indicated by the vertical bar) the list is the special empty list
6.001 Structure and Interpretation of Computer Programs. Copyright © 2004 by Massachusetts Institute of Technology. Slide 31.1.3 Let's start by revisiting lists. Remember that we said a list was simply a sequence of cons cells, connected by cdr's, ending in an empty list (also known as nil). Remember that we said lists had the property of closure. If we were to cons anything onto the beginning of the list, we would get a new sequence of cons cells, ending in the empty list, hence a list. And with the exception of the empty list, taking the cdr of a list results in a sequence of cons cells, ending in the empty list, hence a list. Note that this really has an inductive flavor to it. The base case is an empty list. Given that every collection of cons cells of size less than n, ending in an empty list, is a list; then clearly consing a new element onto such a list yields a list, and thus by induction, every collection of cons cells ending in the empty list is a list. We should be able to use this idea to reason about procedures that manipulate lists. Slide 31.1.4 Here is one of our standard procedures for manipulating lists: our old friend map. Intuitively, we know how this code should work, but can we establish formally that it does what we expect? Sure. We just use our notion of induction here, both on the data structure and on the code itself. For the base case, we have the base case data structure for a list, namely an empty list. Thus, the code clearly returns the right structure. Now, assume that map correctly returns a list in which each element of the input list has had proc applied to it, and that the order of the elements is preserved, for any list of size smaller than the current list. Then we know, given lst, that by induction on the data structure (cdr lst) is a list. By induction on the procedure map we know this willl return a list of the same size, with each element replaced by the application of proc to that element. We can then process the first element of the list, and by induction on the data structure, cons will return a new list of the appropriate size that also satisfies the conditions of map. Thus, by induction, we know that map will correctly perform as expected. Slide 31.1.5 Now, is there anything explicit in the code that says this applies only to lists of numbers? Of course not. It could be lists of symbols, or lists of strings. Or lists of anything. That can be seen by looking at the type definition for a list. As with pairs, we will represent a list by the symbol List followed by angle brackets containing a type definition for the elements of the list. Since lists are created out of pairs, we can be more explicit about this definition, by noting that a List of some type is either a Pair whose first element is of that type, and whose second element is a List of the same type, or (which is indicated by the vertical bar) the list is the special empty list
6.001 Structure and Interpretation of Computer Programs. Copyright o 2004 by Massachusetts Institute of Technology Notice the nice recursive definition of a list, with the closure property that the cdr of a list is itself a list Notice that nothing in the type definition of a list says that the elements hanging off the cars of the list have to be numbers. In fact, our definition uses the arbitrary type C. This means that those structures could be anything, including other lists This leads nicely to a more general data structure called a tree, which lets us capture much more complex kinds of relationships and structures. So what does a tree look like, and are there common procedures associated with the manipulation of trees? 6.001 Notes: Section 31.2 Slide 31.2.1 Trees Here is the conceptual idea behind a tree. a tree has a root, or starting point. At the root, and indeed at every other point in th tree, there are a set of branches that connect different parts of the tree together. each branch is said to contain a child or sub tree, and that sub-tree could itself be a tree, or could simply terminate in a leaf. In the example shown here, all the leaves are numbers, but of course they could be other objects such as strings Note that trees have a nice recursive property, much like lists, in which taking the element hanging off a branch gives us 412m3 60153C another tree. This suggests that procedures designed to manipulate trees should have a very similar recursive structure Trees Slide 31.2.2 网国国属 To implement a tree, we can just build it out of lists. Each level of the tree will be a list the branches at a level will be the cars of the associated list. Those branches may themselves be sub trees, that is, lists of lists. In the example shown, the first top- level branch is just the leaf, 2, but the second top-level branch is a tree. with two branches. 6 and 8. Thus. we have lists of lists as our implementation of a tree 601 SIC?
6.001 Structure and Interpretation of Computer Programs. Copyright © 2004 by Massachusetts Institute of Technology. Notice the nice recursive definition of a list, with the closure property that the cdr of a list is itself a list. Notice that nothing in the type definition of a list says that the elements hanging off the cars of the list have to be numbers. In fact, our definition uses the arbitrary type C. This means that those structures could be anything, including other lists. This leads nicely to a more general data structure called a tree, which lets us capture much more complex kinds of relationships and structures. So what does a tree look like, and are there common procedures associated with the manipulation of trees? 6.001 Notes: Section 31.2 Slide 31.2.1 Here is the conceptual idea behind a tree. A tree has a root, or starting point. At the root, and indeed at every other point in the tree, there are a set of branches that connect different parts of the tree together. Each branch is said to contain a child, or subtree, and that sub-tree could itself be a tree, or could simply terminate in a leaf. In the example shown here, all the leaves are numbers, but of course they could be other objects such as strings. Note that trees have a nice recursive property, much like lists, in which taking the element hanging off a branch gives us another tree. This suggests that procedures designed to manipulate trees should have a very similar recursive structure. Slide 31.2.2 To implement a tree, we can just build it out of lists. Each level of the tree will be a list. The branches at a level will be the cars of the associated list. Those branches may themselves be subtrees, that is, lists of lists. In the example shown, the first toplevel branch is just the leaf, 2, but the second top-level branch is a tree, with two branches, 6 and 8. Thus, we have lists of lists as our implementation of a tree
6.001 Structure and Interpretation of Computer Programs. Copyright o 2004 by Massachusetts Institute of Technology Slide 31.2.3 So what can we say about a tree? First, we can define its type Tr rees By analogy to a list, we have a tree of type C defined as either a list of trees of type C, reflecting the fact that there could be arbitrary branches, or that a tree is just a leaf of type C. Note the recursive definition of the tree structure. and a leaf of some Type Definition: type is just an element of that type Tree<c> List<Tree<c>>I Leaf<c> In fact, as defined here, it would appear that a tree of type c ald al ways have leaves of the same type, e.g. numbers or strings. Of course, we could generalize this to allow variations of types of elements in the leaves of a tree Examples: countleaves caletree Associated with a tree, we expect to see some standard 92 operations. For example, we should have a predicate, leaf? that tests where an element is a leaf or if it is a more complex sub-tree. In addition, we would like to have procedures that generalize operations on lists, such as counting the number of leaves(or basic elements )of a tree We expect that these procedures should be more general than the lists versions, to reflect the fact that elements of a tree may themselves be complex things, like trees Tree Procedures Slide 31.2. 4 Given trees, built as lists of lists, what kinds of procedures can a tree is a list, we can use list procedures on them we create to manipulate them? First, because a tree is (define my-tree (list 4 (list 5 7)2)) implemented as a list, in principle we could use list operations m-tree on the tree. although that starts to infringe on the data abstraction barrier that separates the use of an abstraction from 国网网 its implementation Here is a simple example. I have defined a test tree, and given it a name. I have also shown the box-and-pointer diagram that represents the actual implementation of this tree 6001 SICP Slide 31.2.5 Tree Procedures Suppose I ask for the length of this structure. Ideally, length should be applied to lists, but because I have chosen. A tree is a list, we can use list procedures on them to represent a tree as a list of list, this procedure can be applied(define my-tree (list 4 (list 5 7)2)) here. For this example, it returns the value 3, which probably isn't what you expected. After all, we would like to think of a tree in terms of the number of leaves, not the number of top level branches Why did this happen? Well, recall that length applies to a list, and it simple counts the number of cons pairs in that list, without ever looking at what is hanging off those pairs. In this (arms 1 001 SICP case. this data structure is a list of three elements some of which happen to be lists as well So if i want the number of top-level branches, this does the right thing, but suppose I really want to know how many leaves are in the tree?
6.001 Structure and Interpretation of Computer Programs. Copyright © 2004 by Massachusetts Institute of Technology. Slide 31.2.3 So what can we say about a tree? First, we can define its type. By analogy to a list, we have a tree of type C defined as either a list of trees of type C, reflecting the fact that there could be arbitrary branches, or that a tree is just a leaf of type C. Note the recursive definition of the tree structure. And a leaf of some type is just an element of that type. In fact, as defined here, it would appear that a tree of type C would always have leaves of the same type, e.g. numbers or strings. Of course, we could generalize this to allow variations of types of elements in the leaves of a tree. Associated with a tree, we expect to see some standard operations. For example, we should have a predicate, leaf? that tests where an element is a leaf or if it is a more complex sub-tree. In addition, we would like to have procedures that generalize operations on lists, such as counting the number of leaves (or basic elements) of a tree. We expect that these procedures should be more general than the lists versions, to reflect the fact that elements of a tree may themselves be complex things, like trees. Slide 31.2.4 Given trees, built as lists of lists, what kinds of procedures can we create to manipulate them? First, because a tree is implemented as a list, in principle we could use list operations on the tree, although that starts to infringe on the data abstraction barrier that separates the use of an abstraction from its implementation. Here is a simple example. I have defined a test tree, and given it a name. I have also shown the box-and-pointer diagram that represents the actual implementation of this tree. Slide 31.2.5 Suppose I ask for the length of this structure. Ideally, length should be applied to lists, but because I have chosen to represent a tree as a list of list, this procedure can be applied here. For this example, it returns the value 3, which probably isn't what you expected. After all, we would like to think of a tree in terms of the number of leaves, not the number of toplevel branches. Why did this happen? Well, recall that length applies to a list, and it simple counts the number of cons pairs in that list, without ever looking at what is hanging off those pairs. In this case, this data structure is a list of three elements, some of which happen to be lists as well. So if I want the number of top-level branches, this does the right thing, but suppose I really want to know how many leaves are in the tree?
6.001 Structure and Interpretation of Computer Programs. Copyright o 2004 by Massachusetts Institute of Technology Slide 31.2.6 Tree Procedures What I would like is another procedure, count-leaves a tree is a list; we can use list procedures on therm that would in this case return the value 4 to indicate there are (define my-tree (list 4 (list 5 7)2)) takes advantage of the data structure of the tree to accomplis four basic elements in this tree. I need a procedure that direc countleaves my-tree) Slide 31.2.7 countleaves So let's think about this, using the general idea that pro should reflect the structure of the data abstraction to which they the cound of a tree is the sun of the countleaves of eac To count the number of leaves in a tree. we can devise a nice recursive strategy. The base case says if the tree is empty, then there are no leaves, obviously. However, we also have a second base case here. which is different than what we have seen before. This second base case says that if the tree is just a single leave. then the value to return is 1 For the recursive strategy we need to be careful. When we were4 dealing with lists, the recursive strategy said"do something to the car of the list" and then add that into whatever we are doing tree, so we have to count the leaves in each child, then add all of those lp er tree as each branch, or child, of the applied again to the rest of the list. Remember that a tree could have anoth countleaves Slide 31.2.8 So let's implement that idea. My definition for count leaves has one argument, a tree It starts with the two base he tatey. the court of tec s the sum of the coumleave f ach I cases. If the tree is empty, using the fact that the tree is implemented as a list, then return 0. If the tree is a leaf, which detine (countleaves tree cond ((null? tree) 0) base case we can find by testing that it is not a pair(since that wo bRse mean it was a list, and hence a subtree), then return o cs (countleaves (cdr tree))))) Otherwise, I add up the number of leaves in the first branch of this tree. and combine that with the number of leaves in the tree formed by pruning that first branch off. 9203 601 SICP I Aha! This is a different form than what we saw earlier Remember for lists, we would have just done something to the car of the tree and then combined that with a recursive call to count-leayes on the cdr of the tree But since the car of a tree could itself be a tree, and not just an element, we have to recursively apply the procedure to both the car and the cdr of the tree So what kind of order of growth should we expect for such a procedure? Notice that there are two calls to the procedure in the recursive case, and that should remind you of a type of procedure we saw before, namely exponential Step back for a second. Notice how again the definition of this procedure reflects the structure of the associated data abstraction to which it will be applied. Compare this to what we saw with lists
6.001 Structure and Interpretation of Computer Programs. Copyright © 2004 by Massachusetts Institute of Technology. Slide 31.2.6 What I would like is another procedure, count-leaves that would in this case return the value 4 to indicate there are four basic elements in this tree. I need a procedure that directly takes advantage of the data structure of the tree to accomplish this. Slide 31.2.7 So let's think about this, using the general idea that procedures should reflect the structure of the data abstraction to which they apply. To count the number of leaves in a tree, we can devise a nice recursive strategy. The base case says if the tree is empty, then there are no leaves, obviously. However, we also have a second base case here, which is different than what we have seen before. This second base case says that if the tree is just a single leave, then the value to return is 1. For the recursive strategy we need to be careful. When we were dealing with lists, the recursive strategy said "do something to the car of the list" and then add that into whatever we are doing applied again to the rest of the list. Remember that a tree could have another tree as each branch, or child, of the tree, so we have to count the leaves in each child, then add all of those up. Slide 31.2.8 So let's implement that idea. My definition for countleaves has one argument, a tree. It starts with the two base cases. If the tree is empty, using the fact that the tree is implemented as a list, then return 0. If the tree is a leaf, which we can find by testing that it is not a pair (since that would mean it was a list, and hence a subtree), then return 1. Otherwise, I add up the number of leaves in the first branch of this tree, and combine that with the number of leaves in the tree formed by pruning that first branch off. Aha! This is a different form than what we saw earlier. Remember for lists, we would have just done something to the car of the tree, and then combined that with a recursive call to count-leaves on the cdr of the tree. But since the car of a tree could itself be a tree, and not just an element, we have to recursively apply the procedure to both the car and the cdr of the tree. So what kind of order of growth should we expect for such a procedure? Notice that there are two calls to the procedure in the recursive case, and that should remind you of a type of procedure we saw before, namely exponential. Step back for a second. Notice how again the definition of this procedure reflects the structure of the associated data abstraction to which it will be applied. Compare this to what we saw with lists