Sunday, March 29, 2009

Ari N. Schulman - Why Minds Are Not Like Computers

The human brain will never be mimicked by any kind of computers currently existing. It pains me every time I hear someone equate the brain with a computer and assume that we will create a computer that mimics the brain. This article refutes all that.

Why Minds Are Not Like Computers

Ari N. Schulman

When the blackbird flew out of sight,
It marked the edge
Of one of many circles.
—Wallace Stevens

People who believe that the mind can be replicated on a computer tend to explain the mind in terms of a computer. When theorizing about the mind, especially to outsiders but also to one another, defenders of artificial intelligence (AI) often rely on computational concepts. They regularly describe the mind and brain as the “software and hardware” of thinking, the mind as a “pattern” and the brain as a “substrate,” senses as “inputs” and behaviors as “outputs,” neurons as “processing units” and synapses as “circuitry,” to give just a few common examples.

Those who employ this analogy tend to do so with casual presumption. They rarely justify it by reference to the actual workings of computers, and they misuse and abuse terms that have clear and established definitions in computer science—established not merely because they are well understood, but because they in fact are products of human engineering. An examination of what this usage means and whether it is correct reveals a great deal about the history and present state of artificial intelligence research. And it highlights the aspirations of some of the luminaries of AI—researchers, writers, and advocates for whom the metaphor of mind-as-machine is dogma rather than discipline.

Conceptions of the Computer

Before any useful discussion about artificial intelligence can proceed, it is important to first clarify some basic concepts. When the mind is compared to a computer, just what is it being compared to? How does a computer work?

Broadly speaking, a computer is a machine that can perform many different procedures rather than just one or a few. In computer parlance, a procedure is known as an algorithm—a set of distinct, well-defined steps. Suppose, for example, that you work in an office and your boss asks you to alphabetize the books on his shelf. There are many ways you could do it. For example, one approach would be to look through all of the books and find the first alphabetically (say, Aesop’s Fables), and swap it with the first book on the shelf. Then look through the remaining unsorted books again, find the next highest, and swap it with the book after Aesop’s Fables. Keep going until you have no unsorted books left. This procedure is known as “selection sort” because the approach is to select the highest unsorted book and put it with the sorted books.

The algorithmic approach, as this example shows, is to break up a problem into a series of simple steps, each of which requires little thought or effort. In this particular procedure, the number of specified steps is fairly small—but when you actually perform a selection sort to organize a bookshelf, the number of steps you execute will be much larger, because most of the steps are repeated for each book. The heart of most useful algorithms is repetition; selection sort accomplishes a task with one basic operation that, when performed over and over, completes the whole task. An algorithm doesn’t necessarily have to involve repetition, but any task performed on a large set of data usually will use such repeated steps, known as “loops.” Selection sort also has a well-defined start state (the unsorted shelf) and end state (the sorted shelf), which can be referred to as its input and output. Algorithms have a well-defined set of steps for transforming input to output, so anyone who executes an algorithm will perform the same steps, and an algorithm’s output for a given input will be the same every time it is executed (even so-called “randomized” algorithms are deterministic in practice).

Algorithms involve several forms of abstraction. First, an algorithm consists of clear specifications for what should be performed in each step, but not necessarily clear specifications for how. In essence, an algorithm takes a problem specifying what should be achieved and breaks it into smaller problems with simpler requirements for what should be achieved. An algorithm should specify steps simple enough that what becomes identical to how as far as the person or machine executing the algorithm is concerned. How specific the steps need to be in order for this identity to occur depends on the intelligence of the executor. Returning to the example of sorting your boss’s books, the step in which you select the highest unsorted book is more complex than, say, the one in which you pull that book off the shelf and swap it with another. For a highly intelligent sorter, how to execute this step may be self-evident; a less intelligent sorter may need the details spelled out (perhaps like this: write down the first unsorted title; for each remaining book, check its title; if it’s higher, cross it off and write it down instead, along with where it is on the shelf so you can quickly find it again). This routine can be considered a sub-procedure of the original algorithm.

Suppose you wanted to pay someone else to organize the books for you using selection sort. You could simply write the original few steps on a pad of paper in the level of detail at which they were first described. But when you include the step about “selecting the highest unsorted book,” since another person might not know how to do it, you could include the note “see page 2 for instructions on how to do this,” and then list the steps of this sub-routine on page 2. The intelligence of the sorter would lead you to specify more or fewer detailed sub-routines, depending on what steps the sorter already knows how to do. The tasks that an executor can perform in which the what can be specified without the how are known as “primitive” operations.

There is also an abstraction in the description of the objects involved in the algorithm. Certain assumptions are made about their nature. In our example, the books have titles composed of known characters, allowing for alphabetization; the shelf has an ordering (beginning to end, or left to right); the books are objects that can fit onto the shelf and be moved about; and so on. These characteristics may seem rather obvious—so much so that they are inextricable from the concepts of “book” and “shelf”—but what is important is that only these few properties are relevant for the purposes of the algorithm. You, as the sorter, need know nothing about the full nature of a book in order to execute the algorithm—you need only have knowledge of shelf positions, titles, and how titles are ordered relative to one another. This abstraction is useful because the objects involved in the algorithm can easily be represented by symbols that describe only these relevant properties.

These two forms of abstraction are at the core of what enables the execution of procedures on a computer. At the level of its basic operations, a computer is both extremely fast and exceedingly stupid, meaning that the type of task it can perform in which the what is the same as the how is very simple. For a computer to perform the selection sort algorithm, for example, it would have to be described in terms of much simpler primitive steps than the version offered here. The type of steps a computer can perform are usually about as complex as “tell me if this number is greater than that number” and “add these two numbers and tell me the result.” The power of the computer derives not from its ability to perform complex operations, but from its ability to perform many simple operations very quickly. Any complex procedure that a computer performs must be reduced to the primitive operations that a computer can execute, which may require many levels at which the procedure is broken down into simpler and still simpler steps.

Manipulating Symbols

Imagine that you have a computer with three useful abilities: it has a large number of memory slots in which you can store numbers; you can tell it to move existing numbers from one slot to another; and it can compare the numbers in any two slots, telling you which is greater. You can give the computer a sequential list of instructions to execute, some examples of which could be, “Store the number ‛25’ in slot 93,” “Copy the number from slot 76 into slot 41,” “Tell me whether the numbers in slots 17 and 58 are equal,” and “If the last two numbers compared were equal, jump back four instructions, otherwise keep going.” Could you use such instructions to perform your book-sorting task?

To do so, you must be able to represent the problem in terms that the computer can understand—but the computer only knows what numbers and memory slots are, not titles or shelves. The solution is to recognize that there is a correspondence between the objects that the computer understands and the relevant properties of the objects involved in the algorithm: for example, numbers and titles both have a definite order. You can use the concepts that the computer understands to symbolize the concepts of your problem: assign each letter to a number so that they will sort in the same way (1 for A, 26 for Z), and write a title as a list of letters represented by numbers; the shelf is in turn represented by a list of titles. You can then reduce the steps of your sorting job into steps at the level of simplicity of the computer’s basic operations. If you do this correctly, the computer can execute your algorithm by performing a series of arithmetical operations. (Of course, getting the computer to physically move your boss’s books is another matter, but it can give you a list ordered the way your boss wanted.)

This is why the computer is sometimes called a “symbol-manipulation machine”: what the computer does is manipulate symbols (numbers) according to instructions that we give it. The physical computer can thus solve problems in the limited sense that we imbue what it does with a meaning that represents our problem.

It is worth dwelling for a moment on the dualistic nature of this symbolism. Symbolic systems have two sides: the abstract concepts of the symbols themselves, and an instantiation of those symbols in a physical object. This dualism means that symbolic systems and their physical instantiations are separable in two important (and mirrored) ways. First, a physical object is independent of the symbols it represents: Any object that represents one set of symbols can also represent countless other symbols. A physical object and a symbolic system are only meaningfully related to each other through a particular encoding scheme. Thus it is only partially correct to say that a computer performs arithmetic calculations. As a physical object, the computer does no such thing—no more than a ball performs physics calculations when you drop it. It is only when we consider the computer through the symbolic system of arithmetic, and the way we have encoded it in the computer, that we can say it performs arithmetic.

Second, a symbolic system is independent of its representation, so it can be encoded in many different ways. Again, this means not just that it is independent of any particular representation, but of any particular method of representation—much as an audio recording can exist in any number of formats (LP, CD, MP3, etc.). The same is true for programs, in which ­higher-level concepts may be represented in any number of different ways.

This is a crucial property of algorithms and programs—another way of stating that an algorithm specifies what should be done, but not necessarily how to do it. This separation of what and how allows for a division of knowledge and labor that is essential to modern computing. Computer users know that most popular programs (say, Microsoft Word or Mozilla Firefox) work the same way no matter what computer they’re running on. You, as a user, don’t need to know that the instructions a Windows machine uses to run the program are entirely different from those used by an Apple machine. This view of the interaction between user and program is known to software engineers as a “black box,” because the user can see everything on the outside of the box—what it does—but nothing on the inside—how it does it.

Black boxes pervade every aspect of computer design because they employ three distinct abstractions, each offering tremendous advantages for programmers and users. The first has already been described: a user needs to know only what a program does, so he need not repeat the programmer’s labor of understanding how it does it. The same is true for ­programmers themselves, who need to know only what operations the computer is capable of performing, and don’t need to concern themselves with how it performs them. Second, black box programming allows for simple machines to be easily combined to create more complex machines; this is called modular programming, as each black box functions as a ­module that can be fitted to other modules. The final abstraction of modular programming is perhaps its greatest advantage: the how can be changed without affecting the what. This allows the programmer to conceive of new ways to increase the efficiency of the program without changing its input-output behavior. More importantly, it allows for the same program to be executed on a wide variety of different machines. Most modern computer processors offer the same set of instructions that have been used by processors for decades, but execute them in such a dramatically different way that they are performed millions of times faster than they were in the past.

Computers as Black Boxes

Let’s return to the hypothetical task of sorting your boss’s bookshelves. Suppose that your employer has specified what you should do, but not how—in other words, suppose he is concerned only with transforming the start state of the shelf to a desired end state. You might sort the shelf a number of different ways—selection sort is just one option, and not always a very good one, since it is exceedingly slow to perform for a large number of books. You might instead decide to sort the books a different way: first pick a book at random, and then move all the books that alphabetically precede it to its left, and all the books that alphabetically follow to its right; then sort each of the two smaller sections of books in the same way. You’ll notice that the operations you use are quite different, but your employer, if he notices any change at all, will only note that you completed the task faster than last time. (Called “quicksort,” this is in fact the fastest known sorting algorithm.)

Or, as suggested, you might pay a friend to sort the books—then potentially you would not even know how the sorting was performed. Or you could hire several friends, and assign to each of them one of the simpler parts of the task; you would then have been responsible for taking a complex task and breaking it into more simple tasks, but you would not have been responsible for how the simpler tasks themselves were performed. Black box programming creates hierarchies of tasks in this way. Each level of the hierarchy typically corresponds to a differing degree of complexity in the instructions it uses. In the sorting example, the highest level of the hierarchy is the instruction “sort the bookshelf,” while the lowest is a collection of simple instructions that might each look something like “compare these two numbers.”

Computers, then, have engineered layers of abstraction, each deriving its capabilities from joining together simpler instructions at a lower layer of abstraction. But each layer uses its own distinct concepts, and each layer is causally closed—meaning that it is possible to understand the behavior of one layer without recourse to the behavior of a higher or lower layer. For instance, think about your home or office computer. It has many abstraction layers, typically including (from highest to lowest): the user interface, a high-level programming language, a machine language running on the processor, the processor microarchitecture, Boolean logic gates, and transistors. Most computers will have many more layers than this, sitting between the ones listed. The higher and lower layers will likely be the most familiar to laymen: the user interface creates what you see on the screen when you interact with the computer, while Boolean logic gates and transistors give rise to the common description of the computer as “just ones and zeroes.”

The use of layers of abstraction in the computer unifies several essential aspects of programming—symbolic representation, the divide-and-conquer approach of algorithms, and black box encapsulation. Each layer of a computer is designed to be separate and closed, but dependent upon some lower layer to execute its basic operations. A higher level must be translated into a lower level in order to be executed, just as selection sort must be translated into lower-level instructions, which must be translated into instructions at a still lower level.

The hierarchy of a computer is not turtles all the way down: there is a lowest layer that is not translated into something lower, but instead is implemented physically. In modern computers this layer is composed of transistors, miniscule electronic switches with properties corresponding to basic Boolean logic. As layers are translated into other layers, symbolic systems can thus be represented using other symbols, or using physical representations. The perceived hierarchy derives partially from the fact that one layer is represented physically, thus making its relationship to the physical computer the easiest to understand.

But it would be incorrect to take the notion of a hierarchy to mean that the lowest layer—or any particular layer—can better explain the computer’s behavior than higher layers. Suppose that you open a file sitting on your computer’s desktop. The statement “when I clicked the mouse, the file opened” is causally equivalent to a description of the series of state changes that occurred in the transistors of your computer when you opened the file. Each is an equally correct way of interpreting what the computer does, as each imposes a distinct set of symbolic representations and properties onto the same physical computer, corresponding to two different layers of abstraction. The executing computer cannot be said to be just ones and zeroes, or just a series of machine-level instructions, or just an arithmetic calculator, or just opening a file, because it is in fact a physical object that embodies the unity of all of these symbolic interpretations. Any description of the computer that is not solely physical must admit the equivalent significance of each layer of description.

The concept of the computer thus seems to be based on a deep contradiction between dualism and unity. A program is independent of the hardware that executes it; it could run just as well on many other pieces of hardware that work in very different ways. But a program is dependent on some physical representation in order to execute—and in any given computer, the seemingly independent layers do not just exist simultaneously, but are in fact identical, in that they are each equivalent ways of describing the same physical system.

More importantly, a description at a lower level may be practically impossible to translate back into an original higher-level description. Returning again to our sorting example, suppose now that a friend hires you to do some task that his boss asked him to perform. All he gives you is a list of instructions, each of which is about as simple as “decide if these two numbers are equal.” When you follow these instructions, you will perform the task exactly as your friend has specified, but you may have no idea what task you are performing beyond comparing lots of numbers. Even if you are able to figure out that, say, you are also doing some kind of sort, it could be impossible to know whether you are sorting books rather than addresses or names. The steps you execute still clearly embody the higher-level concepts designed by your friend and intended by his boss, but simply ­knowing those steps may not be sufficient to allow you to deduce those original concepts. In the computer, then, a low-level description of a program does provide a causally closed description of its behavior, but it obscures the higher-level concepts originally used to create the program. One may very likely, then, be unable to deduce the intended purpose and design of a program, or its internal structure, simply from its lower-level behavior.

The Mind as Black Box

Since the inception of the AI project, the use of computer analogies to try to describe, understand, and replicate mental processes has led to their widespread abuse.
Read the rest of the article.


No comments: