Recent advances in sketch recognition*

Recent advances in sketch recognition* by NICHOLAS NEGROPONTE Massachusetts Institute of Technology Cambridge, Massachusetts In a shocking and almost...
Author: Aron Clarke
1 downloads 1 Views 955KB Size
Recent advances in sketch recognition* by NICHOLAS NEGROPONTE Massachusetts Institute of Technology Cambridge, Massachusetts

In a shocking and almost silly interview with Max Jacobson, Christopher Alexanderl recounted the fol-

the salient characteristics of an actual computer program, but most of the major issues are far broader than the exp-elierrc-e--l:-an -admit. 'Plre---re-a-d-ersilOutd se-dously won ~ der (as we continually do), if drawing is a two-dimensional language, does sketching have a syntax and semantics? Is any of HUNCH more than the syntactical processing of a hand drawing? The founding work in computer graphics was called SKETCHPAD.3 While this was an effective name, in some way it polluted the notion of "sketching" in any sense of the word. In contrast to SKETCHPAD, "We view the problem of sketch recognition as the step by step resolution of the mismatch between the user's intentions (of which he himself may not be aware) and his graphical articulations. In a design context, the convergence to a match between the meaning and the graphical statement of that meaning is complicated by continually changing intentions that result from the user's viewing his own graphical statements."4 Sketching can be considered both as a form of introspection, communicating with oneself, and as a form of presentation, communication with others. In the first case, the machine is holding the same pencil, eavesdropping so to speak. In the second case you are sharing a piece of paper with the machine, and both of you are drawing on the same sheet with his (its) own stylus. In both instances memory is the drawing medium (for the human at least) and the design vehicle for looping into the physical world. We are not suggesting that the heart magically tells the wrist something that embellishes a concept passing from mind to medium. We are proposing that a nebulous idea is characterized by not knowing when you begin a sentence exactly what you are going to say at the end. Furthermore, the final "phrases" are in fact flavored (for better or for worse) by your initial tack and your, our, or the computer's reaction to it. Consequently, in an act like sketching, the graphical nature of the drawing or doodle (that is, the wobbliness of lines, the collections of overtracings, and the darkness of inscriptions) have important meanings, meanings that must not be, but are for the most part, overlooked in computer graphics. "A straight line 'sketch' on a cathode ray tube could trigger an aura of completeness injurious to the designer as well as antagonistic to the design."5

lowing story:---

"There was a conference which I was invited to a few months ago where computer graphics was being discussed as one item and I was arguing very strongly against computer graphics simply because of the frame of mind that you need to be in to create a good building. Are you at peace with yourself? Are you thinking about smell and touch, and what happens when people are walking about in a place? But particularly, are you at peace with yourself? All of that is completely disturbed by the pretentiousness, insistence and complicatedness of computer graphics and all the allied techniques. So my final objection to that and to other types of methodology is that they actually prevent you from being in the right state of mind to do the design, quite apart from the question of whether they help in a sort of technical sense, which, as I said, I don't think they do." While we find notions of a "frame of mind ... to create a good building" extremely distasteful (and paternalistic), we wholeheartedly admit that computer graphics is guilty of great complication and noise. In general, computer graphics research has been totally self-serving, aptly fitting Weizenbaum's2 analogy: "It is rather like an island economy in which the natives make a living by taking in each other's laundry." The following paper describes a specific experiment in computer graphics, one with which Alexander might someday be at ease: sketch recognition. The effort is particularly exciting (to us) because it allows for a wide variety of approaches (some contradictory), modestly executable, with the acknowledgment that the limiting case-a computer that can recognize a hand-drawn sketch with the same reliability as an onlooking human-will require a machine intelligence. The following pages report upon * This work has been supported by The Office of Computing Activities The National Science Foundation

663

From the collection of the Computer History Museum (www.computerhistory.org)

664

National Computer Conference, 1973

tracking vernacular, HUNCH takes in every nick and bump, storing a voluminous history of your tracings on both magnetic tape and storage tube. HUNCH is not looking at the sketch as much as it is looking at you sketching; it is dealing with the verb rather than the noun. It behaves like a person watching you sketch, seeing lines grow, and saying nothing until asked or triggered by a conflict recognized at a higher level of application. Unlike a completed sketch, that is, a two-dimensional representation, what we have just described is so far one dimensional. The information is recorded serially at the rate of 200 X, Y, and limited Z coordinates per second. This coordinate information is augmented by measurements of pressure upon the stylus, from zero to fifty ounces. At this writing, position and pressure are the only recorded data; one can imagine measuring how hard the sketcher is squeezing the pen or taking his galvonic skin resistance. In addition to position and pressure the

Figure 1

In contrast to most graphical systems, we have built a sketch recognition system called HUNCH that faithfully records wobbly lines and crooked corners in anticipation of drawing high-level inferences about ... ! The goal of HUNCH is to allow a user to be as graphically freewheeling and inaccurate as he would be with a human partner; thus the system is compatible with any degree of formalization of the user's own thoughts. Unlike the SKETCHPAD paradigm. which is a rubber-band pointing-andFigure 3

method of reporting X, Y, Z (that is, a continuous updating 200 times per second) is in fact a built-in form of clock, which provides the added and crucial features of speed and acceleration. Either on-line or upon command, HUNCH performs certain transformations on the stream of data and then examines it for the purpose of recognizing your intentions at three levels: (1) what you meant graphically, in two dimensions; (2) what you meant physically, in three dimensions; and (3) what you meant architecturally. Each catagory is progressively more difficult. They range from recognizing a square, to a cube, to your being a new brutalist. GRAPHICAL INTENTIONS

Figure 2

This section describes the most primitive level of recognition, which involves graphical intentions at the level of finding lines, corners, and two-dimensional geometric properties, For h11man