The Institute for Systems Research
Maryland Robotics Center
Brain and Behavior Institute
Professor Aloimonos holds a Ph.D. in Computer Science from the University of Rochester.
His research is devoted to the principles governing the design and analysis of real-time systems that possess perceptual capabilities, for the purpose of both explaining animal vision and designing seeing machines. Such capabilities have to do with the ability of the system to control its motion and the motion of its parts using visual input (navigation and manipulation) and the ability of the system to break up its environment into a set of categories relevant to its tasks and recognize these categories (categorization and recognition).
The work is being done in the framework of Active and Purposive Vision, a paradigm also known as Animate or Behavioral Vision. In simple terms, this approach suggests that Vision has a purpose, a goal. This goal is action; it can be theoretical, practical or aesthetic. When Vision is considered in conjunction with action, it becomes easier. The reason is that the descriptions of space-time that the system needs to derive are not general purpose, but are purposive. This means that these descriptions are good for restricted sets of tasks, such as tasks related to navigation, manipulation and recognition.
If Vision is the process of deriving purposive space-time descriptions as opposed to general ones, one is faced with the difficult question of where to star t (with which descriptions)? Understanding moving images is a capability shared by all "seeing" biological systems. It was therefore decided to start with descriptions that involve time. Another reason for this is that motion problems are purely geometric and understanding the geometry amounts to solving the problems. This led to a consideration of the problems of navigation. Within navigation, once again, one faces the same question: in which order should navigational capabilities be developed? This led to the development of a synthetic approach, according to which the order of development is related to the complexity of the underlying model. The appropriate starting point is the capability of understanding self-motion. By performing a geometric analysis of motion fields, global patterns of partial aspects of motion fields were found to be associated with particular 3D motion. This gave rise to a series of algorithms for recovering egomotion through pattern matching. The qualitative nature of the algorithms in conjunction with a nature of the well-defined input (the input is the normal flow, i.e. the component of the flow along the gradient of the image) makes the solution stable against noise.
Other problems, higher in the hierarchy of navigation, are independent motion detection, estimation of ordinal depth, and learning of space. To illustrate these topics, consider the case of ordinal depth. Traditionally, systems were supposed to estimate depth. Such metric information is too much to expect from systems that are supposed to just navigate successfully. Many tasks can be achieved by using an ordinal depth representation. Such a representation can be extracted without knowledge of the exact image motion or displacement. Recent studies on visual space distortion have triggered a new framework for understanding visual shape. A study of a spectrum of shape representations lying between the projective and Euclidean layers is currently underway.
The learning of space can be based on the principle of learning routes. A system knows the space around it if it can successfully visit a set of locations. With more memory available, relationships between the representations of different routes give rise to partial geocentric maps.
In hand-eye coordination, the concept of a perceptual kinematic map has been introduced. This is a map from the robot's joints to image features. Currently under investigation is the problem of creating a classification of the singularities of this map.
The work on active, anthropomorphic vision led to the study of fixation and the development of TALOS (TALOS), a system that implements dynamic fixation. Since fixation is a principle of Active Vision and fixating observers build representations relative to fixations, it is important to solve fixation in real time and demonstrate it in hardware. TALOS consists of a binocular head/eye system augmented with additional sensors. It is designed to perform fixation as it is moving, in real time.
The ideas of Purposive Vision have led to the study of Intelligence as a purposive activity. A four-valued logic is being developed for handling reasoning in a system of interacting purposive agents.
The research of Professor Aloimonos is devoted to the principles governing the design and analysis of real-time systems that possess perceptual capabilities, for the purpose of both explaining animal vision and designing seeing machines. Such capabilities have to do with the ability of the system to control its motion and the motion of its parts using visual input (navigation and manipulation) and the ability of the system to break up its environment into a set of categories relevant to its tasks and recognize these categories (categorization and recognition). The work is being done in the framework of Active and Purposive Vision, a paradigm also known as Animate or Behavioral Vision.
Since the early 2000 he has been working on the integration of sensorimotor information with the conceptual system, bridging the gap between signals and symbols. This led to the introduction of language tools into the Robotics community. During the past five years his research is supported by the European Union under the cognitive systems program in the projects POETICON and POETICON++ , by the National Science Foundation under the Cyber Physics Systems Program in the project Robots with Vision that find objects and by the National Institues of Health in the project Human Activity Languages.
Here is an example of going from language to action (ask a robot to do something). Note how the robot announces that he has to think for a moment, before performing the action but does not reveal its thinking. Here, some of the thinking is revealed.
For the dual problem of going from action to language (observing an activity and describing in natural language what is going on), see our demos in the Telluride Neuromorphic Cognition Engineering workshops .