At the edge of a cognitive space
A group of four people walk into a room and the leader says, "Watson, bring me the last working session." The computer recognizes and greets the group, then retrieves the materials used in the last meeting and displays them on three large screens. Settling down to work, the leader approaches one screen, and swipes his hands apart to zoom into the information on display. The participants interact with the room through computers that can understand their speech, and sensors that detect their position, record their roles and observe their attention. When the topic of discussion shifts from one screen to another, but one participant remains focused on the previous point, the computer asks a question: "What are you thinking?"
Source: ensselaer Polytechnic Institute
It's a simple scene that illustrates a milestone in the development of environments allowing humans to interact naturally with machines. In a collaboration between Rensselaer Polytechnic Institute and IBM Research, the Cognitive and Immersive Systems Laboratory (CISL) has reached that milestone, and is poised to advance cognitive and immersive environments for collaborative problem-solving in situations like board rooms, classrooms, diagnosis rooms, and design studios.
"This new prototype is a launching point -- a functioning space where humans can begin to interact naturally with computers," said Hui Su, director of CISL. "At its core is a multi-agent architecture for a cognitive environment created by IBM Watson Research Center to link human experience with technology. In CISL, we created this architecture to integrate technologies that register different kinds of human behavior captured by sensors as individual events and forward them to the cognitive agents behind the scene for interpretation. Enhancing this architecture will allow us to link new sensing technologies and computer vision technologies into the system, and to enable collaborative decision making tools on top of these technologies."
The current capabilities of the space are rudimentary in comparison with human understanding. The room can understand and register speech, three specific gestures, the position of occupants of the room, their roles, and the spatial orientation of those occupants, triggering the correct cognitive computing agents to take action and bring data and information relevant to the discussion into the room in real-time. But the promise is clear.
"From this point, we can build the capability for better interpreting what happens in the room," said Su. "Our architecture provides a framework for incorporating new technologies such as more cognitive computing capabilities that interpret human behavior. That allows us to really dig in to what people mean during a discussion, triggering the cognitive computing agents to bring valuable analysis and insights to the discussion. In terms of interpreting behavior, we are at the very beginning, but from here the terrain gets very interesting."
CISL is developing its prototype "situations room" using Studio 2 in the Curtis R. Priem Experimental and Performing Arts Center (EMPAC) at Rensselaer. Studio 2 was designed as an "exceptionally versatile space for the integration of digital technology with human expression and perception," and easily incorporates the technology CISL is creating. The prototype relies on several cognitive technologies developed by Rensselaer and IBM, as well as sensors -- such as microphones, cameras, and Kinnect motion sensors -- linked by the CISL architecture.
Within Studio 2, sensors detect human activity, such as a change in the position of an occupant of the room, speech, gesture, and head movement. Absent the CISL architecture, each of the cognitive technologies acts in solitude, responding to a specific activity detected by a single type of sensor and provided to the computer for interpretation. A sensor provides an input, and the computer provides an output. The interaction between human and machine is based on a single action with a finite duration.
The CISL architecture makes it possible for the computer to register and track activities from multiple sensors for interpretation by multiple cognitive technologies through a message queue. The sensors and cognitive technologies work in concert, to register and interpret "multimodal" human behavior through multiple activities over an extended duration. When a person enters the environment, sensors capture different kinds of activity, and -- through the CISL architecture -- the computer records each activity as a specific event, and forwards it to cognitive technologies for interpretation and response.
"Humans don't stop to distinguish between the modalities they use to communicate. You point to something on the screen, move your hands and you talk about it, and I understand which parts are significant and interpret them," Su said. "The first step to bridging that barrier is to make it possible for the machine to absorb that behavior in the correct order and understand which part is significant. They have to absorb and interpret multiple modalities simultaneously."
The new CISL prototype draws on several technologies from the IBM Bluemix cloud platform that interpret text -- first translating speech to text, then using natural language processing through Watson to interpret text -- and trigger the correct cognitive computing agents to take the correct actions. Cognitive technologies developed at Rensselaer can interpret three gestures (hands swiping together to zoom in or out of a window on screen, or swiping in one direction to close a window), track and interpret the position of occupants in the room, and track and interpret the orientations of those occupants. The machine also tracks and registers information displayed on the screens installed in the space for machines to interpret and help long-term human activities such as a mergers and acquisitions discussion The interaction is fluid and continuous, and the future is within grasp.
"This work is important because now we can start to do more interpretation," Su said. "Now we can add modalities -- more than just basic movement and speech, and richer understanding and interpretation. We can begin to talk about the subtleties of human behavior like bias and emotion. With this step, we have opened up a broad horizon. This helps us build a symbiotic relationship between humans and machines."