One Size Fits All: A Universal User Interface

One Size Fits All: A Universal User Interface Ishan Khetarpal Poolesville High School, Poolesville, MD Under Donald Perlis University of Maryland, Co...
Author: Rafe Dennis
1 downloads 2 Views 159KB Size
One Size Fits All: A Universal User Interface

Ishan Khetarpal Poolesville High School, Poolesville, MD Under Donald Perlis University of Maryland, College Park, MD

(October 28, 2009)

A key limitation of speech recognition systems is their inability to apply common sense reasoning. The Metacognitive Loop (MCL) is a device that provides this ability to Artificial Intelligence (AI) systems. Currently, MCL is being applied to Active Logic for ReasonEnhanced Dialogue (ALFRED), a universal interface for all computer systems. The final version of ALFRED will accept verbal input, and the goal is to use MCL to reduce errors in speech recognition. After a speech recognition interface for ALFRED was created, data was collected on common errors in utterance recognition. Most of the sentences that were generated randomly were interpreted with no errors, but purposefully ambiguous sentences (with homonyms) were often misinterpreted as the computer could not reason about context. Also, due to the nature of the software used to create the verbal interface to ALFRED, alterations in manner of utterance (such as background noise and faster utterance rate) were prone to error in interpretation. The results were consistent with past studies, and emphasize the need for MCL.

Keywords: Speech Recognition, Common Sense Reasoning, Artificial Intelligence, Natural Language Processing

Khetarpal

1.

2

Introduction

As the number of technologies a person must use grows in increasingly technological times, it becomes impractical for a user to learn the user interfaces for each technology [1]. Active Logic for Reason-Enhanced Dialogue (ALFRED) attempts to remedy this situation by serving as a universal translator: it accepts natural-language, interprets it to determine which technology the user is trying to control, translates the command to the appropriate programming language for the technology, and transmits it to the technology. The current version of ALFRED is also a study in Artificial Intelligence (AI), as an AI agent called the Metacognitive Loop (MCL) is to monitor it. The scope of this research is to adapt ALFRED to accept spoken naturallanguage utterances, and to collect data on common errors in speech recognition for further study. Data confirmed a need for MCL to distinguish between ambiguous words, such as homonyms, and to realize when background noise or altered utterance rate was affecting understandability. The Metacognitive Loop There are two components to intelligence: cleverness and common sense. Traditional AI systems focused on the cleverness aspect, knowing the best strategies for a specific problem, such as checkers. However, such programs are unable to realize when they are confused, and thus cannot adapt to new situations, and cannot even seek assistance in doing so. A new type of AI system has been suggested that can learn when exposed to new and unexpected situations [2]. It can learn from mistakes, associate action with reasoning, and seek and take advice. This allows it to respond to error and confusion. The system will not be clever in the face of new situations. However, it will be able to ask for help and learn accordingly. This is an improvement over the checkers-type system mentioned above, which cannot reason about new situations [2]. A key component of this new method is perturbation tolerance: the ability to keep going even when expectations are violated. Changes can occur in the Knowledge Base (KB), such as an ambiguous goal, a change in sensor information, typographical errors in input, etc. The system must never fail/crash, must be able to determine when a perturbation has occurred, must be able to assess possible strategies for repair, and must be able to implement a selected strategy [2]. This proposed system will keep track of its past actions so that it may learn from them, and will thereby have a history of its own [2]. Humans seem to use a procedure very similar to this to address everyday situations in a continually changing world. While this may be due to evolutionary knowledge or an unorganized collection of mechanisms, evidence suggests that this procedure is discrete and can be replicated [2]. Speech-Recognition Component The aim for ALFRED is a universal interface. That is, a person should be able to make statements such as “turn on the lights” or “book a ticket to Hawaii,” and ALFRED should be able to transmit them respectively (to the home and to a travel agency in their appropriate languages). ALFRED is currently being tested with simulated Mars and Afghanistan domains, but will eventually serve as an interface for any or all computer systems. Before this research was

Khetarpal

3

conducted, ALFRED accepted only keyboard input. The research served to add speech recognition to ALFRED, and collected data on errors in speech recognition for future incorporation into the project. The data supported previous studies’ findings on difficulties in speech recognition [3].

2.

Materials and Methods

Commercially available Dragon NaturallySpeaking speech-recognition software was used as a user interface to ALFRED. This software is advantageous over others, such as Carnegie Mellon Sphinx, because it is significantly more accurate due to built in learning algorithms. However, a disadvantage of Dragon is that it requires specific user training, which would not be feasible in the real-world due to the vast variety of different users. Due to this limitation, research proceeded in a manner that would allow Dragon to be exchanged for another product later, with a Graphical User Interface (GUI) that had text boxes in which either keyboard input or any speech recognition program’s input could be accepted. The current version of ALFRED runs only on Linux, but Dragon runs only on Windows. As a result, a significant portion of the program written was to bridge the two via Transmission Control Protocol (TCP). Extendibility for possible text based output from ALFRED was also included: a future researcher can readily modify the TCP connection for use in both directions, allowing ALFRED to transmit statements to the user interface (the complete program code is found in Appendix A). Data on common errors was collected, with randomly generated sentences, purposefully ambiguous sentences (with homonyms), and crafted phrases specific to Mars and Afghanistan.

3.

Data and Results

For utterances from the random sentence generator and from purposefully ambiguous, a rubric was used to evaluate understanding of the utterance, with 1 corresponding to a perfect interpretation and with 3 corresponding to major errors. As expected, purposefully ambiguous phrases with homonyms were often interpreted incorrectly. “It’s not easy to wreck an ice beach” was interpreted as an earlier utterance, “It’s not easy to recognize speech.” However, in some cases, even arbitrary sentences from the random sentence generator had a large degree of complexity. For example, “The interfering sophisticate oils a carpet” was interpreted as a 2 and “The ethical key chambers a double photo” was interpreted as a 3. The data confirm association of errors with ambiguous and complex utterances (the complete data can be found in Table 1 of Appendix B). Utterances designed for the Mars and Afghanistan domains were tested in several conditions: quiet, noise, fast, slow. Each utterance was stated three times for each condition, and Dragon’s exact interpretation of the utterance was recorded. Data reveal errors associated with background noise and altered utterance rate. For example, the utterance “Stop and deboard” was interpreted as “Stop and divorce” with noise and as “Stop Randy Board” with rapid utterance rate (See Table 2 of Appendix B for the complete data).

Khetarpal

3.

4

Conclusion

The data confirm that complex utterances, ambiguous utterances with homonyms, and utterances made in non-standard conditions are not as easily recognized by Dragon. This may be partially due to Dragon’s requirement of calibration/training before use—it is vulnerable to condition changes. While additional data are needed, from other speakers and with other speech recognition systems, the data confirm ALFRED’s need for MCL’s common sense reasoning capabilities. Even if MCL cannot decipher a garbled utterance, it can suggest that ALFRED ask the user for a rephrase. This demonstration of need for common sense reasoning capabilities from MCL in the realm of natural language is the first step toward demonstrating a universal benefit by any AI system from MCL.

Appendix A: Complete Program Code ConnectorToAlfred.java import java.io.*; import java.net.*; import javax.swing.JOptionPane; //Created by Ishan Khetarpal under authority //of Dr. Don Perlis and Dr. Scott Fults at //The University of Maryland, College Park //Date of Initial Creation: July 27, 2009 //Date of Last Modification: August 12, 2009 public class ConnectorToAlfred { private ServerSocket myHost = null; private Socket myConnection = null; private int myPortNumber = 0; private String myHostName = null; private PrintWriter toAlfred = null; private boolean myStatus = false; private int myUtteranceNumber = 0; public static final int MIN_PORT = 1025; public static final int MAX_PORT = 49150; public ConnectorToAlfred(File writeTo) { for(int i=MIN_PORT; i

Suggest Documents