Scientific Emphasis Proceedings, June 01, 2001 © Essem 2001

 

Comments on the Nuance SpeechObject Submission to W3C

Stéphane H. Maes, smaes@us.ibm.com


1.0 Introduction

This document provides some comments on Nuance’s SpeechObject Submission to the W3C.

 

We believe that the Nuance SpeechObject proposal is an interesting framework for object-oriented development of directed dialog applications for Interactive Voice Response (IVR) systems.

 

However, we have some comments regarding its applicability as a standard reusable dialog component framework within the context addressed by the W3C voice browser working group.

2.0 Comments

·         We believe that the reusable component framework adopted by the W3C voice browser activity should remain independent of specific implementation platforms. Therefore, we would therefore recommend keeping such components independent of additional or proprietary platforms like the SpeechChannels.

·         We would prefer the W3C reusable component requirements to be met within the VoiceXML framework via a markup based solution that integrates cleanly with VoiceXML:

o         This would keep the framework independent of specific implementation platforms, as it would be fully supported by the VoiceXML interpreter while maintaining VoiceXML declarative authoring.

§         Enabling reusable dialog components within the VoiceXML framework would allow users to learn from and re-use standard components as they get deployed.

·         We would like the reusable component framework to play well over HTTP in a client-server environment.

o        As expressed, the framework described in the Nuance submission appears to be IVR-centric.

o        Going forward, it is important to support speech applications distributed across client and server, with the client performing various levels of processing depending on the size of the client.

3.0 Additional Comments beyond the W3C reusable Component requirements.

We believe that the reusable component framework needs to address the issues of context sharing, parallel activation and mixed initiative across objects.

 

More specifically:

·         SpeechObjects may need to share context in order to come up with the right NL interpretation.

o        This may include access to intermediate states of an object;

o        As described, at present these are not available to other object, since an object returns only after it is done executing.

·         We need to address parallel activation of SpeechObjects (beyond the first prompt or entry point in the object as also mentioned in the updated W3C reusable dialog component requirements (section 2.2)) (Note: this document not publicly accessible. It is only accessible to W3C members).

o        As described, speech objects need to complete their execution by reaching the end of the associated dialog or terminate with an error event. There are no provisions to interrupt and resume an object or switch between objects.

o        As described, it is not possible to decide on the fly which of a set of parallel engines should process a given input.

o        As described it is not possible to use the result of an object running in parallel to update the intermediate state of another object.

·         Multi-modal applications will also require context-sharing between modalities, including explicit time-sharing of input events in different modalities. It is not clear how this will be achievable within the current specification of the SpeechObjects.

·         It would be advantageous to retain the reusable component framework within declarative XML markup in order to easily enable future mixed initiative extensions. This would allow for more easily following the evolution of the Voice XML interpret and its associated Form interpretation Algorithm to support advanced mixed initiative capabilities. Adopting an object-oriented extension framework at this stage might threaten future development of applications that support mixed initiative and context sharing across objects, subdialogs and documents as well as all the different types of form items.

 

Eventually, we believe that it is important in the future that a reusable dialog component framework supports sharing of logic for multi-channel or multi-modal applications. Today it is not clear how SpeechObjects integrate for example with web and wireless applications.

4.0 Conclusions

We recommend a two steps standardization approach:

·         To build a reusable VoiceXML dialog component based on the set of components proposed in the  W3C Reusable Dialog Requirements and update proposal with parameters, function and result object structure similar to what is proposed in SpeechObject Submission to the W3C.

·         To evolve the component framework based on the lessons learned from deploying the above. This parallels the evolution of HTML to cover those constructs that are demanded by real world WWW usage scenarios

 

In addition, we recommend that the issues of context sharing be addressed and a corresponding update of the W3C Reusable Dialog Requirements.

 

Appendix A —Acknowledgements

These comments are based upon key contributions from Jaroslav Gergic, Rafah Hosn, Jan Kleindienst, Stéphane H. Maes, TV Raman, Jan Sedivy, Ladislav Seredi.