Scientific Emphasis Proceedings, June 01, 2001 © Essem 2001
This document
provides some comments on Nuance’s SpeechObject Submission to the W3C.
We believe that
the Nuance SpeechObject proposal is an interesting framework for
object-oriented development of directed dialog applications for Interactive
Voice Response (IVR) systems.
However, we have
some comments regarding its applicability as a standard reusable dialog
component framework within the context addressed by the W3C voice browser
working group.
·
We
believe that the reusable component framework adopted by the W3C voice browser
activity should remain independent of specific implementation platforms.
Therefore, we would therefore recommend keeping such components independent of
additional or proprietary platforms like the SpeechChannels.
·
We would prefer the W3C reusable component requirements to be met
within the VoiceXML framework via a markup based solution that integrates
cleanly with VoiceXML:
o
This would keep the
framework independent of specific implementation platforms, as it would be
fully supported by the VoiceXML interpreter while maintaining VoiceXML
declarative authoring.
§
Enabling
reusable dialog components within the VoiceXML framework would allow users to
learn from and re-use standard components as they get deployed.
·
We
would like the reusable component framework to play
well over HTTP in a client-server environment.
o
As expressed, the framework described in the Nuance submission
appears to be IVR-centric.
o
Going
forward, it is important to support speech
applications distributed across client and server, with the client performing
various levels of processing depending on the size of the client.
We believe that the reusable component framework needs to address the
issues of context sharing, parallel activation and mixed initiative across
objects.
More specifically:
·
SpeechObjects may need to share context in order to come up with
the right NL interpretation.
o
This may include access to intermediate states of an object;
o
As described, at present these are not available to other object,
since an object returns only after it is done executing.
·
We need to address parallel activation of SpeechObjects (beyond
the first prompt or entry point in the object as also mentioned in the updated
W3C reusable dialog component requirements (section 2.2)) (Note: this
document not publicly accessible. It is only accessible to W3C members).
o
As described, speech objects need to complete their execution by
reaching the end of the associated dialog or terminate with an error event.
There are no provisions to interrupt and resume an object or switch between
objects.
o
As described, it is not possible to decide on the fly which of a
set of parallel engines should process a given input.
o
As described it is not possible to use the result of an object
running in parallel to update the intermediate state of another object.
·
Multi-modal applications will also require context-sharing between
modalities, including explicit time-sharing of input events in different
modalities. It is not clear how this will be achievable within the current specification
of the SpeechObjects.
·
It would be advantageous to retain the reusable component
framework within declarative XML markup in order to easily enable future mixed
initiative extensions. This would allow for more easily following the evolution
of the Voice XML interpret and its associated Form interpretation Algorithm to
support advanced mixed initiative capabilities. Adopting an object-oriented
extension framework at this stage might threaten future development of
applications that support mixed initiative and context sharing across objects,
subdialogs and documents as well as all the different types of form items.
Eventually, we believe that it is important in the future that a
reusable dialog component framework supports sharing of logic for multi-channel
or multi-modal applications. Today it is not clear how SpeechObjects integrate
for example with web and wireless applications.
We recommend a two steps standardization approach:
·
To build a reusable VoiceXML dialog component based on
the set of components proposed in the W3C Reusable Dialog
Requirements and update
proposal with parameters, function and result object structure similar to
what is proposed in SpeechObject Submission to the W3C.
·
To evolve the
component framework based on the lessons learned from deploying the above. This
parallels the evolution of HTML to cover those constructs that are demanded by
real world WWW usage scenarios
In addition, we
recommend that the issues of context sharing be addressed and a corresponding
update of the W3C Reusable Dialog Requirements.
These
comments are based upon key contributions from Jaroslav Gergic, Rafah Hosn, Jan Kleindienst, Stéphane H. Maes, TV Raman, Jan Sedivy, Ladislav Seredi.