VoiceXML Over HTTP
Communication between the VXML Server and Voice browser is based on request-response cycles using VoiceXML over HTTP. VoiceXML documents are linked together by using the Uniform Resource Identifiers (URI), which is a standardized technology to reference resources within a network. User input is carried out by web forms similar to HTML. Forms contain input fields that the user edited and sent back to a server.
Resources for the Voice browser are located on the VXML Server. These resources are VoiceXML files, digital audio, instructions for speech recognition (Grammars), and scripts. Every communication process between the VoiceXML browser and Voice application has to be initiated by the VoiceXML browser as a request to the VXML Server. For this purpose, VoiceXML files contain grammars that specify expected words and phrases. A link contains the URL that refers to the Voice application. The browser connects to that URL as soon as it recovers a match between spoken input and one of the grammars.
Note |
From Unified CVP Release 9.0(1) and later release the CVP installer installs CVP Call Server, CVP VXML Server and Media Server together. On installing CVP installer, you can configure only Call Server, VXML Server, Media Server or any other combination as required. |
When determining the VXML Server performance, consider the following key aspects:
-
QoS and network bandwidth between the Web application server and the voice gateway
For details, see Network Infrastructure Considerations.
-
Performance on the VXML Server
For details, see the Hardware and System Software Specification for Cisco Unified Customer Voice Portal at http://www.cisco.com/en/US/products/sw/custcosw/ps1006/prod_technical_reference_list.html.
-
Use of prerecorded audio versus Text-to-Speech (TTS)
Voice user-interface applications tend to use prerecorded audio files wherever possible. Recorded audio sounds better than TTS. Prerecorded audio file quality must be designed so that it does not impact download time and browser interpretation. Make recordings in 8-bit mu-law 8 kHz format.
-
Audio file caching
Ensure that the voice gateway is set to cache audio content to prevent delays from downloading files from the media source. For details about prompt management on supported gateways, see Cisco IOS Caching and Streaming.
-
Use of Grammars
A voice application, such as any user-centric application, is prone to certain problems that might be discovered only through formal usability testing or observation of the application in use. Poor speech recognition accuracy is one type of problem common to voice applications, and a problem most often caused by poor grammar implementation. When users mispronounce words or say things that the grammar designer does not expect, the recognizer cannot match their input against the grammar. Poorly designed grammars containing many difficult-to-distinguish entries also result in many incorrectly recognized inputs, leading to decreased performance on the VXML Server. Grammar tuning is the process of improving recognition accuracy by modifying a grammar based on an analysis of its performance.