Transcribe

The Transcribe element in Call Studio can be used to engage the Google Speech-to-Text services. The Transcribe element is located under the Customer Virtual Assistant group in the Call Studio Elements. This element is extension of the Form element and it engages the Speech Server resource on VVB to communicate with the Google Speech-to-Text Server. To indicate the Speech-to-Text server resource requirement, Call Studio creates a specific grammar - builtin:speech/transcribe - and sends it to VVB in a VXML Page. It does not specify which transcribe service is to be used; this is configured in VVB.

Note

After playing non barge-in prompt for 120 seconds, VVB barges and creates recognition session for the Transcribe element. This causes the prompt to pause and skip a few seconds if the non barge-in prompt is longer than 120 seconds.
The Transcribe element works both with Cisco DTMF and Nuance ASR adaptors.

Settings

Name (Label)	Type	Req'd	Single Setting Value	Substitution Allowed	Default	Notes
`Input Mode`	string	Yes	`true`	`false`	`voice`	The type of entry allowed for input. Possible values are `voice` (only voice input) and `dtmf+voice` (voice and DTMF input).
`NoInput Timeout`	int ≥ 0	Yes	`true`	`true`	`5s`	The maximum time allowed for silence before a noinput event is thrown. Possible values are standard time designations including both a non-negative number and a time unit. For example, `3s` for seconds or `3000 ms` for milliseconds.
`Max NoInput Count`	int ≥ 0	Yes	`true`	`true`	`3`	The maximum number of noinput events allowed during input capture. Possible values are `int > 0` where 0 = infinite noinputs allowed.
`Max NoMatch Count`	int ≥ 0	Yes	`true`	`true`	`3`	The maximum number of `NoMatch` events allowed during DTMF input capture. Possible values are `int > 0` where 0 is infinite `NoMatch` events allowed.
`DTMF Grammar`	string	Yes	`true`	`true`	None	This option is mandatory only if the input mode selected is DTMF and voice. It supports cisco DTMF regex.
`Secure Logging`	boolean	Yes	`true`	`true`	`false`	Whether or not to enable logging of potentially sensitive data of the element. If set to `true`, the element's potentially sensitive data will not be logged.
`Terminiation Character`	string	No	`true`	`true`	`#`	Terminate the voice stream or DTMF collection.
`Max Input Time`	int ≥ 0	Yes	`true`	`true`	`30s`	The maximum time (in seconds) the voice input is allowed to last. Possible values are positive integer values followed by s. For example, `50s`. Default value is `30s`.
`Final Silence`	int > 0	Yes	`true`	`true`	`1s`	The interval of silence (in seconds or milliseconds) that indicates the end of speech. Possible values are positive integer values followed by either s or ms. For example, `3s` and `3000ms`. Default value is `1s`.
`Recognize.phraseHints`	String	No	`true`	`true`	None	This is comma separated string that lists the hints for recognition. Hints are used to recognize a phrase or a word that is pronounced differently. For example, Savings, Current.
`Recognize.alternateLanguages`	String	No	`true`	`true`	None	Comma separated string of up to 3 additional BCP-47 language tags, listing possible alternative languages of the supplied audio other than the default language. For example, en-US, en-IN.

Custom VoiceXML Properties

Name (Label)	Type	Notes
`Recognize.NBestCount`	Integer	Specifies the maximum number of SpeechRecognitionAlternative messages within each SpeechRecognitionResult. The server may return fewer than max_alternatives. Valid values are 0-30. A value of 0 or 1 returns a maximum of 1. If omitted, returns a maximum of 1.
`Recognize.regionId`	String	Specifies the region to be used by the cloud speech-to-text transcription. This property should be configured in the root document of the project.
`Recognize.singleUtterance`	Boolean	Indicates whether this request should automatically end after speech is no longer detected. If this parameter is enabled, cloud speech-to-text will detect pauses, silence, or non-speech audio to determine when to end recognition. If this parameter is disabled, the stream will continue to listen and process audio until either the stream is closed directly, or the stream's length limit is reached. The default setting for this parameter is `true`.
`Recognize.model`	String	This is used to specify the machine learning model to be used by the cloud speech-to-text transcription to improve the recognition results. For example, see https://cloud.google.com/speech-to-text/docs/basics
`Recognize.modelEnhanced`	Boolean	Indicates whether enhanced model has been enabled. If it is enabled, the cloud speech-to-text transcription uses an enhanced speech recognition model to recognize speech and produce audio transcription more accurately. The default setting for this parameter is `true`. You can enable or disable data logging for enhanced speech model. For more information on data logging for enhanced speech model, see https://cloud.google.com/speech-to-text/docs/enhanced-models .

Element Data

Element Data	Notes
`value`	Transcribed text or DTMF collected.
`input_type`	Indicates the type of input captured (`dtmf` or `dtmf+voice`).
`confidence`	The speech recognition confidence between 0.0 and 1.0. A higher number indicates a greater probability that the recognized words are correct. The default of 0.0 is a sentinel value indicating that confidence was not set.
`language_code`	The language code that was triggered during recognition. Also see `Recognize.alternateLanguages` under Settings.

Exit States

Name	Notes
`done`	Transcription completed.
`max_noinput`	The maximum number of `NoInput` events has occurred. If this is `0`, this exit state will not occur.
`max_nomatch`	The maximum number of `NoMatch` events that has occurred. This exit state will not occur if the maximum number of `nomatch` events is `0`. and `input_type` is `voice`. If `input_type` is `dtmf`, `max_nomatch` is the maximum number of DTMF mismatch with DTMF Grammar regex.

Audio Group

Form Data Capture

Name (Label)	Required	Max1	Notes
`initial_audio_group` (Initial)	Yes	Yes	Played when the voice element begins.
`nomatch_audio_group` (NoMatch)	No	No	Played when a `NoMatch` event occurs. This is applicable only when the input type selected is DTMF and voice.
`noinput_audio_group` (NoInput)	No	No	Played when a `NoInput` event occurs.

End

Name (Label)	Required	Max1	Notes
`done_audio_group` (Done)	No	Yes	Played when the form data capture is completed and the voice element exits with the `Done` exit state.

Folder and Class Information

Studio Element Folder Name	Class Name
Form	`com.audium.server.voiceElement.form`.

Events

Name (Label)	Class Name
Event Type	You can select Java Exception, VXML Event, or Hotlink as event handler for this element.

Element Specifications for Cisco Unified CVP VXML Server and Call Studio, Release 12.5(1)

Bias-Free Language

Book Title

Element Specifications for Cisco Unified CVP VXML Server and Call Studio, Release 12.5(1)

Chapter Title

Transcribe

Results

Chapter: Transcribe

Transcribe

Settings

Custom VoiceXML Properties

Element Data

Exit States

Audio Group

Form Data Capture

End

Folder and Class Information

Events

Was this Document Helpful?

Contact Cisco