Speech Recognition Documentation

Step 0 – Imports

Most users will require the speechrec.grammar library to build a grammar to recognize against, as well as the speech library to build an automated prompt. This library may not be necessary if you have audio assets to play in its place.

local speech = require "summit.speech"
local grammar = require "summit.speechrec.grammar"

Step 1 – Building a Grammar

The speechrec.grammar library provides methods that allow you to create a grammar object from smaller grammar fragments. This library allows grammars to be created programmatically, embedding user input and variable data into the grammar, and allows chunks of grammars to be reused within your application.

The following code creates a grammar for a simple directory that will either connect the user to the sales department when the user says “sales”, the marketing department when the user says “marketing”, and connects the user to a person when they say either “agent”, “help”, or “let me speak to a human”.

local grm = grammar.choice({
    'sales',
    'marketing',
    grammar.choice({
        'agent',                  -- these options
        'help',                   -- are all interpreted
        'let me speak to a human' -- as "help"
    }, 'help'),
})

Step 2 – Recognition

Once a grammar is built, it can be passed to the channel.recognize command. This will return two objects: the recognition result and a possible error value. This error value should not be significant unless there is a problem outside of the summit developer’s control (or if a naughty summit developer supplies a malformed grammar by-hand – this is why it is strongly encouraged to use the grammar building library unless the developer has a strong familiarity with Nuance grammar formats).

The recognize command takes an instance of a grammar directly and will compile it to something that can be used by the speech recognition server implicitly. There is no need for the developer to do anything with the grammar directly.

See the documentation for the channel.recognize command for a more in-depth description of the command’s input parameters.

local result, err = channel.recognize(grm, {
    play=speech.speech('Say a department and we will direct you'),
})

Step 3 – Response

Assuming that the error value is nil, the summit application can react to the metadata in the result object. If the speech recongition server could successfully match the user’s utterance with the supplied grammar, then the recognized method will return true. Otherwise, there could have been an issue with recognition (unclear audio, user did not match the grammar, the DTMF input by the user does not match the validation supplied by the summit application developer, etc). In this case, the cause property of the result object will contain a small string denoting the nature of the failure.

if not result:recognized() then
    channel.say('Sorry, I did not catch that.')
    return -- Or ask again, assume a default, etc
end

See the documentation for RecognizeResult for a more in-depth description about the properties on the result object.

if result.interpretation == "sales" then
    channel.say('Directing you to the sales department.')
elseif result.interpretation == "marketing" then
    channel.say('Directing you to the marketing department.')
else
    channel.say('Connecting you to an agent.')
end

Step 4 – Handling Multiple Interpretations

It is possible that the speech recognition server could not distinguish with high confidence between two interpretations. In this case, additional possible interpretations will be supplied in decreasing order of confidence. Note: the input, interpretation, input-mode, and confidence on the result object will always be the interpretation with the highest confidence. Lower-confidence interpretations can often be ignored, hence this step is optional in almost all cases.

for i=2,result.num_results do
    channel.say(
        'You may also have said ' ..
        result:nth(i).input ..
        'meaning ' .. result:nth(i).interpretation
    )
end