speechrec.grammar

Summit voice recognition grammar builder.

summit.speechrec.grammar provides a builder interface used to construct grammars to be used with channel.recognize. Because writing grammars by hand is error prone, difficult to debug, we currently require using this interface to construct grammars which are reusable and easier to change and maintain.

Creating a grammar

The functions in this library create a GrammarRule object, which represents a grammar fragment. These fragments can be combined in sequence, choice, and repetition to make more sophisticated grammars. A grammar rule can be given to the speech recognition server for interpretation through the channel recognize command.

local speech = require "summit.speech"
local grammar = require "summit.speechrec.grammar"

local grm = grammar.choice({
    'sales',
    'marketing',
    grammar.choice({
        'agent',                  -- these options
        'help',                   -- are all interpreted
        'let me speak to a human' -- as "help"
    }, 'help'),
})

-- Now, grm can be passed to channel.recognize

local result, err = channel.recognize(grm, {
    play=speech.speech('Say a department and we will direct you'),
})

Classes

Name Summary
GrammarRule

Represents a complete grammar or a grammar fragment.

Functions

Name Summary
builtin

Creates a reference to a builtin grammar fragment.

choice

Creates a reference to a grammar fragment which matches (exactly) one child from a list of child fragments.

optional

Creates a reference to a grammar fragment which can be matched (exactly) once, or be omitted by the speaker.

repetition

Creates a reference to a grammar fragment which can be repeated a number of times.

Important: If a meaning is not supplied, then rules with a zero value for min_times will be interpreted as an empty string when the user does not provide a matching utterance. If an alternate meaning is supplied, then this meaning will be the verbatim meaning of the rule when it is matched regardless of the number of times the utterance repeated the rule (zero times, one time, and many times will all have the same interpretation). For concrete examples, see grm3 and grm4 in the usage below.

If a custom interpretation and the number of times the user repeated a rule is meaningful, then the children should be wrapped using wrap, which converts the meaning of each child individually before being applied to repetition.

sequence

Creates a reference to a grammar fragment which matches a list of child fragments spoken in order.

text

Creates a reference to a grammar fragment which matches literal text.

wrap

Wraps a reference to a grammar fragment with an updated meaning.

Classes : speechrec.grammar

GrammarRule

Represents a complete grammar or a grammar fragment.

This class should not be instantiated directly. Instead, it is recommended that grammar rules are created by the factory methods provided by this library. This class does not have any useful user-facing features, and should be treated opaquely. Grammars are implicitly compiled once they are passed to channel.recognize.

Functions : speechrec.grammar

builtin(name, parameters)

Creates a reference to a builtin grammar fragment. The following builtin grammars are available for use. Each grammar interprets input in a specific way and returns a normalized string containing the perceived meaning of the user’s utternace. Some sample utterances (input) and interpretations (output) are given below.

Grammar Example Utterance Interpretation
zipcode five three two one two 53212
five three two one two one two three four 532121234
boolean yes true
correct true
no false
creditcard three seven eight two eight two two four six three one zero zero zero five 378282246310005
ccexpdate twelve sixteen 20161231
november eighteen 20181130
march twenty nineteen 20190331
currency seventeen dollars 17.00
twenty three dollars and forty nine cents 23.49
two thousand and fifty bucks 2050.00
digits seven two three nine nine 72399
phone five five five four six seven nine 5554679
four one four five five five four six seven nine 4145554679
one four one four five five five four six seven nine 4145554679
four one four five five five four six seven nine extension two three 4145554679x23
socialsecurity five seven four three two one one zero eight 574321108
five seven four three two one one oh eight 574321108
alphanum L 3 I I 9 z l3ii9z
number two point three five 2.35
three hundred and eighty seven 387
a hundred ten 110
time three forty seven A M 0347a
noon 1200p
six in the morning 0600a
nine oh four 0904?
date august second seventeen ??170802
august second two thousand seventeen 20170802
six four nineteen ninety seven 19970604
sunday twenty may two thousand one 20010520
yesterday -1
today 0
tomorrow +1
cancel cancel cancel
exit exit exit
help help help
operator operator operator

The cancel, exit, help, and operator grammars match one word, but can be optionally preceded by a sequence of “uh”, “huh”, and “umm” noises.

In addition, the creditcard grammar validates American Express, Diner’s Club, Discover, Mastercard, Private, and Visa formats. Vocal input will not match if it does not conform to one of these formats. The zip grammar ensures that the 5-digit zipcode is not equal to 00000, and the social security grammar ensures that the input seems legal (does not have an all-zero group or serial, and the area (first three digits) does not exceed 800, which are reserved numbers).

You can also pass the following parameters to the given builtin grammars to customize their validation.

Grammar Parameter Description
time minallowed/maxallowed If the input is outside this range, then the recognized input will not match. By default the values are `0000` and `2359`.
minexpected If the input is outside this range, then the recognized input will have lower confidence. By default the values are `0000` and `2359`.
granularityallowed If the input is not a multiple of the granularity, then the recognized input will not match. If not supplied, the granularity is not checked.
granularityexpected Like `granularityallowed`, but will lower confidence instead of rejecting the input outright.
phone minextension/maxextenion If an extension exists and is outside of this range, then the recognized input will not match. By default the values are `0000` and `9000`.
nondigitstylepenalty The penalty to give the interpretation when the user says `fourteen` instead of `one four`. By default, this value is `200` (high numbers give higher penalties).
currency minallowed/maxallowed If the input is outside this range, then the recognized input will not match. By default the values are `0` and `999999.99`.
minexpected/maxexpected If the input is outside this range, then the recognized input will have lower confidence. By default the values are `0` and `999999.99`.
granularityallowed If `input *100` is not a multiple of `granularity * 100`, then the recognized input will not match. If not supplied, the granularity is not checked.
granularityexpected Like `granularityallowed`, but will lower confidence instead of rejecting the input outright.
disambiguationmode If equal to `assume_larger`, then the interpretation will favor larger numbers when the input is ambiguous (e.g. `20210` over `2210` when the user says `twenty two ten`).
date minallowed/maxallowed If the input is outside this range, then the recognized input will not match. By default the values are `19000101` and `21991231`.
minexpected/maxexpected If the input is outside this range, then the recognized input will have lower confidence. By default the values are `19000101` and `21991231`.
number minallowed/maxallowed If the input is outside this range, then the recognized input will not match. By default the values are `0` and `999999.99`.
minexpected/maxexpected If the input is outside this range, then the recognized input will have lower confidence. By default the values are `0` and `999999.99`.
maxdecimal The maximum nubmer of digits to allow after the dot. The default value is `2`.
granularityallowed If input is not a multiple of granularity, then the recognized input will not match. If not supplied, the granularity is not checked.
granularityexpected Like `granularityallowed`, but will lower confidence instead of rejecting the input outright.
creditcard typesallowed A plus-separated list of valid credit card types. By default, the value is `visa+mastercard+amex+dinersclub+discover+private`. Additional types cannot be added to this value, but card types can be taken away.
ccexpdate referencedate The date (see output of this grammar for the format) to use as a reference to "today". Will use the current year and month as a default.
maxallowed The max number of months in the future from `reference` date. If the recognized input exceeds this value, it will not match.
maxexpected The max number of months in the future from `reference` date. If the recognized input exceeds this value, it will have a lower confidence.
Parameters
Name Type Default Summary

name

string

The name of the grammar file.

parameters

table or nil

nil

Additional parameters to send along with the builtin grammar.

Returns
Type Summary

GrammarRule

Usage
local grammar = require "summit.speechrec.grammar"

local grm1 = grammar.builtin("phone")
local grm2 = grammar.builtin("digits", { length=3 })

choice(children, meaning)

Creates a reference to a grammar fragment which matches (exactly) one child from a list of child fragments.

Parameters
Name Type Default Summary

children

array

A sequence of child fragments. Bare strings are also supported and will be implicitly cast into a text grammar rule with the default meaning.

meaning

string or nil

nil

The interpretation to use if the rule is matched. If no meaning is supplied, the interpretation will match the literal text. The meaning of all child nodes will be discarded if this value is supplied.

Returns
Type Summary

GrammarRule

Usage
local grammar = require "summit.speechrec.grammar"

-- The interpretation will, by default, be the spoken word
local grm1 = grammar.choice({"red", "green", "blue"})

-- Saying "red", "green", or "blue" will give "color" as the
-- literal itnerpratation.
local grm2 = grammar.choice({"red", "green", "blue"}, "color")

optional(child, meaning)

Creates a reference to a grammar fragment which can be matched (exactly) once, or be omitted by the speaker. This function is a simple convenience function wrapping repetition.

Parameters
Name Type Default Summary

child

GrammarRule

The child to optionally match.

meaning

string or nil

nil

The interpretation to use if the rule is matched. If no meaning is supplied, the interpretation will match the literal text. The meaning of all child nodes will be discarded if this value is supplied.

Returns
Type Summary

GrammarRule

Usage
local grammar = require "summit.speechrec.grammar"

local grm = grammar.sequence({
  grammar.optional("very"), -- optional
  "good"                    -- required
})

repetition(min_times, max_times, children, meaning)

Creates a reference to a grammar fragment which can be repeated a number of times.

Important: If a meaning is not supplied, then rules with a zero value for min_times will be interpreted as an empty string when the user does not provide a matching utterance. If an alternate meaning is supplied, then this meaning will be the verbatim meaning of the rule when it is matched regardless of the number of times the utterance repeated the rule (zero times, one time, and many times will all have the same interpretation). For concrete examples, see grm3 and grm4 in the usage below.

If a custom interpretation and the number of times the user repeated a rule is meaningful, then the children should be wrapped using wrap, which converts the meaning of each child individually before being applied to repetition.

Parameters
Name Type Default Summary

min_times

int

The minimum number of times that the children must match.

max_times

int

The maximum number of times that the children must match.

children

GrammarRule or array

The child to repeatedly match. An array of children are also supported and will implicitly be cast into a sequence grammar rule with the default meaning.

meaning

string or nil

nil

The interpretation to use if the rule is matched. If no meaning is supplied, the interpretation will match the literal text. The meaning of all child nodes will be discarded if this value is supplied.

Returns
Type Summary

GrammarRule

Usage
local grammar = require "summit.speechrec.grammar"

-- Matches the word "tomato" one to three times
local grm1 = grammar.repetition(1, 3, "tomato")

-- Matches two to five occurrences of the words "red", "green", and "blue". If
-- matched, the interpretation will be "colors", regardless of the input.
local grm2 = grammar.repetition(2, 5, {"red", "green", "blue"}, "colors")

-- Matches zero to two occurrences of the phrase "pear bookcase ennui". If the
-- user says nothing, then the interpretation is an empty string. If the user
-- says "pear bookcase ennui", the input phrase is the interpretation. If the
-- user says "pear bookcase ennui pear bookcase ennui", and the input phrase
-- is the interpretation.
local grm3 = grammar.repetition(0, 2, grammar.sequence({
    "pear",
    "bookcase",
    "ennui",
}))

-- Matches zero to two occurrences of the phrase "pear bookcase ennui". If the
-- user says nothing, or the phrase "pear bookcase ennui", "pear bookcase ennui
-- pear bookcase ennui" then the interpretation is "precipice".
local grm4 = grammar.repetition(0, 2 grammar.sequence({
    "pear",
    "bookcase",
    "ennui",
}, "precipice"))

sequence(children, meaning)

Creates a reference to a grammar fragment which matches a list of child fragments spoken in order.

Parameters
Name Type Default Summary

children

array

An ordered sequence of child fragments. Bare strings are also supported and will be implicitly cast into a text grammar rule with the default meaning.

meaning

string or nil

nil

The interpretation to use if the rule is matched. If no meaning is supplied, the interpretation will match the literal text. The meaning of all child nodes will be discarded if this value is supplied.

Returns
Type Summary

GrammarRule

Usage
local grammar = require "summit.speechrec.grammar"

local grm1 = grammar.sequence({
    grammar.choice({ "red", "green", "blue" }),
    grammar.choice({ "dog", "house", "pots" }),
})

local grm2 = grammar.sequence({
    grammar.choice({ "red", "green", "blue" }),
    grammar.choice({ "red", "green", "blue" }),
}, "two colors")

local grm3 = grammar.sequence({
    "beginning",
    grammar.builtin("phonenumber"),
    "end",
})

text(text, meaning)

Creates a reference to a grammar fragment which matches literal text.

Parameters
Name Type Default Summary

text

string

The text to match.

meaning

string or nil

nil

The interpretation to use if the rule is matched. If no meaning is supplied, the interpretation will match the literal text.

Returns
Type Summary

GrammarRule

Usage
local grammar = require "summit.speechrec.grammar"

local grm1 = grammar.text("green")
local grm2 = grammar.text("violet", "purple") -- interprets "violet" as "purple"

wrap(child, meaning)

Wraps a reference to a grammar fragment with an updated meaning.

Parameters
Name Type Default Summary

child

GrammarRule

The child to change the interpretation of upon match

meaning

string

The interpretation to use if the rule is matched.

Returns
Type Summary

GrammarRule

Usage
local grammar = require "summit.speechrec.grammar"

-- Speaking "sticker felony" will be interpreted as "matched matched".
local grm = grammar.repetition(1, 3, grammar.wrap(grammar.choice({
    "sticker",
    "glassmaker",
    "felony",
}), "matched"))