AssessSpeechLanguageModel

The AssessSpeechLanguageModel action returns the perplexity and unknown word statistics for some input text. You can use this action to assess whether a language pack, combined with an optional custom language model, is suitable for processing your audio.

The text that you use to assess the language model must be different from the text you used to train the language model. Micro Focus recommends that you create a transcript for some of the speech that you intend to process and assess the language model using that text.

Type: synchronous

Parameter Description Required
CustomLanguageModel The name and interpolation weight of a custom language model to use to supplement the base language pack. Separate the name and interpolation weight with a colon (:). No
LanguagePack The base language pack. Yes
MaxUnknownWords The maximum number of unknown words to return in the response. No
TextData The text to use to assess the language model. Text files must be uploaded as multipart/form-data. For more information about sending data to Media Server, refer to the Media Server Administration Guide. Set this or textpath
TextPath The path of a text file that contains the text to use to assess the language model. The path must be absolute, or relative to the Media Server executable file. Set this or textdata

Example

curl http://localhost:14000 -F action=AssessSpeechLanguageModel
                            -F LanguagePack=ENUS
                            -F CustomLanguageModel=ProductNames:0.1
                            -F TextData=@SomeText.txt

Response

The following XML is an example response:

<autnresponse>
  <action>ASSESSSPEECHLANGUAGEMODEL</action>
  <response>SUCCESS</response>
  <responsedata>
    <perplexity>93.67</perplexity>
    <unknownWordRate>1.54</unknownWordRate>
    <uniqueUnknownWordRate>2.44</uniqueUnknownWordRate>
    <unknownWord>
      <word>Eduction</word>
      <count>1</count>
    </unknownWord>
    <unknownWord>
      <word>OmniGroupServer</word>
      <count>1</count>
    </unknownWord>
    ...
  </responsedata>
</autnresponse>

The response includes the following information:

The response also includes an unknownWord element for each unknown word (up to the limit specified by MaxUnknownWords). The count element describes how many times the word appears in the input text.