There are several factors that affect the memory and CPU requirements for Speech Server, including:
You should estimate the hardware requirements based on these factors. It might become clear that the solution might not be a single instance of Speech Server, but a distribution across numerous servers, and potentially across several boxes.
Each task has its own memory and CPU requirements. The following table covers some of the key Speech Server tasks, and gives a rough approximation of how much memory a single channel (a single instance of a task) uses. It also gives information on the maximum number of simultaneous channels that HPE recommends that you process on a single CPU core to achieve real-time processing speed.
Speech Server function | Memory per channel | Channels per core |
---|---|---|
Speech-to-text (single channel) |
200 MB Note: This task also requires a language resource, which uses around 600 MB, although you can share this between multiple channels that are using the resource. |
1 |
Speaker identification |
150 MB Note: Memory usage for speaker identification is dependent on the number of speakers; this figure is just an example for running speaker identification with a limited set of speakers. |
1 |
Audio fingerprinting | 200 MB | 50 |
Language identification | 250 MB | 1 |
Transcript alignment | 250 MB | 1 |
In most cases, the key limiting factors in terms of the hardware configuration of Speech Server are the number of available CPUs, and the memory available.
For most tasks, HPE recommends that each task runs on its own CPU (as shown in the previous table). In this case, you should configure the server with the same number of task managers as there are CPUs available. Each of these should be set to run only one simultaneous task (each task manager will run all its active tasks on a single core).
However, because tasks such as audio fingerprinting are very fast, you can run several of these on a single CPU and still achieve real-time speed. To achieve this, you can configure the task managers to run multiple tasks simultaneously.
You cannot run task managers that are configured to run only one simultaneous task on the same server as task managers that are configured to run multiple simultaneous tasks. As a result, if you are running audio fingerprinting alongside other Speech Server tasks such as speech-to-text, and you want to process all tasks to run in real time, it might be beneficial to run two separate servers, one for audio fingerprinting requests, and one for all other tasks. You can then control how many task managers are used by each server, to control the resources available for each type of task.
It might not be possible to process the data in real time, or at a speed that fits your requirements, on a single box.
For example, if you want to run a speech-to-text task to generate a transcript for 2,400 hours of audio data, and you want the task to be complete in 24 hours, you need to split this data across several parallel tasks to achieve that goal. Assuming that you are running the task at real-time speed (that is, using the relative running mode that ensures that one hour of audio takes one hour to process), you need 100 simultaneous processes. If each of your hardware boxes have 10 cores available, you need approximately 10 machines, each running 10 channels of speech-to-text, to process the audio (bearing in mind the previously stated recommendation that you run only a single channel of speech-to-text on a single core).
Likewise, if you are processing live audio streams (which could be running indefinitely) you should dedicate a single core to process each channel. You can achieve this by having as many task managers as you have audio streams, each processing just a single task. Depending on how many streams you need to process, you might need to run multiple servers across multiple boxes.
If you are running Speech Server on a 32-bit platform, memory usage can be an important factor. Each language resource that you load uses around 500 MB of memory. If you run speech-to-text across many languages, you might find it better to run multiple servers, each dedicated to processing a certain language (or small set of languages). In this way you can run large numbers of parallel tasks without having too many languages loaded on a single box.
In general, in a setup where there are several Speech Servers running, there is no reason that you should run a particular task on one server rather than another. However, this might be different if:
You can use the checkResources
action to find out whether a particular server has the resources (available task slots, language resources) to run a task. Run the action with the same parameters that you use when you run a task using the addTask
action.
You can also use the getStatus
action to return the number of tasks currently running or queued on a particular server. You can use the information returned to help select the most appropriate server to submit the task to.
|