-
Notifications
You must be signed in to change notification settings - Fork 5k
Description
Platform: Windows C++ app built with VS2022. My PC is a Dell laptop with quad core i5.
Pass a 3 second audio clip of the word "six" 3 or four times, and the return can take up to a minute of CPU time and sometimes include odd gibberish.
Here is an example. I am speaking "six,six,six" as clear as I can, and am sending the audio buffer to Whisper. The lines labeled "erase" are simply silence in my audio buffer, and are not sent to whisper. The lines with the timings in seconds are Whisper processing approximately 3 second chunks of "six,six,six":

As you can see, there are 2 correct inferences there, 11 and 17 seconds. The others take quite a bit of time, and one has a bit of gibberish at the end. I have seen longer strings of gibberish and longer times also. Here's an 80 second CPU grind:

Here's my init parms:
// get default Whisper parameters
m_params = whisper_full_default_params(WHISPER_SAMPLING_GREEDY);
// overrides
m_params.print_progress = false;
m_params.print_timestamps = false;
m_params.no_context = true;
m_params.single_segment = true;
m_params.max_tokens = 0; // no limit
char BinFilename[] ="ggml-tiny.en.bin";
m_ctx = whisper_init_from_file(BinFilename);