Speech Recognizer and Video Setup

205 days ago by Soul Shell (@Soulshellgames)

Share this post:

Share on Bluesky Share on Twitter Share on Facebook

Hi, I have managed to integrate Open AI Whisper into Unreal Engine 5 using the Whisper++ project. Also, I made a simple system that controls the video quality and resolution. Let me know what you think. For examplo, do you know of any other local speech recognizer written in C++ that could be used?

Specific progress:

Integration of the Open AI whisper thread that controls the speech recognition.
Development of the system that controls the video quality and resolution.
Addition of the SETUP state in the state machine.
SETUP user interface
GRAPHICS user interface
VOICE SYNTHESIS user interface

Next steps: LlaMa integration. Mainly to explore the impact of using GPUs and whether it is possible to completely remove censorship from the model.

Get More than words

Download NowName your own price

More than words

Add Game To Collection

Status	In development
Author	Soul Shell
Genre	Simulation, Interactive Fiction, Visual Novel
Tags	ai, artificial-intelligence, chatbot, chatgpt, Dating Sim
Languages	German, English, Spanish; Castilian, Spanish; Latin America, French, Italian, Portuguese (Portugal), Portuguese (Brazil)

Update with Gemma 3 and Unreal Engine 5!
94 days ago
Unreal Engine 5 update on May 3
Apr 16, 2025
AI Controller
Mar 21, 2025
State machine for animations
Feb 13, 2025
LlaMaCpp integration in Unreal Engine 5
Jan 31, 2025
Text to speech and Lipsync on UE5
Jan 01, 2025
More than words to Unreal Engine 5
Dec 14, 2024
Activation of the Mirostat algorithm and fixing the bug about Open door
Oct 30, 2024
Now the android can remember previous conversations!
Oct 25, 2024

See all posts

Comments

XenoCow197 days ago

Nice progress. Does Whisper run locally or is it an API call to Open AI's servers? If I find a speech to text program for C++, I'll let you know. I am going to be starting a project with some features in common with yours soon(tm).

I know it's not exactly related to speech to text, but in this new version, will there be mouse controls for the camera without having to hold down a button? I remember with the current version it took me a while to get used to holding down a button to look around. Maybe holding down a keyboard key like left alt could enable the cursor and disable looking so you can navigate the UI.

Soul Shell197 days ago

"Nice progress."

Thanks a lot!

"Does Whisper run locally or is it an API call to Open AI's servers?"

It runs locally, the focus of the game is that everything can be run locally, nothing from servers (I use the Whispercpp project in C++).

"If I find a speech to text program for C++, I'll let you know."

Perfect, thanks!

"I am going to be starting a project with some features in common with yours soon(tm)."

Best wishes!

"will there be mouse controls for the camera without having to hold down a button?"

Sorry, I don't quite understand. If you put the cursor in the text box the control goes to the UI. If you click on the screen the control goes to the game viewport and you can use the mouse to rotate the camera.

XenoCow192 days ago

Thank you.

Sorry, I don't quite understand. If you put the cursor in the text box the control goes to the UI. If you click on the screen the control goes to the game viewport and you can use the mouse to rotate the camera.

I mean that in the original game, I think I remember having to hold down either the right or left mouse button in order to rotate the camera. It took some getting used to compared to most FPS games that, if they need to also use the cursor, have a button to enable the cursor and take control from the rotation rather than how you had it so the cursor was the default and the rotation was secondary. I hope that makes more sense.

Soul Shell186 days ago

Ok, nice, I understand the point.

About TTS, I've been doing some research, the TTS I need must run locally, be written in C++, multilingual and provide me with audio analysis, i.e. the visemes.

The one I see potentially to use would be this one:

https://github.com/PABannier/bark.cpp

It would be perfect because it is the brother of llama.cpp and whisper.cpp, and it also uses the SUNO AI technology with which I made the music for the video game.

However, it doesn't have, as far as I can see, the audio analysis for the visemes :'(, I'll keep an eye on this project.

(If I could integrate this technology then MTW could run on Windows, Linux and MAC)

XenoCow185 days ago

That does sound pretty good. Is there another layer you could add on top to detect the visemes? I know that conventional techniques for doing that have been around since at least 2015 since there is a plugin for adobe animator cs6 that can detect the various sounds out of audio. You might want to look in the Vtuber space for something realtime and lightweight.

More than words

Speech Recognizer and Video Setup

Get More than words

More than words

More posts

Comments