Speech Recognizer and Video Setup
More than words » Devlog
Hi, I have managed to integrate Open AI Whisper into Unreal Engine 5 using the Whisper++ project. Also, I made a simple system that controls the video quality and resolution. Let me know what you think. For examplo, do you know of any other local speech recognizer written in C++ that could be used?
Specific progress:
- Integration of the Open AI whisper thread that controls the speech recognition.
- Development of the system that controls the video quality and resolution.
- Addition of the SETUP state in the state machine.
- SETUP user interface
- GRAPHICS user interface
- VOICE SYNTHESIS user interface
Next steps: LlaMa integration. Mainly to explore the impact of using GPUs and whether it is possible to completely remove censorship from the model.
Get More than words
Download NowName your own price
More than words
Status | In development |
Author | Soul Shell |
Genre | Simulation, Interactive Fiction, Visual Novel |
Tags | ai, artificial-intelligence, chatbot, chatgpt, Dating Sim |
Languages | German, English, Spanish; Castilian, Spanish; Latin America, French, Italian, Portuguese (Portugal), Portuguese (Brazil) |
More posts
- State machine for animations8 days ago
- LlaMaCpp integration in Unreal Engine 521 days ago
- Text to speech and Lipsync on UE551 days ago
- More than words to Unreal Engine 569 days ago
- Activation of the Mirostat algorithm and fixing the bug about Open doorOct 30, 2024
- Now the android can remember previous conversations!Oct 25, 2024
- Change your android's personality and brainSep 30, 2024
- Next update September 27: customize the android's personalitySep 16, 2024
- Update to LLAMA 3.1 and support for multiple languagesSep 03, 2024
Comments
Log in with itch.io to leave a comment.
Nice progress. Does Whisper run locally or is it an API call to Open AI's servers? If I find a speech to text program for C++, I'll let you know. I am going to be starting a project with some features in common with yours soon(tm).
I know it's not exactly related to speech to text, but in this new version, will there be mouse controls for the camera without having to hold down a button? I remember with the current version it took me a while to get used to holding down a button to look around. Maybe holding down a keyboard key like left alt could enable the cursor and disable looking so you can navigate the UI.
"Nice progress."
Thanks a lot!
"Does Whisper run locally or is it an API call to Open AI's servers?"
It runs locally, the focus of the game is that everything can be run locally, nothing from servers (I use the Whispercpp project in C++).
"If I find a speech to text program for C++, I'll let you know."
Perfect, thanks!
"I am going to be starting a project with some features in common with yours soon(tm)."
Best wishes!
"will there be mouse controls for the camera without having to hold down a button?"
Sorry, I don't quite understand. If you put the cursor in the text box the control goes to the UI. If you click on the screen the control goes to the game viewport and you can use the mouse to rotate the camera.
Thank you.
I mean that in the original game, I think I remember having to hold down either the right or left mouse button in order to rotate the camera. It took some getting used to compared to most FPS games that, if they need to also use the cursor, have a button to enable the cursor and take control from the rotation rather than how you had it so the cursor was the default and the rotation was secondary. I hope that makes more sense.
Ok, nice, I understand the point.
About TTS, I've been doing some research, the TTS I need must run locally, be written in C++, multilingual and provide me with audio analysis, i.e. the visemes.
The one I see potentially to use would be this one:
https://github.com/PABannier/bark.cpp
It would be perfect because it is the brother of llama.cpp and whisper.cpp, and it also uses the SUNO AI technology with which I made the music for the video game.
However, it doesn't have, as far as I can see, the audio analysis for the visemes :'(, I'll keep an eye on this project.
(If I could integrate this technology then MTW could run on Windows, Linux and MAC)
That does sound pretty good. Is there another layer you could add on top to detect the visemes? I know that conventional techniques for doing that have been around since at least 2015 since there is a plugin for adobe animator cs6 that can detect the various sounds out of audio. You might want to look in the Vtuber space for something realtime and lightweight.