Text to speech and Lipsync on UE5
More than words » Devlog
I have successfully integrated Microsoft SAPI into Unreal Engine 5 to generate text-to-speech (TTS) along with my lip-syncing method. Let me know what you think.
Progress in the update of More than words to Unreal Engine 5.
- Main interface [OK].
- Updated Marian's 3d model [OK]
- Microsoft SAPI integration on UE5 to generate text to speech [OK].
- State machine integration: Wait, Think, Speak and Listen [OK].
- State machine integration for lipsync [OK].
+ Lipsync now has a more simplified algorithm with better results.
- Integration of the thread that handles speaking [OK].
+ Correction of a minor bug that prevented the thread from being completely destroyed.
Next steps: Integrate speech recognition.
Get More than words
Download NowName your own price
More than words
Status | In development |
Author | Soul Shell |
Genre | Simulation, Interactive Fiction, Visual Novel |
Tags | ai, artificial-intelligence, chatbot, chatgpt, Dating Sim |
Languages | German, English, Spanish; Castilian, Spanish; Latin America, French, Italian, Portuguese (Portugal), Portuguese (Brazil) |
More posts
- Speech Recognizer and Video Setup4 days ago
- More than words to Unreal Engine 534 days ago
- Activation of the Mirostat algorithm and fixing the bug about Open door79 days ago
- Now the android can remember previous conversations!85 days ago
- Change your android's personality and brainSep 30, 2024
- Next update September 27: customize the android's personalitySep 16, 2024
- Update to LLAMA 3.1 and support for multiple languagesSep 03, 2024
- Soundtrack More Than WordsAug 25, 2024
- LLAMA 3.1Aug 18, 2024
Comments
Log in with itch.io to leave a comment.
Looks good. Any chance you could add face (camera) tracking too so that she looks at the player? Bonus points if you could do so with a model that incorporates glances away instead of just staring the whole time.
On the TTS specificially, I'm sure it's for computational load and ease of programming that you're using the Microsoft built in system, but have you looked into local TTS models? This game uses one that is pretty convincingly human sounding for what it is: https://jetro30087.itch.io/ai-companion-miku If you can get in contact with the dev, maybe he'll tell you what system he used.
"Any chance you could add face (camera) tracking so that she looks at the player? Bonus points if you could do so with a model that incorporates glances away instead of just staring the whole time."
What you say can be achieved with BlendSpace animations, when I work on the animations module I'll see how far I can go.
"I'm sure it's for computational load and ease of programming that you're using the Microsoft built in system, but have you looked into local TTS models?"
Currently, to generate static voice I am using this one (Cortana's voice in the new update):
But as you say, the main reason is that the Microsoft system generates the voice and the voice analysis (it gives you the list of phonemes) with a very good performance, it practically does not represent a load for the video game and as it runs at the same time as the graphics of unreal engine and the LLMs, it is the reason why I still hold on to it. But I'm not closing my eyes. We'll see how much the TTs advance this year and if a similar performance can be achieved.