Text to speech and Lipsync on UE5

221 days ago by Soul Shell (@Soulshellgames)

Share this post:

Share on Bluesky Share on Twitter Share on Facebook

I have successfully integrated Microsoft SAPI into Unreal Engine 5 to generate text-to-speech (TTS) along with my lip-syncing method. Let me know what you think.

Progress in the update of More than words to Unreal Engine 5.

- Main interface [OK].
- Updated Marian's 3d model [OK]
- Microsoft SAPI integration on UE5 to generate text to speech [OK].
- State machine integration: Wait, Think, Speak and Listen [OK].
- State machine integration for lipsync [OK].
+ Lipsync now has a more simplified algorithm with better results.
- Integration of the thread that handles speaking [OK].
+ Correction of a minor bug that prevented the thread from being completely destroyed.

Next steps: Integrate speech recognition.

Get More than words

Download NowName your own price

More than words

Add Game To Collection

Status	In development
Author	Soul Shell
Genre	Simulation, Interactive Fiction, Visual Novel
Tags	ai, artificial-intelligence, chatbot, chatgpt, Dating Sim
Languages	German, English, Spanish; Castilian, Spanish; Latin America, French, Italian, Portuguese (Portugal), Portuguese (Brazil)

Update with Gemma 3 and Unreal Engine 5!
97 days ago
Unreal Engine 5 update on May 3
Apr 16, 2025
AI Controller
Mar 21, 2025
State machine for animations
Feb 13, 2025
LlaMaCpp integration in Unreal Engine 5
Jan 31, 2025
Speech Recognizer and Video Setup
Jan 13, 2025
More than words to Unreal Engine 5
Dec 14, 2024
Activation of the Mirostat algorithm and fixing the bug about Open door
Oct 30, 2024
Now the android can remember previous conversations!
Oct 25, 2024

See all posts

Comments

XenoCow216 days ago(+8)

Looks good. Any chance you could add face (camera) tracking too so that she looks at the player? Bonus points if you could do so with a model that incorporates glances away instead of just staring the whole time.

On the TTS specificially, I'm sure it's for computational load and ease of programming that you're using the Microsoft built in system, but have you looked into local TTS models? This game uses one that is pretty convincingly human sounding for what it is: https://jetro30087.itch.io/ai-companion-miku If you can get in contact with the dev, maybe he'll tell you what system he used.

Soul Shell215 days ago

"Any chance you could add face (camera) tracking so that she looks at the player? Bonus points if you could do so with a model that incorporates glances away instead of just staring the whole time."

What you say can be achieved with BlendSpace animations, when I work on the animations module I'll see how far I can go.

"I'm sure it's for computational load and ease of programming that you're using the Microsoft built in system, but have you looked into local TTS models?"

Currently, to generate static voice I am using this one (Cortana's voice in the new update):

But as you say, the main reason is that the Microsoft system generates the voice and the voice analysis (it gives you the list of phonemes) with a very good performance, it practically does not represent a load for the video game and as it runs at the same time as the graphics of unreal engine and the LLMs, it is the reason why I still hold on to it. But I'm not closing my eyes. We'll see how much the TTs advance this year and if a similar performance can be achieved.

XenoCow200 days ago

Currently, to generate static voice I am using this one (Cortana's voice in the new update):

Sweet. I look forward to it!

More than words

Text to speech and Lipsync on UE5

Get More than words

More than words

More posts

Comments