Connecting Emotions and Expressions
- nathanenglish
- Mar 8, 2023
- 2 min read
Next is the matter of connecting the emotional context of Raith's responses to his facial expressions. Luckily, this proved to be a pretty simple solution thanks to OpenAI and its functionality.
I added an instruction inside the script that controls the ChatGPT responses, as such:

This returns mostly reliable results. Occasionally it doesn't follow the instructions exactly--sometimes it picks its own keyword that I haven't accounted for, or it might place the emotion keyword somewhere random in the sentence, or places the keyword in quotes instead of brackets, which messes up my coding. But I'd give it a solid 9/10 for reliability. Such is the nature of AI.

Here, I separate the ChatGPT response string into two strings using the left bracket [ as the demarcation. The strings get stored in an array where they can be accessed when needed. Each string gets sent through some functions to clean them up a little---The emotion keyword gets the closing bracket ] removed, and the main string has any quotation marks removed and missing punctuation corrected.

From there, they're ready to be sent off to their next destinations. The main response gets sent to the TTS and UI, and the keywords get sent to a switch case. This switch case controls a float to blend between the different facial expressions. Currently it just sets the float to a set value, but in an ideal world I could blend dynamically from 0 to 1 to control the severity of each facial expression. The way its set up now doesn't work very well (as it would have to blend through all the other emotions to get to the desired one--and I couldn't find a good configuration to make the blending make sense).

Unity blend tree, with dynamic blending between different animation clips.
Had I more time and more brains, I would spend more time figuring out how to control the severity of emotions. Something like:
If you get two or more of an emotional response in a row, it would gradually make the facial expression more severe
dynamic blending between like expressions like happy and curious, or sad and angry
Thinking
Finally, it's time to fill in the awkward silence between when the AI response is ready and when the TTS speech is ready. Now that I have the responses shortened to 150 characters for debugging purposes, this awkward silence is not very long. But when the responses are longer, the silence is longer.
I account for this in-between space by tracking when the AI response is ready and when the TTS is speaking using bools.

If the chatGPT response is ready (chat.textResponse != null) but the TTS is not actively speaking yet, Raith is still thinking about his response. If the TTS is actively speaking aloud, then Raith is no longer in the in-between thinking phase.

isThinking is used to control a couple things, First, his eyes will look upward toward a target above his head like he's currently in thought. Second, it enables and disables a simple thought cloud sprite animation above his head.
chat.NewTextArea.Text = newSentence prevents the UI text box from updating until the TTS is actively speaking.

An Eye Target game object, offset slightly from the main camera. Raith looks toward this target to mimic a "thinking" expression.
Finally, here's everything all together once more!
Comments