New Podcast: Pioneering Audio-Driven Facial Animation
On the latest episode of the Building Rapport podcast, we sat down with Michael Berger, CTO and Co-Founder of Speech Graphics and Rapport. Berger’s lifelong passion for language and speech has been the driving force behind innovations that have reshaped the landscape of facial animation. Listen to the full podcast below.
A Lifelong Passion for Language and Speech
Michael Berger’s journey into linguistics started remarkably early. "I actually started studying linguistics when I was 12, I was I was participating in my stepmother's college course in linguistics,” he shared. He later joined a PhD program at the University of California where he worked on one of the first-ever 3D animated faces to make the speech more realistic.
Reflecting on the primal role of face-to-face communication in human history, Berger described it as "our first technology... as old as language itself." This deep understanding of the interplay between speech and visual expression became the foundation for his work.
From Academia to Industry
During another PhD at the University of Edinburgh, where he was working on speech synthesis and audio-driven animation, Berger met Gregor Hofer, co-founder of Speech Graphics and Rapport. Together, they identified a gap in the gaming industry where facial animation lagged behind other aspects of animation.
"Games were becoming very story-driven," Berger explained, "but facial animation was horrible... there was a lot of hand-keyed animations or motion capture. But this was a very expensive and not a scalable approach." Their first big break came with Shadow of Mordor, where their technology drove the facial animations for the orcs, adapting to unique rigging challenges like tusks and exaggerated features. "We had to invent a lot of new tech as we went," Berger recalled.
Challenges and Innovations
Speech Graphics focused on pushing the boundaries of realism while addressing scalability. "There’s a grid with three axes: ease of use and scalability, quality, and creative control," Berger noted, emphasizing the importance of balancing these factors.
Berger also highlighted how machine learning has transformed the field, enabling models to classify audio into emotional categories and inform nonverbal behavior. This technology helps bridge the gap between auditory and visual communication, creating animations that are both lifelike and emotionally resonant.
Rapport and the Evolution of AI Conversations
On the back of the success of Speech Graphics, Rapport was created in order to serve a different market. Rapport aims to bring realistic animations into other sectors through real-time conversational experiences.
Unlike rigid, finite-state dialogue systems of the past, Rapport leverages large language models to create empathetic and engaging interactions. "Large language models have opened the way to much more broad-ranging and realistic conversations," Berger explained.
Berger emphasized that Rapport is about more than mimicking human behavior: "We’re not trying to fool people into thinking they’re talking to a real person. It’s about creating the impression that the thing you’re talking to is alive in some way."