The Growth of Audio-Driven Animation and Speech Graphics

 

Leading the Future of Performance-Driven Animation

An update from Gregor Hofer (L) & Michael Berger (R), Co-founders of Speech Graphics

The audio-driven animation sector is buzzing with innovation, and at Speech Graphics, we're thrilled to see this growth in the sector we've nurtured over the last decade.

Audio-driven animation is a technique that uses nothing but audio clips to create realistic animations. Most solutions use visemes to represent the key poses in observed speech however, Speech Graphics takes a unique approach using audio-driven technology to match sounds to muscle maps to increase the accuracy of animations.

The Rise in Popularity of Audio-Driven Animation

Several factors are fueling the increasing demand for audio-driven animation:

  1. Efficient Animation: In the games and entertainment industries, it is common for projects to contain vast amounts of recorded voiceover. Animating all of this dialogue, either by hand or through motion capture, is very expensive and time-consuming. Audio-driven animation provides a fast, low-cost alternative with consistent output.

  2. Rise of AI Experiences: The rapid popularity of AI-driven applications, like ChatGPT, has led to an increased demand for immediate and engaging digital conversations. Audio-driven animation allows for the creation of real-time interactive experiences, bringing to life new AI applications.

  3. Talent Scalability: Other animation techniques, like motion capture, can be very costly for studios and time-consuming for actors to record. Audio-driven animation offers the ability to animate an actor’s face-rig across multiple languages and projects in a more cost-effective manner.

  4. Text-to-Speech (TTS) Advancements: Technological innovators like Google and Amazon, as well as more specialized platforms like ElevenLabs, have created Text-to-Speech (TTS) systems with realistic and expressive voices for their customers, which are elevated by high-quality accompanying facial animation.

As pioneers in this sector, we at Speech Graphics are excited to see new players entering this space. This growth validates our long-held belief that audio-driven animation is the future of high-quality, performance-driven character animation.

Speech Graphics: Consistently Ahead of the Curve

As this sector has grown, Speech Graphics has experienced significant growth in both revenue and customer numbers. This success stems from our unwavering commitment to staying ahead of the curve with our unique technology and our focus on seamlessly integrating into client workflows. This is shown through our products being used in some of the world’s most popular games, including The Last of Us: Part 2, Hogwarts Legacy, and Resident Evil: Villiage.

Our product suite, tailored for adaptability, continues to set the standard in audio-driven animation.

SGX: Powering AAA Video Game Development

Our multi-award-winning procedural animation product, SGX, is built from over 20 years of R&D in speech technology, linguistics, machine learning, and procedural facial dynamics. It converts audio files into animation files, which can then be imported into game engines or other 3D animation systems.

The main focus of SGX development over the years has been to optimize the automatic quality you get out of batch processing. But we've also invested a lot of R&D into giving users more creative control through interactive processing. Some of the key benefits of SGX are:

  • Rig Agnostic: Our software works with any type of 3D rigging style, standard, and art style. Whether human or non-human, 2 eyes or 20. Read our case study for Remnant 2 where we animated a 14-foot tall, Root-infested Wolf creature, with 10 eyes!

  • Language-Specific Optimizations: Although we animate lip sync in any language, we provide specific language modules for eleven major languages and dialects for optimal pronunciation quality.

  • Creative Control: Our SGX Director tool enables quick and effective editing of animation and expressions, keeping creative control in the hands of animators.

  • Universal Compatibility: SGX outputs animation to any DCC tool or game engine, as well as offering direct plug-ins for Maya & Unreal Engine.

  • Automatic Emotion Detection: SGX uses AI to detect several vocal qualities, triggering automatic behavior modes: positivity, negativity, effort, and laughter.

  • Transcript + Audio Synergy: By combining transcript and audio data, we achieve superior quality compared to audio-only solutions.

  • Behavior Modes & Modifiers: Set the emotional style of the animation to match the character's vocal performance.

  • Non-Verbal Animation: Unique support for non-verbal/speech animation such as breath, blinks, eye darts, and head motion.

SG Com: Real-Time Animation on Any Device

SG Com is our real-time animation system that converts audio into facial animation with a latency of only 50 milliseconds locally on device. Whether you’re looking to enhance player experience through avatar puppeteering or speed up production cycles by generating animation at runtime, SG Com has the power to deliver. Some of the key benefits of SG Com are:

  • Proven at Scale: SG Com has powered player-to-player chat functions within Fortnite lobbies from the player’s mic input.

  • Performant: The only system that is real-time and CPU-constrained, not utilizing any GPU resources.

  • Cross-Platform: Works on PlayStation, Xbox, PC, Mac, Android, and Apple devices.

  • Listening Behaviour: Voice qualities are analyzed by our software and that triggers reactive facial and body animations.

  • Engine Flexibility: Supports custom engines beyond Unity and Unreal Engine.

Rapport: Democratizing Digital Character Animation

With the rapid growth of the audio-driven animation market, Speech Graphics identified the need to provide a more accessible animation platform for customers wanting to create engaging digital experiences. Our latest product innovation, Rapport, provides a complete tool suite for anyone to create, animate, and deploy real-time digital character experiences:

  • Forefront of AI: Rapport remains at the cutting edge of AI experiences by integrating with ChatGPT, Google Gemini, Llama 2 on Groq, and custom AI solutions to generate natural conversations instantly.

  • Wider Integrations: The Rapport platform offers every integration needed for users in one place - AI Types, Speech-To-Text, Text-To-Speech, and character providers.

  • Multiple Character Styles: Offers a variety of different character creator platforms including Reallusion CC4, AvatarOS, 2DNAC, Metahuman, ReadyPlayerMe, Copresence, ARKit, and many more.

  • Best In Class Animation: Rapport’s audio-driven animation is powered by Speech Graphics technology.

  • Emotional Intelligence: Emotional modifiers in-platform allow you to give the character insight and personality.

  • Flexible Rendering: Supports WebGL and Pixel streaming rendering.

  • Deployment Options: Rapport projects can be deployed anywhere - whether fully cloud-hosted and managed or run on your own infrastructure.

Leading the Future of Animation

This combination of products, each with its own benefits and use cases, means customers consistently turn to Speech Graphics for their audio-driven animation needs. We're proud to offer solutions that cater to a wide range of industries and applications, from AAA game development to interactive digital assistants.

As we continue to innovate and push the boundaries of what's possible in audio-driven animation, we remain committed to our core values:

  1. Excellence: In everything we do - technological, aesthetic, scientific, and commercial - we strive to be the best. We know that to achieve excellence in our interdisciplinary field requires skill and continuous innovation in multiple areas. Our products are compelling because they are backed by substance and depth.

  2. Courage: We have the confidence to take risks, and the will to create the future we envision. This requires determination, belief in ourselves, energy, hard work, and sometimes a leap into the unknown. Among ourselves, we are not afraid to take alternate views and embrace spirited debate in pursuit of the best ideas.

  3. Open-Mindedness: We embrace diversity of views, disciplines, and cultures. We approach everything with respect, curiosity, and a willingness to have our assumptions tested. Differences make us stronger, and listening is as important as talking.

  4. Commerciality: We do everything for a reason, and the customer is the center of it. Without knowing what the customer really needs, we waste precious time and resources.

  5. Rapport: Our company is all about creating meaningful connections – rapport. Internally, the rapport among our interdisciplinary team builds collaboration and unity of purpose. Externally, the relationships with our customers and partners enable us to learn and grow. And the very purpose of our technology is face-to-face communication in the service of telling stories and building relationships.

Embracing Industry Growth

At Speech Graphics, we don't just welcome the growth of the audio-driven animation industry – we celebrate it. This expansion validates the vision we've held since our inception and pushes us to continually refine and improve our offerings.

As more players enter the field, we're excited about the potential for collaboration, innovation, and the creation of even more immersive and engaging animated experiences. We believe that this growth will lead to new opportunities not just for us, but for the entire entertainment and interactive media landscape.

Looking Ahead

As we look to the future, we see endless possibilities for audio-driven animation. From more realistic NPCs in video games to personalized AI assistants and beyond, the applications are limited only by our imagination.

At Speech Graphics, we're committed to leading this charge, continuing to develop technologies that push the boundaries of what's possible in character animation. We invite you to join us on this exciting journey as we shape the future of audio-driven animation together.

Whether you're a game developer, a tech innovator, or simply passionate about the future of animation, we'd love to hear from you. Let's explore how Speech Graphics can help bring your characters to life in ways you never thought possible.

Contact us today to learn more about our suite of audio-driven animation solutions and how they can benefit your projects.

 
Previous
Previous

The Journey of Speech Graphics and Rapport

Next
Next

Speech Graphics now animates breath and laughter!