Beyond words – The intelligent voice data revolution

Our Managing Director for Studios reflects on voice data and vocal biometrics, and why they are another rich, important dataset in the brave new world of large language AI models.

teamwork iconteamwork icon
teamwork iconteamwork icon

Diverse People, Diverse Perspectives

Waracle is an inclusive, inspiring & developmental home to the most talented and diverse people in our industry. The perspectives offered in our insights represent the views and opinions of the individual authors and not necessarily an official position of Waracle.

As we immerse ourselves ever more deeply in a new age of intelligent digital products, it is important not to overlook the emerging technologies that seem to have fallen out of favour in the ongoing furore over generative AI.

The reality of generative AI, as it stands, is that large language models need high-quality, clean data to train and develop their outputs that ultimately will benefit personalisation, operational efficiency, value generation and refinement of business processes… So if we are looking for rich, nuanced, non-standard datasets, then maybe it’s time to give some attention and appreciation to the potential of voice data.

Hear us out.

Beyond Words

This isn’t merely about using our voices to communicate with our devices… but rather about understanding the data generated from these innocuous interactions and how it can impact and transform various aspects of our lives, from health to wealth, energy use to brand engagement … and beyond!

Imagine a world where a personal device not only responds to what you say but understands how you feel based on the tone of your voice, your energy levels as you said it and other verbal cues! It’s a reality that’s pretty close to our reach.

Companies, armed with advanced analytics, are now capable of drawing invaluable insights from voice data in a way that extends far beyond the spoken word.

Spoken words as Biomarkers

Businesses that are accruing voice data won’t necessarily want to understand individual biomarkers. However, some of the larger advances in audio technology over the last few years have come from distributed learning, which focuses on analysing patterns and aggregating data into very high-quality central trends.

Provided they’ve loaded the right references into their machine learning and artificial intelligence systems, these patterns can provide incredibly useful information that can be leveraged in a variety of fascinating ways.

Imagine, for example, that MS teams with OpenAI’s APIs plugged into it can understand aggregated high-level trends about what is being talked about on calls, what the priorities of your clients are and how they are actually feeling about your business as a service provider by reading cues in voices, from cadence to tone, word choice to stress levels!

This kind of audio data tracking is already more omnipresent than most people would ever think.

Whether you’re at home, in your car, or standing next to your smart fridge, your devices are listening – to an extent (not in the creepy, devious way… but they have to be able to hear you when you wake them up!).

And when you do wake them up to direct them toward a task… Digital assistants like Alexa may well be the first entities you engage with that detect your emotional state, even before those humans around you.

While this may sound somewhat intimidating, awareness of these practices is vital. Because when we think about biomarkers, we think of facial recognition, fingerprint IDs etc. but the sensor-rich devices around us are capable of detecting many more inputs than you might imagine.

Voice data and ethics

As consumers become more cognisant of the technology that is being employed around them, the demand for privacy and data control is likely to escalate.

However, as the understanding of privacy and data control increases, the consumer may well see the benefit in agreeing to ambient data collection as it ushers in a wave of new personalised ecosystems, enabling individuals to have better control over their work life and home life and its alignment with their mental and physical well-being.

There’s an undeniable power in these technologies that, when used responsibly, can benefit users and forward-thinking organisations alike.

Hence, it’s high time we turned our attention to the four key questions:

  • Who is listening?
  • What are they listening to?
  • Why are they listening?
  • What benefit can listening provide back?

These key questions provide a grounding to understand what the art of the possible is and why the big players are listening in.

Who is doing what?

Major manufacturers and service providers such as Google, Facebook, Amazon, Apple, and Samsung are the primary entities collecting and analysing your voice data.

The feelings of the general consumer towards this, are one of suspicion and frustration (maybe even verging on anger) but so many consumer apps ask for microphone access during installation and at this moment in time, most people still don’t really understand what they’re opting into.

The reason that the afore-mentioned businesses are so keenly interested in the data you generate, is because they aim to use it for business expansion and growth. But it does beg the question, what exactly is on their listening radar and why are they desperate to hear from you?

Naturally, the focus is primarily driven by market and revenue opportunities. As the largest chunk of Google and Amazon’s revenue originates from advertising, it’s no surprise that marketing takes precedence in the application of voice analytics.

But outside of the primary revenue model of the internet, things are getting interesting…

We would propose an initial hierarchy for the application of voice analytics:

Marketing, then general Well-being, Health and Wealth.

This progression mirrors the potential of the data that can be collected.

It is easy to infer from voice data that someone is in the market to buy something… it is then slightly more difficult to infer if someone is feeling low and could benefit from mindfulness, device breaks etc… then the really nuanced stuff starts to occur. Can we tell if someone is on their way towards an actual health diagnosis from their voice metadata? And could they be heading towards a period of financial vulnerability because of a confluence between mental health and physical health issues?

You can see where we are headed.

Why voice data and why now?

The “why” in this equation might seem self-evident now.

Amassing comprehensive data sets on consumers is fundamental to most businesses future growth and competitive edge. The exact parameters that individual organisations may listen in for and measure will only become more intricate as they delve deeper.

While the notion of continuous voice data collection might seem daunting, the intention here isn’t to alarm. Instead, we aim to shed light on how the growth and application of voice biometrics highlight the relevance and vitality of biometric sensors in general for the world of intelligent digital products.

Share this article


David Low
David Low
Managing Director - Studios