Seconds required for predicting mood


Layers in the speech analysis engine architecture

AI-powered Voice Recognition

A global call center major wanted to analyze voice of its customers and agents to:

  • Determine customer tone and mood.
  • Improve customer experience.
  • Guide agents not to give in to strong emotions.

What did we do?

  • Gramener built a prediction solution that can analyze the voice and determine the tone of the customer.
  • To increase the robustness, we carried out Data Augmentation like noise removal, changing audio tempo and gain.
Gramener built a 9-layer architecture consisting of a series of convolutional, max-pooling, dropout and fully connected layers.


Could predict customer tone and mood through visual interpretation at an interval of 3 seconds. The in-depth analysis of voice emotions helped agents improve customer experience.

Scroll to Top