Speaker Recognition using Voice biometrics

Speaker Recognition using Voice biometrics

Speaker Recognition using Voice biometrics In the field of enterprise mobility, Uniphore stands out for our guarantee to provide fully secure solutions. One way we accomplish this is through Voice Biometric technology, which authenticates the identity of the user by validating their unique voiceprint. Using nothing more than their mobile phone, users engage with this technology to securely access and enter personal information as well as transact.

To assure that we are consistently offering the most sophisticated Voice Biometric-based solutions, Uniphore has partnered with several global leaders in the field of speech technology. For our commercial deployments of Voice Biometrics, we work with Nuance, the leading provider of speech and imaging solutions around the world. To conduct proactive R&D on speech technology, we’ve teamed up with IIT-Madras. This latter partnership has enabled us to create critical Intellectual Property in Voice Biometrics, and develop commercially ready solutions that are today used for a variety of applications, including financial transactions at the bottom of the pyramid.

In this post, we want to give you an inside view of some of the R&D work we’re doing on Voice Biometrics, and the insights we’ve gained in the process. This post was written by Swetha Bharathi, one of our interns who is currently pursuing her M.S. in Speech Technology at IIT-M.



Voice Biometrics is a ground-breaking technology, and one that is relatively new to the market. Therefore, one of the key focus areas of our R&D team is to develop standardized recommendations and technological designs, which can be used as common evaluation criterion by buyers and regulatory authorities. Over time, these criteria will go on to define standards in the global Voice Biometrics industry, enabling large-scale adoption and open access to developers.

To use the voice biometric system, every user must enroll by registering their ‘voiceprint,’ – a combination of pitch, frequency, tone, etc. – that is as unique to them as their fingerprint or their iris scan. Combined with the user’s private passphrase, the voiceprint is used thereafter to verify their identity and give them access to Uniphore’s application.

However, sometimes users are denied access to an account that is legitimately theirs – an error known as a ‘false negative.’ Though these mistakes don’t happen often, we are quickly working toward achieving a False Rejection Rate of less than 3%. This goal will be achieved through fine-tuning the core-technology, the application design, the User Interface, and the various elements of adoption optimization.

For example, one variable that we are currently testing is handset variation. In practice, we found that some of the users who enroll with a certain model of handset, but attempt to verify their identity with a different model, are facing troubles in authentication. We conducted several experiments in which we enrolled different users on various handset models and monitored the performance of each voiceprint. The data we gathered led us to conclude that the effect of handset variability is caused by acoustic characteristics and speech transformations imposed by different handsets. (Contrary to popular belief, different handset manufacturers have different sets of acoustic coding on voice.) The variations observed helped us to tweak the configuration parameters and the verification score, thus allowing inter-operability among the handsets without compromising the security. We are also currently developing an adaptation algorithm that will adapt the voiceprint to various handsets automatically, increasing the quality of verification even more.

Another factor that plays a role in the success of user authentication is the length of passphrase. By running controlled experiments on passphrases of different durations, we determined that a passphrase with a minimum of 1.5 seconds of length, 0.8 seconds of spoken inputs, and 5 syllables drastically increases verification performance. This study also led us to discover an entirely new parameter of Voice Biometrics, which we have labeled ‘Variance’. We can’t go into the details, but this concept has become our ‘secret sauce’ is truly defining how to optimally choose effective passphrases for verification. In fact, by analyzing thousands of speech samples collected on our platform, we were able to derive a proprietary mechanism for measuring Variance and use this to achieve a False Rejection Rate of less than 3% in lab conditions.

Both the type of handset and length of the passphrase are important factors in designing the most optimum passphrases for Speaker Verification systems. Given our research findings, we are creating a sophisticated list of usable passphrases, and consequently noticing a perceivable improvement in performance. Going ahead, we will focus on mathematically modeling how these various parameters together affect the performance using a multivariate analysis. As we develop new insights into this exciting technology, we’ll be keeping you posted through more blog posts and white papers.

About Uniphore: Uniphore Technologies Inc is the leader in Multi lingual speech-based software solutions. Uniphore’s solutions allow any machine to understand and respond to natural human speech, thus enabling humans to use the most natural of communication modes, speech, to engage and instruct machines. Uniphore operates from its corporate headquarters at IIT Madras Research Park, Chennai, India and has sales offices in Middle East (Dubai, UAE) as well as in Manila, Philippines.


Table of Contents