A voice is more than just a sequence of sounds coming out from your mouth. Voice can represent the way you want to express yourself amongst many other things. Voice can be also used as a tool to complement a musical composition or to express verbally a complex syntax argument that produces a meaningful result to the targeted audience.
Voices are inherently complex and the amount of information we can gather through a quick analysis enables us to accurately, most times, pinpoint critical information such as the speaker’s sex (in a non visual situation) or their emotional state.
This kind of information can signal a clear potential of forensic importance. Voice due to its complexity and the relationship with its owner provides a forensic audio examiner with a wealth of information which in turn allows the examiner to create a profile that will be used in a voice comparison case.
Comparing voices is challenging and is imperative for the forensic sound examiner to know what kind of relations and interactions the different components of voice are providing in order to create a successful model for a meaningful comparison.
Voices are quite difficult to discriminate forensically especially when problematic conditions are present during sample comparison of audio recordings. Foreground or background noises as well as other sound artifacts can interfere with the quality of the obtained voice sample.
Perhaps the most common task in forensic speaker identification requires the examination of one or more samples of an unknown voice to be compared with one or more the voice samples of a known voice.
Why voices are challenging to discriminate forensically
As I mentioned earlier the condition of the recordings during voice samples comparison can be problematic due to inferior sound quality. There are of course many other aspects which will influence the outcome of a voice sample comparison such as disguise, false information that change or obscure the vocal formants of the speech characteristics, distortion, variations between samples which will require obtaining additional voices samples and any other differences that will affect spectra and aural characteristics during the examination.
Unknown voice samples must be compared with one or more samples of a known voice. For simplicity reasons let’s call the unknown voice as the “offender” and the known voice as the “suspect”. In a legal case at the court of law the prosecution and defence need to know if the two voice samples have come from the same person. Positive identification will result in “suspect” = the “offender” and elimination will result in “suspect” = not the “offender”.
As you can understand obtaining the highest degree of accuracy during the examination and comparison of the voice samples is imperative in order to conclude on the case of recording evidence.
The above image displays an American male pronouncing the word “light”. However along with the word “light” there is also substantial background noise as well as a hum (the straight horizontal white line in the middle! of the screen).
The screenshot below displays the same word “light” but with the background noise minimised and the hum eliminated.