Minimum of information divergence criterion for signals with tuning to speaker voice in automatic speech recognition

Authors

DOI:

https://doi.org/10.3103/S0735272720010045

Keywords:

random signal, digital signals processing, speech signal, automatic speech processing, voice techniques, noise immunity

Abstract

It is considered a problem of automatic speech recognition at basic, phonetic level of speech signal processing. It is researched a problem of noise-immunity increase. For its solution it is proposed a criterion of minimum information divergence of the signals with tuning to a speaker voice and automatic scaling of speech template to thin structure of observed (current) speech frame. An example of its practical realization is considered, efficiency characteristics are researched. Applying the author’s software we carry out an experiment and obtain qualitative estimations of threshold signals gain in case of application of proposed criterion. It is shown than this gain can be 10 dB and greater under certain conditions. Obtained results and drawn conclusions are intended it to their application for development and modernization of existent systems and techniques of automatic processing and recognition of speech intended it to operation in conditions of intensive noise effect.

References

L. R. Rabiner, R. W. Shafer, Theory and Applications of Digital Speech Processing. Pearson, Boston, 2010. URI: https://www.pearson.com/us/higher-education/program/Rabiner-Theory-and-Applications-of-Digital-Speech-Processing/PGM130812.html.

I. B. Tampel, “Automated speech recognition – the main stages over last 50 years,” Sci. Tech. J. Information Technol., Mech. Optics, v.15, n.6, p.957, 2015. DOI: http://doi.org/10.17586/2226-1494-2015-15-6-957-968.

M. Schuster, “Speech recognition for mobile devices at Google,” in: B. T. Zhang, M. A. Orgun (eds.), PRICAI 2010: Trends in Artificial Intelligence. PRICAI 2010. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer, 2010, v.6230, p.8-10. DOI: http://doi.org/10.1007/978-3-642-15246-7_3.

V. V. Savchenko, A. V. Savchenko, “Information-theoretic analysis of efficiency of the phonetic encoding-decoding method in automatic speech recognition,” J. Commun. Technol. Electronics, v.61, n.4, p.430, 2016. DOI: https://doi.org/10.1134/S1064226916040112.

Z. Wu, Information Hiding in Speech Signals for Secure Communication. Elsevier Science, 2015. DOI: http://doi.org/10.1016/C2013-0-19179-9.

R. Rammohan, N. Dhanabalsamy, V. Dimov, J. Frank, “Eidelman smartphone conversational agents (Apple Siri, Google, Windows Cortana) and questions about allergy and asthma emergencies,” J. Allergy Clinical Immunology, v.139, n.2, p.ab250, 2017. DOI: http://doi.org/10.1016/j.jaci.2016.12.804.

M. B. Akçay, K. Oğuzb, “Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities and classifiers,” Speech Commun., v.116, n.1, p.56, 2020. DOI: http://doi.org/10.1016/j.specom.2019.12.001.

V. V. Savchenko, “A method of measuring the index of acoustic voice quality based on an information-theoretic approach,” Meas. Tech., v.61, n.1, p.79, 2018. DOI: http://doi.org/10.1007/s11018-018-1391-8.

V. V. Savchenko, “Itakura-Saito divergence as an element of the information theory of speech perception,” J. Commun. Technol. Electron., v.64, n.6, p.590, 2019. DOI: http://doi.org/10.1134/S1064226919060093.

V. V. Savchenko, “Criterion for minimum of mean information deviation for distinguishing random signals with similar characteristics,” Radioelectron. Commun. Syst., v.61, n.9, p.419, 2018. DOI: https://doi.org/10.3103/S0735272718090042.

S. M. Qaisar, N. Hammad, R. Khan, R. Asfour, “A speech to machine interface based on perceptual linear prediction and classification,” Proc. of Int. Conf. on Advances in Science and Engineering Technology, 26 Mar.-10 Apr. 2019, Dubai, UAE. IEEE, 2019. DOI: https://doi.org/10.1109/ICASET.2019.8714304.

V. N. Zvaritch, B. G. Marchenko, “Linear autoregressive processes with periodic structures as models of information signals,” Radioelectron. Commun. Syst., v.54, n.7, p.367, 2011. DOI: https://doi.org/10.3103/S0735272711070041.

F. Castanié, Digital Spectral Analysis: Parametric, Non-Parametric and Advanced Methods. Wiley-ISTE, 2011. DOI: http://doi.org/10.1002/9781118601877.

V. V. Savchenko, A. V. Savchenko, “Criterion of significance level for selection of order of spectral estimation of entropy maximum,” Radioelectron. Commun. Syst., v.62, n.5, p.223, 2019. DOI: https://doi.org/10.3103/S0735272719050042.

R. M. Gray, A. Buzo, A. H. Gray, Y. Matsuyama, “Distortion measures for speech processing,” IEEE Trans. Acoust., Speech Signal Processing, v.28, n.4, p.367, 1980. DOI: https://doi.org/10.1109/TASSP.1980.1163421.

O. D. Eva, A. M. Lazar, “Feature extraction and classification methods for a motor task brain computer interface: a comparative evaluation for two databases,” Int. J. Advanced Computer Sci. Appl., v.8, n.8, p.263, 2017. DOI: http://doi.org/10.14569/IJACSA.2017.080834.

S. S. Rachel, U. Snekhalatha, K. Vedhasorubini, D. Balakrishnan, “Spectral analysis of speech signal characteristics: a comparison between healthy controls and laryngeal disorder,” Proc. of Int. Conf. on Intelligent Computing and Applications. Singapore: Springer, 2018, v.632, p.333-334. DOI: http://doi.org/10.1007/978-981-10-5520-1_31.

V. V. Savchenko, “Words phonetic decoding method with the suppression of background noise,” J. Commun. Technol. Electron., v.62, n.7, p.788, 2017. DOI: http://doi.org/10.1134/S1064226917070099.

E. Hossain, M. S. A. Zilany, E. Davies-Venn, “On the feasibility of using a bispectral measure as a nonintrusive predictor of speech intelligibility,” Computer Speech Lang., v.57, p.59, 2019. DOI: http://doi.org/10.1016/j.csl.2019.02.003.

H. Ding, T. Lee, I. Y. Soon, C. K. Yeo, P. Dai, G. Dan, “Objective measures for quality assessment of noise-suppressed speech,” Speech Commun., v.71, p.62, 2015. DOI: https://doi.org/10.1016/j.specom.2015.02.001.

A. A. Borovkov, Mathematic Statistics [in Russian]. St. Petersburg: Lan’, 2010.

S. Kullback, Information Theory and Statistics. N.Y.: Dover Pub., 1997.

E. Estrada, H. Nazeran, F. Ebrahimi, M. Mikaeili, “Symmetric Itakura distance as an EEG signal feature for sleep depth determination,” Proc. of ASME Bioengineering Conf., 17-21 Jun. 2009, Lake Tahoe, USA. 2009, p.723-724. DOI: https://doi.org/10.1115/SBC2009-206233.

A. A. Gharbali, S. Najdi, J. M. Fonseca, “Investigating the contribution of distance-based features to automatic sleep stage classification,” Comput. Biology Medicine, v.96, p.8, 2017. DOI: https://doi.org/10.1016/j.compbiomed.2018.03.001.

B. R. Levin, Theoretic Principles of Statistic Radioengineering [in Russian]. Moscow: Radio i Svyaz’, 1989.

Published

2020-01-24

Issue

Section

Research Articles