Scale-invariant modification of COSH distance for measuring speech signal distortions in real-time mode
Keywords:digital signal processing, speech signal, spectral analysis, interference protection, distance measures, Itakura-Saito divergence, COSH distance, Itakura distance, Kullback-Leibler divergence
This study considers a new measure of distortions of speaker speech sounds that is invariant with respect to the gain of speech signal in a communication channel. Properties of the measure are investigated in comparison with its closest analogues. A series of theoretical features has been proved. The new measure is shown to combine advantages of the symmetric Itakura distance in relation to the noise immunity of automatic speech processing, on the one hand, and of the COSH distance in relation to the sensitivity to speech signal distortions, on the other hand. Using the proprietary software, an experiment was set up and conducted. Estimates of the new measure dependence on the signal-to-noise ratio were presented. It has been shown that the logarithmic presentation of this relationship has the pattern close to linear. The obtained results are intended to be used in development of new systems and upgrading of existing systems and technologies for digital signal processing and speech quality analysis under the noise exposure.
J. Sadasivan, C. S. Seelamantula, N. R. Muraka, “Speech enhancement using a risk estimation approach,” Speech Commun., vol. 116, pp. 12–29, 2020, doi: https://doi.org/10.1016/j.specom.2019.11.001.
V. V. Savchenko, “Itakura–Saito divergence as an element of the information theory of speech perception,” J. Commun. Technol. Electron., vol. 64, no. 6, pp. 590–596, 2019, doi: https://doi.org/10.1134/S1064226919060093.
M. A. Bakhshali, M. Khademi, A. Ebrahimi-Moghadam, S. Moghimi, “EEG signal classification of imagined speech based on Riemannian distance of correntropy spectral density,” Biomed. Signal Process. Control, vol. 59, p. 101899, 2020, doi: https://doi.org/10.1016/j.bspc.2020.101899.
A. A. Borovkov, Mathematical Statistics. Additional Chapters, [in Russian]. Moscow: Nauka, Fizmatlit, 1984.
C. Liu, M. Jiang, “Robust adaptive filter with lncosh cost,” Signal Process., vol. 168, p. 107348, 2020, doi: https://doi.org/10.1016/j.sigpro.2019.107348.
D. Prasetyawan, T. Nakamoto, “Comparison of NMF with Kullback-Leibler divergence and Itakura-Saito divergence for Odor approximation,” in 2019 IEEE International Symposium on Olfaction and Electronic Nose (ISOEN), 2019, pp. 1–3, doi: https://doi.org/10.1109/ISOEN.2019.8823186.
Y. Matsuyama, A. Buzo, R. Gray, “Spectral distortion measures for speech compression. Information Systems Lab., Stanford Electronics Lab., Tech. Rep. 6504-3,” Stanford, California, 1978. uri: https://www.researchgate.net/publication/234252904.
F. Itakura, S. Saito, “Analysis synthesis telephony based on the maximum likelihood method,” in Proc. 6th of the International Congress on Acoustics, 1968, pp. C17–C20, uri: http://www.fon.hum.uva.nl/praat/manual/Itakura___Saito__1968_.html.
R. Gray, A. Buzo, A. Gray, Y. Matsuyama, “Distortion measures for speech processing,” IEEE Trans. Acoust. Speech, Signal Process., vol. 28, no. 4, pp. 367–376, 1980, doi: https://doi.org/10.1109/TASSP.1980.1163421.
S. Kullback, Information Theory and Statistics. New York: Dover Publications, 1997, uri: https://www.amazon.com/Information-Theory-Statistics-Dover-Mathematics/dp/0486696847.
F.-L. Xie, F. K. Soong, H. Li, “Voice conversion with SI-DNN and KL divergence based mapping without parallel training data,” Speech Commun., vol. 106, pp. 57–67, 2019, doi: https://doi.org/10.1016/j.specom.2018.11.007.
A. A. Gharbali, S. Najdi, J. M. Fonseca, “Investigating the contribution of distance-based features to automatic sleep stage classification,” Comput. Biol. Med., vol. 96, pp. 8–23, 2018, doi: https://doi.org/10.1016/j.compbiomed.2018.03.001.
V. V. Savchenko, “A method of measuring the index of acoustic voice quality based on an information-theoretic approach,” Meas. Tech., vol. 61, no. 1, pp. 79–84, 2018, doi: https://doi.org/10.1007/s11018-018-1391-8.
Y. Gu, H.-L. Wei, “A robust model structure selection method for small sample size and multiple datasets problems,” Inf. Sci., vol. 451–452, pp. 195–209, 2018, doi: https://doi.org/10.1016/j.ins.2018.04.007.
F. Mustiere, M. Bouchard, M. Bolic, “All-pole modeling of discrete spectral powers: A unified approach,” IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 2, pp. 705–708, 2012, doi: https://doi.org/10.1109/TASL.2011.2163511.
S. Shamila Rachel, U. Snekhalatha, K. Vedhasorubini, D. Balakrishnan, “Spectral analysis of speech signal characteristics: A comparison between healthy controls and Laryngeal disorder,” in Proc. International Conference on Intelligent Computing and Applications, 2018, pp. 333–341, doi: https://doi.org/10.1007/978-981-10-5520-1_31.
B. Wei, J. D. Gibson, “A new discrete spectral modeling method and an application to CELP coding,” IEEE Signal Process. Lett., vol. 10, no. 4, pp. 101–103, 2003, doi: https://doi.org/10.1109/LSP.2003.808550.
A. Ben Aicha, “Machine learning based approach to assess denoised speech,” Procedia Comput. Sci., vol. 159, pp. 698–706, 2019, doi: https://doi.org/10.1016/j.procs.2019.09.225.
M. E. Hossain, M. S. A. Zilany, E. Davies-Venn, “On the feasibility of using a bispectral measure as a nonintrusive predictor of speech intelligibility,” Comput. Speech Lang., vol. 57, pp. 59–80, 2019, doi: https://doi.org/10.1016/j.csl.2019.02.003.
V. V. Savchenko, A. V. Savchenko, “Method for measuring distortions of a speech signal during its transmission over a communication channel to a biometric identification system,” Izmer. Tekhnika, no. 11, pp. 65–72, 2020, doi: https://doi.org/10.32446/0368-1025it.2020-11-65-72.
V. V. Savchenko, “Minimum of information divergence criterion for signals with tuning to speaker voice in automatic speech recognition,” Radioelectron. Commun. Syst., vol. 63, no. 1, pp. 42–54, 2020, doi: https://doi.org/10.3103/S0735272720010045.
V. V. Savchenko, “Words phonetic decoding method with the suppression of background noise,” J. Commun. Technol. Electron., vol. 62, no. 7, pp. 788–793, 2017, doi: https://doi.org/10.1134/S1064226917070099.
V. V. Savchenko, A. V. Savchenko, “Criterion of significance level for selection of order of spectral estimation of entropy maximum,” Radioelectron. Commun. Syst., vol. 62, no. 5, pp. 223–231, 2019, doi: https://doi.org/10.3103/S0735272719050042.
J. Benesty, J. Chen, Y. Huang, “Linear prediction,” in Springer Handbook of Speech Processing, Berlin, Heidelberg: Springer Berlin Heidelberg, 2008, pp. 121–134.
F. Itakura, “Minimum prediction residual principle applied to speech recognition,” IEEE Trans. Acoust. Speech, Signal Process., vol. 23, no. 1, pp. 67–72, 1975, doi: https://doi.org/10.1109/TASSP.1975.1162641.
E. Estrada, H. Nazeran, F. Ebrahimi, M. Mikaeili, “Symmetric Itakura distance as an EEG signal feature for sleep depth determination,” in ASME 2009 Summer Bioengineering Conference, Parts A and B, 2009, pp. 723–724, doi: https://doi.org/10.1115/SBC2009-206233.
O. Diana, A. Mihaela, “Feature extraction and classification methods for a motor task brain computer interface: A comparative evaluation for two databases,” Int. J. Adv. Comput. Sci. Appl., vol. 8, no. 8, 2017, doi: https://doi.org/10.14569/IJACSA.2017.080834.