Method for reduction of speech signal autoregression model for speech transmission systems on low-speed communication channels

V. V. Savchenko

doi:10.3103/S0735272721110030

Authors

V. V. Savchenko Linguistic University of Nizhny Novgorod, Nizhny Novgorod, Russian Federation https://orcid.org/0000-0003-3045-3337

DOI:

https://doi.org/10.3103/S0735272721110030

Keywords:

digital signal processing, speech signal, low-speed communication channel, digital spectral analysis, power spectral density, all-pole model, algorithms CELP

Abstract

In this paper it is considered the problem of reduction or reduction of the order p >> 1 of an autoregressive model (AR-model) of a speech signal by the criterion of minimum loss of useful information. The problem is formulated as an optimization problem in terms of discrete spectral modeling. It is indicated that the most acute problem in solving is the necessity to scale the AR-model parameters for the simulated signal at each step of iterative calculation process. To overcome this problem, it is proposed to use the measure of information divergence of signals in the frequency domain with the property of scale invariance as the goal functional. On its basis, a new method of the AR-model reduction is developed where the scaling operation exceeds the limits of the iterative optimization procedure. The effectiveness of the proposed method is substantiated theoretically and researched experimentally. It is shown that the main component of the achieved effect is the gain in accuracy of the reduced AR-model in the Kullback–Leibler information metric. The results obtained are addressed to researchers and developers of systems and technologies for digital speech transmission over low-speed communication channels.

References

G. Kitagawa, Introduction to Time Series Modeling. Chapman and Hall/CRC, 2020, doi: https://doi.org/10.1201/9780429197963.
L. Tan, J. Jiang, “Introduction to digital signal processing,” in Digital Signal Processing, Elsevier, 2019, pp. 1–12.
L. R. Rabiner, R. W. Schafer, “Introduction to digital speech processing,” Found. Trends® Signal Process., vol. 1, no. 1–2, pp. 1–194, 2007, doi: https://doi.org/10.1561/2000000001.
M. W. Spratling, “A review of predictive coding algorithms,” Brain Cogn., vol. 112, pp. 92–97, 2017, doi: https://doi.org/10.1016/j.bandc.2015.11.003.
G. Sharma, K. Umapathy, S. Krishnan, “Trends in audio signal feature extraction methods,” Appl. Acoust., vol. 158, p. 107020, 2020, doi: https://doi.org/10.1016/j.apacoust.2019.107020.
H. Chaouch, F. Merazka, P. Marthon, “Multiple description coding technique to improve the robustness of ACELP based coders AMR-WB,” Speech Commun., vol. 108, pp. 33–40, 2019, doi: https://doi.org/10.1016/j.specom.2019.02.002.
V. V. Savchenko, A. V. Savchenko, “Method for measuring distortions in speech signals during transmission over a communication channel to a biometric identification system,” Meas. Tech., vol. 63, no. 11, pp. 917–925, 2021, doi: https://doi.org/10.1007/s11018-021-01864-x.
Y. Gu, H.-L. Wei, “A robust model structure selection method for small sample size and multiple datasets problems,” Inf. Sci., vol. 451–452, pp. 195–209, 2018, doi: https://doi.org/10.1016/j.ins.2018.04.007.
S. Cui, E. Li, X. Kang, “Autoregressive model based smoothing Forensics of very short speech clips,” in 2020 IEEE International Conference on Multimedia and Expo (ICME), 2020, pp. 1–6, doi: https://doi.org/10.1109/ICME46284.2020.9102765.
S. L. Marple, Digital Spectral Analysis with Applications, 2nd ed. Mineola, New York: Dover Publications, 2019, uri: https://www.goodreads.com/book/show/19484239.
J. Benesty, J. Chen, Y. Huang, “Linear prediction,” in Springer Handbook of Speech Processing, Berlin, Heidelberg: Springer Berlin Heidelberg, 2008, pp. 121–134.
J. Gibson, “Mutual information, the linear prediction model, and CELP voice codecs,” Information, vol. 10, no. 5, p. 179, 2019, doi: https://doi.org/10.3390/info10050179.
Ç. Candan, “Making linear prediction perform like maximum likelihood in Gaussian autoregressive model parameter estimation,” Signal Process., vol. 166, p. 107256, 2020, doi: https://doi.org/10.1016/j.sigpro.2019.107256.
D. Xiao, F. Mo, Y. Zhang, M. Zhao, L. Ma, “An extended Levinson-Durbin algorithm and its application in mixed excitation linear prediction,” Heliyon, vol. 4, no. 11, p. e00948, 2018, doi: https://doi.org/10.1016/j.heliyon.2018.e00948.
M. Morise, “CheapTrick, a spectral envelope estimator for high-quality speech synthesis,” Speech Commun., vol. 67, pp. 1–7, 2015, doi: https://doi.org/10.1016/j.specom.2014.09.003.
V. Y. Semenov, “Methods for calculating and coding the parameters of autoregressive speech model when developing the vocoder based on fixed point signal process,” J. Autom. Inf. Sci., vol. 51, no. 2, pp. 30–40, 2019, doi: https://doi.org/10.1615/JAutomatInfScien.v51.i2.40.
V. V. Savchenko, A. V. Savchenko, “Guaranteed significance level criterion in automatic speech signal segmentation,” J. Commun. Technol. Electron., vol. 65, no. 11, pp. 1311–1317, 2020, doi: https://doi.org/10.1134/S1064226920110157.
A. V. Savchenko, V. V. Savchenko, “A method for measuring the pitch frequency of speech signals for the systems of acoustic speech analysis,” Meas. Tech., vol. 62, no. 3, pp. 282–288, 2019, doi: https://doi.org/10.1007/s11018-019-01617-x.
C. Liu, M. Jiang, “Robust adaptive filter with lncosh cost,” Signal Process., vol. 168, p. 107348, 2020, doi: https://doi.org/10.1016/j.sigpro.2019.107348.
S. Kullback, Information Theory and Statistics. New York: Dover Publications, 1997, uri: https://www.amazon.com/Information-Theory-Statistics-Dover-Mathematics/dp/0486696847.
V. V. Savchenko, A. V. Savchenko, “Criterion of significance level for selection of order of spectral estimation of entropy maximum,” Radioelectron. Commun. Syst., vol. 62, no. 5, pp. 223–231, 2019, doi: https://doi.org/10.3103/S0735272719050042.
V. V. Savchenko, L. V. Savchenko, “Speech signal autoregression modeling based on the discrete Fourier transform and scale-invariant measure of information discrimination,” J. Commun. Technol. Electron., vol. 66, no. 11, pp. 1266–1273, 2021, doi: https://doi.org/10.1134/S1064226921110085.
F. Mustiere, M. Bouchard, M. Bolic, “All-pole modeling of discrete spectral powers: A unified approach,” IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 2, pp. 705–708, 2012, doi: https://doi.org/10.1109/TASL.2011.2163511.
A. R. Sampson, “Stochastic Approximation,” in Wiley StatsRef: Statistics Reference Online, Wiley, 2014.
V. V. Savchenko, “Minimum of information divergence criterion for signals with tuning to speaker voice in automatic speech recognition,” Radioelectron. Commun. Syst., vol. 63, no. 1, pp. 42–54, 2020, doi: https://doi.org/10.3103/S0735272720010045.
A. V. Savchenko, V. V. Savchenko, “Scale-invariant modification of COSH distance for measuring speech signal distortions in real-time mode,” Radioelectron. Commun. Syst., vol. 64, no. 6, pp. 300–309, 2021, doi: https://doi.org/10.3103/S0735272721060030.
V. V. Savchenko, “Itakura–Saito divergence as an element of the information theory of speech perception,” J. Commun. Technol. Electron., vol. 64, no. 6, pp. 590–596, 2019, doi: https://doi.org/10.1134/S1064226919060093.
R. Gray, A. Buzo, A. Gray, Y. Matsuyama, “Distortion measures for speech processing,” IEEE Trans. Acoust. Speech, Signal Process., vol. 28, no. 4, pp. 367–376, 1980, doi: https://doi.org/10.1109/TASSP.1980.1163421.
E. Estrada, H. Nazeran, F. Ebrahimi, M. Mikaeili, “Symmetric Itakura distance as an EEG signal feature for sleep depth determination,” in ASME 2009 Summer Bioengineering Conference, Parts A and B, 2009, pp. 723–724, doi: https://doi.org/10.1115/SBC2009-206233.
D. Wang, M. Yu, C. B. Low, S. Arogeti, Model-Based Health Monitoring of Hybrid Systems. New York, NY: Springer New York, 2013, doi: https://doi.org/10.1007/978-1-4614-7369-5.
O. Diana, A. Mihaela, “Feature extraction and classification methods for a motor task brain computer interface: A comparative evaluation for two databases,” Int. J. Adv. Comput. Sci. Appl., vol. 8, no. 8, 2017, doi: https://doi.org/10.14569/IJACSA.2017.080834.
H. B. Kashani, A. Sayadiyan, “Sequential use of spectral models to reduce deletion and insertion errors in vowel detection,” Comput. Speech Lang., vol. 50, pp. 105–125, 2018, doi: https://doi.org/10.1016/j.csl.2017.12.008.
J. Gibson, “Speech compression,” Information, vol. 7, no. 2, p. 32, 2016, doi: https://doi.org/10.3390/info7020032.
G. Tamulevicius, J. Kaukenas, “High-order autoregressive modeling of individual speaker’s qualities,” in 2017 5th IEEE Workshop on Advances in Information, Electronic and Electrical Engineering (AIEEE), 2017, pp. 1–6, doi: https://doi.org/10.1109/AIEEE.2017.8270551.