Evaluation of potential efficiency of speech coding using different parameters of linear prediction

Authors

DOI:

https://doi.org/10.3103/S0735272720090010

Keywords:

linear prediction of speech, spectral envelope of speech, efficient coding, line spectrum representation, linear prediction coefficients, reflection coefficients, log area ratio, cepstral coefficients, line spectral pairs/projections, line spectral frequencies, vector quantization

Abstract

The paper presents the results for estimating the potential coding efficiency of spectrum envelope waveform (SEW) of speech signals (SS) by using the linear prediction (LP) method and using different sets of alternative equivalent parameters (AEP) that include line spectral pairs/projections (LSP), line spectral frequencies (LSF), and alternative line spectral parameters of highest splitting (LSP-HS and LSF-HS). These results are derived by the proposed approach based on using the LP method for SS SEW coding with maximum frame overlap during its analysis. It includes the examination of such coding scheme as approaching an appropriate analog vector source in each of AEP spaces, the stepwise designing of an appropriate vector codebook in each of spaces with gradual increase of its size and employing the ideal scheme of vector quantization with exhaustive search at each stage. The distortions versus rate relationships have been calculated based on the results of analysis in each of AEP spaces. In addition, a generalized function is proposed for approximation of the specified relationships. The technique is presented that makes it possible in each AEP space to estimate Shannon’s lower bound, the dispersion of the Gaussian equivalent source, differential entropy, redundancy, values of the weighting constant in the generalized formula of entropy, and other entropy characteristics of coding the equivalent sources (parameters of SS SEW) in these spaces. The efficiency indicators of real and potential coding of corresponding AEP are proposed and calculated. It has been shown that in conjunction with the combination of proposed efficiency indicators, the best results demonstrate spaces of line spectral parameters of highest splitting (LSP-HS and LSF-HS).

References

W. C. Chu, Speech Coding Algorithms: Foundation and Evolution of Standardized Coders. New Jersey: Wiley, 2003, uri: https://www.wiley.com/en-us/Speech+Coding+Algorithms%3A+Foundation+and+Evolution+of+Standardized+Coders-p-9780471668879.

O. I. Shelukhin, N. F. Lukyantsev, Digital Processing and Transmission of Speech Signals. Moscow: Radio i Svyaz’, 2000.

J. D. Markel, A. H. Gray, Linear Prediction of Speech, vol. 12. Berlin, Heidelberg: Springer Berlin Heidelberg, 1976, doi: https://doi.org/10.1007/978-3-642-66286-7.

L. R. Rabiner, R. W. Schafer, Digital Processing of Speech Signals. New Jersey: Prentice-Hall, 1978.

C. E. Shannon, “A mathematical theory of communication,” Bell Syst. Tech. J., vol. 27, no. 3, pp. 379–423, 1948, doi: https://doi.org/10.1002/j.1538-7305.1948.tb01338.x.

C. E. Shannon, “Communication in the presence of noise,” Proc. IRE, vol. 37, no. 1, pp. 10–21, 1949, doi: https://doi.org/10.1109/JRPROC.1949.232969.

F. F. Dubrovka, V. A. Tretiakov, “Spectral efficiency analysis of digital signals for 3.1–10.6 GHz ultra-wideband radio systems,” Radioelectron. Commun. Syst., vol. 54, no. 9, pp. 465–471, 2011, doi: https://doi.org/10.3103/S0735272711090019.

F. F. Dubrovka, V. O. Tretiakov, “Limiting ratios between symbol rate and minimal carrier frequency in ultra-wideband digital information transmission systems,” Radioelectron. Commun. Syst., vol. 52, no. 1, pp. 1–6, 2009, doi: https://doi.org/10.3103/S0735272709010014.

C. Shannon, Studies on Information Theory and Cybernetics, [in Russian]. Moscow: Inostrannaya Literatura, 1963.

J. Makhoul, S. Roucos, H. Gish, “Vector quantization in speech coding,” Proc. IEEE, vol. 73, no. 11, pp. 1551–1588, 1985, doi: https://doi.org/10.1109/PROC.1985.13340.

A. N. Kolmogorov, “Three approaches to the definition of the concept ‘quantity of information,’” Probl. Peredachi Informatsii, vol. 1, no. 1, pp. 3–11, 1965, uri: http://mi.mathnet.ru/eng/ppi68.

A. N. Kolmogorov, Theory of Information and Theory of Algorithms, [in Russian]. Moscow: Nauka, 1987.

3GPP, “European digital cellular telecommunications system; half rate speech. part 2: half rate speech transcoding (gsm 06.20),” in GSM. Global System for Mobile Communications. ETS 300 581-2 (GSM 06.20 version 4.2.1), 3GPP, 1995.

J. V. Macres, “Theory and implementation of the digital cellular standard voice coder: vselp on the tms320c5x,” 1994. uri: https://www.ti.com/lit/an/spra136/spra136.pdf?ts=1601298356148.

U. S. D. of Defense, “Analog to digital conversion of voice by 2400 bit/second linear predictive coding,” 1984. uri: https://nvlpubs.nist.gov/nistpubs/Legacy/FIPS/fipspub137.pdf.

3GPP, “Enhanced full rate (efr) speech transcoding (gsm 06.60) / draft prets 300 726 (gsm 06.60 version 5.0.0),” in GSM. Global System for Mobile Communications. Digital cellular telecommunications system, 3GPP, 1996.

3GPP, “Adaptive multi-rate (amr) speech transcoding (gsm 06.90 version 7.2.1 release 1998) / etsi en 301 704 v7.2.1,” in GSM. Global System for Mobile Communications: Digital cellular telecommunications system (Phase 2+), 3GPP, 2000.

3GPP, “European standard (telecommunications series). terrestrial trunked radio (tetra),” in Speech codec for full-rate traffic channel; Part 2: TETRA codec / ETSI EN 300 395-2 v1.3.1, 3GPP, 2005.

S. A. NATO, “The 600 bit/s, 1200 bit/s and 2400 bit/s nato interoperable narrow band voice coder,” in STANAG 4591 С3 (Edition 1), 2008.

U. S. D. T. Defense, “Analog to digital conversion of radio voice by 4,800 bit/second code excited linear prediction (celp),” in FED-STD-1016-CELP, 1991.

T. S. S. ITU, “General aspects of digital transmission systems,” in ITU-T Recommendation G.729, Geneva: ITU, 1996.

T. S. S. ITU, “G.729: reduced complexity 8 kbit/s cs-acelp speech codec,” in ITU-T Recommendation G.729, Geneva: ITU, 1996.

3GPP, “Full rate speech; transcoding (gsm 06.10 version 5.1.1) / ets 300 961 (gsm 06.10 version 5.1.1),” in GSM. Global System for Mobile Communications: Digital Cellular Telecommunications System (Phase 2+), 3GPP, 1998.

K. Koishida, K. Tokuda, T. Kobayashi, S. Imai, “CELP coding system based on mel-generalized cepstral analysis,” in Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP ’96, 1996, vol. 1, pp. 318–321, doi: https://doi.org/10.1109/ICSLP.1996.607117.

K. Koishida, G. Hirabayashi, K. Tokuda, T. Kobayashi, “A wideband celp speech coder at 16 kbit/s based on mel-generalized cepstral analysis,” in Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP ’98 (Cat. No.98CH36181), 1998, vol. 1, pp. 161–164, doi: https://doi.org/10.1109/ICASSP.1998.674392.

R. Vích, Z. Smékal, “LPC and ccf vocal tract models in speech synthesis,” in Proc. of 9th European Signal Processing Conference (EUSIPCO 1998), 1998, uri: https://ieeexplore.ieee.org/document/7089808.

J. Přibil, A. Madlová, “Two speech synthesis methods based on cepstral parameterization,” Radioengineering, vol. 11, no. 2, pp. 35–39, 2002.

G. Strecha, M. Eichner, “Low resource tts synthesis based on cepstral filter with phase randomized excitation,” in Proc. of SPECOM 2006, 2006, uri: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.518.4400&rep=rep1&type=pdf.

R. Vích, M. Vondra, “Complex cepstrum in speech synthesis,” in Proc. of BIOSIGNAL 2010. Analysis of Biomedical Signals and Images, 2010, pp. 37–42.

J. Al-Kheir, Z. Smekal, “Cepstral vocal tract modelling for text-to-speech synthesis,” Damascus Univ. J., vol. 29, no. 1, p. 1, 2013.

F. Itakura, “Line spectrum representation of linear predictor coefficients of speech signals,” J. Acoust. Soc. Am., vol. 57, no. S1, pp. S35–S35, 1975, doi: https://doi.org/10.1121/1.1995189.

F. Itakura, N. Sugamura, “Sound synthesizer,” 4393272, G10L 1/00, 1980.

F. Soong, B. Juang, “Line spectrum pair (lsp) and speech data compression,” in ICASSP ’84. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 9, pp. 37–40, doi: https://doi.org/10.1109/ICASSP.1984.1172448.

3GPP, “Enhanced variable rate codec, speech service. option 3 for wideband spread spectrum digital systems,” in 3rd Generation Partnership Project 2 “3GPP2”. 3GPP2 C.S0014-A v1.0, 3GPP, 2004.

O. I. Pavlov, “The direct п-transform in linear speech prediction,” Radioelectron. Commun. Syst., vol. 43, no. 12, pp. 35–44, 2000.

R. Viswanathan, J. Makhoul, “Quantization properties of transmission parameters in linear predictive systems,” IEEE Trans. Acoust. Speech, Signal Process., vol. 23, no. 3, pp. 309–321, 1975, doi: https://doi.org/10.1109/TASSP.1975.1162675.

N. Sugamura, Speech Signal Coding Using Line Spectrum Parameters. Osaka University Knowledge Archive, 1984, uri: https://ir.library.osaka-u.ac.jp/repo/ouka/all/623/06964_論文.pdf.

Y. Bistritz, H. Lev-Ari, T. Kailath, “Immittance-domain levinson algorithms,” IEEE Trans. Inf. Theory, vol. 35, no. 3, pp. 675–682, 1989, doi: https://doi.org/10.1109/18.30994.

Y. Bistritz, S. Peller, “Immittance spectral pairs (isp) for speech encoding,” in IEEE International Conference on Acoustics Speech and Signal Processing, 1993, vol. 2, pp. 9–12 vol.2, doi: https://doi.org/10.1109/ICASSP.1993.319215.

V. Semenov, “Computation of immittance and line spectral frequencies based on inter-frame ordering property,” J. Comput., vol. 2, no. 7, pp. 75–80, 2007, doi: https://doi.org/10.4304/jcp.2.7.75-80.

F. Itakura, S. Saito, “On the optimum quantization of feature parameters in the parcor speech synthesizer,” in Proc. Conf. Speech Commun. Process., 1972, pp. 434–437.

K. K. Paliwal, B. S. Atal, “Efficient vector quantization of lpc parameters at 24 bits/frame,” IEEE Trans. Speech Audio Process., vol. 1, no. 1, pp. 3–14, 1993, doi: https://doi.org/10.1109/89.221363.

K. K. Paliwal, W. B. Kleijn, “Quantization of lpc parameters,” in Speech coding and synthesis, W. B. Kleijn and K. K. Paliwal, Eds. Amsterdam; New York: Elsevier, 1995, pp. 433–466.

O. I. Pavlov, P. A. Stasevich, G. M. Tertychnyi, “Evaluation of coding efficiency of the spectral envelope of voiced signals in spaces of line spectrum parameters of the highest regression using the cluster analysis method,” in Proc. of 9-th Ukrainian Int. Conf. on Obroblennia sygnaliv i zobrazhen ta rozpiznavannia obraziv, 2008.

M. I. Mazurkov, Foundations of Data Transmission Theory: Manual for Graduate Students, [in Ukrainian]. Odessa: Nauka i Tekhnika, 2005.

J. Gibson, “Rate distortion functions and rate distortion function lower bounds for real-world sources,” Entropy, vol. 19, no. 11, p. 604, 2017, doi: https://doi.org/10.3390/e19110604.

C. A. Franco-Galvan, J. A. Herrera-Camacho, B. Escalante-Ramirez, “Application of different statistical tests for validation of synthesized speech parameterized by cepstral coefficients and lsp,” Comput. y Sist., vol. 23, no. 2, pp. 461–467, 2019, doi: https://doi.org/10.13053/cys-23-2-2977.

H. S. Sung, E. M. Oh, “Determining weighting functions for line spectral frequency coefficients,” 10580425B2, G10L 19/02, 2017.

Y. Xue et al., “Fast computation of lsp frequencies using the bairstow method,” Electronics, vol. 9, no. 3, p. 387, 2020, doi: https://doi.org/10.3390/electronics9030387.

Y. Linde, A. Buzo, R. Gray, “An algorithm for vector quantizer design,” IEEE Trans. Commun., vol. 28, no. 1, pp. 84–95, 1980, doi: https://doi.org/10.1109/TCOM.1980.1094577.

Published

2020-09-21

Issue

Section

Research Articles