Voice activity detection algorithm using spectral-correlation and wavelet-packet transformation

O. Korniienko; E. A. Machusky

doi:10.3103/S0735272718050011

Authors

O. Korniienko National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute", Ukraine
E. A. Machusky National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute", Ukraine

DOI:

https://doi.org/10.3103/S0735272718050011

Keywords:

voice activity detection, correlation analysis, wavelet-packet analysis, critical band, wavelet-packet cepstral coefficients

Abstract

It is developed the voice activity detection algorithm using noise classification technique. It is proposed the spectral-correlation and wavelet-packet (WP) features of frames for voice activity estimation. There are tested three WP trees for effective representing of audio segments: mel-scaled wavelet packet tree, bark-scaled wavelet packet tree and ERB-scaled (equivalent rectangular bandwidth) wavelet packet tree. Application only two principal components of WP features allows to classify accurately the environment noise. The using wavelet-packet tree design which follows the concept of equivalent rectangular bandwidth for acoustic feature extraction allows to increase the voice/silence segments classification accuracy by at least 4% in compare to other classification based voice activity detection algorithms for different noise.

References

KIM, Juntae; KIM, Jaeseok; LEE, Seunghyung; PARK, Jinuk; HAHN, Minsoo. “Vowel based voice activity detection with LSTM recurrent neural network,” Proc. of 8th Int. Conf. on Signal Processing Systems, 21-24 Nov. 2016, Auckland, New Zealand. ACM, NY, 2016. DOI: https://doi.org/10.1145/3015166.3015207.

BENYASSINE, A.; SHLOMOT, E.; SU, H.-Y.; MASSALOUX, D.; LAMBLIN, C.; PETIT, J.-P. “ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications,” IEEE Commun. Mag., v.35, n.9, p.64-73, 1997. DOI: https://doi.org/10.1109/35.620527.

KARRAY, L.; MARTIN, A. “Towards improving speech detection robustness for speech recognition in adverse conditions,” Speech Commun., v.40, n.3, p.261-276, 2003. DOI: https://doi.org/10.1016/S0167-6393(02)00066-3.

ALAM, J.; KENNY, P.; OUELLET, P.; STAFYLAKIS, T.; DUMOUCHEL, P. “Supervised/unsupervised voice activity detectors for text-dependent speaker recognition on the RSR2015 corpus,” Proc. of Odyssey 2014: The Speaker and Language Recognition Workshop, 16-19 June 2014, Joensuu, Finland. Joensuu, 2014, p.123-130.

GRAF, S.; HERBIG, T.; BUCK, M.; SCHMIDT, G. “Features for voice activity detection: a comparative analysis,” EURASIP J. Advances Signal Processing, v.2015, p.91, 2015. DOI: https://doi.org/10.1186/s13634-015-0277-z.

ATAL, B.; RABINER, L. “A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition,” IEEE Trans. Acoustics, Speech, Signal Process., v.24, n.3, p.201-212, 1976. DOI: https://doi.org/10.1109/TASSP.1976.1162800.

KINNUNEN, T.; LI, H. “An overview of text-independent speaker recognition: from features to supervectors,” Speech Commun., v.52, n.1, p.12-40, 2010. DOI: https://doi.org/10.1016/j.specom.2009.08.009.

CHEN, S.-H.; WU, H.-T.; CHANG, Y.; TRUONG, T.K. “Robust voice activity detection using perceptual wavelet-packet transform and Teager energy operator,” Pattern Recognition Lett., v.28, n.11, p.1327-1332, 2007. DOI: https://doi.org/10.1016/j.patrec.2006.11.023.

CHUANGSUWANICH, E.; GLASS, J. “Robust voice activity detector for real world applications using harmonicity and modulation frequency,” Proc. of INTERSPEECH 2011, 28-31 Aug. 2011, Florence, Italy. ISCA, 2011, p.2645-2648.

VOLFOVSKIY, B.N. “Multiple auto-correlation processing and its possibilites for detection of the harmonic signal in a mixture of signal and noise,” Informatsionnoye Protivodeistviye Ugrozam Terrorizma, n.1, p.91-99, 2002.

MADHU, S.; BHAVANI, H.B.; SUMATHI, S. “Performance analysis of thresholding techniques for denoising of simulated partial discharge signals corrupted by Gaussian white noise,” Proc. of Int. Conf. on Power and Advanced Control Engineering, ICPACE, 12-14 Aug. 2015, Bangalore, India. IEEE, 2015. DOI: https://doi.org/10.1109/ICPACE.2015.7274980.

ZIOLKO, B.; MANANDHAR, S.; WILSON, R.C.; ZIOLKO, M. “Wavelet method of speech segmentation,” Proc. of 14th European Signal Processing Conf., EUSIPCO, 4-8 Sept. 2006, Florence, Italy. IEEE, 2006. URI: http://ieeexplore.ieee.org/document/7071218/.

ELTON, R.J.; VASUKI, P.; MOHANALIN, J. “Voice activity detection using fuzzy entropy and support vector machine,” Entropy, v.18, n.8, p.298, 2016. DOI: http://dx.doi.org/10.3390/e18080298.

LEE, G.; NA, S.D.; CHO, J.-H.; KIM, M.N. “Voice activity detection algorithm using perceptual wavelet entropy neighbor slope,” Bio-Medical Materials and Engineering, v.24, n.6, p.3295-3301, 2014. DOI: https://doi.org/10.3233/BME-141152.

RABINER, L.; JUANG, B.-H. Fundamentals of Speech Recognition. Upper Saddle River: Prentice-Hall, 1993.

FLETCHER, H. “Auditory patterns,” Rev. Modern Phys., v.12, n.1, p.47-65, 1940. DOI: https://doi.org/10.1103/RevModPhys.12.47.

MOHAMMADI, M.; ZAMANI, B.; NASERSHARIF, B.; RAHMANI, M.; AKBARI, A. “A wavelet based speech enhancement method using noise classification and shaping,” Proc. of INTERSPEECH, 22-26 Sept. 2008, Brisbane, Australia. ISCA, 2008, p.561-564.

SARIKAYA, R.; PELLOM, L. Bryan; HANSEN, J.H.L. “Wavelet packet transform features with application to speaker identification,” Proc. of IEEE Nordic Signal Processing Symp., 8-11 Jun. 1998, Vigs, Denmark. IEEE, 1998, p.81-84. URI: https://www.isca-speech.org/archive/norsig_98/nos8_081.html.

DESHPANDE, M.S.; HOLAMBE, R.S. “Speaker identification using admissible wavelet packet based decomposition,” Int. J. Signal Process., v.10, n.6, p.83-86, 2010.

DOBRUSHKIN, G.O.; DANILOV, V.Y. “Comparison of quality of Mel- and Bark-frequency cepstral coefficients for parameterization of speech signals,” Proc. Petro Mohyla Black Sea National University. Ser. Computer Technology, v.160, n.148, p.167-171, 2011. URI: http://kt.chdu.edu.ua/article/view/68900.

SAHU, P.K.; BISWAS, Astik; BHOWMICK, Anirban; CHANDRA, Mahesh. “Auditory ERB like admissible wavelet packet features for TIMIT phoneme recognition,” Eng. Sci. Technol. Int. J., v.17, n.3, p.145-151, 2014. DOI: https://doi.org/10.1016/j.jestch.2014.04.004.

WELCH, P. “The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms,” IEEE Trans. Audio Electroacoust., v.15, n.2, p.70-73, 1967. DOI: https://doi.org/10.1109/TAU.1967.1161901.

RAMIREZ, J.; SEGURA, J.C.; BENITEZ, C.; DE LA TORRE, A.; RUBIO, A. “An effective subband OSF-based VAD with noise reduction for robust speech recognition,” IEEE Trans. Speech Audio Process., v.13, n.6, p.1119-1129, 2005. DOI: https://doi.org/10.1109/TSA.2005.853212.

THATPHITHAKKUL, N.; KRUATRACHUE, B.; WUTIWIWATCHAI, C.; MARUKATAT, Sanparith; BOONPIAM, Vataya. “Robust speech recognition using PCA-based noise classification,” Proc. of SPECCOM, 2004, p.45-53.

ZOU, Y.X.; ZHENG, W.Q.; SHI, Wei; LIU, Hong. “Improved voice activity detection based on support vector machine with high separable speech feature vectors,” Proc. of 19th Int. Conf. on Digital Signal Processing, 20-23 Aug. 2014, Hong Kong, China. IEEE, 2014. DOI: https://doi.org/10.1109/ICDSP.2014.6900767.

GAROFOLO, J.S.; LAMEL, L.F.; FISHER, W.M.; FISCUS, J.G.; PALLETT, D.S.; DAHLGREN, N.L. “DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus,” NIST, 1986. URI: https://catalog.ldc.upenn.edu/ldc93s1.

VoxForge, Free Speech Recognition. URI: http://voxforge.org.

PANAYOTOV, V.; CHEN, G.; POVEY, D.; KHUDANPUR, S. “LibriSpeech: An ASR corpus based on public domain audio books,” Proc. of IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, 19-24 Apr. 2015, Brisbane, QLD, Australia. IEEE, 2015, p.5206-5210. DOI: https://doi.org/10.1109/ICASSP.2015.7178964.

VARGA, A.; STEENEKEN, H.J.M. “Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems,” Speech Commun., v.12, n.3, p.247-253, 1993. DOI: https://doi.org/10.1016/0167-6393(93)90095-3.

KORNIIENKO, O.O. “Wavelet-packet features of speech signal in the speaker recognition problem,” Measuring and Computing Devices in Technological Processes, n.2, p.98, 2017.

KORNIIENKO, O.O.; KUSCH, S.M. “Adaptive algorithm for voice activity detection,” Proc. of Int. Sci. and Tech. Conf. on Radio Engineering Fields, Signals, Devices and Systems, 2015, Kyiv, Ukraine. Kyiv, 2015. URI: http://conf.rtf.kpi.ua/attachments/article/490/RTPSAS_2015_s8_t04.pdf.

FRIEDMAN, J.H. “Another Approach to Polychotomous Classification,” Technical Report. Department of Statistics. Stanford University, 1996, p.1-14. URI: http://www-stat.stanford.edu/~jhf/ftp/poly.ps.Z.

CHANG, C.-C.; LIN, C.-J. “LIBSVM: A library for support vector machines,” ACM Trans. Intelligent Syst. Technol., v.2, n.3, Article No. 27, 2011. DOI: https://doi.org/10.1145/1961189.1961199.

RAMYREZ, J.; YÉLAMOS, P.; GÓRRIZ, J.M.; SEGURA, J.C.; GARCÍA, L. “Speech/non-speech discrimination combining advanced feature extraction and SVM learning,” Proc. of 9th Int. Conf. on Spoken Language Processing, 17-21 Sept. 2006, Pittsburgh, Pennsylvania. 2006, p.1662-1665.

ZHANG, Y.; TANG, Z.-M.; LI, Y.-P.; LUO, Y. “A hierarchical framework approach for voice activity detection and speech enhancement,” The Scientific World J., v.2014, Article ID 723643, 2014. DOI: http://dx.doi.org/10.1155/2014/723643.

SOHN, J.; KIM, N.S.; SUNG, W. “A statistical model-based voice activity detection,” IEEE Signal Process. Lett., v.6, n.1, p.1-3, 1999. DOI: https://doi.org/10.1109/97.736233.

EYBEN, F.; WENINGER, F.; SQUARTINI, S.; SCHULLER, B. “Real-life voice activity detection with LSTM recurrent neural networks and an application to Hollywood movies,” Proc. of IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, 26-31 May 2013, Vancouver, BC, Canada. IEEE, 2013, p.483-487. DOI: http://dx.doi.org/10.1109/ICASSP.2013.6637694.

Voice activity detection algorithm using spectral-correlation and wavelet-packet transformation

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

Make a Submission

ADVANTAGES

Information

MEMBER

Subscription

Developed By