مقاومسازی بازشناسی صحبت با به کارگیری فیلتر غیر خطی نامتقارن و استفاده از ویژگیهای طیفی دلتا
محورهای موضوعی : مهندسی برق و کامپیوترحسن فرسی 1 * , سمانه کوهی مقدم 2
1 - دانشگاه بیرجند
2 - دانشگاه پیامنور واحد مشهد
کلید واژه: بازشناسی صحبت ضرایب کپسترال نرمالیزهشده توان فیلتر غیر خطی نامتقارن ویژگیهای کپسترال دلتا,
چکیده مقاله :
در این مقاله یک الگوریتم استخراج ویژگی مقاوم به نویز را پیشنهاد میدهیم. در این الگوریتم به منظور استخراج ویژگی از یک فیلتر غیر خطی و پوشش موقتی استفاده میشود و با بهرهگیری ازویژگی دلتا- طیفی به جای ویژگی کپسترال- دلتا دقت بازشناسی صحبت به طور مطلوبی افزایش مییابد. تقریباً همه سیستمهای خودکار تشخیص صحبت (ASR) کنونی از ویژگیهای کپسترال- دلتا و دلتا- دلتا برای استخراج ویژگی صحبت استفاده میکنند. در این مقاله هدف، رسیدن به ویژگیهای مقاومی است که در شرایط مختلف نویزی بهبود بیشتری برای بازشناسی صحبت فراهم میآورد. برای تحقق این امر بر روی برخی از مشخصات کلیدی صحبت (خصوصاً مشخصات غیر ایستان صحبت) متمرکز شده که با سیگنالهای نویزی اختلاف دارد. نتایج آزمایشهای انجامگرفته نشان میدهد که دقت بازشناسی در مقایسه با MFCC و PLP در حضور انواع مختلف نویز بهبود یافته است.
In this paper, we propose a new feature extraction algorithm which is robust against noise. In the proposed algorithm, a non-linear filter with temporal masking are used for speech feature extraction and by applying delta spectral characteristics instead of delta cepstral, the accuracy of speech recognition is improved. Almost, all present Automatic Speech Recognition (ASR) systems use cepstral-delta and delta-delta characteristics for speech feature extraction. The aim of this paper is to reach the robust speech features which provide more accurate speech recognition under different noisy conditions. This is achieved by focusing on speech key features (especially non-stationary speech features) which highly differ from the noise signals. The obtaining experimental results show that the accuracy of speech recognition improves in comparison with traditional methods such as PLP and MFCC.
[1] B. Atal, "Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification," J. of the Acoustical Society of America, vol. 55, no. 6, pp. 1304-1312, Jun. 1974.
[2] P. Jain and H. Hermansky, "Improved mean and variance normalization for robust speech recognition," in Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing, vol. 6, pp. 80-85, May 2001.
[3] X. Huang, A. Acero, and H. W. Won, Spoken Language Processing: A Guide to Theory, Algorithm, and System Development, Upper Saddle River, NJ: Prentice Hall, 2001.
[4] Y. Obuchi, N. Hataoka, and R. M. Stern, "Normalization of time-derivative parameters for robust speech recognition in small devices," IEICE Trans. on Information and Systems, vol. 87, no. 4, pp. 1004-1011, Spring 2004.
[5] P. J. Moreno, B. Raj, and R. M. Stern, "A vector Taylor series approach for environment-independent speech recognition," in Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing, vol. 2, pp. 733-736, 7-10 May 1996.
[6] R. M. Stern, B. Raj, and P. J. Moreno, "Compensation for environmental degradation in automatic speech recognition," in Proc. of the ESCA Tutorial and Research Workshop on Robust Speech Recognition for Unknown Communication Channels, vol. 2, pp. 33-42, Apr. 1997.
[7] C. Kim and R. M. Stern, "Power function-based power distribution normalization algorithm for robust speech recognition," in Proc. IEEE Automatic Speech Recognition and Understanding Workshop, vol. 3, pp. 188-193, Nov. 2009.
[8] B. Raj, V. N. Parikh, and R. M. Stern, "The effects of background music on speech recognition accuracy," in Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing, vol. 2, pp. 851-854, Apr. 1997.
[9] B. Raj and R. M. Stern, "Missing-feature methods for robust automatic speech recognition," IEEE Signal Processing Magazine, vol. 22, no. 5, pp. 101-116, Apr. 2005.
[10] H. Hermansky, "Perceptual linear prediction analysis of speech," J. Acoust. Soc. Am., vol. 87, no. 4, pp. 1738-1752, Apr. 1990.
[11] C. Kim, Y. H. Chiu, and R. M. Stern, "Physiologically-motivated synchrony-based processing for robust automatic speech recognition," in Proc. INTERSPEECH-2006 Conf., pp. 1975-1978, Sep. 2006.
[12] H. Hermansky and N. Morgan, "RASTA processing of speech," IEEE. Trans. Speech Audio Process., vol. 2, no. 4, pp. 578-58, Oct. 1994.
[13] B. E. D. Kingsbury, N. Morgan, and S. Greenberg, "Robust speech recognition using the modulation spectrogram," Speech Communication, vol. 25, no. 1-3, pp. 117-132, May 1998.
[14] H. G. Hirsch and C. Ehrlicher, "Noise estimation techniques or robust speech recognition," in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 1, pp. 153-156, May 1995.
[15] C. Kim and R. M. Stern, "Nonlinear enhancement of onset for robust speech recognition," in Proc. INTERSPEECH-2010 Conf., vol. 1, pp. 2058-2061, Sep. 2010.
[16] S. F. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. Acoustic, Speech, and Signal Processing, vol. 27, no. 2, pp. 113-120, Apr. 1979.
[17] C. Kim and R. M. Stern, "Power function-based power distribution normalization algorithm for robust speech recognition," in Proc. IEEE Automatic Speech Recognition and Understanding Workshop, vol. 1, pp. 188-193, Dec. 2009.
[18] C. Kim and R. M. Stern, "Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring," in Proc. IEEE Int. Conf. on Acoust., Speech, and Signal Processing, vol. 1, pp. 4574-4577, May 2010.
[19] S. Furui, "Speaker-independent isolated word recognition based on emphasized spectral dynamics," in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 1, pp. 1991-1994, Apr. 1986.
[20] M. Bijankhan and J. Sheikhzadegan, "FARSDAT-the speech database of farsi spoken language," in Proc. 5th Australian Int. Conf. on Speech Science & Tech., vol. 2, pp. 826-831, Dec. 1994.
[21] SPIB, SPIB Noise Data, Available from: http://spib.rice.edu/spib/select_noise.html