Improvement of GMM Model Using PSK for Spoken Language Recognition Systems

Subject Areas : electrical and computer engineering

F. Ghasemian ^{1
*} , M. M. Homayounpour ²

1 -
2 -

Received: 2015-11-28 Accepted : 2015-11-28 Published : 2011-09-21

Keywords: Language recognition sequence kernel PSK support vector machine (SVM) Gaussian mixture model (GMM),

Abstract :

Gaussian Mixture Model (GMM) is a simple and effective method for statistical modeling of the feature space which is widely used in spoken language recognition systems and EM algorithm is used for training the parameters of this model. In this paper, considering the weakness of GMM models, a new model named PAW-GMM is proposed. In this model, the power of each component of GMM in discriminating one language from the others is considered for determining the weights of components. Since PAW-GMM considers the discriminating property of GMM components, it could increase the accuracy of language recognition systems. Also one of the problems of GMM-PSK-SVM which is one of the best GMM models is the high complexity especially for high number of languages. Therefore UBM-PSK-SVM is proposed that has the same accuracy as GMM-PSK-SVM but lower complexity. Experiments on four languages of OGI corpus show the efficiency of the proposed techniques.

References:

[1] A. Ziaei, S. M. Ahadi, S. M. Mirrezaie, and H. Yeganeh, "Spoken language identification using a new sequence kernel - based SVM back - end classifier," in Proc. IEEE Int. Symp. on Signal Processing and Information Technology, pp. 324-329, Dec. 2008.
[2] M. A. Zissman, "Comparision of four approaches to automatic language identification of telephone speech," IEEE Trans. on Speech and Audio Processing, vol. 4, no. 1, pp. 31-44, Jan. 1996
[3] H. Li, B. Ma, and C. H. Lee, "A vector space modeling approach to spoken language identification," IEEE Trans. on Audio, Speech and Language Processing, vol. 15, no. 1, pp. 271-284, Jan. 2007.
[4] R. Tong, B. Ma, H. Li, and E. S. Chng, "A target oriented phonotactic front - end for spoken language recognition," IEEE Trans. on Audio, Speech and Language Processing, vol. 17, no. 7, pp. 1335-1347, Sep. 2009.
[5] P. A. Torres-Carrasquillo, E. Singer, M. A. Kohler, R. J. Greene, D. A. Reynolds, and J. A. Deller, "Approaches to language identification using gaussian mixture models and shifted delta cepstral features," in Proc. ICSLP, pp. 89-92, 2002.
[6] K. -A. Lee, C. You, H. Li, T. Kinnunen, and D. Zhu, "Characterizing speech utterances for speaker verification with sequence kernel SVM," in Proc. InterSpeech 2008, pp. 1397-1400, 2008.
[7] W. Campbell, J. Campbell, D. Reynolds, E. Singer, and P. Torres - Carrasquillo, "Support vector machines for speaker and language recognition," Comput. Speech Lang., vol. 20, no. 2-3, pp. 210-229, Jul.. 2006.
[8] W. Campbell, D. Sturim, and D. Reynolds, "Support vector machines using GMM supervectors for speaker verification," IEEE Signal Process. Lett., vol. 13, no. 5, pp. 308-311, May 2006.
[9] Z. Karam and W. Campbell, "A new kernel for SVM MLLR based speaker recognition," in Proc. InterSpeech 2007, pp. 290-293, Antwerp, Belgium, Aug. 2007.
[10] N. Dehak, P. A. Torres, D. Reynolds, and R. Dehak, "Language recognition via i-vectors and dimensionality reduction," in Proc. InterSpeech 2011, pp. 857-860, 2011.
[11] D. Matrouf, F. Verdet, M. Rouvier, J. F. Bonastre, and G. Linarès, "Modeling nuisance variabilities with factor analysis for GMM - based audio pattern classification," Computer Speech and Language, vol. 25, no. 3, pp. 481-498, Jul. 2011.
[12] J. Farinas and F. Pellegrino, "Automatic rhythm modeling for language identification," in Proc. Eurospeech, vol. 4, pp. 2539-2542, 2001.
[13] R. Tong, M. Bin, D. Zhu, H. Li, and E. S. Chng, "Integrating acoustic, prosodic and phonotactic features for spoken language identification," in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, vol. 1, pp. 205-208, 14-19 May 2006.
[14] W. M. Campbell, E. Singer, P. A. Torres - Carrasquillo, and D. A. Reynolds, "Language recognition with support vector machines," in Proc. Odyssey: the Speaker and Language Recognition Workshop, pp. 41-44, 2004.
[15] F. Ghasemian and M. M. Homayounpour, "Towards better GMM-based acoustic modeling for spoken language identification," in Proc. ICEE, 4 pp., 2011.
[16] R. Tong, B. Ma, H. Li, and E. S. Chng, "Target-oriented phone tokenizers for spoken language recognition," in Proc. InterSpeech 2009, pp. 200-203, Apr. 2009.
[17] W. M. Campbell, "A covariance kernel for SVM language recognition," Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, pp. 4141-4144, Apr. 2008.
[18] Y. K. Muthusamy, R. A. Cole, and B. T. Oshika, "The OGI multi - language telephone speech corpus," in Proc. ICSLP 92, pp. 895-898, 1992.

A Multi-Objective Metaheuristic Approach for Improving Coverage and Connectivity in Wireless Sensor Networks
Print Date : 2026-05-12
Design a Novel Emotion Assessment Approach for Cancer Care Based on Large Language Models
Print Date : 2026-05-12
A Resource Management Method for Fog-DSDN Networks Using Microservices Architecture and Echo State Networks (ESN)
Print Date : 2026-05-12
Lightweight Hybrid Framework for IoT Security Using Optimized Random Forest and Adaptive Feature Selection in Edge-Cloud Architecture
Print Date : 2026-05-12
A Semi-Supervised Learning Framework for Accurate Test Case Classification Using Language Embeddings and Semantic Text Features
Print Date : 2026-05-12
An intelligent technique based on jellyfish algorithm for priority-based task scheduling in IoT/Fog networks
Print Date : 2026-05-12

Share To

Article Url

Improvement of GMM Model Using PSK for Spoken Language Recognition Systems