Blind Two-Channel Speech Source Separation Based on Localization
Subject Areas : electrical and computer engineeringHassan Alisufi 1 , M. Khademi 2 * , Abbas Ebrahimi moghadam 3
1 - Ferdosi University
2 - Ferdowsi University of Mashhad
3 - Ferdosi University
Keywords: Angular spectrogram, generalized cross correlation, blind speech separation,
Abstract :
This paper presents a new method for blind two-channel speech sources separation without the need for prior knowledge about speech sources. In the proposed method, by weighting the mixture signal spectrum based on the location of the speech sources in terms of distance to the microphone, the speech sources are separated. Therefore, by forming an angular spectrum by generalized cross-correlation function, the speech sources in the mixture signal are localized. First, by creating an angular spectrogram by generalized cross-correlation function, the speech sources in the mixture signal are localized. Then according to the location of the sources, the amplitude of the mixture signal spectrum is weighted. By multiplying the weighted spectrum by the values obtained from the angular spectrograms, a binary mask is constructed for each source. By applying the binary mask to the amplitude of the mixture signal spectrum, the speech sources are separated. This method is evaluated on SiSEC database and the measurement tools and criteria contained in this database are used for evaluation. The results show that the proposed method is comparable in terms of the criteria available in the database to the competing ones, has lower computational complexity.
[1] S. Haykin and Z. Chen, "The cocktail party problem," Neural Comput., vol. 17, no. 9, pp. 1875-1902, Sept. 2005.
[2] K. Itakura, et al., "Bayesian multichannel audio source separation based on integrated source and spatial models," IEEE/ACM Trans. Audio Speech Lang. Process., vol. 26, no. 4, pp. 831-846, Apr. 2018.
[3] Y. Xie, K. Xie, Z. Wu, and S. Xie, "Underdetermined blind source separation of speech mixtures based on K-means clustering," in Proc. Chinese Control Conf., CCC'19, pp. 42-46, Guangzhou, China, 27-30 Jul. 2019.
[4] M. S. Brandstein and H. F. Silverman, "A robust method for speech signal time-delay estimation in reverberant rooms," in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP'97, vol. 1, pp. 375-378, Munich, Germany, 21-24 Apr. 1997.
[5] Z. Ding, W. Li, and Q. Liao, "Dual-channel speech separation by sub-segmental directional statistics," in Proc. Int. Conf. on Wireless Communications, Signal Processing and Networking, WiSPNET'16, pp. 2287-2291, Chennai, India, 23-35 Mar. 2016.
[6] X. Li, Z. Ding, W. Li, and Q. Liao, "Dual-channel cosine function based ITD estimation for robust speech separation," Sensors, vol. 17, no. 6, Article No.: 1447, 13 pp. 2017.
[7] T. Maitheen and M. S. Lekshmi, "Enhancement of DUET blind source separation using wavelet," International Research Journal of Engineering and Technology, vol. 4, no. 5, pp. 3551-3553, May 2017.
[8] X. Zhang and D. Wang, "Binaural reverberant speech separation based on deep neural networks," in Proc. Interspeech, vol. pp. 2018-2022, Stockholm, Sweden, 20-24 Aug. 2017.
[9] S. U. N. Wood, et al., "Blind speech separation and enhancement with GCC-NMF," IEEE/ACM Trans. Audio, Speech Lang. Process., vol. 25, no. 4, pp. 745-755, Apr. 2017.
[10] Y. Yu, W. Wang, J. Luo, and P. Feng, "Localization based stereo speech separation using deep networks," in Proc. IEEE Int. Conf. Digit. Signal Process, pp. 153-157, Singapore, Singapore, 21-24 Jul. 2015.
[11] S. U. N. Wood and J. Rouat, "Unsupervised low latency speech enhancement with RT-GCC-NMF," IEEE J. Sel. Top. Signal Process., vol. 13, no. 2, pp. 332-346, May 2019.
[12] C. Knapp and G. Carter, "The generalized correlation method for estimation of time delay," IEEE Trans. Acoust., vol. 24, no. 4, pp. 320-327, Aug. 1976.
[13] M. A. J. Sathya and S. P. Victor, Noise Reduction Techniques and Algorithms for Speech Signal Processing, .
[14] A. P. Klapuri, "Multipitch estimation and sound separation by the spectral smoothness principle," in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP'01, vol. 5, pp. 3381-3384, Salt Lake City, UT, USA, 7-11 May 2001.
[15] C. Blandin, A. Ozerov, and E. Vincent, "Multi-source TDOA estimation in reverberant audio using angular spectra and clustering," Signal Processing, vol. 92, no. 8, pp. 1950-1960, Aug. 2012.
[16] F. Nesta, M. Omologo, and P. Svaizer, "A novel robust solution to the permutation problem based on a joint multiple TDOA estimation," in Proc. IWAENC, 4 pp., Seattle, WA, USA, 14-17 Sept. 2008.
[17] B. Loesch and B. Yang, "Blind source separation based on time-frequency sparseness in the presence of spatial aliasing," in Proc. 9th Int Conf. on Latent Variable Analysis and Signal Separation, 8 pp., St. Malo, France, 27-30 Sept. 2010.
[18] N. Madhu, C. Breithaupt, and R. Martin, "Temporal smoothing of spectral masks in the cepstral domain for speech separation," in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP'08, vol. 1, pp. 45-48, Las Vegas, NV, USA, 30 Mar- 4 Apr. 2008.
[19] [Online]. Available: www.itu.com
[20] [Online]. Available: https://sisec.wiki.irisa.fr.
[21] C. Fevotte, R. Gribonval, and E. Vincent, BSS_EVAL Toolbox User Guide--Revision 2.0, 2005.
[22] A. Liutkus, et al., "The 2016 signal separation evaluation campaign," in Proc. Int. Conf. on Latent Variable Analysis and Signal Separation, pp. 323-332, Grenoble, France, Feb. 2017.