کدگذاري گفتار با استفاده از پيش‌بيني غير خطي بر پايه بسط سري‌هاي ولترا

محورهای موضوعی : مهندسی برق و کامپیوتر

1 - دانشگاه شهید بهشتی
2 - دانشگاه شهید بهشتی

تاریخ دریافت : 1384/07/10 تاریخ پذیرش : 1385/03/12 تاریخ انتشار : 1386/01/01

کلید واژه: بسط سري‌هاي ولتراپيش‌بيني غير خطيپيش‌بيني وفقي پسروپيش‌بيني وفقي پيشروکدگذاري گفتارکمترين مربع‌هاکمترين ميانگين مربع‌ها,

چکیده مقاله :

در سال‌هاي اخير به منظور کاهش بيشتر نرخ بيت و از آنجا پهناي باند توجه روزافزوني به استفاده از مدل‌ها و تکنيک‌هاي غير خطي پيش‌بيني در کدگذاري گفتار شده است. معمولاً شبکه‌هاي عصبي براي اين هدف به کار مي‌روند که منجر به تا dB3 کاهش بيشتر در انرژي سيگنال تحريک مي‌شوند. پيش‌بيني غير خطي همچنين مي‌تواند بر پايه ‌بسط سري‌هاي ولترا انجام گيرد که در آن براي سادگي معمولاً بسط به بخش‌هاي نخست و دوم محدود مي‌شود (پيش‌بيني درجه دو). مطالعات اوليه نشان دادند که در مقايسه با شبکه‌هاي عصبي استفاده از فيلترهاي ولترا منجر به يک کاهش بسيار بيشتر در انرژي سيگنال تحريک مي‌شود (6 تا dB10). با اين وجود به دليل ناپايداري اين کاهش نمي‌تواند منجر به کاهش نرخ بيت يا بهبود نسبت سيگنال به نويز شود. اين ناپايداري در دکدکننده به دليل وجود خطاي محاسباتي (براي نمونه ناشي از چندي‌کردن سيگنال تحريک) و حساسيت بالاي محاسبات به اين خطا ايجاد مي‌شود. در کار اصيلي که در اينجا ارائه مي‌شود ناپايداري در کدک در هر دو نوع پيش‌بيني پيشرو و پسرو به ترتيب با استفاده از الگوريتم‌هاي کمترين مربع‌ها (LS) و کمترين ميانگين مربع‌هاي (LMS) سيگنال خطا بررسي مي‌شود. نشان داده مي‌شود که پايداري در عوض فداکردن بخش عمده‌اي از صرفه‌جويي به دست آمده در انرژي سيگنال تحريک به دست مي‌آيد به گونه‌اي که سطح کاهش نهايي اغلب همانند شبکه‌هاي عصبي مي‌باشد. در پيش‌بيني پيشرو پس از پايدارسازي و با وجود اندکي افزايش در پيچيدگي عملياتي در 20 تا 45٪ قالب‌ها افزودن بخش درجه دو سودمند خواهد بود. بر اين اساس الگوريتمي توسعه مي‌يابد که پيش‌بيني غير خطي تنها بر روي اين قالب‌ها انجام گيرد. اين الگوريتم باعث بهبود تا dB4 در نسبت سيگنال به نويز نهايي مي‌شود. پيش‌بيني غير خطي پسرو متوالي با وجودي که از نقطه نظر پياده‌سازي بسيار مناسب‌تر است در مقايسه با پيش‌بيني خطي کارايي بهتري را نتيجه نمي‌دهد.

چکیده انگلیسی:

In recent years there has been a growing interest to employ non-linear predictive techniques and models in speech coding to further reduce bit-rate and therefore channel bandwidth. Usually neural nets are used for this purpose that result in an additional up to 3dB reduction in the excitation signal energy. Non-linear prediction can also be performed based on Volterra series expansion wherein the expansion is usually limited to first and second terms, for simplicity (quadratic prediction). Early studies have shown that employing Volterra filters results in a much higher reduction in excitation signal energy (6 to 10 dB), as compared with neural nets. But, because of instability, this reduction can not be materialized in terms of bit-rate reduction or signal to noise improvement. This instability in the decoder is triggered by computational errors (i.e. due to quantization of the excitation signal) and high sensitivity of algorithms to these errors. In the original work, presented here, the instability in the codec is studied in both forward and backward prediction schemes using LS and LMS algorithms respectively. It is shown that stability can be obtained at the cost of losing most of saving in excitation signal energy where final reduction level is as much as for neural nets. With forward prediction, after stabilizing, in spite of a small increasing in the operational complexity for 20 to 45% of frames including the quadratic term will be beneficial. So a scheme is developed to perform non-linear prediction only on these frames. This algorithm results in an improvement of up to 4 dB in final signal to noise ratio. Sequential backward quadrant prediction, although much more interesting from implementation point of view, does not lead to an appreciable better performance over linear prediction.

منابع و مأخذ:

[1] N. S. Jayant and P. Noll, Digital Coding of Waveforms, Prentice-Hall Inc., 1984.
[2] J. R. Deller, J. G. Proakis, and J. H. L. Hansen, Discrete-Time Processing of Speech Signals, Macmillan Publication Company, New York, 1993.
[3] A. A. Beex and J. R. Zeidler, "Non-linear effects in adaptive linear prediction," in 4th IASTED Int. Conf. on Signal and Image Process. (SIP), Kaua'i, Hawaii, pp. 21-26, Aug. 2002.
[4] H. M. Teager, "Some observations on oral flow vocalization," IEEE Trans. ASSP, vol. 28, no. 5, pp. 559-601, 1980.
[5] M. Faúndez, F. Vallverdú, and E. Monte, "Nonlinear prediction with neural nets in ADPCM," in Proc. IEEE ICASSP, vol. 1, pp. 345-348,Seattle, US, 1998.
[6] M. Faúndez-Zanuy and O. Oliva, "ADPCM with nonlinear prediction," in Signal Process. IX: Theories and Applications, EUSIPCO, Rodas, Greece, pp. 1205-1208, 1998.
[7] N. Tishby, "A dynamical systems approach to speech processing," in Proc. ICASSP, pp. 365-368, 1990.
[8] B. Townshend, "Non-linear prediction of speech," in Proc. ICASSP,pp. 425-428, 1991.
[9] G. D'Alessandro, M. Faundez Zanuy, and F. Piazza,"A new sub-band non-linear prediction coding algorithm for narrowband speech signal -the NADPCMB-MLT coding scheme," in Proc. ICASSP, vol. 1,pp. 1025-1028, 2002.
[10] V. J. Mathews and G. L. Sicuranza, Polynomial signal processing, Wiley, New York, 2000.
[11] E. Mumolo and D. Francescato, "Adaptive predictive coding of speech by means of Volterra predictors," in Proc. IEEE Winter Workshop on Nonlinear Digital Signal Process., Tampere, Finland,pp. 2.1.4.1-2.1.4.4, Jul. 1993.
[12] J. Thyssen, H. Nielsen, and S. D. Hansen, "Non-linear short-term prediction in speech coding," in IEEE Proc. ICASSP, Autralia, pp.185-188, Apr. 1994.
[13] J. D. Markel and A. H. Gray, Linear Prediction of Speech, Springer- Verlag, New York, 1976.
[14] S. Haykin, Adaptive Filter Theory, Prentice-Hall, New Jersey, 1991.
[15] G. L. Sicuranza, "Quadratic filters for signal processing," in Proceeding of IEEE, vol. 80, no. 8, pp.1263-1285, Aug. 1992.
[16] G. Lindfield and J. Penny, Numerical Methods Using MATLAB,2nd ed. Prentice-Hall, New Jersey, 1994.
[17] M. Reuter, K. Quirk, J. Zeidler, and L. Milstein, "Non linear effects in LMS adaptive filters," in Proc. of Symposium 20S00 on Adaptive Systems for Signal Process., Comm. and Control Symp., pp. 141-146,Lake Louise, Canada, Oct. 2000.

مقالات مرتبط

تشخيص تغييرات صحنه به روش زمينه‏ گيری هوشمند
تاریخ چاپ : 1382/01/01
تخمين سرعت موتور القايي تكفاز و بهينه‎سازي گشتاور آن بدون استفاده از حسگر مكانيكي
تاریخ چاپ : 1382/03/31
طراحی بهينة موتور القائی سه فاز قفس سنجابی برای خودروی برقی
تاریخ چاپ : 1382/03/31
روشي نو در طراحي و ساخت سنكرونايزر الكترونيكي براساس قفل كردن فاز (PLL) جهت موازي كردن سريع ديزل‏ژنراتور‏ها
تاریخ چاپ : 1382/03/31
يك شيوه مداري جديد جهت حفاظت تريستورهاي قدرت سري
تاریخ چاپ : 1382/03/31
همكاري در سيستمهاي چند عامله با استفاده از اتوماتاهاي يادگير
تاریخ چاپ : 1382/03/31

اشتراک گذاری

آدرس مقاله

کدگذاري گفتار با استفاده از پيش‌بيني غير خطي بر پايه بسط سري‌هاي ولترا