Engineering Analysis and Recognition of Nigerian English: An Insight into Low Resource Languages

Authors

  • Sulyman Adewale Yusuf Amuda Electrical & Electronics Engineering Department, University of Ilorin, P.M.B.1515, Ilorin
  • Hynek Boril Center for Robust Speech Systems, University of Texas, Dallas
  • Abhijeet Sangwan Center for Robust Speech Systems, University of Texas, Dallas
  • Tunji S. Ibiyemi Electrical & Electronics Engineering Department, University of Ilorin, Ilorin
  • John H. L. Hansen Center for Robust Speech Systems, University of Texas, Dallas

DOI:

https://doi.org/10.14738/tmlai.24.334

Keywords:

Nigerian English, Limited Resource Language, Automatic Speech Recognition (ASR)

Abstract

A comparative analysis between Nigerian English (NE) and American English (AE) is presented in this article. The study is aimed at highlighting differences in the speech parameters, and how they influence speech processing and automatic speech recognition (ASR). The UILSpeech corpus of Nigerian-Accented English isolated word recordings, read speech utterances, and video recordings is used as a reference for Nigerian English. The corpus captures the linguistic diversity of Nigeria with data collected from native speakers of Hausa, Igbo, and Yoruba languages. The UILSpeech corpus is intended to provide a unique opportunity for application and expansion of speech processing techniques to a limited resource language dialect. The acoustic-phonetic differences between American English (AE) and Nigerian English (NE) are studied in terms of pronunciation variations, vowel locations in the formant space, mean fundamental frequency, and phone model distances in the acoustic space, as well as through visual speech analysis of the speakers’ articulators. A strong impact of the AE–NE acoustic mismatch on ASR is observed. A combination of model adaptation and extension of the AE lexicon for newly established NE pronunciation variants is shown to substantially improve performance of the AE-trained ASR system in the new NE task. This study is a part of the pioneering efforts towards incorporating speech technology in Nigerian English and is intended to provide a development basis for other low resource language dialects and languages.

Author Biographies

Sulyman Adewale Yusuf Amuda, Electrical & Electronics Engineering Department, University of Ilorin, P.M.B.1515, Ilorin

Dr. S.A.Y. Amuda is a Fulbright Scholar, a researcher, and lecturer in the Electrical and Electronics Engineering Department, University of Ilorin, Nigeria. He earned his Ph.D, M.Eng. and B. Eng. degrees in Electrical and Electronics Engineering. He was a visiting researcher at the Center for Robust Speech System, University of Texas, Dallas, in 2009–2010 for nine months. He is a registered engineer in Nigeria (with COREN), member of the Nigerian Society of Engineers (NSE), and member IEEE.

 

Hynek Boril, Center for Robust Speech Systems, University of Texas, Dallas

Hynek Boril received M.S. degree in electrical engineering and Ph.D. degree in electrical engineering and information technology at the Department of Electrical Engineering, Czech Technical University in Prague, in 2003 and 2008, respectively. In August 2007, he joined the Center for Robust Speech Systems (CRSS), Eric Jonsson School of Engineering and Computer Science, University of Texas at Dallas, where he currently works as a Postdoctoral Research Associate. 

Abhijeet Sangwan, Center for Robust Speech Systems, University of Texas, Dallas

Abhijeet Sangwan earned his Bachelors degree in Electronics and Communication Engineering from Visveswaraiah Technological University (VTU), Bangalore, India, in 2002. He earned his Masters and Ph.D. degrees from Concordia University, Canada and The University of Texas at Dallas, U.S.A. in 2006 and 2009, respectively. During 2002-2003, he worked for MindTree Consulting where he designed and developed enterprise datawarehouse systems for Unilever. He interned with the Human Language Technologies Group at IBM's T.J. Watson Research Center, Yorktown Heights in 2008. From 2009, he has been a part of The Center for Robust Speech Systems (CRSS) at The University of Texas at Dallas where he is a Research Associate. His research interests include Automatic Speech Recognition (ASR), Automatic Accent Assessment, and Language Identification Systems.

Tunji S. Ibiyemi, Electrical & Electronics Engineering Department, University of Ilorin, Ilorin

Prof. T. S. Ibiyemi obtained his MSc and Ph.D at University of Bradford, a Chattered  Engineer (UK), registered engineer with Council for Regulation of Engineering in Nigeria (COREN). He is presently the Vice Chancellor of Achievers University, Nigeria. He was one time Head of Electrical Engineering, University of Ilorin, Provost Colleges of Science and Technology, Covenant University, Member advisory board member of Nigeria’s National Space Research and Development Agency and a serving member on Nigeria’s National Universities Commission accreditations team. His research interest is in biometric signal processing for personal identification and forensic applications.

John H. L. Hansen, Center for Robust Speech Systems, University of Texas, Dallas

John H.L. Hansen (IEEE: S'81-M'82-SM'93-F'07) received the B.S.E.E. degree with highest honors from Rutgers University, New Brunswick, N.J. in 1982, and the M.S. and Ph.D. degrees in Electrical Engineering from the Georgia Institute of Technology, Atlanta, Georgia, in 1983 and 1988, respectively. He is presently Associate Dean for Research at Erik Jonsson School of Engineering and Computer Science, University of Texas at Dallas. He established the Center for Robust Speech Systems (CRSS) at UTDallas, which is focused on interdisciplinary research in speech processing, hearing sciences, and language technologies. He is a Fellow of IEEE, He has served as a technical consultant to different reputable industries and the U.S. government, He is currently serving as Past TC-Chair and Member of the IEEE Signal Processing Society. His research focused on interdisciplinary research in speech processing, hearing sciences, and language technologies.

References

U. Gut and J.-T. Milde, “The prosody of Nigerian English,” in SP-2002, 2002, pp. 367–370.

C. T. Hodge, “Yoruba: Basic course,” ED – 010 – 462 Report NDEA – VI – 375, US Foreign Service Institute, 1963.

A. A. Fakoya, Nigerian English: A Morpholecta Classification, Ph.D. thesis, Lagos State University, 2007.

S. Amuda, Boril, H., Sangwan, A. and Hansen, J. H. L. (2010). “Limited Resource Speech Recognition for Nigerian English.” Proc. of IEEE ICASSP’10, 5090-5093.

M. Jibril, “Phonological Variation in Nigerian English”, Ph.D Thesis at University of Lancaster 1986

T. T. Ajani “Is There Indeed A ‘Nigerian English’?” Journal of Humanities & Social Sciences, 1(1), 2007.

T. Ufomata “Setting Priorities in Teaching English Pronunciation in ESL Contexts”, Seminar presentation as a British Academy Visiting Fellow at University College London, 1996.

A. Bamgbose, “Language in Contact: Yoruba and English in Nigeria”, Education and Development, 2(1), pp. 329-341, 1982.

W. Voiers, I. Dynastat, and T. Austin, “Diagnostic Acceptability Measure for Speech Communication System,” in Proc. of IEEE ICASSP, vol. 2, pp. 204–207, 1977.

M. A. Koler, “A Comparison of the New 2400 bps MELP Federal Standard with other Standard Coders,” in Proc. of IEEE ICASSP, 1997.

L. M, Arslan and J. H. L. Hansen, “Language Accent Classification in American English”, Speech Communication, vol. 18, pp. 353-367, ELSEVIER, 1996.

L. M, Arslan and J. H. L. Hansen, “A Study of Temporal Features Frequency Characteristics in American English Foreign Accent”, Journal of Acoustical Society of America, vol. 201(1), pp. 28-40, July, 1997.

J. S. Garofolo, L. F. Lamel, J. G. Fisher,W.M. andFiscus, D. S. Pallett, N. L. Dahlgren, and V. Zue, TIMIT Acoustic-Phonetic Continuous Speech Corpus, LDC93S1, 1993.

J.-L. Gauvain and Chin-Hui Lee, “Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains,” IEEE Transactions on Speech & Audio Processing, 2(2), pp. 291–298, 1994.

S. B. Davis and P. Mermelstein, “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences,” IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), pp. 357–366, 1980.

K. Sjolander and J. Beskow, “WaveSurfer – An Open Source Speech Tool,” in Proc. of ICSLP‘00, Beijing, China, 2000, vol. 4, pp. 464–467.

R. D. Kent and C. Read, The Acoustic Analysis of Speech, Whurr Publishers, San Diego, 1992.

J. Silva and S. Narayanan, “Average Divergence Distance as a Statistical Discrimination Measure for Hidden Markov Models,” IEEE Transactions on Audio, Speech, and Language Processing, 14(3), pp. 890–906, 2006.

J. H. L. Hansen, “Analysis and Compensation of Speech Under Stress and Noise for Environmental Robustness in Speech Recognition,” Speech Communication, 20(1-2), pp. 151–173, 1996.

J. H. L. Hansen, E. Ruzanski, H. Boril, J. Meyerhoff, “TEO-Based Speaker Stress Assessment Using Hybrid Classification and Tracking Schemes,” International Journal of Speech Technology, Springer, June 2012, DOI 10.1007/s10772-012-9165-1.

T. Hasan, H. Boril, A. Sangwan, J. H. L. Hansen, “Multi-Modal Highlight Generation for Sports Videos Using an Information-Theoretic Excitability Measure,” EURASIP Journal on Advances in Signal Processing, 2013:173, 2013.

H. Boril, J. H. L. Hansen, “Unsupervised Equalization of Lombard Effect for Speech Recognition in Noisy Adverse Environments,” IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1379-1393, 2010.

H. Boril, Q. Zhang, A. Ziaei, J. H. L. Hansen, D. Xu, J. Gilkerson, J. A. Richards, Y. Zhang, X. Xu, H. Mao, L. Xiao, F. Jiang, “Automatic Assessment of Language Background in Toddlers Through Phonotactic and Pitch Pattern Modeling of Short Vocalizations,” accepted to Workshop on Child Computer Interaction (WOCCI), September, Singapore, 2014.

M. Mehrabani, H. Boril, J. H. L. Hansen, “Dialect Distance Assessment Method Based on Comparison of Pitch Pattern Statistical Models,” in Proc. of IEEE ICASSP'10, 5158-5161, Dallas, TX, 2010.

Link: http://www.avs4you.com (accessed on Aug 20, 2014).

Downloads

Published

2014-08-28

How to Cite

Amuda, S. A. Y., Boril, H., Sangwan, A., Ibiyemi, T. S., & Hansen, J. H. L. (2014). Engineering Analysis and Recognition of Nigerian English: An Insight into Low Resource Languages. Transactions on Engineering and Computing Sciences, 2(4), 115–128. https://doi.org/10.14738/tmlai.24.334