Building An Automatic Speech Recognition System for Home Automation
DOI:
https://doi.org/10.14738/tmlai.54.3190Keywords:
Speech recognition, acoustic model, language model, HMM, n-gram, domotics, Kaldi.Abstract
This paper presents a study on automatic speech recognition (ASR) systems applied to home automation. So a detailed study of the architecture of speech recognition systems was carried out. The objective is to select a speech recognition software that must operate in remote speech conditions and in a noisy environment. The proposed system is using an ASR toolkit called Kaldi, which must communicate as an open platform communication (OPC) client developed in C++, with any home automation system. The latter behaves like an OPC server.
References
(1) Allauzen A. et Gauvain J.-L., Construction automatique du vocabulaire d’un système de transcription, dans Journées d’Étude sur le Parole (JEP)
(2) Michel Vacher. Analyse sonore et multimodale dans le domaine de l'assistance à domicile. Intelligence artificielle [cs.AI]. Université de Grenoble, 2011.
(3) Richard Dufour. Transcription automatique de la parole spontanée. Informatique [cs]. Université du Maine, 2010.
(4) Mohamed Bouallegue. L'analyse factorielle pour la modélisation acoustique des systèmes dereconnaissance de la parole. Autre [cs.OH]. Université d'Avignon, 2013.
(5) Insect sound recognition based on mfcc and pnn. In Multimedia and Signal Processing (CMSP), 2011 International Conference IEEE
(6) Fethi Bougares. Attelage de systèmes de transcription automatique de la parole. Ordinateur et societé [cs.CY]. Université du Maine, 2012.
(7) Panagiota Karanasou. Phonemic variability and confusability in pronunciation modeling forautomatic speech recognition. Other [cs.OH]. Université Paris Sud - Paris XI, 2013.
(8) Ngoc-Tien Le, Christophe Servan, Benjamin Lecouteux, Laurent Besacier. Better Evaluatio of ASR in Speech Translation Context Using Word Embeddings. Interspeech 2016.
(9) AMAN F., VACHER M., PORTET F., DUCLOT W. & LECOUTEUX B. (2016). CirdoX : an On/Off-line Multisource Speech and Sound Analysis Software. In LREC 2016.
(10) Madikeri, S., Dey, S., Motlicek, P., & Ferras, M. (2016). Implementation of the standard i-vector system for the kaldi speech recognition toolkit (No. EPFL-REPORT-223041). Idiap.
(11) Gaida, C., Lange, P., Petrick, R., Proba, P., Malatawy, A., & Suendermann-Oeft, D. (2014). Comparing open-source speech recognition toolkits. Tech. Rep., DHBW Stuttgart.
(12) The Kaldi Speech Recognition Toolkit, Povey Daniel, Ghoshal Arnab, Boulianne, GillesBurget, LukasGlembek, OndrejGoel, Nagendra, Hannemann, MirkoMotlicek Petr, Qian Yanmin, Schwarz Petr, Silovsky Jan, Stemmer Georg and Vesely
Karel, Idiap-RR-04-2012
(13) Povey, D., Hannemann, M., Boulianne, G., Burget, L., Ghoshal, A., Janda, M., ... & Riedhammer, K. (2012, March). Generating exact lattices in the WFST framework. In Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference. IEEE.
(14) C. Allauzen, M. Riley, J. Schalkwyk, W. Skut, and M. Mohri, “OpenFst:a general and efficient weighted finite-state transducer library,” in Proc.CIAA, 2007.
(15) Olivier Passalacqua, Eric Benoit, Marc-Philippe Huget, Patrice Moreaux. INTEGRATINGOPC DATA INTO GSN INFRASTRUCTURES. IADIS International Conference APPLIEDCOMPUTING 2008
(16) Zheng, L., & Nakagawa, H. (2002, August). OPC (OLE for process control) specification and its developments. In SICE 2002. Proceedings of the 41st SICE Annual Conference (Vol. 2,
pp. 917-920). IEEE.
(17) Topalis, E., Orphanos, G., Koubias, S., & Papadopoulos, G. (2000). A generic network management architecture targeted to support home automation networks and home internet connectivity. IEEE Transactions on Consumer Electronics, 46(1), 44-51.