Sang, Haifeng, and Ge Hai. “A Framework: Region-Frame-Attention-Compact Bilinear Pooling Layer Based S2VT For Video Description”. European Journal of Applied Sciences 7, no. 4 (September 8, 2019): 17-30. Accessed October 31, 2020. https://journals.scholarpublishing.org/index.php/AIVP/article/view/6717.