Sang, H., & Hai, G. (2019). A Framework: Region-Frame-Attention-Compact Bilinear Pooling Layer Based S2VT For Video Description. European Journal of Applied Sciences, 7(4), 17–30. https://doi.org/10.14738/aivp.74.6717