Sang H, Hai G. A Framework: Region-Frame-Attention-Compact Bilinear Pooling Layer Based S2VT For Video Description. EJAS [Internet]. 2019 Sep. 8 [cited 2026 Apr. 20];7(4):17-30. Available from: https://journals.scholarpublishing.org/index.php/AIVP/article/view/6717