1.
Sang H, Hai G. A Framework: Region-Frame-Attention-Compact Bilinear Pooling Layer Based S2VT For Video Description. EJAS [Internet]. 2019Sep.8 [cited 2024Nov.22];7(4):17-30. Available from: https://journals.scholarpublishing.org/index.php/AIVP/article/view/6717