It will depend on how you want to solve it, if only in this punctual or following the standard. The ideal is to follow a standard. According to WCAG accessibility rules, there should be a voice over that explains everything that happens, even when there are silences. If the video is a complement to the text of the web, it should appear indicating it. I recommend you look here
$1 in point 1.2 explains it more deeply.