Feb 112013
 

I wanted to develop a formula that could suggest how to split the videos in order to minimize its transcoding time when transcoding on several machines. For this I needed some numbers.

I found a movie on Vimeo that is longer than 1 hour and downloaded it. Below you can find its parameters (reported by ffprobe):

  • Duration: 01:08:34.00, bitrate: 2118 kb/s
  • Video: h264 (High) (avc1), 1280x720, 29.97 fps
  • Audio: aac (mp4a), 44100 Hz, stereo, 157 kb/s

I have decided to split the whole movie into segments of roughly 60 seconds length and then split the movie into segments of roughly 120 seconds length. Afterwards I have performed two sets of operations on each set of video segments: transcoded the video segments of original picture size, and rescaled video to a smaller size (downscaled to 853x480) and then transcoded.

I've measured how much time does it take to split and join videos and realized that it does not really differ from a simple file copy. It suggests that the bottleneck of splitting and joining most probably is a hard drive, that is why from now on I will treat these operations (split and join) to be the same as a file copy operation (some numbers: splitting 1.1GB size movie took 67 seconds and copying the same movie took 68 seconds).

When I split the movie I've noticed that file size of each segment differed quite significantly. This is natural, since the level of video compression depends on the complexity of the video: the more moving and drastically changing parts in the video picture, the more it will take space on your disk. I wondered if video transcoding time depends on the file size of the segment. For this I've plotted these graphs:

Transcoding times of video segments (1 min length)

Transcoding times of video segments (1 min length)

Transcoding times of video segments (2 min length)

Transcoding times of video segments (2 min length)

From these graphs it is quite clear that video transcoding time does not really depend from segment file size. The major factors are video length (s) and picture size (number of pixels). Of course there are other factors too, including scene complexity, frame similarity, but I will consider only the former two. Now when we have a little understanding on what transcoding time depends, we can try to build a formula that would tell us what is an optimal number of segments for transcoding a movie in distributed fashion. Let's assume that every worker node has the same computation capabilities, then we can visualise transcoding timeline like this:

Distributed transcoding timeline

Distributed transcoding timeline

From here we can see that total transcoding time is:

 T_{total} = T_{split} + k \cdot T_{net} + T_{c} + T_{join}

Here:

  • T_{split} is time that it takes to split the video into segments,
  • T_{net} is the time that it takes to send one segment over the network,
  • T_{c} is the time it takes to transcode one segment,
  • T_{join} is the time it takes to join all transcoded segments back into a single video, and
  • k is the number of segments.

If we add more variables and/or constants:

  • s_o an original video file size,
  • l the length of video (seconds),
  • c_p disk copying speed (B/s),
  • c_{net} transferring through network speed (B/s),
  • s_t transcoded file size (B),
  • c_c transcoding speed,

then we can expand the previous function like this:

 T_{split} = s_o \cdot c_p

 T_{net} = \frac{s_o}{k} \cdot c_{net}

 T_{c} = \frac{l}{k} \cdot c_c

 T_{join} = s_t \cdot c_p

 T_{total} = s_o c_p + k \frac{s_o}{k} c_{net} + \frac{l}{k} c_c + s_t c_p=

 =s_o c_p + s_o c_{net} + \frac{l}{k} c_c + s_t c_p

Now, let's get a derivative of this function:

 \frac{\partial}{\partial k} T_{total} = -\frac{l \cdot c_c}{k^2}

 -\frac{l\cdot c_c}{k^2} = 0 \Leftrightarrow

\Leftrightarrow k = \sqrt{- \frac{l\cdot c_c}{2}}

Since neither c_c nor l can be a negative value, it means that it is not possible to get an optimal k value. In other words, the bigger k , the better.

As it is visualised in the transcoding timeline (see above), master does not perform any transcoding, so all the video has to be transferred over the network. There was also an assumption made that a transcoding process is long enough so that the time it takes to transfer transcoded video over the network does not influence overall transcoding time.

Can we predict what will be the file size of a transcoded video assuming that transcoded video will almost always be encoded with H.264 and AAC? I plotted the data I have collected from couple of video transcoding sessions and derived linear functions that could be used for a file size prediction:

Video (H.264/AAC) file size dependency on video length

Video (H.264/AAC) file size dependency on video length

 f_{360p}(x) = 4808.2x + 77.136

 f_{480p}(x) = 7252.3x + 52.495

f_{720p}(x) =16883x + 351.29

Here x is the length of a video in minutes. Of course, we can't predict the outcome size very accurately, however, a clear linear tendency is visible in the graph, which means that we can guess what would be the transcoded video size on avarage.

Final thoughts

The formula I was trying to find did not gave any reason why it should be used. Then what should I do with it? I will have to measure what is the overhead of a master process, if it is not that big it will probably be possible to put a worker process on the same machine, thus saving the time of a network transfer. In such case the formula will look a bit more useful.

I will have to run some experiments later and compare the results to a video transcoding on one machine and to a theoretical speed-up (Amdahl's law):

 Speed-up \le \frac{1}{F + \frac{1 - F}{N}}

Here F - fraction of the calculation that must be executed serially, N - number of processors.

 Leave a Reply

(required)

(required)

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>