Split and Concatenate Videos with FFmpeg: It's Trivial!

The main idea of my future amazing system is to split video file into pieces, send them to workers in order to re-encode these pieces and then concatenate them back into a single video file. What tool will come to your mind for completing such task? For me it's FFmpeg. It's an astonishing tool for decoding, encoding, resizing and performing other manipulations on video files. You may cut and concatenate videos as well! How cool is that?

The ffmpeg that comes with Ubuntu is actually avconv. Since I wanted the true version of FFmpeg, I've first downloaded the source code:

git clone git://source.ffmpeg.org/ffmpeg.git ffmpeg

Then I've installed couple of dev pacakges in order to enable couple of FFmpeg features:

sudo apt-get install yasm libfaac-dev libfaad-dev libx264-dev \
    libxvidcore-dev libmp3lame-dev libtheora-dev libopenjpeg-dev

Later I've enabled all these features and enabled debug information through configure script:

./configure --enable-shared --enable-gpl --enable-nonfree \
    --enable-libfaac --enable-libx264 --enable-libxvid \
    --enable-libmp3lame --enable-libtheora --enable-libopenjpeg \
    --disable-stripping --enable-debug=3 --extra-cflags="-gstabs+" \
    --disable-optimizations

Finally, I've made the last step in order to build everything:

make

According Y. Sambe et al. work "High-speed distributed video transcoding for multiple rates and formats", a good result can be achieved when you split the video in between the GOPs. This makes sense, since every GOP should start with an i-frame (the frame that contains all the information, not just differences between frames). But in video files Decode Time Stamp (DTS) and Playback Time Stamp (PTS) may differ which may introduce some problems. Authors state that this may lead into a situation where despite i-frame being first in the GOP it may not be the one that will be played first. They call such GOP an Open-GOP. To me it seems a bit strange. I haven't confirmed such thing yet, but it doesn't make sense to play an i-frame after the b-frame. Authors continues that because of existence of Open-GOP and several other reasons, it is good to split videos in such a way that every piece (except the first one) would have one additional GOP in the beginning (which is the last GOP of the previous piece). They did some tests and somehow showed that it does have a slight effect on the resulting video quality after transcoding process.

For testing purposes, let's try splitting videos so that every piece would contain complete GOPs. For this, we need to know how big the GOP of the video is. There is a tool called ffprobe that shows various information about streams in a video container, but to my disappointment this tool cannot show the GOP size. In order to make it show this information, I needed to add a single line of code to ffprobe.c:

static void show_stream(WriterContext *w, AVFormatContext *fmt_ctx, int stream_idx)
{
    ...
        case AVMEDIA_TYPE_VIDEO:
            print_int("width",        dec_ctx->width);
            print_int("height",       dec_ctx->height);
            print_int("has_b_frames", dec_ctx->has_b_frames);
            print_int("gop_size",     dec_ctx->gop_size); // A single line is all I need...

After recompiling and then launching ffprobe, I've learned the details about my video clip:

Duration: 00:01:00.00
Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1920x818, 1239 kb/s, 24 fps
Audio: aac (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 127 kb/s
 
[STREAM]
index=0
codec_name=h264
codec_long_name=H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10
profile=High
codec_type=video
codec_time_base=1/48
codec_tag_string=avc1
width=1920
height=818
has_b_frames=0
gop_size=12
...
[/STREAM]

Good, so now I know that there should be exactly two i-frames per second. This means that it should be possible to nicely split video into pieces of 2 seconds length. In order to test this little theory, I wrote a small python script that would generate me an ffmpeg command for splitting the video:

import sys
 
if __name__=="__main__":
	s="ffmpeg -i video.mp4 \n"
	for i in range(0,60,2):
		s+="-vcodec copy -acodec copy -ss 00:00:"+str(i).zfill(2)
		s+=" -t 00:00:02 out"+str(i)+".mp4 "
		if i

And also a script that concatenates these pieces back into a single video file:

import os
 
if __name__=="__main__":
	f=open("list.tmp","w")
	for i in range(0,60,2):
		f.write("file 'out"+str(i)+".mp4'\n")
	f.close()
 
	os.system("ffmpeg -f concat -i list.tmp -c copy joined.mp4")
	os.remove("list.tmp")

I've uploaded the resulting video (you can also see an original video) to YouTube:

As you can see, this method of splitting and concatenating greatly reduces the quality at splitting points (every 2 seconds). There is no degradation of quality in audio, despite this, such level of quality is unacceptable for production use.

In fact, after the video was split I was expecting that the duration of video file pieces would be exactly 2 seconds. Instead it turned out to be like this:

for e in $(ls out*.mp4 | sort -V); do echo -n $e; ffprobe $e 2>&1 | grep Duration; done;
out0.mp4  Duration: 00:00:02.02, start: 0.000000, bitrate: 1250 kb/s
out2.mp4  Duration: 00:00:02.00, start: 0.020000, bitrate: 1838 kb/s
out4.mp4  Duration: 00:00:02.00, start: 0.017007, bitrate: 1871 kb/s
out6.mp4  Duration: 00:00:02.00, start: 0.012993, bitrate: 1179 kb/s
out8.mp4  Duration: 00:00:02.00, start: 0.010000, bitrate: 1719 kb/s
out10.mp4  Duration: 00:00:02.00, start: 0.008005, bitrate: 1217 kb/s
out12.mp4  Duration: 00:00:02.00, start: 0.005011, bitrate: 1336 kb/s
out14.mp4  Duration: 00:00:02.02, start: 0.000998, bitrate: 1329 kb/s
out16.mp4  Duration: 00:00:02.00, start: 0.020998, bitrate: 1366 kb/s
out18.mp4  Duration: 00:00:02.00, start: 0.019002, bitrate: 1421 kb/s
out20.mp4  Duration: 00:00:02.00, start: 0.016009, bitrate: 1136 kb/s
out22.mp4  Duration: 00:00:02.00, start: 0.011995, bitrate: 418 kb/s
out24.mp4  Duration: 00:00:02.00, start: 0.010000, bitrate: 411 kb/s
out26.mp4  Duration: 00:00:02.00, start: 0.007007, bitrate: 486 kb/s
out28.mp4  Duration: 00:00:02.00, start: 0.002993, bitrate: 598 kb/s
out30.mp4  Duration: 00:00:02.02, start: 0.000000, bitrate: 649 kb/s
out32.mp4  Duration: 00:00:02.00, start: 0.020000, bitrate: 776 kb/s
out34.mp4  Duration: 00:00:02.00, start: 0.018005, bitrate: 331 kb/s
out36.mp4  Duration: 00:00:02.00, start: 0.015011, bitrate: 322 kb/s
out38.mp4  Duration: 00:00:02.00, start: 0.010000, bitrate: 281 kb/s
out40.mp4  Duration: 00:00:02.00, start: 0.008005, bitrate: 137 kb/s
out42.mp4  Duration: 00:00:02.00, start: 0.005011, bitrate: 196 kb/s
out44.mp4  Duration: 00:00:02.02, start: 0.000998, bitrate: 350 kb/s
out46.mp4  Duration: 00:00:02.00, start: 0.020998, bitrate: 455 kb/s
out48.mp4  Duration: 00:00:02.00, start: 0.019002, bitrate: 1176 kb/s
out50.mp4  Duration: 00:00:02.00, start: 0.016009, bitrate: 1230 kb/s
out52.mp4  Duration: 00:00:02.00, start: 0.011995, bitrate: 817 kb/s
out54.mp4  Duration: 00:00:02.00, start: 0.010000, bitrate: 744 kb/s
out56.mp4  Duration: 00:00:02.00, start: 0.007007, bitrate: 729 kb/s
out58.mp4  Duration: 00:00:02.00, start: 0.002993, bitrate: 414 kb/s

This sounds fishy, doesn't it?

Björn recommended me to use libavcodec library directly instead of using ffmpeg. This sounded like a solution, so I spent a couple of days reading libavcodec code. But what I've found out is not very pleasing.

There is a function in libavcodec called av_seek_frame(). However, it is not very reliable. First, you cannot specify a frame number where you want to jump to. Moreover, according to a blog post Picture Go Back, it is not possible to reliably jump to a frame you want:

I repeatedly tried to seek forward and backwards to different frames -- frame 5000, 10,000, and 15,000 in divx, avi, and other video formats. Each time, the resulting location is close, but not exact. FFmpeg thinks it knows the frame number after seeking, but usually it is off. Frankly, when I want to jump to frame 5000, I want to be at frame 5000 and not 5015, 4079, or some other nearby frame.

So, I've just thought that maybe I can just scan the file without decoding it and check where are the beginnings of GOPs. However, I did not find any field that could provide this kind of information, but since all GOPs should start with an i-frame, I may try to just cut before each i-frame. However, I have to decode a frame in order to learn if it's an i-frame or not, and I don't really want to do that. And I really don't want to develop my own tool, because it will not stand a chance against ffmpeg in terms of supported formats, even if I use libavcodec.

And my research continues... Now, I'm thinking to look into VLC, see if it can cut accurately and if so, see if it is possible to use it as a library. Another option is to actually try to implement a new option in ffmpeg that would perform video copying, but will split video file nicely into pieces so that it would be possible to playback smoothly after joining these pieces back into a single video file.

Edit: I have to mention that I've found another way how to split videos using ffmpeg:

ffmpeg -i movie.mp4 -f segment -c copy -segment_time 120 -map 0 out%03d.mp4

After splitting a video using this method and joining the pieces back together, artefacts are still created in between split points. Slight time spaces appears between each segment and the total length gets increased as well. At least no frames are dropped which means that there is probably a slight bug somewhere. It may be a good idea to report this to FFmpeg community and see what are they thinking.

To visualise the behaviour, I have performed a simple test, I've end up with a video that is 00:00:08.05 length (as reported by ffprobe. It also reported errors with STTS twice), however it actually contains around 1 minute video. What I did then is:

ffmpeg -i joined.mp4 -c copy fixed.mp4

Then ffprobe reported that a duration of a file is 00:01:07.40 (still reported STTS error once). Here is the resulting video:

5 thoughts on “Split and Concatenate Videos with FFmpeg: It's Trivial!

  1. Have you found a solution to your issue ? I'm trying to concat video files but i have some gaps like in your last video

  2. Try to demux audio and video, split video, transcode, concat video, and mux back the audio.

  3. I had two issues with ffmpeg, and I wondered if you found a solution:

    1) if I make the duration too short, the video doesn't copy, just the audio.
    2) I don't know how to say "from this time stamp to the end" I could just put the duration up to 99 hours, but that seems like an odd workaround.

    Thoughts?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>