FFmpeg Video Editing – How to Cut Unwanted Parts and Join the Rest into One Video

ffmpegvideovideo editing

I have read about cutting and splicing using concat but this is not enough for me.

I would like to cut marked video parts and join into single video file without re-encoding. Is it possible to cut and merge (in memory) in one go?

There should be no theoretical limit on the number of video parts to cut or join. Audio should be in sync.

Video: H.264+AAC

Edit:

From the answers I learned that video re-encoding is still needed.

I would like to clarify that cutting video clips from the middle of a video file might be done in multiple steps. I was thinking that processing the video in one go would be performance wise by saving some I/O activity. However it is not an objective in itself.

Best Answer

Your question, if I got got it right, has four main parts:

  • How to cut a time-segment from a movie
  • How to merge several segments together
  • How to do the above without transcoding the movie, hence not losing any quality
  • How to do the above efficiently in terms of speed

Cutting a time segment

You can cut a segment by skipping to its starting point with the -ss option, then either setting the duration of the segment with -t, or setting the endpoint with -to:

# skip 30 seconds, then copy the next 60 seconds
ffmpeg -ss 30 -t 60 full-movie.mp4 segment.mp4

In the ffmpeg documentation please consult the section 5.4 Main Options regarding the difference between placing the -ss and -t options before the input file or before the output file. Once you experiment with this you may find this difference as relevant in your case.

Important note: The above example causes the video to be transcoded. We'll discuss below the possibility of doing this with no transcoding.

Merging Several Segments

There are three primary methods to perform a merge of movies, two of them are best explained in this ffmpeg wiki article, and the third is the concat demuxer. If all your segments are identical in terms of a/v encoding spec and container format, you'll probably find the easiest method to be the concat: protocol:

# merge the segments
ffmpeg -i "concat:seg1.mp4|seg2.mp4|seg3.mp4" final.mp4

It is important to note that this merger, like the cutting, once again transcodes the movie, so we have here a transcode of a transcode, potentially degrading the video quality, so we really want to avoid this transcoding as much as we can!

Cutting and Merging Without Transcoding

The reason that the examples in the previous sections require transcoding is because each file and each resulting segment is a standalone movie that was packed as to provide all the stuff you expect when you open a full movie, such as the duration of the movie and other metadata. But hey, you're not looking to play these as standalone movies, you want these segments to become part of a larger video stream. So instead of serving ffmpeg with boxed movies, you should better re-package - re-mux - the input files into a real-time streaming container, forgetting about header stuff you don't really need at this point. In the case of H.264 the streaming mux format is called MPEG-TS, and here is how you re-mux your stream without transcoding it:

# re-muxing the whole movie (see a better option in next example)
ffmpeg -i full-movie.mp4 -c:v copy -c:a copy -bsf:v h264_mp4toannexb full-movies-as-ts.ts

Well, since you're already at it, you might as well use this opportunity to cut just the segment you need:

# skip 30 seconds and re-mux a 60 seconds segment
ffmpeg -ss 30 -t 60 -i full-movie.mp4 -c:v copy -c:a copy -bsf:v h264_mp4toannexb segment.ts

When you merge the TS segments you can also re-mux back to an mp4 container:

# merge the segments and re-mux them as mp4
ffmpeg -i "concat:seg1.ts|seg2.ts|seg3.ts" -c:v copy -c:a copy -movflags empty_moov -flags global_header -bsf:v dump_extra edited-final.mp4

So now we have the whole thing done without ever transcoding the movie, preserving the original quality. But as always, there a caveat...

CAVEAT: The cutting can be done only on a keyframe boundary. Explanation: H.264 organizes the compressed frames in packets, each beginning with a full yet compressed image of the first frame, followed by the deltas of the following frames, thereby decreasing the storage required for each packet. For our purpose, each packet is like a sealed zip of all frames for that duration - it is either all or none. If you want just a piece of the packet then you have to unzip it and zip it back, in other words - to transcode it. So the above method is relevant only if you have a keyframe in each position where you want to cut the movie. For example, if you have a keyframe every 5 seconds you can cut it only every 5 seconds.

So now the question is whether you can accept the limitations on cutting points, particularly as you probably have no idea where in your movie you have keyframes. And that's the reason I suggested above to read in 5.4 Main Options about specifying -s and -t before the input or before the output. If you specify before the input then ffmpeg will find a nearby keyframe to perform your request, which will be "more-or-less" where you wanted to perform the cut. If you don't mind about the preciseness of the cut then good, go for it.

But if you need the cut to be at that precise position, then you have no choice, you must decode the movie in order to pinpoint the frame you're looking for. Well, at least we have some good news: Instead of transcoding and re-transcoding you can do with a single transcode, which somewhat improves the situation:

# skip PRECISELY 30 seconds and transcode a 60 seconds TS segment
ffmpeg -i full-movie.mp4 -ss 30 -t 60 -bsf:v h264_mp4toannexb segment.ts

Since the resulting segments will be in TS, no need for another transcode when merging the segments together.

Speed

In the ffmpeg wiki article about merging there's an explanation on how run the whole process through pipes, thereby eliminating the need for intermediate files and speeding the whole process. DON'T. It will take you longer. The reason that doing it all in memory will take longer isn't because it will run longer, but because you will have no intermediate results while figuring how to get the whole thing done, and you'll find yourself running the whole process again and again. So the pipes theory is good, but in your case you should start by working out and perfecting each step. You'll find that getting everything to work and produce a decent result will require some more tweaking and tuning. Once you'll master the entire process and wish to do some scripting for automated mass editing then you can revisit the piping concept.

Hope the above helps.

Related Question