跳转至

ffmpeg

ffmpeg

记录一个困扰很久的问题

我的台式机使用 AMD 5800x,无论在 linux 下和 windows 下,使用 ffmpeg 用 CPU 编码视频时总是遇到“伪影” (artifacts) 问题。使用 ffmpeg 默认参数,用 libx264 编码一下视频就可以复现。然而同样的视频在其它机器上就无法复现。

artifacts 找到的最像相关帖子是:Artifacts on 4k video encoded with ffmpeg placebo preset - VideoHelp Forum。这里提到 artifact 是播放时出现,其实视频文件用软件解码是没有问题的。 但是我的视频文件导出 png,仍然有 artifact,也就是还是 encoder 的问题。

  • 尝试使用 docker 版的 ffmpeg,仍然还是有问题。

     alias ffmpegd='docker run -v $(pwd):$(pwd) -w $(pwd) --rm --name ffmpeg-run jrottenberg/ffmpeg '
    
  • windows 使用 handbrake,更奇怪了,出现了绿色条条。
    • mpv 播放视频截图时,也会出现这种绿色竖条
  • 平时用 sunshine 串流也会出现网页图片完全糊成马赛克的情况

这些都让我觉得,不会是 CPU 出问题了吧。

基础

  • ffmpeg 读取任意个 input(使用-i 指定),写到任意个输出(通过 plain output url 指定)
  • 每个输出或者输出可以包含任意数量的不同类型的流 (stream),streams 种类有video/audio/subtitle/attachment/data
  • stream 的类型和数量受容器格式 (container format) 影响,从哪些 input 选取哪些流,输出到哪些 output 的过程(stream selection)既可以是自动的也可以通过-map指定。
  • options 对下一个文件起作用,可以重复相同的选项
  • 引用 input 或者 stream,可以使用从 0 开始的索引。比如第一个输入是02:3 表示第 3 个文件的第 4 个流
  • ffmpeg [全局选项] [输入选项] -i [输入文件] [输出选项] [输出文件]

一些简单的例子

Convert an input media file to a different format, by re-encoding media streams:

ffmpeg -i input.avi output.mp4

Set the video bitrate of the output file to 64 kbit/s:

ffmpeg -i input.avi -b:v 64k -bufsize 64k output.mp4

Force the frame rate of the output file to 24 fps:

ffmpeg -i input.avi -r 24 output.mp4

Force the frame rate of the input file (valid for raw formats only) to 1 fps and the frame rate of the output file to 24 fps:

ffmpeg -r 1 -i input.m2v -r 24 output.mp4

ffmpeg 流程

  • decode -> filter -> encode
  • 使用 copy 选项时,可以跳过 decode。Encoded packets are then passed to the decoder (unless streamcopy is selected for the stream, see further for a description).

     _______              ______________
    |       |            |              |
    | input |  demuxer   | encoded data |   decoder
    | file  | ---------> | packets      | -----+
    |_______|            |______________|      |
                                               v
                                           _________
                                          |         |
                                          | decoded |
                                          | frames  |
                                          |_________|
     ________             ______________       |
    |        |           |              |      |
    | output | <-------- | encoded data | <----+
    | file   |   muxer   | packets      |   encoder
    |________|           |______________|
    

filter

simple filter

  • 一个输入,一个输出,类型相同
  • 对每个流使用-filter 选项。-vf, -af 是 video 和 audio 的别名
_________                        ______________
|         |                      |              |
| decoded |                      | encoded data |
| frames  |\                   _ | packets      |
|_________| \                  /||______________|
           \   __________   /
simple     _\||          | /  encoder
filtergraph   | filtered |/
              | frames   |
              |__________|

Complex filter

Complex filter graphs are those which cannot be described as simply a linear processing chain applied to one stream.

  • 多个输入或输出
  • 使用-filter_complex
  • 默认是全局的选项

例子

overlay:将一个视频叠加到另一个视频上

Define a complex filtergraph, i.e. one with arbitrary number of inputs and/or outputs. “Filtergraph syntax” section of the ffmpeg-filters manual.

Input link labels must refer to input streams using the [file_index:stream_specifier] syntax (i.e. the same as -map uses). If stream_specifier matches multiple streams, the first one will be used. An unlabeled input will be connected to the first unused input stream of the matching type.

Output link labels are referred to with -map. Unlabeled outputs are added to the first output file.

ffmpeg -i video.mkv -i image.png -filter_complex '[0:v][1:v]overlay[out]' -map
        '[out]' out.mkv

stream selection

自动选择

  • 根据 output 文件的类型,决定是否包含 video/audio/subtitles 等各种类型的流
  • 流选择的具体规则
    • for video, it is the stream with the highest resolution,
    • for audio, it is the stream with the most channels,
    • for subtitles, it is the first subtitle stream found but there’s a caveat. The output format’s default subtitle encoder can be either text-based or image-based, and only a subtitle stream of the same type will be chosen.

map 选项

-map [-]input_file_id[:stream_specifier][?][,sync_file_id[:stream_specifier]] | [linklabel] (*output*)

  • stream_type[:additional_stream_specifier]
    • ’v’ or ’V’ for video, ’a’ for audio, ’s’ for subtitle, ’d’ for data, and ’t’ for attachments.
    • 如果不指定编号,则表示所有
  • m:key[:value]
    • Matches streams with the metadata tag key having the specified value. If value is not given, matches streams that contain the given tag with any value.

Designate one or more input streams as a source for the output file. Each input stream is identified by the input file index input_file_id and the input stream index input_stream_id within the input file. Both indices start at 0. If specified, sync_file_id:stream_specifier sets which input stream is used as a presentation sync reference.

例子

For example, if you have two audio streams in the first input file, these streams are identified by "0:0" and "0:1". You can use -map to select which streams to place in an output file.

For example:

ffmpeg -i INPUT -map 0:1 out.wav

will map the input stream in INPUT identified by "0:1" to the (single) output stream in out.wav.

For example, to select the stream with index 2 from input file a.mov (specified by the identifier "0:2"), and stream with index 6 from input b.mov (specified by the identifier "1:6"), and copy them to the output file out.mov:

ffmpeg -i a.mov -i b.mov -c copy -map 0:2 -map 1:6 out.mov

To select all video and the third audio stream from an input file:

ffmpeg -i INPUT -map 0:v -map 0:a:2 OUTPUT

To map all the streams except the second audio, use negative mappings

ffmpeg -i INPUT -map 0 -map -0:a:1 OUTPUT

To map the video and audio streams from the first input, and using the trailing ?, ignore the audio mapping if no audio streams exist in the first input:

ffmpeg -i INPUT -map 0:v -map 0:a? OUTPUT

To pick the English audio stream:

ffmpeg -i INPUT -map 0:m:language:eng OUTPUT

Note that using this option disables the default mappings for this output file.

stream handling

Stream handling is independent of stream selection, with an exception for subtitles described below. Stream handling is set via the -codec option addressed to streams within a specific output file. In particular, codec options are applied by ffmpeg after the stream selection process and thus do not influence the latter. If no -codec option is specified for a stream type, ffmpeg will select the default encoder registered by the output file muxer.

  • 在 stream selection 之后进行
  • 使用-codec 指定对 output 中流的处理方式
  • 如果没有指定-codec,则按照 output 文件格式选择默认的 encoder

Options

注意:ffmpeg 很多选项,其关于 input/output 的位置的不同,会产生完全不同的效果

stream specifier

对于一些针对某个流的选项(如-codec, -bitrate),使用 stream specifier 来指定针对哪个流。Some options are applied per-stream, e.g. bitrate or codec. Stream specifiers are used to precisely specify which stream(s) a given option belongs to. 如:

  • -codec: a:1 ac3,指定第2audio流的编码
  • -codec:v copy 指定复制 video 流(-codec or -codec:指定复制所有流)

main options

  • -c[:stream_specifier] codec (input/output,per-stream) Select an encoder (when used before an output file) or a decoder (when used before an input file) for one or more streams. codec is the name of a decoder/encoder or a special value copy (output only) to indicate that the stream is not to be re-encoded.

    ffmpeg -i INPUT -map 0 -c:v libx264 -c:a copy OUTPUT
    

    encodes all video streams with libx264 and copies all audio streams.

    作用顺序为后者优先

    ffmpeg -i INPUT -map 0 -c copy -c:v:1 libx264 -c:a:137 libvorbis OUTPUT
    

    will copy all the streams except the second video, which will be encoded with libx264, and the 138th audio, which will be encoded with libvorbis.

  • -t duration (*input/output*) When used as an input option (before -i), limit the duration of data read from the input file. When used as an output option (before an output url), stop writing the output after its duration reaches duration.
  • -ss position (*input/output*) When used as an input option (before -i), seeks in this input file to position.
    • Note that in most formats it is not possible to seek exactly, so ffmpeg will seek to the closest seek point before position. When transcoding and -accurate_seek is enabled (the default), this extra segment between the seek point and position will be decoded and discarded
    • When doing stream copy or when -noaccurate_seek is used, it will be preserved. When used as an output option (before an output url), decodes but discards input until the timestamps reach position.
  • -to position (*input/output*) Stop writing the output or reading the input at position.
  • -filter[:stream_specifier] filtergraph (*output,per-stream*) Create the filtergraph specified by filtergraph and use it to filter the stream.

    filtergraph is a description of the filtergraph to apply to the stream, and must have a single input and a single output of the same type of the stream. In the filtergraph, the input is associated to the label in, and the output to the label out. See the ffmpeg-filters manual for more information about the filtergraph syntax.

  • Video 相关
    • -vn (input/output) As an input option, blocks all video streams of a file from being filtered or being automatically selected or mapped for any output. See -discard option to disable streams individually. As an output option, disables video recording i.e. automatic selection or mapping of any video stream. For full manual control see the -map option.
    • -vcodec codec (output)
    • -vf filtergraph (output) Create the filtergraph specified by filtergraph and use it to filter the stream.

quick start

参考:FFmpeg 视频处理入门教程 - 阮一峰的网络日志 (ruanyifeng.com)

命令格式

$ ffmpeg \
[全局参数] \
[输入文件参数] \
-i [输入文件] \
[输出文件参数] \
[输出文件]

如下面的命令将 mp4 文件转成 webm 文件,这两个都是容器格式。输入的 mp4 文件的音频编码格式是 aac,视频编码格式是 H.264;输出的 webm 文件的视频编码格式是 VP9,音频格式是 Vorbis。

$ ffmpeg \
-y \ # 全局参数
-c:a libfdk_aac -c:v libx264 \ # 输入文件参数
-i input.mp4 \ # 输入文件
-c:v libvpx-vp9 -c:a libvorbis \ # 输出文件参数
output.webm # 输出文件

常用参数

-c:指定编码器
-c copy:直接复制,不经过重新编码(这样比较快)
-c:v:指定视频编码器
-c:a:指定音频编码器
-i:指定输入文件
-an:去除音频流
-vn: 去除视频流
-preset:指定输出的视频质量,会影响文件的生成速度,有以下几个可用的值 ultrafast, superfast, veryfast, faster, fast, medium, slow, slower, veryslow。

查看支持的编码器

~ ffmpeg.exe -encoders |grep av1
 V..... libaom-av1           libaom AV1 (codec av1)
 V....D librav1e             librav1e AV1 (codec av1)
 V..... libsvtav1            SVT-AV1(Scalable Video Technology for AV1) encoder (codec av1)

~ ffmpeg.exe -encoders |grep HEVC
 V..... libx265              libx265 H.265 / HEVC (codec hevc)
 V....D hevc_amf             AMD AMF HEVC encoder (codec hevc)
 V....D hevc_mf              HEVC via MediaFoundation (codec hevc)

~ ffmpeg.exe -encoders |grep h264
V..... libx264              libx264 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 (codec h264)
 V..... libx264rgb           libx264 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 RGB (codec h264)
 V..... libopenh264          OpenH264 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 (codec h264)
 V....D h264_amf             AMD AMF H.264 Encoder (codec h264)
 V....D h264_mf              H264 via MediaFoundation (codec h264)
 V....D h264_nvenc           NVIDIA NVENC H.264 encoder (codec h264)
 V..... h264_qsv             H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 (Intel Quick Sync Video acceleration) (codec h264)
 V..... nvenc                NVIDIA NVENC H.264 encoder (codec h264)
 V..... nvenc_h264           NVIDIA NVENC H.264 encoder (codec h264)

常用命令

心得 (坑)

  • 截取一段视频时,-ss 的位置不同有不同的效果,详见 seeking 章节。比如会导致你 seek 的速度非常慢(需要解码)。
  • 流复制不用转码是很好,但是会导致各种问题。比如 seek 不精确、seek 后视频和音频长度相差几百 ms。因此剪辑视频最稳妥起见还是重新转码一下
  • 转码时 crf 一定要选好,默认的码率是很低的 h265 4K
    • crf 26: 11mbps(B 站转成 h264 就剩 10mb 了,但实际最高可达 18mb)
    • crf 20: 22281kb
    • crf 16: 34894kb

h265 crf20 对 4K 60fps 的视频,可能造成过高码率,导致有些客户端播放卡顿。最好先测试不同 crf,看看播放效果。码率最好不要超过 25Mbps,平均控制在 10Mbps 左右。

h264 并不支持 crf 模式下设置最大峰值码率video encoding - Using CRF and setting a maximum bitrate with x264 in FFmpeg - Super User

  • 要么使用 two pass mode
  • 要么换 av1 等更高级的编码

查看文件元信息

ffmpeg -i input.mp4 -hide_banner

提取音频

输出文件可以选择不同容器格式,根据音频编码选择。(如 m4a, mp3,也可以直接用 mkv 存储)

ffmpeg -i input.mp4 -vn -c:a copy output.aac

截图

ffmpeg -i input.mp4 -ss 00:01:24 -t 00:00:01 output_%3d.png

JPG 和 PNG

jpg 是有损压缩,png 是无损压缩。 生成 jpg 时可以指定编码质量。-q:v 2表示输出的 jpg 图片质量,一般是 1 到 5 之间(1 为质量最高)。

ffmpeg -ss 01:23:45 -i input -vframes 1 -q:v 2 output.jpg

帧率控制 (-r)

表示每秒提取一帧图片

ffmpeg -i input.mp4 -r 1 output_%03d.png

帧数限制 (-frames:v)

-vframes 1指定只截取一帧

ffmpeg -ss 01:23:45 -i input -vframes 1 -q:v 2 output.jpg

选择帧 (-vf select)

每 10 帧提取一张图片

ffmpeg -i input.mp4 -vf "select=not(mod(n\,10))" -vsync vfr output_%03d.png

过滤器后处理

ffmpeg -i input.mp4 -vf "crop=640:360:0:0" output_%03d.png  # 裁剪图片,640x360 从左上角截取(不指定时居中截取)

ffmpeg -i input.mp4 -vf "scale=640:360" output_%03d.png   # 将输出图片调整为 640x360 分辨率

串流优化

[译] 优化 MP4 视频以获得更快的流传输速度 - 掘金 (juejin.cn)

-movflags faststart -g 50 -sc_threshold 0

faststart

  • Handbrake 的 “Web Optimized” 选项
-movflags faststart

keyframe

ffmpeg - Checking keyframe interval? - Stack Overflow

ffprobe -loglevel error -select_streams v:0 -show_entries packet=pts_time,flags -of csv=print_section=0 input.mp4 | awk -F',' '/K/ {print $1}'
0.000000
2.933333
11.266667
14.233333
22.400000
30.733333
37.133333
43.266667
45.633333
53.966667
62.300000
ffmpeg -i YOUR_INPUT -c:v h264 -keyint_min 25 -g 50 -sc_threshold 0 OUTPUT.mp4
  • -keyint_min: specifies the minimum interval to set a keyrame.
  • -g 50: this option stands for "Group of Pictures" and will instruct FFMPEG to sets the maximum keyframe interval every 50 frames, or 2 seconds, assuming that your input runs at 25 FPS.
  • -sc_threshold 0: This "SC" stands for Scene Change. FFmpeg defaults to a keyframe interval of 250 frames and inserts keyframes at scene changes, meaning when a picture's content change. This option will make sure not to add any new keyframe at scene changes.

The reason why a short keyframe interval is very important is because we need to provide the end user with a fast experience during playback or seeking (fast forward or rewind) a video, especially for the Adaptive Bitrate case scenario, where the end user automatically receives the highest or the lowest quality, based on his own available bandwidth. Also, a player can not start playback on a pframe or b-frame.

参考资料

FFmpeg Formats Documentation -fflags flags 是另外的选项

  • ‘genpts’:Generate missing PTS if DTS is present.

video - Does PTS have to start at 0? - Stack Overflow

  • 想让 start_pts 包含实际时间信息

video - ffmpeg setpts apply uniform offset without re-encoding - Stack Overflow

  • 两个流,知道结尾,因此可以同步
  • 使用 setpts filter 可以做到,但是想要避免转码

截取 Seeking 位置

一定要读Seeking – FFmpeg

关键帧(Keyframes):在视频编码中,关键帧是视频序列中的特殊帧,它们是独立的、不依赖于其他帧的帧。关键帧包含完整的图像信息,而其他帧(如 P 帧和 B 帧)则利用关键帧和/或其他帧的信息进行压缩。 关键帧是视频中的重要参考点,它们通常出现在视频场景变化或运动发生的地方。由于关键帧是独立的,视频播放器或解码器可以在任何关键帧处开始解码,而不需要依赖之前的帧。这有助于快速定位和随机访问视频。

PTS(Presentation Timestamp): PTS 是一个时间戳,表示视频或音频帧在播放时应该呈现的时间。PTS 用于确定帧何时应该在播放时间轴上显示或播放。

GOP(Group of Pictures):GOP 是视频序列中的一组帧,它包含一个关键帧(I 帧)和一些后续帧,包括预测帧(P 帧)和双向预测帧(B 帧)。GOP 的结构有助于视频压缩和解码。 典型的 GOP 结构可能是"IBBPBBPBBPBB",其中"I"是关键帧,"B"是双向预测帧,"P"是预测帧。关键帧始终是 GOP 的第一个帧。

关键帧和 GOP 的关系: 关键帧是 GOP 的起点,它定义了一个新的编码组,其后跟随一系列依赖于它的预测帧和双向预测帧。整个 GOP 的结构有助于在视频编码中实现高压缩比,因为预测帧和双向预测帧可以利用关键帧的信息进行差异编码,而不必携带完整的图像信息。 在视频流中,关键帧的选择和 GOP 的设置对于视频的质量、压缩效率和快速定位能力都具有重要影响。不同的应用和使用场景可能需要不同的关键帧间隔和 GOP 结构来平衡压缩效率和解码性能。

查看关键帧方法

ffprobe -select_streams v -show_frames -show_entries frame=pict_type,pkt_pts_time bbb_sunflower_1080p_30fps_normal.mp4 > log

input seeking

-ss  position (input/output)

When used as an input option (before -i), seeks in this input file to position. Note that in most formats it is not possible to seek exactly, so ffmpeg will seek to the closest seek point before position.

  • 简单理解为 seek 到 I 帧。
  • 转码时:When transcoding and -accurate_seek is enabled (the default), this extra segment between the seek point and position will be decoded and discarded
  • copy 时:When doing stream copy or when -noaccurate_seek is used, it will be preserved.
    • 由于不能编解码,所以实际 seek 位置必须对应 I 帧,因此 seek 位置实际在 pos 最前的一个 I 帧,然后将 ts 置为 0?

实验

ffmpeg -i bbb_sunflower_1080p_30fps_normal.mkv -t 10 ref/ref_%04d.png

f421d45d1b8b491dc36a564c01246ef8  i_ss_2-noacc.png
468a10a80c838526709d4d35abe7af3a  i_ss_2.png

(base) fyyuan@icarus4 ➜  test cat ref/log2|grep f421d45d1b8b491dc36a564c01246ef8
f421d45d1b8b491dc36a564c01246ef8  ref_0061.png
(base) fyyuan@icarus4 ➜  test cat ref/log2|grep 468a10a80c838526709d4d35abe7af3a
468a10a80c838526709d4d35abe7af3a  ref_0059.png

output seeking

When used as an output option (before an output url), decodes but discards input until the timestamps reach position.

  • 由于 input 已经被解码了,因此每帧的 timestamp 可以获得,seek 到指定位置前的 frame。
  • Here, the input will be decoded (and discarded) until it reaches the position given by -ss. This will be done very slowly, frame by frame.
  • As of FFmpeg 2.1, the main advantage is that when applying filters to the output stream, the timestamps aren't reset prior to filtering (i.e. when ​burning subtitles into a video, you don't need to modify the subtitle timestamps),
ffprobe -select_streams v -show_frames -show_entries frame=pict_type,pkt_pts_time src.mkv
[FRAME]
pkt_pts_time=0.613000
pict_type=I
[/FRAME]
[FRAME]
pkt_pts_time=0.630000
pict_type=B
[/FRAME]
...
[FRAME]
pkt_pts_time=1.130000
pict_type=B
[/FRAME]
[FRAME]
pkt_pts_time=1.147000
pict_type=I
[/FRAME]

相关选项

-copyts

Do not process input timestamps, but keep their values without trying to sanitize them. In particular, do not remove the initial start time offset value. Note that, depending on the vsync option or on specific muxer processing (e.g. in case the format option avoid_negative_ts is enabled) the output timestamps may mismatch with the input timestamps even when this option is selected.

-start_at_zero

When used with copyts, shift input timestamps so they start at zero. This means that using e.g. -ss 50 will make output timestamps start at 50 seconds, regardless of what timestamp the input file started at.

-avoid_negative_ts integer (output)

Possible values: ‘make_non_negative’ Shift timestamps to make them non-negative. Also note that this affects only leading negative timestamps, and not non-monotonic negative timestamps.

‘make_zero’ Shift timestamps so that the first timestamp is 0.

‘auto (default)’ Enables shifting when required by the target format.

‘disabled’ Disables shifting of timestamp.

When shifting is enabled, all output timestamps are shifted by the same amount. Audio, video, and subtitles desynching and relative timestamp differences are preserved compared to how they would have been without shifting.

转码

像素格式

Encoder h264_nvenc [NVIDIA NVENC H.264 encoder]:
    General capabilities: dr1 delay hardware
    Threading capabilities: none
    Supported hardware devices: cuda cuda
    Supported pixel formats: yuv420p nv12 p010le yuv444p p016le yuv444p16le bgr0 rgb0 cuda

CRF vs 固定码率等模式

A change of ±6 should result in about half/double the file size You should use CRF encoding primarly for offline file storage, in order to achieve the most optimal encodes

For x264, sane values are between 18 and 28. The default is 23 For x265, the default CRF is 28

  • Constant Quality: ensure that every frame gets the number of bits it deserves to achieve a certain (perceptual) quality level
    • h264, h265: -crf
  • Average Bitrate (ABR): Target Bitrate mode, try to reach the specified bit rate on average
    • -b:v 2M
  • Two pass
    • 用于达到目标码率
    • 即使没有指定目标码率,对于 CRF 仍然是有帮助的

CRF VBR CBR

vp9

vbr

  • 1-pass average bitrate
  • 2-pass average bitrate
    • Two-pass is the recommended encoding method for libvpx-vp9 as some quality-enhancing encoder features are only available in 2-pass mode.
  • Constant quality (Constant quantizer)
  • 2-pass constant quality
  • Constant bitrate

2-pass 模式

Constant quality 2-pass is invoked by setting -b:v to zero and specifiying a quality level using the -crf switch:

input=
output=
crf=30
ffmpeg -i "$(input)" -c:v libvpx-vp9 -b:v 0 -crf $crf -pass 1 -an -f null /dev/null && \
ffmpeg -i "$(input)" -c:v libvpx-vp9 -b:v 0 -crf $crf -pass 2 -c:a copy $output
ffmpeg -i input.mp4 -c:v libvpx-vp9 -crf 30 -b:v 0 output.webm

非常慢,体验很差

h264

Encode/H.264 – FFmpeg

两种模式

  • CRF:不关心文件大小
    • 17–28
    • 默认 23
    • 相差6 码率差一倍
  • 2pass: 对文件大小有要求的场景,比如 streaming
ffmpeg -i input -c:v libx264 -preset medium -crf 23 -c:a copy output.mkv

preset

  • ultrafast
  • superfast
  • veryfast
  • faster
  • fast
  • medium – default preset
  • slow
  • slower
  • veryslow
  • placebo – ignore this as it is not useful (see FAQ)

tune

  • film – use for high quality movie content; lowers deblocking
  • animation – good for cartoons; uses higher deblocking and more reference frames
  • grain – preserves the grain structure in old, grainy film material
  • stillimage – good for slideshow-like content
  • fastdecode – allows faster decoding by disabling certain filters
  • zerolatency – good for fast encoding and low-latency streaming
  • psnr – ignore this as it is only used for codec development
  • ssim – ignore this as it is only used for codec development

profile:非常老的设备支持 main 以下。通常支持 main 和 high

  • baseline
  • main
  • high
  • high10 (first 10 bit compatible profile)
  • high422 (supports yuv420p, yuv422p, yuv420p10le and yuv422p10le)
  • high444 (supports as above as well as yuv444p and yuv444p10le)

Constrained encoding (VBV / maximum bit rate)

You can use -crf or -b:v with a maximum bit rate by specifying both -maxrate and -bufsize:

ffmpeg -i input -c:v libx264 -crf 23 -maxrate 1M -bufsize 2M output.mp4

-bufsize is the "rate control buffer", so it will enforce your requested "average" (1 MBit/s in this case) across each 2 MBit worth of video.

This will effectively "target" -crf 23, but if the output were to exceed 1 MBit/s, the encoder would increase the CRF to prevent bitrate spikes. However, be aware that libx264 does not strictly control the maximum bit rate as you specified (the maximum bit rate may be well over 1M for the above file). To reach a perfect maximum bit rate, use two-pass.

h265

Encode/H.265 – FFmpeg

  • Choose a CRF. CRF affects the quality. The default is 28, and it should visually correspond to libx264 video at CRF 23, but result in about half the file size. CRF works just like in x264, so choose the highest value that provides an acceptable quality.
  • Choose a preset. The default is medium. The preset determines compression efficiency and therefore affects encoding speed. Valid presets are ultrafast, superfast, veryfast, faster, fast, medium, slow, slower, veryslow, and placebo. Use the slowest preset you have patience for. Ignore placebo as it provides insignificant returns for a significant increase in encoding time.
  • Choose a tune (optional). By default, this is disabled, and it is generally not required to set a tune option. x265 supports the following -tune options: psnr, ssim, grain, zerolatency, fastdecode. They are explained in the H.264 guide.

crf/码率选择

  • 转码 4K 时 crf 一定要选好,默认的码率是很低的 h265
    • crf 26: 11mbps(B 站转成 h264 就剩 10mb 了,但实际最高可达 18mb)
    • crf 20: 22281kb
    • crf 16: 34894kb

实测:

直接转码,仍使用了原本的 59.94fps

ffmpeg -i INPUT -c:v libx265 -crf 26 -preset slow -c:a aac -b:a 320k OUTPUT

frame=31195 fps= 12 q=35.8 Lsize=  709168kB time=00:08:41.00 bitrate=11150.6kbits/s speed=0.195x
video:688379kB audio:20388kB subtitle:0kB other streams:0kB global headers:2kB muxing overhead: 0.056575%
x265 [info]: frame I:    132, Avg QP:27.54  kb/s: 60301.51
x265 [info]: frame P:   7805, Avg QP:29.13  kb/s: 28855.09
x265 [info]: frame B:  23258, Avg QP:34.69  kb/s: 4505.13
x265 [info]: Weighted P-Frames: Y:1.7% UV:1.4%
x265 [info]: consecutive B-frames: 1.0% 2.5% 2.3% 91.0% 3.3%

encoded 31195 frames in 2677.46s (11.65 fps), 10833.60 kb/s, Avg QP:33.27
[aac @ 0x55d67964f4c0] Qavg: 174.559

设置帧率 60fps

ffmpeg -i INPUT -c:v libx265 -crf 26 -preset slow -r 60 -c:a aac -b:a 320k OUTPUT


frame=31195 fps=8.3 q=36.0 Lsize=  708843kB time=00:08:41.00 bitrate=11145.6kbits/s speed=0.139x
video:688054kB audio:20388kB subtitle:0kB other streams:0kB global headers:2kB muxing overhead: 0.056600%
x265 [info]: frame I:    132, Avg QP:27.54  kb/s: 60361.57
x265 [info]: frame P:   7805, Avg QP:29.14  kb/s: 28870.71
x265 [info]: frame B:  23258, Avg QP:34.70  kb/s: 4507.21
x265 [info]: Weighted P-Frames: Y:1.7% UV:1.4%
x265 [info]: consecutive B-frames: 1.0% 2.5% 2.3% 91.0% 3.3%

encoded 31195 frames in 3751.69s (8.31 fps), 10839.31 kb/s, Avg QP:33.27
[aac @ 0x56079a852080] Qavg: 174.559

6.x GB 2 小时 h264 电影,转码成 h265 后,1.6GB

video:1601155kB audio:98562kB subtitle:0kB other streams:0kB global headers:6kB muxing overhead: 0.310183%
x265 [info]: frame I:   1140, Avg QP:23.90  kb/s: 9640.57
x265 [info]: frame P:  85553, Avg QP:26.19  kb/s: 3183.49
x265 [info]: frame B: 180658, Avg QP:32.30  kb/s: 606.13
x265 [info]: Weighted P-Frames: Y:1.0% UV:0.5%
x265 [info]: consecutive B-frames: 1.7% 1.5% 86.7% 6.7% 3.4%

encoded 267351 frames in 11214.90s (23.84 fps), 1469.42 kb/s, Avg QP:30.30

av1

Encode/AV1 – FFmpeg

AV1 can achieve about 30% higher compression efficiency than VP9, and about 50% higher efficiency than H.264.

现有几种编码器可以选择

  • libaom-av1
    • 慢到不可用!
  • libsvtav1
    • 可以跑满 128 个核!!!
    • SVT-AV1 (libsvtav1) is an encoder originally developed by Intel in collaboration with Netflix.
    • The encoder supports a wide range of speed-efficiency tradeoffs and scales fairly well across many CPU cores.

Constant Quality:

libaom-av1 has a constant quality (CQ) mode (like CRF in x264 and x265) which will ensure that every frame gets the number of bits it deserves to achieve a certain (perceptual) quality level, rather than encoding each frame to meet a bit rate target. This results in better overall quality. If you do not need to achieve a fixed target file size, this should be your method of choice.

ffmpeg -i input.mp4 -c:v libsvtav1 -crf 32 -preset 12 av1_test.mkv

Comparing SVT-AV1 Presets: Size, Quality, and Speed with CRF Variations - OTTVerse

  • preset 1 实在太慢,2 - 12 为有意义的值
  • 2 -> 12 速度提升 124x
  • 6 -> 12 速度提升 6x
  • CRF >= 38, 无论 preset 怎么变,VMAF >= 95%
  • 2 -> 12,文件大小增加 20%
  • 6 和 2 最接近,速度提升 20x
    • 只考虑[6, 12]

FPS vs crf, preset

VMAF vs CRF, preset

Preset CRF Filesize (MB) Bitrate (kbps) PSNR (dB) SSIM VMAF Time (sec) FPS
2 38 32 26290.6 34.593 0.955 98.84 826.6 0.6
12 38 39 32693.4 32.99 0.936 95.90 6.1 81
Preset CRF Filesize (MB) Bitrake (kbps) PSNR (dB) SSIM VMAF Time (sec) FPS
2 26 75 62757.8 39.283 Bitrate (kbps) 99.709 974.094 0.5
4 26 75 62446.2 38.727 0.977 99.679 147.694 3.4
6 26 75 63023.0 38.439 0.975 99.565 53.14 9.4
8 26 81 67342.4 37.965 0.973 99.506 20.237 25
10 26 86 71827.7 37.677 0.97 99.446 10.524 48
12 26 92 77138.9 36.816 0.965 99.289 8.052 62

音频

Encode/HighQualityAudio – FFmpeg

  • 避免从一个 lossy 格式转码成另一个 lossy 格式,最好直接 copy。
    • Transcoding from a lossy format like MP3AAC, Vorbis, Opus, WMA, etc. to the same or different lossy format might degrade the audio quality even if the bitrate stays the same (or higher).
  • libopus > libvorbis >= libfdk_aac > libmp3lame >= eac3/ac3 > aac > libtwolame > vorbis > mp2 > wmav2/wmav1
  • 128, 384 k
fyyuan@icarus4   Media ffmpeg -vn -i src.mkv -c:a aac -b:a 256k audio.mp3
Stream mapping:
  Stream #0:1 -> #0:0 (aac_latm (native) -> aac (native))
Press [q] to stop, [?] for help
[mp3 @ 0x564c9fc61340] Invalid audio stream. Exactly one MP3 audio stream is required.
Could not write header for output file #0 (incorrect codec parameters ?): Invalid argument
Error initializing output stream 0:0 --
[aac @ 0x564c9fc60900] Qavg: nan
Conversion failed!

调节帧数

ffmpeg - Best way to convert a 59.94fps video to 29.97? - Stack Overflow

[FFmpeg-user] framerate conversion with sync audio (narkive.com) 这位老哥的需求时将 24fps 的视频转成 25fps,要求单纯只是加速,不改变 frame 内容。由于改变了 video 的速率,因此也要同时改变 audio 的速率,最后的结果命令

  • 有老哥说-r 不是很可靠,建议使用-vf setpts=PTS*0.8
  • 还有老哥提议测试以下asetpts filter 是否可以
ffmpeg -r 25 -i input_24fps.mov -af asetrate=50000,aresample=48000 -c:v  
prores -profile:v 3 -c:a pcm_s24le output_25fps_resampledaudio.mov

What is video timescale, timebase, or timestamp in ffmpeg? - Stack Overflow

  • 每 frame 包含 timestamp 信息。Presentation TimeStamps,表示其应该显示的时间?
  • For the sake of precise resolution of these time values, a timebase is used。比如 1/75 为单位
  • timescale 即 timebase 的倒数,FFmpeg shows the timescale as the tbn value in the readout of a stream.

字幕

PGS 字幕(图像字幕)

优点

  • 不会因缺失字体而显示不正常(相对于 ASS)
  • 支持字幕特效
  • 相对于硬字幕,可以切换字幕流 缺点
  • 一些老电视可能不支持 PGS 字幕

自己动手给视频添加PGS/SUP字幕_软件应用_什么值得买 (smzdm.com)

也有人问如何将 PGS 字幕压制成硬字幕 如何将SUP字幕压制到电影中,成为硬字幕 - 字幕 - 音轨 - 国语视界 (cnlang.org)

ffmpeg 貌似

ffmpeg -i c:\xxxx.xxx -filter_complex "[0:v][0:s]overlay[v]" -map "[v]" -map 0:a:0 output.mp4

想尝试使用 handbrake 压制

[PSG] Subtitles forced burn-in : r/handbrake (reddit.com) 原来 mp4 对于 PGS 字幕会强制烧录:

If you're wanting an mp4 container, PGS is not supported and you should remove the track from your preset. If you don't manually remove the track, it forces an automatic "burn-in" behavior. If you switch to MKV you can use it normally; this is one of the main reasons I personally use mkv

发现原来勾上 burn in 就好了,并且不能勾选仅限强制。mp4 对于 sup 默认是烧录,不能也不用选。

Note

有 Forien Track, Froien Track then first track。翻译错误,外语音轨翻译成了外部音轨。。。这里的烧录行为用于控制多条字幕时,烧录哪一个。选择 无 的话,则都不烧录。当然无论怎么选,仍然可以手动勾选进行修改。只能烧录一个字幕。还有字幕加载行为,可以选择几个语言,优先加载这些字幕。

文本字幕

ass

SRT

对比前面一个图,为什么ASS/SSA格式字幕会比SRT字幕选项少一些呢,这是因为你的ASS/SSA字幕文件中已经给每一行字幕指定了字体样式颜色和大小,软件会最大程度的遵照格式指定来展现字幕。而SRT字幕并没有这些指定,所以需要我们选择需要使用的字体样式及大小(图标3),屏幕对齐方式和左右、上下的边距(图标4),字幕字体颜色(图标5),字体边框颜色及边框宽度(图标6),字体阴影颜色,宽度及透明度(图标7)。这些对于SRT来说是全局设置,默认的字幕字体都会遵照这个设置。

extract

  • subrip: SubRip subtitle
  • srt: SubRip subtitle with embedded timing
ffmpeg -ss 44 -i City.Hunter.2024.1080p.NF.WEB-DL.DUAL.DDP5.1.Atmos.H.264-FLUX.mkv -t 180 -c copy -map 0:v:0 -map 0:a:0 -map 0:s:36 city_hunter.mkv -y

通过 language 选择

ffmpeg -ss 44 -i City.Hunter.2024.1080p.NF.WEB-DL.DUAL.DDP5.1.Atmos.H.264-FLUX.mkv -t 180 -c copy -map 0:v:0 -map 0:a:0 -map 0:m:language:chi city_hunter.mkv

burn to video

➜  Disk3 ffmpeg -i city_hunter_nosub.mkv -vf subtitles=city_hunter.chi.srt output.mp4
ffmpeg version N-113445-ge0da916b8f Copyright (c) 2000-2024 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.4.0-1ubuntu1~22.04)
  configuration: --prefix=/home/yfy/ffmpeg_build --pkg-config-flags=--static --extra-cflags='-I/home/yfy/ffmpeg_build/include -march=native' --extra-ldflags=-L/home/yfy/ffmpeg_build/lib --extra-libs='-lpthread -lm' --ld=g++ --bindir=/home/yfy/bin --enable-gpl --enable-libass --enable-libfdk-aac --enable-libfreetype --enable-libmp3lame --enable-libopus --enable-libsvtav1 --enable-libdav1d --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-nonfree
  libavutil      58. 36.101 / 58. 36.101
  libavcodec     60. 38.100 / 60. 38.100
  libavformat    60. 20.100 / 60. 20.100
  libavdevice    60.  4.100 / 60.  4.100
  libavfilter     9. 17.100 /  9. 17.100
  libswscale      7.  6.100 /  7.  6.100
  libswresample   4. 13.100 /  4. 13.100
  libpostproc    57.  4.100 / 57.  4.100
Input #0, matroska,webm, from 'city_hunter_nosub.mkv':
  Metadata:
    ENCODER         : Lavf60.20.100
  Duration: 00:03:02.04, start: 0.000000, bitrate: 5779 kb/s
  Stream #0:0(jpn): Video: h264 (Main), yuv420p(progressive), 1920x1080 [SAR 1:1 DAR 16:9], 24 fps, 24 tbr, 1k tbn (default) (original)
      Metadata:
        BPS             : 4791086
        NUMBER_OF_FRAMES: 149832
        NUMBER_OF_BYTES : 3738844388
        _STATISTICS_WRITING_APP: mkvmerge v83.0 ('Circle Of Friends') 64-bit
        _STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
        DURATION        : 00:03:02.041000000
  Stream #0:1(jpn): Audio: eac3 (Dolby Digital Plus + Dolby Atmos), 48000 Hz, 5.1(side), fltp, 768 kb/s (default) (original)
      Metadata:
        BPS             : 768000
        NUMBER_OF_FRAMES: 195093
        NUMBER_OF_BYTES : 599325696
        _STATISTICS_WRITING_APP: mkvmerge v83.0 ('Circle Of Friends') 64-bit
        _STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
        DURATION        : 00:03:02.000000000
[Parsed_subtitles_0 @ 0x5f46b3ffbd40] libass API version: 0x1502000
[Parsed_subtitles_0 @ 0x5f46b3ffbd40] libass source: tarball: 0.15.2
[Parsed_subtitles_0 @ 0x5f46b3ffbd40] Shaper: FriBidi 1.0.8 (SIMPLE) HarfBuzz-ng 2.7.4 (COMPLEX)
[Parsed_subtitles_0 @ 0x5f46b3ffbd40] Using font provider fontconfig
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> h264 (libx264))
  Stream #0:1 -> #0:1 (eac3 (native) -> aac (native))
Press [q] to stop, [?] for help
[aac @ 0x5f46b4dfd040] Using a PCE to encode channel layout "5.1(side)"
[Parsed_subtitles_0 @ 0x7c8ac4003640] libass API version: 0x1502000
[Parsed_subtitles_0 @ 0x7c8ac4003640] libass source: tarball: 0.15.2
[Parsed_subtitles_0 @ 0x7c8ac4003640] Shaper: FriBidi 1.0.8 (SIMPLE) HarfBuzz-ng 2.7.4 (COMPLEX)
[Parsed_subtitles_0 @ 0x7c8ac4003640] Using font provider fontconfig
[libx264 @ 0x5f46b3ff2380] using SAR=1/1
[libx264 @ 0x5f46b3ff2380] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 @ 0x5f46b3ff2380] profile High, level 4.0, 4:2:0, 8-bit
[libx264 @ 0x5f46b3ff2380] 264 - core 163 r3060 5db6aa6 - H.264/MPEG-4 AVC codec - Copyleft 2003-2021 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=24 lookahead_threads=4 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=24 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'output.mp4':
  Metadata:
    encoder         : Lavf60.20.100
  Stream #0:0(jpn): Video: h264 (avc1 / 0x31637661), yuv420p(progressive), 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 24 fps, 12288 tbn (default) (original)
      Metadata:
        BPS             : 4791086
        NUMBER_OF_FRAMES: 149832
        NUMBER_OF_BYTES : 3738844388
        _STATISTICS_WRITING_APP: mkvmerge v83.0 ('Circle Of Friends') 64-bit
        _STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
        DURATION        : 00:03:02.041000000
        encoder         : Lavc60.38.100 libx264
      Side data:
        cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A
  Stream #0:1(jpn): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, 5.1(side), fltp, 394 kb/s (default) (original)
      Metadata:
        BPS             : 768000
        NUMBER_OF_FRAMES: 195093
        NUMBER_OF_BYTES : 599325696
        _STATISTICS_WRITING_APP: mkvmerge v83.0 ('Circle Of Friends') 64-bit
        _STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
        DURATION        : 00:03:02.000000000
        encoder         : Lavc60.38.100 aac
[Parsed_subtitles_0 @ 0x7c8ac4003640] fontselect: (Arial, 400, 0) -> /usr/share/fonts/truetype/liberation/LiberationSans-Regular.ttf, 0, LiberationSans
[Parsed_subtitles_0 @ 0x7c8ac4003640] Glyph 0x6211 not found, selecting one more font for (Arial, 400, 0)
[Parsed_subtitles_0 @ 0x7c8ac4003640] fontselect: (Arial, 400, 0) -> /usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc, 0, NotoSansCJKjp-Regular
[out#0/mp4 @ 0x5f46b3fdb400] video:69496kB audio:8768kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.155742%
frame= 4369 fps=108 q=-1.0 Lsize=   78386kB time=00:03:01.95 bitrate=3529.1kbits/s speed=4.51x
[libx264 @ 0x5f46b3ff2380] frame I:77    Avg QP:17.81  size: 86251
[libx264 @ 0x5f46b3ff2380] frame P:1394  Avg QP:21.37  size: 28008
[libx264 @ 0x5f46b3ff2380] frame B:2898  Avg QP:23.29  size:  8792
[libx264 @ 0x5f46b3ff2380] consecutive B-frames:  6.2% 10.5% 16.8% 66.5%
[libx264 @ 0x5f46b3ff2380] mb I  I16..4: 30.6% 49.3% 20.1%
[libx264 @ 0x5f46b3ff2380] mb P  I16..4:  6.3% 14.1%  1.9%  P16..4: 34.2%  7.2%  3.7%  0.0%  0.0%    skip:32.7%
[libx264 @ 0x5f46b3ff2380] mb B  I16..4:  0.6%  1.3%  0.2%  B16..8: 33.5%  2.2%  0.3%  direct: 1.7%  skip:60.1%  L0:43.8% L1:53.7% BI: 2.5%
[libx264 @ 0x5f46b3ff2380] 8x8 transform intra:60.4% inter:83.4%
[libx264 @ 0x5f46b3ff2380] coded y,uvDC,uvAC intra: 41.2% 53.9% 15.2% inter: 8.2% 13.8% 1.3%
[libx264 @ 0x5f46b3ff2380] i16 v,h,dc,p: 26% 29%  6% 39%
[libx264 @ 0x5f46b3ff2380] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 26% 20% 20%  5%  6%  6%  6%  6%  5%
[libx264 @ 0x5f46b3ff2380] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 25% 21% 12%  6%  8%  7%  8%  6%  6%
[libx264 @ 0x5f46b3ff2380] i8c dc,h,v,p: 50% 24% 21%  5%
[libx264 @ 0x5f46b3ff2380] Weighted P-Frames: Y:0.6% UV:0.6%
[libx264 @ 0x5f46b3ff2380] ref P L0: 65.0% 13.1% 15.1%  6.8%  0.0%
[libx264 @ 0x5f46b3ff2380] ref B L0: 87.5% 10.4%  2.1%
[libx264 @ 0x5f46b3ff2380] ref B L1: 96.7%  3.3%
[libx264 @ 0x5f46b3ff2380] kb/s:3127.34
[aac @ 0x5f46b4dfd040] Qavg: 419.197

Filter

concat

Concatenate – FFmpeg

There are two methods within ffmpeg that can be used to concatenate files of the same type:

  1. the concat ''demuxer''
  2. the concat ''protocol''

stereo3d

不知为什么,我使用有 bug。tb 确实能转成 sbs,但是转不了 half。并且转的速度非常慢,理论上应该没什么计算,只是重组下像素。远慢于 scale。最后,转换的结果还很卡顿,不知道是为什么。

因此 sbs -> half sbs,建议还是使用 scale。至于 tb 转 sbs,还不知道怎么搞。

FFmpeg Filters Documentation

  • sbsl: side by side parallel (left eye left, right eye right)
  • sbs2l: side by side parallel with half width resolution (left eye left, right eye right)
  • abl: above-below (left eye above, right eye below)
    • tbl: same
  • ab2l, tb2l: above-below with half height resolution (left eye above, right eye below)

How to convert Top-and-Bottom 3d video to side-by-side 3d video with FFmpeg - Stack Overflow

ffmpeg -i top-and-bottom.mov -vf stereo3d=abl:sbsl -c:a copy side-by-side.mov

‘al’

  • alternating frames (left eye first, right eye second) ‘ar’
  • alternating frames (right eye first, left eye second)

scale

将 `input.mp4` 的视频缩放到 1280x720 的分辨率
ffmpeg -i input.mp4 -vf "scale=1280:720" output.mp4

# 将视频宽度缩放到 1280 像素,并保持纵横比
ffmpeg -i input.mp4 -vf "scale=1280:-1" output.mp4

# 使用 `pad` 过滤器与 `scale` 结合:保持原始尺寸,并只缩放到指定的宽度和高度。
# 这会缩放视频以适应指定的分辨率,同时在边缘填充黑色,以保持指定的尺寸。
ffmpeg -i input.mp4 -vf "scale=1280:720:force_original_aspect_ratio=decrease,pad=1280:720:(ow-iw)/2:(oh-ih)/2" output.mp4

pad

my_ffmpeg/padding at main · afrum/my_ffmpeg (github.com)

ffmpeg -i input.mp4 -vf "pad=WIDTH:HEIGHT:(ow-iw)/2:(oh-ih)/2" -c:a copy output.mp4
  • WIDTHHEIGHT 是目标分辨率。
  • (ow-iw)/2 是水平 left 的 padding,right 根据 width 自动得出。这里 (ow - iw)/2 表示,水平中心对齐。
  • (oh-ih)/2 是垂直 top 的 padding。

由于可以使用 ow, iw 变量。因此后面可以固定使用 (ow-iw)/2:(oh-ih)/2 ,适用于所有中心对齐的 padding 场景。

GPU 硬件加速

测试视频:

参考资料:

查看支持的 h264 编解码

ffmpeg -codecs|grep 264

DEV.LS h264                 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 (decoders: h264 h264_v4l2m2m h264_qsv h264_cuvid ) (encoders: libx264 libx264rgb h264_nvenc h264_omx h264_qsv h264_v4l2m2m h264_vaapi )
ffmpeg -hwaccels

vaapi 测试

使用 vaapi 需要以下库

apt install va-driver-all mesa-va-drivers

基本测试,测试权限之类的

ffmpeg -init_hw_device vaapi=va:/dev/dri/renderD128

调用 vaapi 加速编码

ffmpeg -y -vaapi_device /dev/dri/renderD128 -i Big_Buck_Bunny_1080_10s_30MB.mp4 -vf 'format=nv12,hwupload' -c:v h264_vaapi output.mp4

ffmpeg -y -i Big_Buck_Bunny_1080_10s_30MB.mp4 -c:v libx264 output.mp4

如果设备支持硬件解码

  • hwaccel vaapi 指定使用硬件解码
ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -hwaccel_device /dev/dri/renderD128 -i input.mp4 -c:v h264_vaapi output.mp4
  • n5105 UHD 核显,转码 big bunny 1080p 30fps, 25Mb/s码率大概50fps
  • 使用 cpu 转则只有 13fps

vainfo

error: can't connect to X server!
libva info: VA-API version 1.7.0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/radeonsi_drv_video.so
libva info: Found init function __vaDriverInit_1_7
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.7 (libva 2.6.0)
vainfo: Driver version: Mesa Gallium driver 21.2.6 for AMD Radeon (TM) R7 M340 (ICELAND, DRM 3.42.0, 5.15.0-86-generic, LLVM 12.0.0)
vainfo: Supported profile and entrypoints
      VAProfileMPEG2Simple            : VAEntrypointVLD
      VAProfileMPEG2Main              : VAEntrypointVLD
      VAProfileNone                   : VAEntrypointVideoProc

n5105

root@n5105-pve ➜  test vainfo
error: can't connect to X server!
libva info: VA-API version 1.17.0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_17
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.17 (libva 2.12.0)
vainfo: Driver version: Intel iHD driver for Intel(R) Gen Graphics - 23.1.1 ()
vainfo: Supported profile and entrypoints
      VAProfileNone                   : VAEntrypointVideoProc
      VAProfileNone                   : VAEntrypointStats
      VAProfileMPEG2Simple            : VAEntrypointVLD
      VAProfileMPEG2Main              : VAEntrypointVLD
      VAProfileH264Main               : VAEntrypointVLD
      VAProfileH264Main               : VAEntrypointEncSliceLP
      VAProfileH264High               : VAEntrypointVLD
      VAProfileH264High               : VAEntrypointEncSliceLP
      VAProfileJPEGBaseline           : VAEntrypointVLD
      VAProfileJPEGBaseline           : VAEntrypointEncPicture
      VAProfileH264ConstrainedBaseline: VAEntrypointVLD
      VAProfileH264ConstrainedBaseline: VAEntrypointEncSliceLP
      VAProfileVP8Version0_3          : VAEntrypointVLD
      VAProfileHEVCMain               : VAEntrypointVLD
      VAProfileHEVCMain               : VAEntrypointEncSliceLP
      VAProfileHEVCMain10             : VAEntrypointVLD
      VAProfileHEVCMain10             : VAEntrypointEncSliceLP
      VAProfileVP9Profile0            : VAEntrypointVLD
      VAProfileVP9Profile0            : VAEntrypointEncSliceLP
      VAProfileVP9Profile1            : VAEntrypointVLD
      VAProfileVP9Profile1            : VAEntrypointEncSliceLP
      VAProfileVP9Profile2            : VAEntrypointVLD
      VAProfileVP9Profile2            : VAEntrypointEncSliceLP
      VAProfileVP9Profile3            : VAEntrypointVLD
      VAProfileVP9Profile3            : VAEntrypointEncSliceLP
      VAProfileHEVCMain422_10         : VAEntrypointVLD
      VAProfileHEVCMain444            : VAEntrypointVLD
      VAProfileHEVCMain444            : VAEntrypointEncSliceLP
      VAProfileHEVCMain444_10         : VAEntrypointVLD
      VAProfileHEVCMain444_10         : VAEntrypointEncSliceLP

nvidia NVENC

FFMpeg + NVENC,显卡加成让视频编码快而不糙 - 沧域的小站 (icyu.me)

-cq                <float>      E..V....... Set target quality level (0 to 51, 0 means automatic) for constant quality mode in VBR rate control (from 0 to 51) (default 0)

-qp                <int>        E..V....... Constant quantization parameter rate control method (from -1 to 51) (default -1)
# 编码
ffmpeg -i input.mp4 -c:v hevc_nvenc -crf 23 -c:a copy output.mp4
# 解码+编码
ffmpeg -hwaccel cuda -i input.mp4 -c:v hevc_nvenc -crf 23 -c:a copy output.mp4
ffmpeg -re -i input.mp4 -c:v hevc_nvenc -preset slow -profile:v main10 -rc vbr_hq -b:v 6000k -maxrate:v 9000k -bufsize:v 12000k -c:a aac -b:a 128k -f mpegts "srt://hostname:port?streamid=stream1&mode=listener"

4K 25G h264,如下配置 30s 6.3M,非常多马赛克

ffmpeg -hwaccel cuda -ss 4:10 -t 30 -i a.mp4 -c:v hevc_nvenc -preset slow -c:a copy a.h265.gpu.mkv -y

cq 比 qp 慢:0.68 vs 1.47

时间戳相关

DTS 解码时间戳

[matroska @ 0x55a111e20440] Non-monotonous DTS in output stream 0:1; previous: 565, current: 564; changing to 565. This may result in incorrect timestamps in the output file.

选项

-copyts

Do not process input timestamps, but keep their values without trying to sanitize them. In particular, do not remove the initial start time offset value.

Note that, depending on the vsync option or on specific muxer processing (e.g. in case the format option avoid_negative_ts is enabled) the output timestamps may mismatch with the input timestamps even when this option is selected.

-start_at_zero

When used with copyts, shift input timestamps so they start at zero.

This means that using e.g. -ss 50 will make output timestamps start at 50 seconds, regardless of what timestamp the input file started at.

-copytb mode

Specify how to set the encoder timebase when stream copying. mode is an integer numeric value, and can assume one of the following values:

1

Use the demuxer timebase.

The time base is copied to the output encoder from the corresponding input demuxer. This is sometimes required to avoid non monotonically increasing timestamps when copying video streams with variable frame rate.

0

Use the decoder timebase.

The time base is copied to the output encoder from the corresponding input decoder.

-1

Try to make the choice automatically, in order to generate a sane output.

Default value is -1.

ffmpeg 输出时间戳默认会置为 0

ffmpeg demux into audio and video resets PTS - Stack Overflow

  • ffmpeg 分离 video 时会 reset timestamp?

构造一个 start 为 1h 的视频

ffmpeg -i bbb_sunflower_1080p_30fps_normal.mp4 -c copy -output_ts_offset 3600 test.mkv

查看确实 start_time = 3600

  • 使用 mpv 播放进度条仍为 0
  • 使用 windows 自带的视频播放器,进度条确实从 1h 开始
  • 文件管理器显示的视频时常也确实增加了 1h
Input #0, matroska,webm, from 'test.mkv':
  Metadata:
    title           : Big Buck Bunny, Sunflower version
    GENRE           : Animation
    MAJOR_BRAND     : isom
    MINOR_VERSION   : 1
    COMPATIBLE_BRANDS: isomavc1
    COMPOSER        : Sacha Goedegebure
    ARTIST          : Blender Foundation 2008, Janus Bager Kristensen 2013
    COMMENT         : Creative Commons Attribution 3.0 - http://bbb3d.renderfarming.net
    ENCODER         : Lavf58.76.100
  Duration: 01:10:34.60, start: 3600.000000, bitrate: 497 kb/s
  Stream #0:0: Video: h264 (High), yuv420p(progressive), 1920x1080 [SAR 1:1 DAR 16:9], 30 fps, 30 tbr, 1k tbn, 60 tbc (default)
    Metadata:
      HANDLER_NAME    : GPAC ISO Video Handler
      VENDOR_ID       : [0][0][0][0]
      DURATION        : 01:10:34.600000000
  Stream #0:1: Audio: ac3, 48000 Hz, 5.1(side), fltp, 320 kb/s (default)
    Metadata:
      HANDLER_NAME    : GPAC ISO Audio Handler
      VENDOR_ID       : [0][0][0][0]
      DURATION        : 01:10:34.144000000
start_pts=3600067
start_time=3600.067000
start_pts=3600000
start_time=3600.000000

ffmpeg 复制一下

ffmpeg -i test.mkv -c copy test_copy.mkv

确实 start 被置为 0 了,即

Input #0, matroska,webm, from 'test_copy.mkv':
  Metadata:
    title           : Big Buck Bunny, Sunflower version
    COMMENT         : Creative Commons Attribution 3.0 - http://bbb3d.renderfarming.net
    GENRE           : Animation
    MAJOR_BRAND     : isom
    MINOR_VERSION   : 1
    COMPATIBLE_BRANDS: isomavc1
    COMPOSER        : Sacha Goedegebure
    ARTIST          : Blender Foundation 2008, Janus Bager Kristensen 2013
    ENCODER         : Lavf58.76.100
  Duration: 00:10:34.60, start: 0.000000, bitrate: 3321 kb/s
  Stream #0:0: Video: h264 (High), yuv420p(progressive), 1920x1080 [SAR 1:1 DAR 16:9], 30 fps, 30 tbr, 1k tbn, 60 tbc (default)
    Metadata:
      HANDLER_NAME    : GPAC ISO Video Handler
      VENDOR_ID       : [0][0][0][0]
      DURATION        : 00:10:34.600000000
  Stream #0:1: Audio: ac3, 48000 Hz, 5.1(side), fltp, 320 kb/s (default)
    Metadata:
      HANDLER_NAME    : GPAC ISO Audio Handler
      VENDOR_ID       : [0][0][0][0]
      DURATION        : 00:10:34.144000000
start_pts=67
start_time=0.067000
start_pts=0
start_time=0.000000

ffmpeg 合并视频和音频,默认居然会有问题

将视频分解为视频和音频

ffmpeg -i bbb_sunflower_1080p_30fps_normal.mkv -vn -c copy a.m4a -an -c copy a.mkv

对音频进行截切和偏移(剪掉前 1.7s,同时往右偏移 1.7s,因此相对于 base 还是同步的)

ffmpeg -ss 1.7 -itsoffset 1.7 -i a.m4a -c copy b.m4a
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'b.m4a':
  Metadata:
    major_brand     : M4A
    minor_version   : 512
    compatible_brands: M4A isomiso2
    title           : Big Buck Bunny, Sunflower version
    artist          : Blender Foundation 2008, Janus Bager Kristensen 2013
    composer        : Sacha Goedegebure
    encoder         : Lavf58.76.100
    comment         : Creative Commons Attribution 3.0 - http://bbb3d.renderfarming.net
    genre           : Animation
  Duration: 00:10:34.14, start: 1.696000, bitrate: 319 kb/s
  Stream #0:0(und): Audio: ac3 (ac-3 / 0x332D6361), 48000 Hz, 5.1(side), fltp, 320 kb/s (default)
    Metadata:
      handler_name    : GPAC ISO Audio Handler
      vendor_id       : [0][0][0][0]
    Side data:
      audio service type: main
start_pts=81408
start_time=1.696000

将原本视频和音频合并 ffmpeg -i a.mkv -i b.m4a -c copy merge2.mkv

  • ffmpeg 直接忽视了 input 的时间戳,将其认为是从 0 开始的,然后合并。因此导致声音相比于正确情况提前了。
Input #0, matroska,webm, from 'merge.mkv':
  Metadata:
    title           : Big Buck Bunny, Sunflower version
    ARTIST          : Blender Foundation 2008, Janus Bager Kristensen 2013
    COMMENT         : Creative Commons Attribution 3.0 - http://bbb3d.renderfarming.net
    GENRE           : Animation
    MAJOR_BRAND     : isom
    MINOR_VERSION   : 1
    COMPATIBLE_BRANDS: isomavc1
    COMPOSER        : Sacha Goedegebure
    ENCODER         : Lavf58.76.100
  Duration: 00:10:34.53, start: 0.000000, bitrate: 3321 kb/s
  Stream #0:0: Video: h264 (High), yuv420p(progressive), 1920x1080 [SAR 1:1 DAR 16:9], 30 fps, 30 tbr, 1k tbn, 60 tbc (default)
    Metadata:
      HANDLER_NAME    : GPAC ISO Video Handler
      VENDOR_ID       : [0][0][0][0]
      DURATION        : 00:10:34.533000000
  Stream #0:1: Audio: ac3, 48000 Hz, 5.1(side), fltp, 320 kb/s (default)
    Metadata:
      HANDLER_NAME    : GPAC ISO Audio Handler
      VENDOR_ID       : [0][0][0][0]
      DURATION        : 00:10:32.448000000
start_pts=0
start_time=0.000000
start_pts=0
start_time=0.000000

正确合并 ffmpeg -i a.mkv -i b.m4a -copyts -c copy merge2.mkv

Input #0, matroska,webm, from 'merge2.mkv':
  Metadata:
    title           : Big Buck Bunny, Sunflower version
    ARTIST          : Blender Foundation 2008, Janus Bager Kristensen 2013
    COMMENT         : Creative Commons Attribution 3.0 - http://bbb3d.renderfarming.net
    GENRE           : Animation
    MAJOR_BRAND     : isom
    MINOR_VERSION   : 1
    COMPATIBLE_BRANDS: isomavc1
    COMPOSER        : Sacha Goedegebure
    ENCODER         : Lavf58.76.100
  Duration: 00:10:34.60, start: 0.067000, bitrate: 3320 kb/s
  Stream #0:0: Video: h264 (High), yuv420p(progressive), 1920x1080 [SAR 1:1 DAR 16:9], 30 fps, 30 tbr, 1k tbn, 60 tbc (default)
    Metadata:
      HANDLER_NAME    : GPAC ISO Video Handler
      VENDOR_ID       : [0][0][0][0]
      DURATION        : 00:10:34.600000000
  Stream #0:1: Audio: ac3, 48000 Hz, 5.1(side), fltp, 320 kb/s (default)
    Metadata:
      HANDLER_NAME    : GPAC ISO Audio Handler
      VENDOR_ID       : [0][0][0][0]
      DURATION        : 00:10:34.144000000
start_pts=67
start_time=0.067000
start_pts=1696
start_time=1.696000

delay 某个流

强行将音频 delay 0.613s,可以导致不同步

(base) fyyuan@icarus4 ➜  reset_pts ffmpeg -i ../src.mkv -itsoffset 0.613 -i ../src.mkv -map 0:0 -map 1:1 -c copy -shortest output_video_adjusted.mkv

Input #0, matroska,webm, from 'output_video_adjusted.mkv':
  Metadata:
    ENCODER         : Lavf58.76.100
  Duration: 00:08:41.15, start: 0.613000, bitrate: 24908 kb/s
  Stream #0:0: Video: hevc (Main 10), yuv420p10le(tv, bt2020nc/bt2020/bt2020-10), 3840x2160 [SAR 1:1 DAR 16:9], 59.94 fps, 59.94 tbr, 1k tbn, 59.94 tbc (default)
    Metadata:
      DURATION        : 00:08:41.149000000
  Stream #0:1: Audio: aac_latm (LC) ([2][22][0][0] / 0x1602), 48000 Hz, 5.1, fltp (default)
    Metadata:
      DURATION        : 00:08:41.018000000
start_pts=613
start_time=0.613000
start_pts=613
start_time=0.613000

B 站问题

为什么 seek 0.638s,结果导致 video 起始变成了 4.1s

现象就是视频少了 4s,mpv 播放时起始直接变成了 4s

Input #0, matroska,webm, from 'src.hevc.aac.mkv':
  Metadata:
    ENCODER         : Lavf58.76.100
  Duration: 00:08:41.18, start: 0.000000, bitrate: 11146 kb/s
  Stream #0:0: Video: hevc (Main 10), yuv420p10le(tv, bt2020nc/bt2020/bt2020-10, progressive), 3840x2160 [SAR 1:1 DAR 16:9], 59.94 fps, 59.94 tbr, 1k tbn, 59.94 tbc (default)
    Metadata:
      ENCODER         : Lavc58.134.100 libx265
      DURATION        : 00:08:41.175000000
  Stream #0:1: Audio: aac (LC), 48000 Hz, 5.1, fltp (default)
    Metadata:
      ENCODER         : Lavc58.134.100 aac
      DURATION        : 00:08:41.023000000
start_pts=638
start_time=0.638000
start_pts=0
start_time=0.000000
ffmpeg -i src.hevc.aac.mkv -ss 0.638 -c copy cut.hevc.aac.ss638.mkv
Input #0, matroska,webm, from 'cut.hevc.aac.ss638.mkv':
  Metadata:
    ENCODER         : Lavf58.76.100
  Duration: 00:08:40.54, start: 0.002000, bitrate: 10998 kb/s
  Stream #0:0: Video: hevc (Main 10), yuv420p10le(tv, bt2020nc/bt2020/bt2020-10, progressive), 3840x2160 [SAR 1:1 DAR 16:9], 59.94 fps, 59.94 tbr, 1k tbn, 59.94 tbc (default)
    Metadata:
      ENCODER         : Lavc58.134.100 libx265
      DURATION        : 00:08:40.536000000
  Stream #0:1: Audio: aac (LC), 48000 Hz, 5.1, fltp (default)
    Metadata:
      ENCODER         : Lavc58.134.100 aac
      DURATION        : 00:08:40.385000000
start_pts=4171
start_time=4.171000
start_pts=2
start_time=0.002000

怀疑是 seek,seek 到下一个关键帧了,导致丢失了关键帧前的所有数据(虽然 chatgpt 告诉我 seek 是 seek 到之前的关键帧)

感觉问题在于为什么自己的视频 video 和 audio duration 相差没那么大,而 B 站转码后,就相差 0.5s 了

分 p 版本 源文件

Input #0, matroska,webm, from 'p09.mkv':
  Metadata:
    ENCODER         : Lavf58.76.100
  Duration: 00:08:41.15, start: 0.000000, bitrate: 24909 kb/s
  Stream #0:0: Video: hevc (Main 10), yuv420p10le(tv, bt2020nc/bt2020/bt2020-10), 3840x2160 [SAR 1:1 DAR 16:9], 59.94 fps, 59.94 tbr, 1k tbn, 59.94 tbc (default)
    Metadata:
      DURATION        : 00:08:41.149000000
  Stream #0:1: Audio: aac_latm (LC) ([2][22][0][0] / 0x1602), 48000 Hz, 5.1, fltp (default)
    Metadata:
      DURATION        : 00:08:41.002000000
start_pts=613
start_time=0.613000
start_pts=0
start_time=0.000000

b 站下载的

➜  yt-dlp ./yt-dlp https://www.bilibili.com/video/BV1dG411z7sc -F -I 9
[BiliBili] Extracting URL: https://www.bilibili.com/video/BV1dG411z7sc
[BiliBili] 1dG411z7sc: Downloading webpage
[BiliBili] BV1dG411z7sc: Extracting videos in anthology
[BiliBili] Downloading playlist BV1dG411z7sc - add --no-playlist to download just the video BV1dG411z7sc
[download] Downloading playlist: 【第73回NHK紅白歌合戦 BS4K】节选
[BiliBili] Playlist 【第73回NHK紅白歌合戦 BS4K】节选: Downloading 1 items
[download] Downloading item 1 of 1
[BiliBili] Extracting URL: https://www.bilibili.com/video/BV1dG411z7sc?p=9
[BiliBili] 1dG411z7sc: Downloading webpage
[BiliBili] BV1dG411z7sc: Extracting videos in anthology
[BiliBili] 410742370: Extracting chapters
[info] Available formats for BV1dG411z7sc_p9:
ID    EXT RESOLUTION FPS │   FILESIZE    TBR PROTO │ VCODEC         VBR ACODEC      ABR
───────────────────────────────────────────────────────────────────────────────────────
30216 m4a audio only     │ ≈  2.51MiB    39k https │ audio only         mp4a.40.5   39k
30232 m4a audio only     │ ≈  5.88MiB    92k https │ audio only         mp4a.40.2   92k
30280 m4a audio only     │ ≈  9.87MiB   155k https │ audio only         mp4a.40.2  155k
30016 mp4 640x360     30 │ ≈ 23.35MiB   367k https │ avc1.64001E   367k video only
30032 mp4 852x480     30 │ ≈ 52.44MiB   824k https │ avc1.64001F   824k video only
30064 mp4 1280x720    30 │ ≈116.21MiB  1827k https │ avc1.640028  1827k video only
30080 mp4 1920x1080   30 │ ≈173.36MiB  2725k https │ avc1.640032  2725k video only
30116 mp4 1920x1080   59 │ ≈346.03MiB  5439k https │ avc1.640032  5439k video only
30120 mp4 3840x2160   59 │ ≈  1.16GiB 18644k https │ avc1.640034 18644k video only
[download] Finished downloading playlist: 【第73回NHK紅白歌合戦 BS4K】节选

(这里是 yt-dlp 的 bug,需要加 copyts,否则合并时会丢弃 input timestamp。不过我试了手动 merge,貌似仍然不同步,这次变成音频要快一点)

Input #0, matroska,webm, from 'bili-p09.mkv':
  Metadata:
    DESCRIPTION     : Packed by Bilibili XCoder v2.0.2
    MAJOR_BRAND     : iso5
    MINOR_VERSION   : 1
    COMPATIBLE_BRANDS: avc1iso5dsmsmsixdash
    ENCODER         : Lavf58.76.100
  Duration: 00:08:41.05, start: 0.000000, bitrate: 18779 kb/s
  Stream #0:0: Video: h264 (High), yuv420p(tv, bt709, progressive), 3840x2160 [SAR 1:1 DAR 16:9], 59.94 fps, 59.94 tbr, 1k tbn, 119.88 tbc (default)
    Metadata:
      HANDLER_NAME    : VideoHandler
      VENDOR_ID       : [0][0][0][0]
      DURATION        : 00:08:40.539000000
  Stream #0:1: Audio: aac (LC), 48000 Hz, stereo, fltp (default)
    Metadata:
      HANDLER_NAME    : SoundHandler
      VENDOR_ID       : [0][0][0][0]
      DURATION        : 00:08:41.045000000
start_pts=0
start_time=0.000000
start_pts=0
start_time=0.000000

手动合并的(这里的 1.226 不就是 0.613 的两倍吗)(果然,将音频 delay +600ms 后,就很同步了)

Input #0, matroska,webm, from 'p9.mkv':
  Metadata:
    DESCRIPTION     : Packed by Bilibili XCoder v2.0.2
    MAJOR_BRAND     : iso5
    MINOR_VERSION   : 1
    COMPATIBLE_BRANDS: avc1iso5dsmsmsixdash
    ENCODER         : Lavf58.76.100
  Duration: 00:08:41.77, start: 0.000000, bitrate: 18753 kb/s
  Stream #0:0: Video: h264 (High), yuv420p(tv, bt709, progressive), 3840x2160 [SAR 1:1 DAR 16:9], 59.94 fps, 59.94 tbr, 1k tbn, 119.88 tbc (default)
    Metadata:
      HANDLER_NAME    : VideoHandler
      VENDOR_ID       : [0][0][0][0]
      DURATION        : 00:08:41.765000000
  Stream #0:1: Audio: aac (LC), 48000 Hz, stereo, fltp (default)
    Metadata:
      HANDLER_NAME    : SoundHandler
      VENDOR_ID       : [0][0][0][0]
      DURATION        : 00:08:41.045000000
start_pts=1226
start_time=1.226000
start_pts=0
start_time=0.000000

原因在于这里的 duration 是包含 start_pts 的,因此源视频的 video 就比 audio 少约 0.5s

使用 setpts(后面发现该命令是 filter 因此需要转码,速度很慢,实际上对可以 input video 单独设置 input pts offset,后面会用到该命令进行修复),可以复现 duration 差值变大

 ffmpeg -i src.mkv -filter_complex "[0:v]setpts=PTS-STARTPTS[v];[0:a]asetpts=PTS-STARTPTS[a]" -map "[v]" -map "[a]" setpts.mkv
Input #0, matroska,webm, from 'setpts.mkv':
  Metadata:
    ENCODER         : Lavf58.76.100
  Duration: 00:08:41.01, start: 0.000000, bitrate: 20558 kb/s
  Stream #0:0: Video: h264 (High 10), yuv420p10le(tv, bt2020nc/bt2020/bt2020-10, progressive), 3840x2160 [SAR 1:1 DAR 16:9], 59.94 fps, 59.94 tbr, 1k tbn, 119.88 tbc (default)
    Metadata:
      ENCODER         : Lavc58.134.100 libx264
      DURATION        : 00:08:40.540000000
  Stream #0:1: Audio: vorbis, 48000 Hz, 5.1, fltp (default)
    Metadata:
      ENCODER         : Lavc58.134.100 libvorbis
      DURATION        : 00:08:41.006000000
start_pts=0
start_time=0.000000
start_pts=0
start_time=0.000000

猜想问题原因

可以看到我上传到 B 站的文件视频流的起始帧的 pts 为 0.613s,差不多正是 B 站音画不同步的时间

Input #0, matroska,webm, from 'src.mkv':
  Metadata:
    ENCODER         : Lavf58.76.100
  Duration: 00:08:41.15, start: 0.000000, bitrate: 24909 kb/s
  Stream #0:0: Video: hevc (Main 10), yuv420p10le(tv, bt2020nc/bt2020/bt2020-10), 3840x2160 [SAR 1:1 DAR 16:9], 59.94 fps, 59.94 tbr, 1k tbn, 59.94 tbc (default)
    Metadata:
      DURATION        : 00:08:41.149000000
  Stream #0:1: Audio: aac_latm (LC) ([2][22][0][0] / 0x1602), 48000 Hz, 5.1, fltp (default)
    Metadata:
      DURATION        : 00:08:41.002000000
start_pts=613
start_time=0.613000
start_pts=0
start_time=0.000000

并且经过很多次转码实验,发现转码并不能重置掉这个 start_pts,而是基本完全保留

src.hevc.aac.60fps.mkv  start_pts=638
src.hevc.aac.mkv  start_pts=638
src.avc.mkv start_pts=638

而 start_pts 的含义便是视频的第一帧应该显示出来的时间。 如果我把这个 pts 重置为 0,视频就会比音频提前(假设原本的 pts,音画是同步的话)

而 yt-dlp 下载 B 站视频时,视频和音频是分开下载,最后再合并成一个视频的。 我观察了好几个 yt-dlp 下载的视频,发现 start_pts 都变成了 0。所以猜测:

  • B 站转码的时候将 pts 重置为了 0(每个 frame 做如下 filter:[0:v]setpts=PTS-STARTPTS[v]
  • 后面发现 yt-dlp 合并时丢弃了原本的 ts,将两个流当作 start_pts=0 进行合并的。B 站的视频和音频文件是有 offset
  • 但是即使手动-copyts 合并,仍然不同步,并且这次变成音频提前了。 UDPATE: 是 yt-dlp 会清除 timestamps。不过这里的问题在于 B 站好像也忽略了时间戳,导致不同步

至于为什么一开始使用-ss 剪辑出 8 分钟的片段时,pts 为 613 而不是干脆置为 0。怀疑流复制情况,只能 seek 到 pos 之后的一个关键帧(而不是 chatgpt 告诉我的 seek 到之前的关键帧)

其它

发现原始视频有两个 audio

Input #0, mpegts, from '2022-12-31 第73回NHK 紅白歌合戦 NHK BS4K.m2ts':
  Duration: 04:25:01.42, start: 61.462067, bitrate: 26406 kb/s
  Program 101
  Stream #0:0[0x1011]: Video: hevc (Main 10) (HESE / 0x45534548), yuv420p10le(tv, bt2020nc/bt2020/bt2020-10), 3840x2160 [SAR 1:1 DAR 16:9], 59.94 fps, 59.94 tbr, 90k tbn, 59.94 tbc
  Stream #0:1[0x1100]: Audio: aac_latm (LC) ([17][0][0][0] / 0x0011), 48000 Hz, stereo, fltp
  Stream #0:2[0x1c00]: Data: bin_data ([6][0][0][0] / 0x0006)
  Stream #0:3[0x1c01]: Data: bin_data ([6][0][0][0] / 0x0006)
  Stream #0:4[0x1101]: Audio: aac_latm (LC) ([17][0][0][0] / 0x0011), 48000 Hz, stereo, fltp
Unsupported codec with id 100359 for input stream 2
Unsupported codec with id 100359 for input stream 3
start_pts=5577842
start_time=61.976022
start_pts=5531586
start_time=61.462067
start_pts=5531586
start_time=61.462067
start_pts=5531586
start_time=61.462067
start_pts=5589186
start_time=62.102067

youtube 没问题

yt-dlp 合并视频和音频时,究竟是如何对准 ts 的?

Input #0, mpegts, from 'youtube.f312.mp4':
  Duration: 00:08:41.15, start: 0.016678, bitrate: 5441 kb/s
  Program 1
  Stream #0:0[0x100]: Video: h264 (High) ([27][0][0][0] / 0x001B), yuv420p(tv, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 59.94 fps, 59.94 tbr, 90k tbn, 119.88 tbc
start_pts=1501
start_time=0.016678

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'youtube.f380.m4a':
  Metadata:
    major_brand     : dash
    minor_version   : 0
    compatible_brands: iso6mp41
    creation_time   : 2024-01-02T23:44:20.000000Z
  Duration: 00:08:41.15, start: 0.000000, bitrate: 384 kb/s
  Stream #0:0(und): Audio: ac3 (ac-3 / 0x332D6361), 48000 Hz, 5.1(side), fltp, 384 kb/s (default)
    Metadata:
      creation_time   : 2024-01-02T23:44:20.000000Z
      handler_name    : ISO Media file produced by Google Inc.
      vendor_id       : [0][0][0][0]
    Side data:
      audio service type: main
start_pts=0
start_time=0.000000

Input #0, matroska,webm, from 'youtube.mkv':
  Metadata:
    ENCODER         : Lavf58.76.100
  Duration: 00:08:41.15, start: 0.000000, bitrate: 5661 kb/s
  Stream #0:0: Video: h264 (High), yuv420p(tv, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 59.94 fps, 59.94 tbr, 1k tbn, 119.88 tbc (default)
    Metadata:
      DURATION        : 00:08:41.154000000
  Stream #0:1: Audio: ac3, 48000 Hz, 5.1(side), fltp, 384 kb/s (default)
    Metadata:
      HANDLER_NAME    : ISO Media file produced by Google Inc.
      VENDOR_ID       : [0][0][0][0]
      DURATION        : 00:08:41.152000000
start_pts=0
start_time=0.000000
start_pts=0
start_time=0.000000

手动 merge + -copyts

Input #0, matroska,webm, from 'youtube-hand-merge2.mkv':
  Metadata:
    ENCODER         : Lavf58.76.100
  Duration: 00:08:41.17, start: 0.000000, bitrate: 5661 kb/s
  Stream #0:0: Video: h264 (High), yuv420p(tv, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 59.94 fps, 59.94 tbr, 1k tbn, 119.88 tbc (default)
    Metadata:
      DURATION        : 00:08:41.171000000
  Stream #0:1: Audio: ac3, 48000 Hz, 5.1(side), fltp, 384 kb/s (default)
    Metadata:
      HANDLER_NAME    : ISO Media file produced by Google Inc.
      VENDOR_ID       : [0][0][0][0]
      DURATION        : 00:08:41.152000000
start_pts=17
start_time=0.017000
start_pts=0
start_time=0.000000

设置输入和输出 pts

尝试 cut + reset offset

  • 重点在于-itsoffset 是对输入的文件的所有流加上一个偏移
  • 而 output 则是对输出的再加上个偏移
ffmpeg -vn -ss 0.613 -itsoffset 0.613 -i ../src.mkv -an -i ../src.mkv -c copy -map 0:a -map 1:v -output_ts_offset -0.613 output_combined.mkv


Input #0, matroska,webm, from 'output_combined.mkv':
  Metadata:
    ENCODER         : Lavf58.76.100
  Duration: 00:08:40.59, start: 0.000000, bitrate: 24935 kb/s
  Stream #0:0: Audio: aac_latm (LC) ([2][22][0][0] / 0x1602), 48000 Hz, 5.1, fltp (default)
    Metadata:
      DURATION        : 00:08:40.438000000
  Stream #0:1: Video: hevc (Main 10), yuv420p10le(tv, bt2020nc/bt2020/bt2020-10), 3840x2160 [SAR 1:1 DAR 16:9], 59.94 fps, 59.94 tbr, 1k tbn, 59.94 tbc (default)
    Metadata:
      DURATION        : 00:08:40.585000000
start_pts=55
start_time=0.055000
start_pts=0
start_time=0.000000

文件 duration 的不可靠,是包含了 start_pts 的

  • 这解释了为什么从 B 站下载的文件,start_pts 变 0 后,为什么 duration 相差很大
  • 相当于 stream 和文件 duration,duration 都是计算最长的?