Condenser makes it very easy to extract speech audio from video files based on subtitle timings. You drag a video or a folder that has videos to Condenser and it creates an audio file for each video, consisting of only audio that correspond to subtitles, optionally with some padding, and discards the rest.
This is mainly supposed to be used for passive immersion, where you are listening to the audio of something you have already watched as a part of language learning. By omitting the audio outside of speech, it increases the language per second that you are getting exposed to-i.e., the audio’s “language density”. It also saves disk space.
If you would like to use it on different operating systems, you can directly get the source code and run it via python.
How to use - Simple
Unpack the archive to a folder and create a shortcut to condenser.exe if you want.
Drag and drop a video (or audio) or a folder with videos to the executable (or its shortcut).
If there is an ambiguity, a pop-up may ask you to make a choice. Answer accordingly.
When the processing is done, an audio file with the name “[video_name]_con.mp3” will be created next to the video.
How to use - Detailed
Unpack the archive to a folder and create a shortcut to condenser.exe if you want.
Single file:
Drag and drop a video (or audio) to the executable (or its shortcut) or double click to run it and choose “Video” to select a file.
If the video has no embedded subtitles, it will look for a subtitle file of the same name in the same directory.
If it cannot find a suitable subtitle file either, it will ask you to select a subtitle file.
If there are multiple audio streams in the video file, it will ask you to pick one.
When the processing is done, an audio file with the name “[video_name]_con.mp3” will be created next to the video.
If an error occurs, the error message is written to a log.txt file in the executable directory.
Folder:
Drag and drop a folder with videos to the executable (or its shortcut) or double click to run it and choose “Folder” to select a folder.
All the video files in the folder must have the same group of audio and subtitle streams for this to work.
If the videos don’t have embedded subtitles, it will look for a subtitle file of the same name for each video file.
If it cannot find suitable subtitle files either, the program will exit (it won’t ask you to select a subtitle for every single video).
If there are multiple audio streams in the video files, it will ask you to pick one.
When the processing is done, a folder named “[folder_name]_con” will be created next to the input folder and mp3 files with the same name as the videos will be created within that folder.
If an error occurs, the error message is written to a log.txt file in the executable directory.
Config
You can change some settings in config.json:
“padding” is the amount of time that is added to the beginning and end of each subtitle period before extraction. The default is 500 ms and it works pretty well. Too short of a padding may slow down processing since the program merges overlapping periods before extracting audio. Also it may not give enough time to get context into what is happening in each line, making it less comprehensible.
“ask_when_multiple_srt” is false by default, which means it will pick the default (first) subtitle in a video file if it has multiple subtitles embedded. This is normally not a problem, but some videos may have strange subtitles put as the first one, such as “commentary” or “songs only”. In this case, change this option to true and the program will ask which subtitle to use.
“filtered_characters” is a set of characters to filter. If a subtitle line consists only of these characters, that line is ignored in the output.
“filter_parentheses” is true by default, which means it ignores subtitle lines that are completely enclosed in parentheses (including brackets and curly braces)
“output_format” is set to “mp3” by default, but supports every output format supported by ffmpeg. Formats of note are “flac” as it is lossless, and “aac” since it supports higher quality audio at the same file size as mp3. For a complete list see ffmpeg’s documentation.
“sub_suffix” is empty by default. If your external subs have a suffix by default (e.g. “[video_name]_retimed.srt” or “[video_name]_en.srt”), you can set it here (e.g. “_retimed” or “_en”) so that the program can find the subtitle file automatically.
“fixed_output_dir” is null by default. You can set it to a path string (e.g. “C:/Users/[user_name]/Condensed Audio”) to save output files in this directory.
“fixed_output_dir_with_subfolders” is true by default. If you set “fixed_output_dir” and this option is true, the program will create a “_con” subfolder within the fixed dir when the input is a folder. If it is false, it will save the output files directly in the fixed dir.
“use_system_ffmpeg” is false by default. If true, the program will use the system’s ffmpeg instead of the one included in the package. If you set this to true, make sure that ffmpeg is in your system’s PATH.
“output_condensed_subtitles” is false by default. If true, the program will output condensed subtitles as a .srt or .lrc file with the same name as the output file.
“condensed_subtitles_format” is “srt” by default. It can either be “srt” or “lrc”. Determines the format of “output_condensed_subtitles”. Has no effect if “output_condensed_subtitles” is false.
Change log
v1.6.0
Fixed a bug with timestamps when using output_condensed_subtitles
Added LRC format condensed subtitle output option via condensed_subtitles_format (thanks to Ttixan)
v1.5.0
Added the option to set a fixed output directory (thanks to wawrzenczyka)
Made error logs more informative (thanks to artjomsR)
Added the option to use system’s ffmpeg instead of the included one
Added the option to output condensed subtitles
Updated the icon
Improved the tests and overall code quality
Updated python version to 3.12
Updated ffmpeg version to the latest stable release
v1.4.0
Added the ability to control output file format, set by config
Added a subtitle suffix option to config
Added Japanese parentheses to the list of filtered parentheses
Added some audio extensions to the input file extension list
v1.3.1
Fixed a bug with subtitles where some lines can be empty after filtering
v1.3.0
Fixed a bug with external subtitles when condensing a folder
Added the ability to filter certain characters and lines enclosed by parentheses in subtitles, set by config
v1.2.2
Fixed a bug with mp3 files as input
Fixed a bug with non-srt subtitle files when condensing a folder
v1.2.1
Improved error logging
Fixed a bug with non-srt subtitle files
v1.2
Changed to prioritize external subtitles over internal ones
v1.1
Changed the audio extraction stage to be much faster, especially for long videos