Re-adjusting out-of-sync subtitles.

Re-adjusting out-of-sync subtitles.

Have you ever downloaded a .srt file only to find out that it is out of sync ? Even a small delay can be intolerable. Back in the day, I used VLC to readjust the subtitles in real-time. But since there wasn't a way (that I know of) to save the changes, I decided to look elsewhere.

For a while I wondered how easy it would be to make a script for this simple task. After looking into it, it turned out, to my surprise, that the SRT file format is quite simple. It is composed of fragments that are formatted like this :

N
HH:MM:SS,mmm --> HH:MM:SS,mmm
Actual subtitle

It starts with a number N that identifies the fragment. This number starts from 1 and keeps incrementing for each subtitle that's displayed on screen. A line with two timestamps follows. These represent the time during which the subtitle will be displayed. Note that these are somewhat precise as they also include milliseconds. The third line contains the actual subtitle, and the fourth line is empty.

All I had to do was play with the values of the second line to get it working. If the dialog precedes the subtitles by a second, adding a second to the aforementioned timestamps will make them in sync once more.

Unfortunately I lost the script that I made back then, which was written in PHP if I recall. Rewriting it however turned out to be trivial. The std.datetime module of Phobos had everything I needed. Well, almost.

The issue I ran into was parsing milliseconds. Out of the relevant structs of the std.datetime module, only SysTime supported these. Unfortunately, SysTime also takes the date into consideration, which doesn't make sense in the context of subtitle handling. In the end I used TimeOfDay and discarded milliseconds altogether. Until I run into a situation where a delay of less than a second is annoying, I'll keep using the program as it is.

The heart of the program is the following loop. I keep calling it a program but it really is just a glorified shell script. I'm sure the same behavior can be replicated with a series of unix commands. But I digress. The loop copies lines from the fin stream to fout. These can either be File objects or stdin/stdout streams, whichever you prefer. Once an arrow is detected in the line, the code parses out the HH:mm:ss portion of the timestamps, adds the given positive or negative delay to it, then replaces the old HH:mm:ss timestamps with the updated ones, leaving milliseconds intact. You'll notice that TimeOfDay does most of the heavy lifting when it comes to parsing seeing as it already has a static method to construct a new instance from a HH:mm:ss formatted string :

foreach(string line; fin.lines)
{
    size_t arrow = line.indexOf("-->");
    if(arrow != -1)
    {
        string left = line[0 .. arrow].strip;
        size_t colon = left.indexOf(',');
        if(colon != -1)
            left = left[0 .. colon];

        string right = line[arrow + 4 .. $].strip;
        colon = right.indexOf(',');
        if(colon != -1)
            right = right[0 .. colon];

        auto from = TimeOfDay.fromISOExtString(left);
        auto to = TimeOfDay.fromISOExtString(right);

        from += seconds.seconds;
        from += minutes.minutes;
        from += hours.hours;

        to += seconds.seconds;
        to += minutes.minutes;
        to += hours.hours;

        line = line.replace(left, from.toISOExtString);
        line = line.replace(right, to.toISOExtString);
    }
    fout.write(line);
}

The real MVP of this code is the Duration struct. It's not visible in the code, but it's what I used in the from += and to += lines. Duration represents a duration. It can be minutes, seconds, years, or even nanoseconds. Suppose you want to add a number of seconds to a given TimeOfDay object. It makes much more sense to reason in terms of durations than to add another TimeOfDay object where everything but the seconds are set, doesn't it ? Duration also works with other structs like DateTime or SysTime.

A Duration can be created with the dur helper. It takes a compile time argument that defines the unit and a normal argument that specifies the amount of said unit. For example : dur!"seconds"(3).

Duration dur(string units)(long length) @safe pure nothrow @nogc
    if(units == "weeks" ||
       units == "days" ||
       units == "hours" ||
       units == "minutes" ||
       units == "seconds" ||
       units == "msecs" ||
       units == "usecs" ||
       units == "hnsecs" ||
       units == "nsecs")
{
    return Duration(convert!(units, "hnsecs")(length));
}

The library also defines a few aliases to make it possible to write seconds(3) instead of the more verbose dur!"seconds"(3). The order is interchangeable thanks to the uniform function call syntax (UFCS) of D and as such, it is possible to write things like 5.days or 13.years.

alias weeks   = dur!"weeks";   /// Ditto
alias days    = dur!"days";    /// Ditto
alias hours = dur!"hours"; /// Ditto
...

TimeOfDay, much like its sister structs, overloads the + and - operators to accept a Duration. It will then return a TimeOfDay copy to which it adds or subtracts the given duration. Negative durations are also accepted, and adding -3.seconds to TimeOfDay will actually subtract three seconds from it as one would expect.

Once the logic is handled, the only part that's remaining is the logistics. I decided to use std.getopt since it exposes an easy to use API when it comes to handling command-line arguments : namely the getopt function. Not only can it parse out arguments, but it also lets you specify help messages for them :

auto result = getopt(args,
    "i|in", "Input file or - for stdin", &input,
    "o|out", "Ouput file or - for stdout", &output,
    "s|seconds", "Seconds to add", &seconds,
    "m|minutes", "Minutes to add", &minutes,
    "h|hours", "Hours to add", &hours,
);

The program can be called like this :

./srt-sync -s -3 < old.srt > new.srt

A nice summary of the arguments and their use can be displayed by passing in --help :

srt-sync, a simple subtitle synchronizing utility.
-i      --in Input file or - for stdin
-o     --out Ouput file or - for stdout
-s --seconds Seconds to add
-m --minutes Minutes to add
-h   --hours Hours to add
-h    --help This help information.

This summary was automatically generated from the description I set for each command-line argument in the getopt() call :

if(result.helpWanted)
{
    defaultGetoptPrinter("srt-sync, a simple subtitle synchronizing utility.", result.options);
    return;
}

That's pretty much it. The code is available on Github. It has no external dependencies so you can just compile it with the dmd srt_sync.d -o srt_sync command.

Commentaires

Posts les plus consultés de ce blog

Writing a fast(er) youtube downloader

My experience with Win by Inwi

Porting a Golang and Rust CLI tool to D