Posts for 2015 May

PSA: Use nautilus (GTK) in Kubuntu if Dolphin crashes

Post by Nico Brailovsky @ 2015-05-27 | Permalink | Leave a comment

So, for some reason my (brand new) Kubuntu is not very stable. KDE tends to randomly crash whenever I send a large-ish file to the recycling bin. Dunno why, looks like some threading + caching issue in dolphin, and I couldn't find a solution anywhere on the web. Well, there are two workarounds:

  1. Don't use the recycling bin. This sort of breaks my workflow, so I prefer to:
  2. Use nautilus.

You'll have to install gtk packages, but that's a small price to pay to have KDE not crash every couple of minutes.


Vim tip: Stop escaping slashes

Post by Nico Brailovsky @ 2015-05-07 | Permalink | 1 comments | Leave a comment

If you do a lot of search & replace in Vim, eventually you'll end up escaping a lot of slashes. Whenever you have to replace a path, for example. Isn't that annoying? After a couple of levels you end up with a horrible "\/path\/to\/foo\/bar" pattern. And if you miss an escape slash, good luck. It's better to scrap the whole thing and start over.

Luckily, when you are using the 's'earch command you can pick a different separator. Instead of typing "%s/\/foo\/bar\/baz\//foo\/bar\//g", you can simply type "%s#/foo/bar/baz/#foo/bar/#g". Vim will automagically detect you want to use '#' as a delimiter, and you'll end up with a much more readable pattern.

Extra tip: this also works in sed


C++: A jump table with a template device

Post by Nico Brailovsky @ 2015-05-05 | Permalink | 8 comments | Leave a comment

A few articles ago we saw how gcc might need some help when mixing template instanciation (pure compile time data) with function calls (deducible compile time information, but not available to the template expander). Now we'll go one step further and combine all three types: pure compile time data, deducible compile time data and pure run time data (*). Just to annoy the compiler, and to see how gcc is able to optimize the results.

Let's build a simple example, similar to what we used last time: an object that will determine the range of an integer and then invoke a callback with the closest range. Something like this could be used, for example, to allocate a buffer.

void boring(int x, func f) {
    if (x < 2) {
        f(2);
    } else if (x < 4) {
        f(4);
    } else if (x < 8) {
        f(8);
    } else if (x < 16) {
        // You get the idea...
    }
}

Can we build a prettier template version of this code, without any overhead? Let's try:

typedef void (*func)(int);
template 
struct Foo {
    void bar(size_t size, func callback) {
        if (size > My_Size) {
            callback(My_Size);
        } else {
            next_foo.bar(size, callback);
        }
    }
    Foo next_foo;
};
// Stop condition
template<> struct Foo<0> {
    void bar(size_t, func) { }
};
void wrapper(int x, func f) {
    Foo<512> jump_table;
    jump_table.bar(x, f);
}

And now, let's compile like as "g++ -fverbose-asm -S -O0 -c foo.cpp -o /dev/stdout | c++filt". You'll see something like this:

wrapper(int, void ()(int)):
    call    Foo<512>::bar(unsigned long, void ()(int))
Foo<512>::bar(unsigned long, void ()(int)):
    cmpq    $512, %rsi    #, size
    jbe    .L4
    call    %rdx    # callback
    jmp    .L3
.L4:
    call    Foo<256>::bar(unsigned long, void ()(int))    #
.L3:
    leave
Foo<256>::bar(unsigned long, void ()(int)):
    cmpq    $256, %rsi    #, size
    jbe    .L4
    call    %rdx    # callback
    jmp    .L3
.L4:
    call    Foo<128>::bar(unsigned long, void ()(int))    #
.L3:
    leave
# You get the idea, right?
Foo<0>::bar(unsigned long, void (*)(int)):
    # Stop condition, do nothing

That doesn't look too good, does it? We don't need to worry: we already learned that gcc needs help from the optimizer to handle template expansion and non static function calls. Let's move to O1:

rapper(int, void ()(int)):
.LFB14:
    cmpq    $512, %rdi    #, D.2974
    jbe    .L2    #,
    movl    $512, %edi    #,
    call    %rsi    # f
    jmp    .L1    #
.L2:
    cmpq    $256, %rdi    #, D.2974
    jbe    .L4    #,
    movl    $256, %edi    #,
    call    %rsi    # f
    jmp    .L1    #
# Again, it should be clear what's going on...
.L11:
    cmpq    $1, %rdi    #, D.2974
    .p2align 4,,2
    jbe    .L1    #,
    movl    $1, %edi    #,
    .p2align 4,,2
    call    %rsi    # f
.L1:

It's better than last time, but it doesn't look great either: gcc managed to inline all calls, but it stopped there. Let's move to O2 and see what happens:


wrapper(int, void ()(int)):
    movslq    %edi, %rdi    # x, D.2987
    cmpq    $512, %rdi    #, D.2987
    ja    .L13    #,
    cmpq    $256, %rdi    #, D.2987
    ja    .L14    #,
    [ .... ]
    cmpq    $2, %rdi    #, D.2987
    ja    .L21    #,
.L13:
    movl    $512, %edi    #,
    jmp    %rsi    # f
.L14:
    movl    $256, %edi    #,
    jmp    %rsi    # f
[ .... ]
.L21:
    movl    $2, %edi    #,
    jmp    %rsi    # f
.L1:
    rep
    ret
    .p2align 4,,10
    .p2align 3

Now, that looks much better. And we can now see that gcc generates the same code at -O2 for both versions of our code.

(*) Just for the sake of completion: