Posts for 2013 March Nico Brailovsky's thought repository

Posts for 2013 March

Awesome festival

Post by Nico Brailovsky @ 2013-03-28 | Permalink | Leave a comment

I already had an awesome talking makefile which commanded me to go and fix my program when it broke by using festival, a voice synthesizer for Linux. I didn't know there was also a plugin for gaim, "festival-gaim".

It's quite funny but I don't think it's a good idea to have your computer reading out loud your messages. And you should remember to log off when going to sleep.

Perhaps I'll try festival as (yet another) wake up method: with a cron I can probably get it to read the news out loud in the morning. I'm not sure yet if that'll wake me up or make me even sleepier, though.

C++ exceptions under the hood 8: two-phase handling

Post by Nico Brailovsky @ 2013-03-26 | Permalink | Leave a comment

We finished last chapter on the series about C++ exceptions by adding a personality function that _Unwind_ was able to call. It didn't do much but there it was. The ABI we have been implementing can now throw exceptions and the catch is already halfway implemented, but the personality function needed to properly choose the catch block (landing pad) is bit dumb so far. Let's start this new chapter by trying to understand what are the parameters that the personality function receives and next time we'll begin adding some real behavior to __gxx_personality_v0: when __gxx_personality_v0 is called we should say "yes, this stack frame can indeed handle this exception".

We already said we won't care for the version or the exceptionClass for our mini ABI. Let's ignore the context too, for now: we'll just handle every exception with the first stack frame above the function throwing; note this implies there must be a try/catch block on the function immediately above the throwing function, otherwise everything will break. This also implies the catch will ignore its exception specification, effectively turning it into a catch(...). How do we let _Unwind_ know we want to handle the current exception?

_Unwind_Reason_Code is the return value from the personality functions; this tells _Unwind_ whether we found a landing pad to handle the exception or not. Let's implement our personality function to return _URC_HANDLER_FOUND then, and see what happens:

alloc ex 1
__cxa_throw called
Personality function FTW
Personality function FTW
no one handled __cxa_throw, terminate!

See that? We told _Unwind_ we found a handler, and it called the personality function yet again! What is going on there?

Remember the action parameter? That's how _Unwind_ tells us what he is expecting, and that is because the exception catching is handled in two phases: lookup and cleanup (or _UA_SEARCH_PHASE and _UA_CLEANUP_PHASE). Let's go again over our exception throwing and catching recipe:

__cxa_throw/__cxa_allocate_exception will create an exception and forward it to a lower-level unwind library by calling _Unwind_RaiseException
Unwind will use CFI to know which functions are on the stack (ie to know how to start the stack unwinding)
Each function has have an LSDA (language specific data area) part, added into something called ".gcc_except_table"
Unwind will try to locate a landing pad for the exception:
- Unwind will call the personality function with the action _UA_SEARCH_PHASE and a context pointing to the current stack frame.
- The personality function will check if the current stack frame can handle the exception being thrown by analyzing the LSDA.
- If the exception can be handled it will return _URC_HANDLER_FOUND.
- If the exception can not be handled it will return _URC_CONTINUE_UNWIND and Unwind will then try the next stack frame.
If no landing pad was found, the default exception handler will be called (normally std::terminate).
If a landing pad was found:
- Unwind will iterate the stack again, calling the personality function with the action _UA_CLEANUP_PHASE.
- The personality function will check if it can handle the current exception again:
- If this frame can't handle the exception it will then run a cleanup function described by the LSDA and tell Unwind to continue with the next frame (this is actually a very important step: the cleanup function will run the destructor of all the objects allocated in this stack frame!)
- If this frame can handle the exception, don't run any cleanup code: tell Unwind we want to resume execution on this landing pad.

There are two important bits of information to note here: 1. Running a two-phase exception handling procedure means that in case no handler was found then the default exception handler can get the original exception's stack trace (if we were to unwind the stack as we go it would get no stack trace, or we would need to keep a copy of it somehow!). 2. Running a _UA_CLEANUP_PHASE and calling a second time each frame, even though we already know the frame that will handle the exception, is also really important: the personality function will take this chance to run all the destructors for objects built on this scope. It is what makes RAII an exception safe idiom!

Now that we understand how the catch lookup phase works we can continue our personality function implementation. The next time.

An awareness test

Post by Nico Brailovsky @ 2013-03-21 | Permalink | Leave a comment

Rant ahread: this is an old video, but it remains very instructive. http://www.youtube.com/watch?feature=player_embedded&v=Ahg6qcgoay4.

It's amazing how easy it is to loose sight of a dancing bear (that sentence does make sense after viewing the video!). I wonder if that's why I dislike Windows. Too many menus, whereas I have a single interface for pretty much anything with my CLI.

C++ exceptions under the hood 7: a nice personality

Post by Nico Brailovsky @ 2013-03-19 | Permalink | 1 comments | Leave a comment

On our journey to learn about exceptions we have learned so far how a throw is done, that something called "call frame information" helps a library called Unwind to do the stack unwinding, and that the compiler writes something called LSDA, language specific data area, to know which exceptions can a method handle. And we know by now that a lot of magic is done on the personality function; we've never seen it in action though. Let's recap in a bit more of detail about how an exception will be thrown and caught (or, more precisely, how we know so far it will be thrown caught):

The compiler will translate our throw statement into a pair of __cxa_allocate_exception/__cxa_throw
__cxa_allocate_exception will create the exception in memory
__cxa_throw will initialize a bunch of stuff and forward this exception to a lower-level unwind library by calling _Unwind_RaiseException
Unwind will use CFI to know which functions are on the stack (ie to know how to start the stack unwinding)
Each function will have an LSDA (language specific data area) part, added into something called ".gcc_except_table"
Unwind will invoke the personality function with the current stack frame and the LSDA; this function should reply to unwind whether this stack can handle the exception or not

Knowing this, it's about time we implement our own personality function. Our ABI used to print this when an exception was thrown:

alloc ex 1
__cxa_throw called
no one handled __cxa_throw, terminate!

Let's go back to our mycppabi and let's add something like this (link to full mycppabi.cpp file):

void __gxx_personality_v0()
{
    printf("Personality function FTW\n");
}

Note: You can download the full sourcecode for this project in my github repo.

And sure enough, when we run it we should see our personality function being called. We know we're on the right track and now we have an idea of what we want for our personality function; let's start using the proper definition for this function:

_Unwind_Reason_Code __gxx_personality_v0 (
                     int version, _Unwind_Action actions, uint64_t exceptionClass,
                     _Unwind_Exception unwind_exception, _Unwind_Context context);

If we put that into our mycppabi.cpp file we get:

#include 
#include 
#include 
#include 
namespace __cxxabiv1 {
    struct __class_type_info {
        virtual void foo() {}
    } ti;
}
#define EXCEPTION_BUFF_SIZE 255
char exception_buff[EXCEPTION_BUFF_SIZE];
extern "C" {
void __cxa_allocate_exception(size_t thrown_size)
{
    printf("alloc ex %i\n", thrown_size);
    if (thrown_size > EXCEPTION_BUFF_SIZE) printf("Exception too big");
    return &exception_buff;
}
void __cxa_free_exception(void thrown_exception);
#include 
typedef void (unexpected_handler)(void);
typedef void (terminate_handler)(void);
struct __cxa_exception {
    std::type_info *    exceptionType;
    void (exceptionDestructor) (void );
    unexpected_handler  unexpectedHandler;
    terminate_handler   terminateHandler;
    __cxa_exception *   nextException;
    int         handlerCount;
    int         handlerSwitchValue;
    const char *        actionRecord;
    const char *        languageSpecificData;
    void *          catchTemp;
    void *          adjustedPtr;
    _Unwind_Exception   unwindHeader;
};
void __cxa_throw(void thrown_exception, struct type_info tinfo, void (dest)(void))
{
    printf("__cxa_throw called\n");
    __cxa_exception header = ((__cxa_exception ) thrown_exception - 1);
    _Unwind_RaiseException(&header->unwindHeader);
    // __cxa_throw never returns
    printf("no one handled __cxa_throw, terminate!\n");
    exit(0);
}
void __cxa_begin_catch()
{
    printf("begin FTW\n");
}
void __cxa_end_catch()
{
    printf("end FTW\n");
}
_Unwind_Reason_Code __gxx_personality_v0 (
                     int version, _Unwind_Action actions, uint64_t exceptionClass,
                     _Unwind_Exception unwind_exception, _Unwind_Context context)
{
    printf("Personality function FTW!\n");
}
}

Code @ my github repo.

Let's compile and link everything, then run it and start by analyzing each param to this function with some help of gdb:

Breakpoint 1, __gxx_personality_v0 (version=1, actions=1, exceptionClass=134514792, unwind_exception=0x804a060, context=0xbffff0f0)

The version and the exceptionClass are related to language/ABI/compiler toolchain/native or non-native exception, etc. We don't need to worry about it for our mini ABI, we'll just handle all the exceptions.
Actions: this is what _Unwind_ uses to tell the personality function what it should do (more on that later)
unwind_exception: the exception allocated by __cxa_allocate_exception (kind of... there's a lot of pointer arithmetic going on but that pointer can be used to access our original exception anyway)
context: this holds all the information regarding the current stack frame, for example the language specific data area (LSDA). This is what we will be using to detect whether this stack can handle the thrown exception (and also to detect whether we need to run any destructors)

So there we have it, a working (well, linkeable) personality function. Doesn't do much, though, so next time we'll start adding some real behavior and try to make it handle an exception.

Vim tip: no swap

Post by Nico Brailovsky @ 2013-03-14 | Permalink | Leave a comment

Swap files in Vim can be helpful as a very dumb lock mechanism, just so you're sure no one is changing the same file as you are in the same server. They can also serve as a very crude back up system, in case your system crashes. Alas, git seems suited to cover much better both functionalities, as a not so dumb developer synchronization mechanism and as a code backup tool in case everything crashes (you don't really host your git on localhost, do you?).

If you commit often swapfiles can be an annoyance more than a useful feature, and disabling then can save you a lot of "Swap file already found" messages. Just add "set noswapfile" to your .vimrc and forget about them.

C++ exceptions under the hood 6: gcc_except_table and the personality function

Post by Nico Brailovsky @ 2013-03-12 | Permalink | Leave a comment

We learned last time that, just as a throw statement is translated into a pair of __cxa_allocate_exception/throw calls, a catch block is translated into a pair of __cxa_begin/end_catch calls, plus something called CFI (call frame information) to find the landing pads, the points on a function where an exception can be handled.

What we don't yet know is how does _Unwind_* know where the landing pads are. When an exception is thrown there are a bunch of functions in the stack; all the CFI stuff will let Unwind know which functions these are but it's also necessary to know which landing pads each function provides so we can call each one and check if it wants to handle the exception (and we're ignoring functions with multiple try/catch blocks!).

To know where the landing pads are, something called gcc_except_table is used. This can be found (with a bunch of CFI stuff) after the function's end:

.LFE1:
    .globl  __gxx_personality_v0
    .section    .gcc_except_table,"a",@progbits
    [...]
.LLSDACSE1:
    .long   _ZTI14Fake_Exception

The section .gcc_except_table is where all information to locate a landing pad is stored, and we'll see more about it once we get to analyzing the personality function; for now, we'll just say that LSDA means language specific data area and it's the place where the personality function will check if there are any landing pads for a function (it is also used to run the destructors when unwinding the stack).

To wrap it up: for every function where at least a catch is found, the compiler will translate this statement into a pair of __cxa_begin_catch/__cxa_end_catch calls and then the personality function, which will be called by __cxa_throw, will read the gcc_except_table for every method in the stack, to find something call LSDA. The personality function will then check in the LSDA whether a catch can handle an exception and if there is any cleanup code to run (this is what triggers the destructors when needed).

We can also draw an interesting conclusion here: if we use the nothrow specifier (or the empty throw specifier) then the compiler can omit the gcc_except_table for this method. The way gcc implements exceptions, that won't have a great impact on performance but it will indeed reduce code size. What's the catch? If an exception is thrown when nothrow was specified the LSDA won't be there and the personality function won't know what to do. When the personality function doesn't know what to do it will invoke the default exception handler, meaning that in most cases throwing from a nothrow method will end up calling std::terminate.

Now that we have an idea of what the personality function does, can we implement one? We'll see how next time.

Hive speedup trick

Post by Nico Brailovsky @ 2013-03-07 | Permalink | Leave a comment

I've been playing around with Hive on top of Hadoop using AWS lately, but until recently I only knew about optimizing your query for better data-crunching throughput. Turns out you can also parallelize the subqueries execution, but this feature is not enabled by default.

Try running an explain on your query: if you have many root stages without dependencies then run this magic command: "set hive.exec.parallel=true", and then try your query again. If everything worked out fine you should be running multiple stages in parallel. use hive.exec.parallel.thread.number to control exactly how many stages to run in parallel.

C++ exceptions under the hood 5: magic around __cxa_begin_catch and __cxa_end_catch

Post by Nico Brailovsky @ 2013-03-05 | Permalink | Leave a comment

After learning how exceptions are thrown we are now on our way to learn how they are caught. Last time we added to our example application a bunch of try/catch statements to see what they did, and sure enough we got a bunch of linker errors, just like we did when we were trying to find out what does the throw statement do. This is what the linker says when trying to process throw.o:

Note: You can download the full sourcecode for this project in my github repo.

> g++ -c -o throw.o -O0 -ggdb throw.cpp
> gcc main.o throw.o mycppabi.o -O0 -ggdb -o app
throw.o: In function 'try_but_dont_catch()':
throw.cpp:12: undefined reference to '__cxa_begin_catch'
throw.cpp:12: undefined reference to '__cxa_end_catch'
throw.o: In function 'catchit()':
throw.cpp:20: undefined reference to '__cxa_begin_catch'
throw.cpp:20: undefined reference to '__cxa_end_catch'
throw.o:(.eh_frame+0x47): undefined reference to '__gxx_personality_v0'
collect2: ld returned 1 exit status

And our theory, of course, is that a catch statement is translated by the compiler into a pair of __cxa_begin_catch/end_catch calls into libstdc++, plus something new called the personality function of which we know nothing yet.

Let's begin by checking if our theory about __cxa_begin_catch and __cxa_end_catch holds. Let's compile throw.cpp with -S and analyze the assembly. There is a lot to see but if I strip it to the bare minimum this is what I get:

_Z5raisev:
    call    __cxa_allocate_exception
    call    __cxa_throw

So far so good: the same old definition we got for raise(), just throw an exception.

_Z18try_but_dont_catchv:
    .cfi_startproc
    .cfi_personality 0,__gxx_personality_v0
    .cfi_lsda 0,.LLSDA1

The definition for try_but_dont_catch(), mangled by the compiler. There is something new, though: a reference to __gxx_personality_v0 and to something else called LSDA. These are seemingly innocent declarations but they are actually quite important:

The linker will use these according to a CFI specification; CFI stands for call frame information, and here there is a full spec for it. It will be used, mostly, to unwind the stack.
LSDA on the other hand means language specific data area, and it will be used by the personality function to know which exceptions can be handled by this function

We'll be talking a lot more about CFI and LSDA in the next articles; don't forget about them, but for now let's move on:

    [...]
    call    _Z5raisev
    jmp .L8

Another easy one: just call "raise", and then jump to L8; L8 will return normally from this function. If raise didn't execute properly then the execution (somehow, we don't know how yet!) shouldn't resume in the next instruction but in the exception handlers (which in ABI-speak are called landing pads. More on that later).

    cmpl    $1, %edx
    je  .L5
.LEHB1:
    call    _Unwind_Resume
.LEHE1:
.L5:
    call    __cxa_begin_catch
    call    __cxa_end_catch

This is quite difficult to follow but it's actually quite straight forward. Here most of the magic will happen: first we check if this is an exception we can handle, if we can't then we say so by calling _Unwind_Resume, if it is then we call __cxa_begin_catch and __cxa_end_catch; after calling these functions the execution should resume normally and thus L8 will be executed (that is, L8 is right below our catch block):

.L8:
    leave
    .cfi_restore 5
    .cfi_def_cfa 4, 4
    ret
    .cfi_endproc

Just a normal return from our function... with some CFI stuff on it.

So this is it for exception catching, although we don't know yet how __cxa_begin/end_catch work, we have an idea that these pair forms what's called a landing pad, a place in the function to handle the raised exception. What we don't know yet is how the landing pads are found. _Unwind_ must somehow go through all the calls in the stack, check if any call (stack frame, to be precise) has a valid try block with a landing pad that can catch the exception, and then resume the execution there.

This is no small feat, and we'll see how that works next time.