Tuesday, February 2, 2010

Awktimization

If you're doing a lot of heavy text crunching with Awk, you've probably found that it can get quite slow very quickly. It always seems to happen whether you program well or not. Unfortunately for us, Awk isn't a very high efficiency tool. But that's not going to stop us. Not today.

You see, the GNU implementation of Awk, gawk, has a built-in profiler that you can use in order to analyze and optimize your Awk scripts. Its beautiful simplicity will cast light on those sinister bottlenecks destroying your performance. To use it, simply call pgawk from the command line on any machine with the GNU Awk installed. (Alternatively, you can pass the --profile option to gawk.) The runtime will execute your script and write the profiler results to a file by default named awkprof.out. This file contains the execution counts for each statement executed in the left margin.

Optimizing the now elucidated bottlenecks in your program should give you a considerable performance boost. However, if after improving on your runtimes, you still find that your program isn't zippy enough, you can put a little bit more spring in its step by compiling it to ANSI C.

That's right, the open-source Awka project takes plain old Awk code and transforms it into a nitroglycerin-breathing, high-performance rocket car written in C. Nested loops, arrays, and user functions benefit the most from the conversion. The final product will typically see performance gains of around 50%, but depending on the code, you could see 300-400% speed boosts.

So if you find that your Awk text crunching just isn't quite up to par, maybe it's time to consider grinding that puppy through a profiler. And if that still isn't enough, give it the Total Language Makeover with a conversion to C. And if you have a problem that's still taking millennia to compute, well, maybe you should contemplate asking a simpler question. :)

1 comment:

  1. Also, a debugger and byte-code interpreter have been announced for gawk: http://groups.google.com/group/comp.lang.awk/browse_thread/thread/c5ffd6341e781f50#

    ReplyDelete