Thursday, December 25, 2014

Hooking up Kafka to Storm with the KafkaSpout in Storm 0.9.3

I have recently published a code example for using the Kafka Spout that is now part of Storm 0.9.3. After wasting a day fighting with incompatible logging frameworks between Kafka and Storm, I found an example project in Scala that had the necessary exclusions to get the uber-jar to run in Storm cluster mode. I've converted that example to Java and Maven, rather than Scala and sbt, and considerably simplified the examples so you can focus on just one thing: getting Kafka hooked up to Storm using the KafkaSpout.

The code repo is here: https://github.com/quux00/kstorm

Sunday, October 19, 2014

Updated microbenchmarks for Java String tokenization

I've been focusing a lot on Java performance lately and ran across a Programmers StackExchange post asking the most efficient way to tokenize a string in Java. It struck a nerve with me because the most common way to do this is to use String#split. But the split method takes as its tokenization delimiter a string that gets compiled into a regular expression. Which seems nice when you need it, but likely to be rather inefficient if all you need is to split on a character, such as a space. Then I looked up the old StringTokenizer class which I used to use years and years ago. And its Javadoc has this interesting tidbit:

StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.

Strangely, however, the accepted answer for the StackExchange question links to a blog post that shows benchmarks claiming that StringTokenizer is faster than String#split. So why shouldn't I use StringTokenizer if I don't need a regex to split a string into fields?


/* ---[ Someone is wrong on the internet ]--- */

Someone is wrong on the internet

Well, that blog post needs to be updated for two reasons:

  1. it used Java 6, so we need to see whether things are different in Java 7 and 8
  2. microbenchmarks are hard to do correctly and the author used questionable means (for example - there is not much warm up time and his timings are awfully short)

The difficulty of doing accurate microbenchmarks has been greatly assuaged by the creation of the JMH benchmark tool. If you haven't adopted it yet, put it at the top your technical TODO list. The documentation is excellent and it comes with many useful samples.


/* ---[ Splitting a Java string into tokens ]--- */

Of the techniques used in the original post, I am interested in four candidates:

  1. tokenize the string yourself with String#indexOf
  2. use String#split
  3. use the almost deprecated StringTokenizer
  4. use Guava's Splitter

Here are the reported outcomes of the earlier measurements for these 4 options:

IndexOf: 50ms
StringTokenizer: 89ms
GuavaSplit: 109ms
Split: 366ms

I reran the author's exact code on my system using Java 7 and got the following:

IndexOf: 52ms
StringTokenizer: 104ms
GuavaSplit: 119ms
Split: 145ms

So using his benchmark technique, I got the same ordering, though the ratios have changed a bit. In Java 7, split is only 3 times slower than index, rather than 7 times slower.


/* ---[ What does JMH say? ]--- */

I mildly rewrote these benchmarks using JMH. I also made longer strings to tokenize, rather than the two token string used by the blog post author, as that seemed a more realistic data set. And I used three strings, instead of one:

String[] lines = new String[]{
    "0000 12 34 5 666 77 88888 99",
    "aa bb cc dd ee ff gg hh ii jj kk ll mm nn oo pp qq rr ss tt uu vv ww xx yy zz",
    "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod"
};

The JMH based benchmark code is listed below. The results are rather interesting. In the original benchmarks raw time is measure, so lower is better. With JMH, I measured throughput (the default metric) in operations/second, so higher is better:

With Java 7 (1.7.0_60-b19) the results were:

m.JavaStringSplits.indexOf           thrpt    10  840.147 ± 28.553 ops/ms
m.JavaStringSplits.javaSplit         thrpt    10  694.870 ±  3.858 ops/ms
m.JavaStringSplits.stringTokenizer   thrpt    10  473.812 ±  8.422 ops/ms
m.JavaStringSplits.guavaSplit        thrpt    10  392.759 ±  8.707 ops/ms

And with Java 8 (1.8.0_20-ea):

m.JavaStringSplits.indexOf           thrpt    10  827.136 ± 30.831 ops/ms
m.JavaStringSplits.javaSplit         thrpt    10  700.809 ±  7.215 ops/ms
m.JavaStringSplits.stringTokenizer   thrpt    10  480.788 ± 16.793 ops/ms
m.JavaStringSplits.guavaSplit        thrpt    10  386.398 ±  5.323 ops/ms

With the presumably more robust and accurate JMH model of measuring microbenchmarks, String#split moved up the ranks and handily outperforms both StringTokenizer and Guava Splitter.

My hand-coded splitter using indexOf was about 18% "faster" (higher throughput) than Java#split, so we do see some overhead in Java#split, but for most scenarios I conclude that the Javadoc for StringTokenizer is correct: use String#split for most scenarios unless you know that 1) you don't need regular expressions to tokenize and 2) string splitting is your bottleneck.

A follow-up set of tests that could be done would be on very large strings. In his Letter the Caller Choose Mechanical Sympathy post from a few years ago, Martin Thompson advocated devising an API where you pass in the datastructure to be populated during the tokenization, rather than relying one to be created in the underlying tokenization method and passed back to you. He states (but does not demonstrate):

In the Java world this idiom becomes critical to performance when large arrays are involved.


/* ---[ Find the flaw ]--- */

One of my favorite interview questions is to put a piece of code in front of someone and say "find the flaws". And Aleksey Shipilëv, the author of JMH, states:

Your [JMH] benchmarks should be peer-reviewed.

Mine have not been, so I throw it out to the internet to find the flaws in my quick analysis.


/* ---[ The code and setup ]--- */


Benchmarks were performed on:

Ubuntu (Xubuntu) 14.04 LTS with 3.13.0-37 Linux kernel with 8 Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz cores.

Monday, March 17, 2014

Rust namespaces by example


/* ---[ Cheat sheet for Rust namespaces ]--- */

Rust namespaces can be a little mind-bending. This brief blog post is meant to provide instruction by example on setting up an application or library of Rust code that uses multiple files in a hierarchical namespace.

Though the current docs-project-structure guide on the Rust wiki is pretty sparse, you should start there. Then read the section on Crates and Modules in the tutorial.

I used those references plus an examination of how files and namespaces in the libstd code tree are structured to come up with this example.

In this simple example, I want to have an application where things are namespace to the top level module abc. I want to have a couple of files (namespaces) under abc, some additional directories (namespaces) below abc, such as abc::xyz and have modules in the abc::xyz namespace. Furthermore, I want all those namespaces to be able to refer to each - both down the chain and up the chain.

Here is a simple example that illustrates how to do it. I am using Rust-0.10pre (built 16-Mar-2014).

First, I have a project I called "space-manatee", under which I have a src directory and then my code hierarchy starts:

quux00:~/rustlang/space-manatee/src$ tree
.
├── abc
│   ├── mod.rs
│   ├── thing.rs
│   ├── wibble.rs
│   └── xyz
│       ├── mod.rs
│       └── waldo.rs
└── main.rs

2 directories, 6 files

To provide a namespace foo in Rust, you can either create a file called foo.rs or a dir/file combo of foo/mod.rs. The content of my abc/mod.rs is:

quux00:~/rustlang/space-manatee/src$ cat abc/mod.rs 
pub mod thing;
pub mod wibble;
pub mod xyz;

All this module does is export other modules in the same directory. It could have additional code in it - functions and data structures, but I elected not to do that.



xyz is a directory, and since I created the xyz/mod.rs dir/file combo, it is a namespace that can be used and exported.

Let's look into the other files:

quux00:~/rustlang/space-manatee/src$ cat abc/thing.rs 
extern crate time;

use time::Tm;

pub struct Thing1 {
    name: ~str,
    when: time::Tm,
}

pub fn new_thing1(name: ~str) -> ~Thing1 {
    ~Thing1{name: name, when: time::now()}
}

thing.rs pulls in the rustlang time crate and then defines a struct and constructor for it. It doesn't reference any other space-manatee code.

quux00:~/rustlang/space-manatee/src$ cat abc/wibble.rs 
use abc::thing;
use abc::thing::Thing1;

pub struct Wibble {
    mything: ~Thing1
}

pub fn make_wibble() -> ~Wibble {
    ~Wibble{mything: thing::new_thing1(~"cthulu")}
}

wibble.rs, however, does reference other space-manatee projects, so it uses the fully qualified namespace from the top of the hierarchy, but it does not have to explicitly "import" anything. It can find the thing namespace without a mod declaration because thing.rs is in the same directory.



OK, let's look into the xyz directory now.

quux00:~/rustlang/space-manatee/src$ cat abc/xyz/mod.rs 
pub mod waldo;

That just exports the waldo namespace in the same directory. What's in waldo?

quux00:~/rustlang/space-manatee/src$ cat abc/xyz/waldo.rs 
use abc::wibble::Wibble;

pub struct Waldo {
    magic_number: int,
    w: ~Wibble
}

The Waldo struct references the Wibble struct that is higher than it in the hierarchy. Notice there is no "import" via a mod statement - apparently going up the hierarchy requires no import.

So that's the supporting cast. Let's see how the main.rs program uses them:

quux00:~/rustlang/space-manatee/src$ cat main.rs 
extern crate time;

use abc::{thing,wibble};
use abc::thing::Thing1;
use abc::wibble::Wibble;
use abc::xyz::waldo::Waldo;

pub mod abc;

fn main() {
    let tg: ~Thing1 = thing::new_thing1(~"fred");
    println!("{}", tg.name);

    let wb: ~Wibble = wibble::make_wibble();
    println!("{}", wb.mything.name);

    let wdo = Waldo{magic_number: 42,
                    w: wibble::make_wibble()};
    println!("{:?}", wdo);
}

The only mod "import" main.rs had to do is of the abc namespace - which is in the same directory as main.rs. In fact, that is all you can import. If you try mod abc::thing the compiler will tell you that you aren't doing it right.

By importing abc, you are importing abc/mod.rs. Go back up and look at what abc/mod.rs does - it imports other modules, which in turn import other modules, so they all end up being imported into main.rs as addressable entities.



Once all those import references are set up, nothing special has to be done to compile and run it:

quux00:~/rustlang/space-manatee/src$ rustc main.rs
quux00:~/rustlang/space-manatee/src$ ./main
fred
cthulu
abc::xyz::waldo::Waldo{
  magic_number: 42,
  w: ~abc::wibble::Wibble{
    mything: ~abc::thing::Thing1{
      name: ~"cthulu",
      when: time::Tm{tm_sec: 21i32,
        tm_min: 26i32, tm_hour: 21i32, tm_mday: 17i32, tm_mon: 2i32,
        tm_year: 114i32, tm_wday: 1i32, tm_yday: 75i32, tm_isdst: 1i32,
        tm_gmtoff: -14400i32, tm_zone: ~"EDT", tm_nsec: 663891679i32
      }
    }
  }
}

(I hand formatted the Waldo output for easier reading.)

Sunday, March 16, 2014

Select over multiple Rust Channels


/* ---[ Channels in Rust ]--- */

The channel paradigm for CSP-based concurrency has received a lot of attention lately since it is the foundational concurrency paradigm in Go and Clojure has embraced it with core.async. It turns out that Rust, the new language from Mozilla, also fully embraces channel-based message passing concurrency.

Both Go and Clojure's core.async have a select operation that allows your code to wait on multiple channels and respond to the first one that is ready. This is based, at least conceptually, on the Unix select system call that monitors multiple file descriptors.

Rust also has a select operation. And it has a select! macro to make using it easier. Here's an example:

use std::io::Timer;

fn use_select_macro() {
    let (ch, pt): (Sender<~str>, Receiver<~str>) = channel();

    let mut timer = Timer::new().unwrap();
    let timeout = timer.oneshot(1000);
    select! (
        s = pt.recv() => println!("{:s}", s),
        () = timeout.recv() => println!("timed out!")
    );
}

Channels and Ports are now called Senders and Receivers in Rust. As with select in Go, if the Receiver called pt has a message come in before the 1 second timer goes off, its code block will execute. Otherwise, the timer's Receiver will be read from and its code block executed, printing "timed out".

Note that the select! macro uses parens, like a function call, not curly braces like a code block.

The select! macro is currently labelled experimental, since it has some limitations. One I hit this week is that it will fail (as in, not compile) if you embed the Receiver in a struct:


fn does_not_compile() {
    let (ch, pt): (Sender<~str>, Receiver<~str>) = channel();
    let a = A{c: ch, p: pt};

    let mut timer = Timer::new().unwrap();
    let timeout = timer.oneshot(1000);
    select! (
        s = a.p.recv() => println!("{:s}", s), 
        () = timeout.recv() => println!("time out!")
    );
}

This fails with error: no rules expected the token '.'. I've filed an issue for it here: https://github.com/mozilla/rust/issues/12902#issuecomment-37714663


/* ---[ Using the Rust Channel Select API ]--- */

The workaround is to use the Select API directly. Here's how you do it:


use std::comm::Select;
use std::io::Timer;

fn select_from_struct() {
    let (ch, pt): (Sender<~str>, Receiver<~str>) = channel();    
    let mut timer = Timer::new().unwrap();
    let timeout = timer.oneshot(1000);

    let a = A{c: ch, p: pt};

    let sel = Select::new();
    let mut pt = sel.handle(&a.p);
    let mut timeout = sel.handle(&timeout);
    unsafe { pt.add(); timeout.add(); }
    let ret = sel.wait();

    if ret == pt.id() {
        let s = pt.recv();
        println!("ss: {:?}", s);
    } else if ret == timeout.id() {
        let () = timeout.recv();
        println!("TIMEOUT!!");
    }
}

It's a little more code, but fairly straightforward to follow. You wrap your Receivers in a select Handle and them add them add them to the Receiver set via the add method (which must be wrapped in an unsafe block). Each handle gets an id so you can discover which one returned first.

Finally you wait. When the wait returns you check the return id and execute the appropriate block of code, which starts by call recv on the Receiver to get the incoming value (if any).

Sunday, January 26, 2014

Debugging Rust with gdb

[Update]: This blog post has been updated to Rust 0.11 as of mid-April 2014. The previous version was for Rust 0.9.

This blog entry outlines the current state of my understanding of how to use the gdb debugger for Rust programs. As of this writing, I'm using:

$ rustc -v
rustc 0.11-pre-nightly (e332287 2014-04-16 00:56:30 -0700)
host: x86_64-unknown-linux-gnu

$ gdb --version
GNU gdb (GDB) 7.6.1-ubuntu

A caveat before we begin: I'm not an expert in gdb and I'm (now less of) a newbie at Rust. I'm cataloging my findings for both my own future reference and to help anyone who needs to see an example in action. I'd welcome feedback on tips for better ways to do this.


/* ---[ GDB ]--- */

I won't give a gdb tutorial. There are lots of good ones on the web, such as these:


/* ---[ The setup ]--- */

The example I'll use is modified from an example from cmr in this Rust issue: https://github.com/mozilla/rust/issues/10350. I've chosen this one because it shows jumping into both a regular (named) function and an anonymous closure.

The codebase consists of two files, both in the same directory:

quux.rs:

pub fn quux00(x: || -> int) -> int {
    println!("DEBUG 123");
    x()
}

and bar.rs

extern crate quux;

fn main() {
    let mut y = 2;
    {
        let x = || {
            7 + y
        };
        let retval = quux::quux00(x);
        println!("retval: {:?}", retval);
    }
    y = 5;
    println!("y     : {:?}", y);
}

To compile them with debugging info included use the -g switch (this changed since version 0.9):

$ rustc -g --crate-type lib quux.rs
$ rustc -g -L . bar.rs

Start the bar program in gdb and let's set some breakpoints:

$ gdb bar
(gdb) break bar::main
Breakpoint 1 at 0x4042c0: file bar.rs, line 3.
(gdb) rbreak quux00
Breakpoint 2 at 0x43e760: file quux.rs, line 1.
int quux::quux00(int ())(int ());
(gdb) info breakpoints
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00000000004042c0 in bar::main at bar.rs:3
2       breakpoint     keep y   0x000000000043e760 in quux::quux00 at quux.rs:1

For the first breakpoint, I know the full path, so I just use break.

Sometimes correct full paths in Rust can be tricky, especially for functions that are parametized, so it can be easier to just use rbreak, which takes a regular expression. All functions that match the regex will have a breakpoint set.

The second breakpoint does have the word "quux00" in it, but it's been mangled. There's a way to change that in Rust, but let's move on for the moment.


/* ---[ Aside: rbreak ]--- */

In my playing around so far, I've been unable to use break to set a breakpoint for a function not in the file with the main method, so rbreak has been very useful.

The rbreak command is pretty powerful. According to the documentation, if you want to set a breakpoint for all functions, rbreak . will do the trick. You don't want to do this for a rust application, because there are hundreds of functions that get compiled in.

Instead, you'll want to limit the scope of the regex search by limiting the search to a single file with:

(gdb) rbreak bar.rs:.

But again I've only gotten this to work for the "main" file. If I type rbreak quux.rs:. it doesn't know what I'm talking about. Something for future research.


/* ---[ Let's debug ]--- */

So now we've got two breakpoints set, as indicated by the output from info breakpoints.

Let's start the debugging session:

(gdb) run
Starting program: /home/midpeter444/lang/rust/sandbox/debug/./bar
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
... deleted a bunch of info about threads starting up ...
Breakpoint 1, bar::main () at bar.rs:3
3    fn main() {
(gdb) n
4        let mut y = 2;
(gdb) n
9            let retval = quux::quux00(x);
(gdb) list
4        let mut y = 2;
5        {
6            let x = || {
7                7 + y
8            };
9            let retval = quux::quux00(x);
10            println!("retval: {:?}", retval);
11        }
12        y = 5;
13        println!("y     : {:?}", y);
(gdb) p y
$1 = 2
(gdb) p x
$2 = {int ()} 0x7fffffffd4f8

Interesting that it seemed to skip lines 5-8 when I single stepped with n. But the x value was captured as an unnamed function, as you can see on the last line of the readout.

Now we are on line 8 (confirmed with the frame command below), so let's continue - our breakpoint on the quux00 function should now be tripped:

(gdb) frame
#0  bar::main () at bar.rs:9
9            let retval = quux::quux00(x);
(gdb) c
Continuing.

Breakpoint 2, quux::quux00 (x={int ()} 0x7fffffffd500) at quux.rs:1
1    pub fn quux00(x: || -> int) -> int {

Yes, it was tripped. Let's look around and single-step through it:

(gdb) frame
#0  quux::quux00 (x={int ()} 0x7fffffffd500) at quux.rs:1
1    pub fn quux00(x: || -> int) -> int {
(gdb) list
1    pub fn quux00(x: || -> int) -> int {
2        println!("DEBUG 123");
3        x()
4    }
(gdb) s
2        println!("DEBUG 123");
(gdb) p x
$4 = {int ()} 0x7fffffffd390

OK, we're inside the quux00 method. We stepped over the first instruction (the println) and inspected the x param, which is our anonymous Rust closure. Let's continue by stepping into the closure and see if that works:

(gdb) n
DEBUG 123
3        x()
(gdb) s
fn1356 () at bar.rs:6
6            let x = || {
(gdb) n
7                7 + y
(gdb) p y
$5 = 2
(gdb) n
bar::main () at bar.rs:10
10            println!("retval: {:?}", retval);

Excellent. That worked and we even had line numbers. Now we are back in the outer main fn. BTW, note that the "anonymous" closure has a name: fn1356 - remember that name, we'll come back to it later.

It's an easy walk to the finish line from here:

(gdb) list
5        {
6            let x = || {
7                7 + y
8            };
9            let retval = quux::quux00(x);
10            println!("retval: {:?}", retval);
11        }
12        y = 5;
13        println!("y     : {:?}", y);
14    }
(gdb) p retval
$3 = 9
(gdb) n
2    
(gdb) p y
$4 = 2
(gdb) c
Continuing.
retval: 9
y     : 5
[Inferior 1 (process 7007) exited normally]


/* ---[ Round 2: Set breakpoints on all methods in main file ]--- */

Let's start over and set breakpoints on all the functions in the bar.rs file:

$ gdb bar
(gdb) rbreak bar.rs:.
Breakpoint 1 at 0x4042c0: file bar.rs, line 3.
static void bar::main();
Breakpoint 2 at 0x404520: file bar.rs, line 6.
static int fn1356()();

Aah, there's the name again: fn1356. So that's another way to set a breakpoint on your closures. If we redo the session, we'll see that the breakpoint now gets tripped in the closure as it's being executed (from within the quux00 method):

(gdb) r
Starting program: /home/midpeter444/lang/rust/sandbox/debug/bar 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, bar::main () at bar.rs:3
warning: Source file is more recent than executable.
3    fn main() {
(gdb) c
Continuing.
DEBUG 123

Breakpoint 2, fn1356 () at bar.rs:6
6            let x = || {
(gdb) frame
#0  fn1356 () at bar.rs:6
6            let x = || {
(gdb) n
7                7 + y
(gdb) p y
$1 = 2


/* ---[ Round 3: Demangle function names ]--- */

In the Rust 0.9 version of this post I showed how rust mangled the function names, but that seems to have gone away. You can still explicitly specify not to mangle function names like so:

#[no_mangle]
pub fn quux00(x: || -> int) -> int {
    println("DEBUG 123");
    x()
}

I haven't tried anything really complicated with Rust in gdb yet, but hopefully these examples serve to get you going.

Telephonic Whispers in Rust, revisited


/* ---[ Hitting the big time ]--- */

My previous blog entry was a bit of a trifle: it showed an implementation of an toy CSP (Communicating Sequential Process) application in Rust, comparing it to an example implementation in Go by Rob Pike. I discovered, post-hoc, that it hit the Rust subreddit and Hacker News. There was some good discussion there and a few complaints.


/* ---[ Clarifications ]--- */

First, the complaints. Of course this example was tuned for how Go works: the example was created by Rob Pike. The point of his example, and my blog post, was that goroutines (Rust tasks) are lightweight (green threads) that are multiplexed onto native OS threads. The example is meant to show that you can spawn up tens of thousands to hundreds of thousands of goroutines (Rust tasks) pretty rapidly. That's not something you can do in Java, for example: I've tried on my Linux system and it cried uncle somewhere between 20,000 and 30,000 threads and hosed my system.

The Rust guide on Tasks says:

Rust can create hundreds of thousands of concurrent tasks
on a typical 32-bit system

So remembering the Pike example, I thought I'd try it with Rust. My primary intent was just to show that I figured out how to implement it in Rust and that it worked. When you are learning Rust, getting something to compile is an achievement worth celebrating. The Rust compiler is wonderfully strict - enforcing all its rules in order to provide deterministic, safe memory management with static analysis rather than requiring a managed runtime. (But it can provide a managed runtime with Garbage Collection if you want that model, which is one of things that makes Rust so intriguing.)

The simple timings in my earlier blog post were included only because the numbers were interestingly different. The reasons for that difference was the most interesting part of the community discussion and I certainly learned a few things I didn't know about Rust, so, despite the complaints, I'm glad I included it. There wasn't much point in doing formal extensive benchmarks, as some people suggested, because, as the other posters pointed out, the example is trifling and not something you would typically do.

Second, I called the thing "Chinese Whispers" only because that's what the esteemed Rob Pike called it and I was clearly doing a port of his work in the presentation I cited. The name makes no sense to me, but since there were charges of racism and suggestions of calling it "Broken Telephone", I'll now call it "Telephonic Whispers", since I'm not exactly sure what is "broken" about chaining a series of channels together and passing a message from one end to the other. Seems connected to me.

Third, I updated the blog with a few improvements. I shouldn't have described channels as "first class" in Rust, since that label seems to be reserved for things are a part of the core of a language. Rust channels are not built into the language in the way they are in Go. My point was that while these are in the std library and not in the core language, they are a central aspect of Rust's approach to concurrency. They are just as at hand out of the box as Go's channels are, unlike a language like Java where you have to construct them yourself from other concurrency constructs, such as a SynchronousQueue or a TransferQueue.


/* ---[ The interesting follow up: segmented stacks ]--- */

Patrick Walton profiled the Rust code and found that most of the time was spent allocating stacks. Go currently (I'm using version 1.1.2) uses something called "segmented stacks" rather than larger contiguous stacks. In the segmented stack model, you create small stacks for a thread/task/routine and when it needs more space you create another segment and keep track of them in some data structure, such as a linked list.

Since most of the work of the "Telephonic Whispers" example is spent creating a chain of 100,001 channels, it makes sense that systems that build larger stacks for each routine/task will take longer.

An advantage of a segmented stack model is that it allows you to quickly spawn up tasks and if they never need a lot of stack space, less memory is wasted. If the task needs more later, then it gets more on demand.

Rust used to have segmented stacks, but they were removed late in 2013, as announced here by Brian Anderson: https://mail.mozilla.org/pipermail/rust-dev/2013-November/006314.html. The Rust team found that the apparent advantages of compact memory caused performance problems and had compatibility issues with integrating with foreign code (typically C) and with LLVM libc optimizations.

And now, as pointed out in the "Chinese Whispers" community discussions, it looks like Go will be moving away from segmented stacks too. The primary issue they cite is what Brian Anderson above calls "stack thrashing": when you have heavy looping and that code is continually hitting a stack boundary, causing stack allocations to happen very frequently, thus significantly affecting performance.


/* ---[ Valid consideration: Number of native threads ]--- */

Another important point made in the "whispers" community discussion was about the underlying threading defaults: in Go version 1, only a single native OS thread is created even if you spawn multiple goroutines. To change that you have to manually set it via the runtime.GOMAXPROCS(int) function. So in the Pike example, only one thread is being used.

Rust v0.9 uses multiple threads by default and has to do synchronization between threads. So that might account for some of the slow down between the Go and Rust versions. But Walton's profiling showed that most of the time is the stack allocation, so this consideration is probably secondary.

I did a small experiment on this front by setting GOMAXPROCS to 8 (the number of CPUs on my system):

Then I did a small simple benchmark where I looped 50 times running the Go "whispers" code either with the default GOMAXPROCS setting or GOMAXPROCS=NumCPU (8). I ran each three times and saw consistent results. Here are two representative runs:

GOMAXPROCS = default:

$ time ./whispers
100001

real    0m5.569s
user    0m5.403s
sys     0m0.168s

GOMAXPROCS = NumCPU (8)

$ time ./whispers
100001

real    0m10.801s
user    0m20.413s
sys     0m6.007s

Using more CPUs causes the Go version to run about twice as slow. This would add support to the notion that this is part of the timing difference between Go and Rust for this example is due to the default number of threads that are spun up.


/* ---[ Final thoughts ]--- */

To conclude, I want to state something that is obvious by now and was also discussed the community discussion: if you go to the effort of spawning up lots of tasks in Rust or goroutines in Go, design your application in such a way that they do some amount of significant work. Spawning more tasks or routines can have overhead that you may not need to pay. As strncat says in the Reddit discussion, Rust does not have true coroutines. That doesn't mean you can't use tasks that way, but you'll need to be judicious about how you use them (and how many).

One example I'd like to implement in Rust is another port of some Rob Pike Go code: Lexical scanning and parsing use Goroutines: https://www.youtube.com/watch?v=HxaD_trXwRE

Sunday, January 19, 2014

Telephonic Whispers in Rust


/* ---[ CSP: an idea whose time has come ]--- */

Of late, the Go language has popularized the CSP (Communicating Sequential Processes) style of programming. Rob Pike gives a great talk on Go's implementation of CSP with Go channels and goroutines in this presentation: Google I/O 2012 - Go Concurrency Patterns.

After watching that presentation, I myself got inspired and wrote a Go-inspired CSP library in Clojure. Later, Rich Hickey and team came and built an even better version, core.async, that has gotten a lot of acclaim.

Recently, I've been learning the Rust language and was intrigued and happy to see that the CSP model is also embraced front-and-center by the Rust team. Like Go, the core primitive for message passing is a channel and Rust has lightweight routines (green threads), called tasks in Rust.

One of the first things I did when implementing a CSP library in Go was to port Pike's Go examples from this Google I/O talk to Clojure. Here I do that in Rust for the "Chinese Whispers" example.


/* ---[ Telephonic Whispers ]--- */

Pike shows this example in his 2012 Heroku presentation to show off the idea that you can spawn up a lot of lightweight goroutines and have it run efficiently. Since Rust tasks are lightweight, I wanted to test whether Rust would behave similarly.

In the Chinese Whispers example, a daisy-chain of go routines are formed and they communicate unidirectionally along a series of channels. Go routine A signals to B who signals to C, etc. The message passed along the way is an integer than is incremented on each hand off.

With the Go example, you can run a daisy-chain of 100,000 go routines very efficiently:

$ time ./chinese-whispers 
100001

real    0m0.322s
user    0m0.215s
sys     0m0.105s

Here's my code to do the same in Rust (v 0.9):

A key difference is that in the Rust model a single channel actually comes in two parts: the port end, from which you receive, and a channel end, onto which you send data. The Go model has a single entity, the channel, which can be used for both sending and receiving, though you can create write-only or read-only channel handles.

This separation makes the Rust code a little more intricate and verbose. I also need additional variables, such as the temp variables on lines 16 and 17, since Rust pointers can only be owned by one entity at a time and sending them off into a task (via spawn) means that you've permanently transferred ownership of that reference.

Here is the running speed of the Rust version on my system:

$ time bin/chinese-whispers 
100001

real    0m4.234s
user    0m16.942s
sys     0m3.331s

So Rust can do it just fine, but it is not as efficient as Go. The Go version is about 13 times faster, based on these timed runs on my system.




[Update]: This blog post was discussed on Hacker News and Reddit soon after I published it. I've written a follow-up post addressing some of the complaints and summarizing some of the really interesting points in those community discussions: http://thornydev.blogspot.com/2014/01/telephonic-whispers-in-rust-revisited.html

Wednesday, January 1, 2014

Debugging Go (golang) programs with gdb


/* ---[ Update: July 2015 ]--- */

At this point, I think the Go community has given up trying to make gdb work with Go programs. It is pretty painful, so I don't recommend it. Recently, I've tried both delve and godebug, two contributions from the Go community.. I had better luck with godebug. In fact, it performed perfectly for a recent issue I was having and was a joy to work with.



/* ---[ Debugging Go ]--- */

At the time of this writing, the only debugger I know of for the Go language is the FSF's gdb debugger. gdb can be used to debug programs written in Go and compiled with gccgo or 6g compilers.

At present, I'm using Go version 1.1.2 (on Xubuntu Linux). Do not upgrade to version 1.2 if you want to be able to use gdb. The 1.2 release introduced changes that breaks single stepping through code in gdb: http://code.google.com/p/go/issues/detail?id=6776.

As a side note, I find this situation pretty disappointing. It says that the Go developers are not including gdb compatibility tests in their testing of Go. That really isn't acceptable in my opinion if gdb is the only debugger tool available. Happily, the last entry in the comments from one of the Go maintainers/developers is "If possible, fix this for 1.2.1."


/* ---[ GDB ]--- */

I won't give a gdb tutorial. There are lots of good ones on the web, such as these:


/* ---[ Using GDB with Go ]--- */

I've written a unit test, using Go's testing library, for the code I'm debugging. Here's the code under test:

//
// Read in the username and password properties from the CONFIG_FILE file
// Returns error if CONFIG_FILE cannot be found or opened
//
func readDatabaseProperties() (uname, passw string, err error) {
    propsStr, err := ioutil.ReadFile(CONFIG_FILE)
    if err != nil {
        return
    }

    for _, line := range strings.Split(string(propsStr), "\n") {
        prop := strings.Split(line, "=")
        if len(prop) == 2 {
            switch prop[0] {
            case "username":
                uname = prop[1]
            case "password":
                passw = prop[1]
            }
        }
    }
    return
}

And here is the unit test for it:

func TestReadDatabaseProps(t *testing.T) {
    uname, passw, err := readDatabaseProperties()
    if err != nil {
        t.Errorf(fmt.Sprintf("%v", err))
    }
    if len(uname) == 0 {
        t.Errorf("uname is empty")
    }
    if len(passw) == 0 {
        t.Errorf("passw is empty")
    }
}

The CONFIG_FILE on my system has:

username = midpeter444
password = jiffylube

The unit test is currently failing:

$ go test -test.run="TestReadDatabaseProps"
--- FAIL: TestReadDatabaseProps (0.00 seconds)
    indexer_test.go:150: uname is empty
    indexer_test.go:153: passw is empty
FAIL
exit status 1
FAIL    fslocate    0.023s

So let's check it out in gdb. First, compile the go code with the following flags:

$ go test -c -gcflags '-N -l'

The -c flag causes the go compiler to generate an executable in the current directory called xxx.test, where xxx is the name of the directory your code is in. In my case, it generated fslocate.test.

Next ensure you have the GOROOT environment variable properly set and start gdb.

$ gdb fslocate.test -d $GOROOT

After doing this, if you see an error about /usr/local/go/src/pkg/runtime/runtime-gdb.py, see my side note #1 at the bottom of this posting.

The way we just fired up gdb will give you the command line prompt only. If you want to use it that way, I recommend using the frame command in order to keep track of where you are in the code as you single-step through it.

However, I like using the -tui option to gdb which will split the screen and give you a visual display of the code as you step through it.

$ gdb -tui fslocate.test -d $GOROOT

You get a screen like this. Don't worry about the assembly code - that will switch to your Go code once you get underway.

To proceed, I'm going to set a breakpoint at the start of the test. To do this you'll need to specify the path to the function you want to break on. The code is actually in package "main", but it is in the "fslocate" directory, so I set the breakpoint like this:

(gdb) b "fslocate.TestReadDatabaseProps"
Breakpoint 1 at 0x43c730: file /home/midpeter444/lang/go/projects/src/
fslocate/indexer_test.go, line 144.

If you get a message like:

Function "fslocate.TestReadDatabaseProps" not defined.
Make breakpoint pending on future shared library load? (y or n) 

You didn't get the path right. See side note #2 for some help.

Now that we have the breakpoint set, run the program to the breakpoint by typing run or r:

(gdb) r

Now you'll see that the TUI window jumps to the current line of code and highlights it:

Next advance line by line with n (or next) to step over to the next line where the function under test is called. Once there I step into it (s or step) and step over the next few lines until I'm here:

Now let's inspect some of our variables with print or p:

(gdb) p propsStr
$2 = {array = 0xc2000b3240 "username = midpeter444\npassword = jiffylube\n",
      len = 44, cap = 556}
(gdb) p err
$3 = {tab = 0x0, data = 0x0}
(gdb) p line
$4 = 0xc20009b7b0 "username = midpeter444"

The function ioutil.ReadFile returns a byte slice and error. Inspecting them shows the string value of the byte slice, its length and capactiy and that the error is nil (as indicated by its data value being 0x0).

So the file read worked. Then we read in the first line as a string and it looks good. Then I called:

prop := strings.Split(line, "=") 

to split the line into a string slice. Inspecting this slice shows us the contents of the slice "struct", but not the contents of the underlying slice array:

(gdb) p prop
$5 = {array = 0xc2000b1260, len = 2, cap = 2}

To peek into the array in gdb, we can use standard array indexing:

(gdb) p prop.array[0]
$20 = 0xc20009b7b0 "username "
(gdb) p prop.array[1]
$21 = 0xc20009b7ba " midpeter444"

or we can use the dereferencing operator * and @N to look at N contiguous portions of the array. The inspection so far told us that there are two entries in the array, so this is how to see all of the array entries in one command:

(gdb) p *prop.array@2
$22 = {0xc20009b7b0 "username ", 0xc20009b7ba " midpeter444"}

And with that I can see the defect in my code: I need to trim the string before comparing to the cases in my switch statement.


/* ---[ Looking into Go structs ]--- */

You can also use dot notation to look into Go structs, using dereferencing where you have pointers rather than values.

For example here's a snippet of code from my fslocate program:

fse := fsentry.E{"/var/log/hive/foo", "f", false}

// query that new entry
dbChan <- dbTask{QUERY, fse, replyChan}
reply = <-replyChan

fse is a struct of type fsentry.E:

type E struct {
    Path       string // full path for file or dir
    Typ        string // DIR_TYPE or FILE_TYPE
    IsTopLevel bool   // true = specified in the user's config/index file
}

reply is a struct of type dbReply

type dbReply struct {
    fsentries []fsentry.E
    err       error
}

dbChan is a go channel (that takes dbTasks, another struct).

For that snippet of code I can inspect the contents in gdb:

(gdb) p fse
$1 = {Path = 0x637690 "/var/log/hive/foo", Typ = 0x61fd20 "f",
      IsTopLevel = false}

(gdb) p *dbChan
$3 = {qcount = 1, dataqsiz = 10, elemsize = 56, closed = 0 '\000',
      elemalign = 8 '\b', elemalg = 0x5646e0, sendx = 1, recvx = 0,
      recvq = {first = 0x0, last = 0x0}, sendq = {first = 0x0,
      last = 0x0}, runtime.lock = {key = 0}}

I had to use * on dbChan since channels in go are pointers (references).

The reply struct is a little more tricky:

(gdb) p reply          
$9 = {fsentries = {array = 0xc20007e640, len = 1, cap = 4},
      err = {tab = 0x0, data = 0x0}}          

Here we see that it has two entries: fsentries and err. The error is null. fsentries is an array of length one, which using the techniques we walked through earlier can be inspected:

(gdb) p reply.fsentries.array[0]
$11 = {Path = 0xc2000cbfe0 "/var/log/hive/foo", Typ = 0xc2000005d8 "f",
       IsTopLevel = false}


/* ---[ Inspecting "slices" of Go arrays ]--- */

Suppose you have a large Go slice and you want to see elements 20 through 24 only. How would you do that? You can use the indexing operator and the @N operator together. Here's a slice of length three, where I look at the last two elements in the last command:

(gdb) p configDirs
$6 = {array = 0xc2000f8210, len = 3, cap = 3}

(gdb) p *configDirs.array@3
$7 = {0x618dc0 "/d/e", 0x627170 "/new/not/in/db", 0x626fd0 "/a/b/c/foo/bar"}

(gdb) p configDirs.array[1]@2
$8 = {0x627170 "/new/not/in/db", 0x626fd0 "/a/b/c/foo/bar"}

Hopefully, reading through a gdb tutorial and this post are enough to get you through your Go debugging sessions.


Side note 0: Key bindings with -tui

One tricky/annoying aspect of running with the TUI is that the arrow keys navigate around the TUI screen rather than on your command line. So to go back and forward in history (usually the up and down arrows) use the bash bindings Ctrl-p and Ctrl-n respectively. To go left and and right on the command line use the bash/emacs bindings Ctrl-f and Ctrl-b respectively.



Side note 1: python error when starting gdb

On Ubuntu, I get an error after gdb starts up stating:

File "/usr/local/go/src/pkg/runtime/runtime-gdb.py", line 358
print s, ptr['goid'], "%8s" % sts[long((ptr['status']))], blk.function

I've been able to ignore this and still use gdb with Go code just fine.

According to this thread, this is a python version issue used to build gdb and isn't an issue with the Go distribution. It may be specific to Ubuntu-flavored Linux distros. This posting in that thread says you can fix it with:

sudo 2to3 -w /usr/local/go/src/pkg/runtime/runtime-gdb.py

But when I did that I now get errors when I try to look inside a string, struct or array:

(gdb) p propsStr
$1 =  []uint8Python Exception <class 'TypeError'> 'gdb.Value' object cannot
      be interpreted as an integer:

So I recommend that you back up the /usr/local/go/src/pkg/runtime/runtime-gdb.py file before trying it in case you get the same error I did.



Side note 2: Function "fslocate.TestReadDatabaseProps" not defined.

If you get this type of error message when you set a breakpoint on a function:

Function "fslocate.TestReadDatabaseProps" not defined.    
Make breakpoint pending on future shared library load? (y or n) 

Then you either got the path wrong or the notation to the path wrong, so type "n" and try again. First check your spelling.

If that doesn't fix it, you might have a nested path. For example in the fslocate project, I have a stringset subdirectory:

midpeter444 ~/lang/go/projects/src/fslocate
$ tree stringset/
stringset/
├── stringset.go
└── stringset_test.go

In my case the fslocate directory is in my $GOPATH/src directory and the test file is in the fslocate/stringset directory. If I were to go into that directory and compile the test for gdb,
the breakpoint path would be:

(gdb) b "fslocate/stringset.TestReadDatabaseProps"

Notice that you uses slashes for package separators and dots to indicate a function in a package.