TracePoint and Array size

05 Apr 2017

I started this post in October 2013 (!) but never finished it. But it seems like something other folks might find interesting, so, here goes.

I was fiddling with Tracepoint and I could see an event for Array#select:

ruby -e "TracePoint.trace(:c_call,:call){|t| p t.method_id};[].select{|x|x}"
:select

Incidentally, did you know that calling Array#select without a block returns an Enumerator?

irb(main):001:0> [1,2].select
=> #<Enumerator: [1, 2]:select>
irb(main):002:0> [1,2].select.each {|x| puts x } ; nil
1
2

I didn't. Anyhow, a TracePoint event doesn't get fired for #size or #length, but one does for #count:

ruby -e "TracePoint.trace(:c_call,:call){|t| p t.method_id};[].length;[].size;[].count"
:count

And send(:length) triggers one:

$ ruby -e "TracePoint.trace(:c_call, :call){|tp| p tp.method_id} ; [].send(:length)"
:length

Seems weird, since #length is a C function for which #size is an alias:

static VALUE
rb_ary_length(VALUE ary)
{
    long len = RARRAY_LEN(ary);
    return LONG2NUM(len);
}

/* ... and later on ... */

rb_define_method(rb_cArray, "length", rb_ary_length, 0);
rb_define_alias(rb_cArray,  "size", "length");

So why doesn't a plain old #size or #length invocation result in a tracepoint event? I asked Pat Shaughnessy about it on the twitters and he pointed out that it's because size is implemented - wait for it - as a YARV instruction! Sure enough, in insns.def, there's opt_size, and if I add a printf('heyo\n'); to the part of that definition that's handling arrays (via RBASIC_CLASS(recv) == rb_cArray) and run ruby -e "[].size", there appears a flurry of heyos.

Kashap Kondamudi does a fine job of explaining the Ruby execution process so I won't repeat all that here. The short version, though, is that disassembling a size method invocation shows that it's using a specialized opt_size instruction:

$ ruby -rpp -e "pp RubyVM::InstructionSequence.new('[].size').disasm"
"== disasm: #<ISeq:<compiled>@<compiled>>================================\n" +
"0000 trace            1                                               (   1)\n" +
"0002 newarray         0\n" +
"0004 opt_size         <callinfo!mid:size, argc:0, ARGS_SIMPLE>, <callcache>\n" +
"0007 leave            \n"

whereas disassembling a select invocation shows that it's using a more general instruction:

$ ruby -rpp -e "pp RubyVM::InstructionSequence.new('[].select').disasm"
"== disasm: #<ISeq:<compiled>@<compiled>>================================\n" +
"0000 trace            1                                               (   1)\n" +
"0002 newarray         0\n" +
"0004 opt_send_without_block <callinfo!mid:select, argc:0, ARGS_SIMPLE>, <callcache>\n" +
"0007 leave            \n"

and the implementation for opt_send_without_block presumably is also firing tracepoint events. I poked around a bit and it seems like that instruction uses the CALL_METHOD macro which looks up the appropriate method and invokes it. That's not a great explanation because I didn't follow the flow there; maybe someone on the internet will fill in the missing pieces for me here.

Anyhow, takeaway is that Pat is a smart fella and everyone should buy a second copy of Ruby Under a Microscope, and also that when digging around in stuff like this, don't forget about the YARV instruction definitions!