Ruby static code analysis

25 Sep 2013

Update: Thanks to Rein Henrichs for pointing out this example only works on Array, not Enumerable, fixed. Bozhidar Batsov mentioned another static analysis tool, rubocop which uses Peter Zotov's "parser" gem to get an AST. Rubocop has really nice output, definitely worth a look.

There have been a bunch of attempts to do static code analysis on Ruby programs. Some have gotten pretty impressive results. I'm thinking stuff like Michael Edgar's laser, or ruby-lint by Yorick Peterse, or Xavier Shay's cane.

One big advantage of those tools is that you can point them at pretty much any Ruby project and get some useful results. And you don't have to do a lot of PATH setup or whatever; you just give them a glob and away you go. I ran Laser on a project and it found a bunch of miscellaneous issues (1). Same ease of use goes for cane; I pointed it at a directory and it found some code that violated its built-in complexity metrics.

I think a harder challenge for Ruby static analysis tools is finding stuff like this:

def foo(bar)
  # could be simplified to use some_array.detect {|x| x > 1}
  bar.select {|x| x > 1}.first
end

That example could be found with grep -rn "select.*first" (although that would also return false positives), but I'm not aware of a static code analysis tool that calls it out. Or what if the result of the select gets stored for a bit, even just in a local variable:

def foo(bar)
  baz = bar.select {|x| x > 1}
  # ...
  # some other code that does nothing with baz
  # ...
  blurg = baz.first
end

Or especially if bar is an instance of something other than an Array; now we can't do that simplification but it's hard to tell that and our grep from earlier still incorrectly flags this:

class Bar
  def select
    # kind of nonsense, but you get the idea
    yield [1,2,3]
  end
end
Bar.new.select {|x| x > 1 }.first

Those first few example are challenging because they involve data flow analysis. The last example is hard because there'd be a false positive if the tool didn't know the type of the message receiver.

I know it's not a fair comparison, but take a look at Findbug's bug patterns. "Don't use removeAll to clear a collection", "Class is not derived from an Exception, even though it is named as such", "Vacuous call to collections", "Load of known null value", and so forth. Some of the other items there are irrelevant to Ruby programmers (unnecessary casts and whatnot), but there's a lot of good stuff too. Of course, for Findbugs to do all that it needs to be pointed at the project's class files where the types are statically defined, it needs to be able to load all the dependent jar files, etc. In other words, it uses a lot of information that Ruby only has at runtime. More specifically, that's information that Ruby has at the time the methods are being invoked.

In the grand scheme of things, this "hey don't do select followed by first" sort of thing isn't a big deal. They just aren't tremendously compelling. But still, they'd be nice to catch.

(1) Laser is especially impressive as it can fix some of the problems that it finds. For example, it'll go through the code and replace double-quoted strings that aren't using interpolation with single quoted strings.