String method use and misuse

10 Feb 2015

Consider this little code snippet:

[1,2,3].select {|x| x > 1 }.first

This should be replaced by [1,2,3].detect {|x| x > 1 }; that's what pippi is for. Pippi also finds a bunch of variants - select followed by any?, etc.

But after writing a couple rules I ran out of ideas on method sequences to detect. There's an infinite number of method sequence possibilities, of course, but most of them are nonsense, and pippi is slow enough as it is without filling it with checks that probably won't find anything.

So I wanted to figure out which sequences of method calls were happening. This isn't looking for just any method sequence, though, it's focusing on method invocations on objects where that invocation returns that same type. So Array#select is interesting, because it returns another Array, but Array#first is not because the array could be holding a variety of types. After a bit of flailing around I came up with a gizmo to find such sequences and ran it (e.g., USE_PIPPI=true PIPPI_CHECKSET=research bundle exec rake test:units | grep occurred) on some code that was lying around, producing:

select followed by each occurred 34 times
select followed by map occurred 20 times
select followed by size occurred 13 times
select followed by empty? occurred 10 times
select followed by any? occurred 7 times
select followed by first occurred 7 times
select followed by sort_by occurred 4 times
select followed by reject occurred 3 times
select followed by select occurred 3 times
select followed by count occurred 3 times
select followed by last occurred 3 times
select followed by detect occurred 2 times
select followed by length occurred 2 times
select followed by group_by occurred 2 times
select followed by none? occurred 2 times
select followed by to_a occurred 1 times
select followed by each_with_index occurred 1 times
select followed by slice occurred 1 times
select followed by inject occurred 1 times
select followed by all? occurred 1 times
select followed by collect occurred 1 times
select followed by map! occurred 1 times
select followed by join occurred 1 times

The grep is there because the utility prints out the location of each hit so you can go take a look at it, and I wanted to filter those out here. I'm printing out a frequency map, but that's not to say that low numbers on the histogram aren't be interesting; if a Pippi rule only finds one hit in a 10KLOC codebase it's still a good check if it actually found something to fix. But while we're accumulating data we may as well get a feel for frequency.

There's some good stuff there - for example, that select followed by all? should probably be stamped out - but we've already got a bunch of checks around Array#select. How about String#strip?

strip followed by empty? occurred 15 times
strip followed by downcase occurred 10 times
strip followed by split occurred 7 times
strip followed by strip occurred 4 times
strip followed by gsub occurred 4 times
strip followed by encode occurred 3 times
strip followed by length occurred 3 times
strip followed by encoding occurred 3 times
strip followed by force_encoding occurred 3 times
strip followed by index occurred 3 times
strip followed by count occurred 2 times
strip followed by size occurred 2 times
strip followed by ascii_only? occurred 2 times
strip followed by unpack occurred 1 times
strip followed by strip! occurred 1 times
strip followed by gsub! occurred 1 times
strip followed by match occurred 1 times
strip followed by tr occurred 1 times
strip followed by dump occurred 1 times
strip followed by b occurred 1 times
strip followed by delete occurred 1 times
strip followed by encode! occurred 1 times

Definitely some good fodder for rules there. Why would you ever call strip on a String that you got as a result of calling String#strip? Maybe strip followed by empty? could be replaced with a regex check like if foo.match(/^\s*$/). And you can imagine some other possibilities when seeing that list.

To repeat the warning in the pippi README, not all of these issues are fixable. A method might call strip on a String and return it, and the caller might then invoke strip on that same object because a different code path would bring an un-strip'd String to that same location. I don't think there's a good way around that short of a refinement or some such that would tell pippi to ignore that hit, and I'm not sure that'd be worth the trouble.

Anyhow, should be good check potentials here. We'll see!