Suppose you came across some code like this:
Probably you would think that it was odd that it was using an
teams; order doesn't seem to be important. All the operations are supported by
Set which is much more efficient if it does what's needed. So this could be replaced with:
Now we're using constant rather than linear time operations. This doesn't matter here, but if we had a million items in the collection it would make a big difference.
Here's another example of using an
Array when another type would work better:
In this case order is important because clearly the Yankees are the best team of the bunch, ever, in the world. So a
Set wouldn't work. But we can avoid scanning the array with
select by using a
Hash with the city name as the key:
We've replaced another linear time operation with a constant time operation.
This is more than a performance enhancement, though. It also communicates something to the next developer who looks at this code. If we use a
Set, we're communicating that order doesn't matter, and that we'll never expect to be able to index into this collection. If we use a
Hash, we're communicating that we expect lookups to happen a particular way.
So these are worthwhile changes. But in a largish codebase - or even a small one - how do we find these refactoring / performance opportunities? In the case of "replace
Set", for a given
Array, we need to ensure that we're not using any methods that aren't also available on
Set. So if we have an
Array that's using any of these methods, we can't swap them out:
It's not quite that simple though. Consider this code:
That usage of an
Array has an implicit order - best to worst - but we can't tell that by looking at the methods invoked on the object. There aren't any giveaway usages of
at or whatever.
For a real-world example of this I poked around the sass gem (v3.4.13) that I'm using in a project. I looked at a bunch of different array usages, but for most of them it seemed like they were relying on the array being an ordered sequence, not just a collection. The one instance that seemed like it might be a candidate was the
watched_paths array, since only unique directories are desirable there (per the
remove_redundant_directories method) and order doesn't appear to be important. But even if so, what are we talking about here - maybe a dozen or so directories? So not a big savings.
When I first started thinking about this I considered writing a runtime utility to watch usage of
Array instances and suggest replacing with other types. It'd be something like pippi except focused on type usage, not method call sequences. I'm still noodling on it, but as things stand I think it would result in too many false positives.
Maybe it would be more useful on gems than Rails apps? I feel like there's not a lot of upside there, though. When I think about the gems I use day to day -
nokogiri, etc - most of the time I'm using small collections where iterating over a few dozen items just isn't a big deal. Or I'm using some API wrapper client where 99% of the time is spent waiting for data on a socket. The code might be more clear if a specific type were used, but for the most part, no big deal.
In conclusion, I don't think there's a useful tool to be written here. But comments welcome on the twitters.