Exercism Strain: Collaborative implementation and benchmarking
This post covers an incremental implementation of the Exercism “strain” exercise that I worked on collaboratively with one of my colleagues at Ackama, Daniel Jauch.
The Strain exercise brief is to implement keep
and discard
functions within an Elixir module
named Strain
, without using the Enum.filter
module, for example:
iex> Strain.keep([1, 2, 3], fn e -> e < 10 end)
[1, 2, 3]
iex> Strain.keep([1, 2, 3], fn e -> e < 3 end)
[1, 2]
iex> Strain.discard([1, 2, 3], fn e -> e < 3 end)
[3]
First implementation: Use other Enum methods, just not filter
Our first stab at this was to assume that since we couldn’t use Enum/filter/2
, there were Enum
methods that could be used to get an equivalent result. Here is what we came up with:
defmodule Strain do
def keep_enum(list, fun) do
list
|> Enum.map(&(fun.(&1) && &1))
|> Enum.reject(&(!&1))
end
end
This worked, with all tests passing, so we submitted that to Exercism. Yay, done!
But then we took a look at some of the community solutions, and realised that our initial foray into Enum
was a
little misguided, since there was an Elixir feature we completely didn’t know about we could use instead.
Second implementation: Comprehensions
Elixir Comprehensions are a bit of
syntactic sugar over Enumerables that allow for some handy brevity when processing Enumerable data.
Comprehensions support a generator, and a filter. The generator is the first “argument” to the
comprehension and is required. The filter is an optional second argument, and can be used to exclude
non-matching values from the list. Is all this starting to sound exactly like how keep
and
discard
should work? We thought so as well!
defmodule Strain do
def keep(list, fun), do: for x <- list, fun.(x), do: x
end
Much, much simpler, and even more readable in my opinion, once we understood what the two arguments
passed to the comprehension did. The implementation of discard
is exactly the same, just the
inverse result from the check function - !fn.(x)
instead of fun.(x)
.
So, as an Exercism exercise, we were happy with this - from a 5 line function down to a single line.
Then we started wondering - if Enum.filter
is what we would normally use for this, and we have
come up with two different implementations of the exercise - what is the speed difference between
them? Only one way to find out!
Third implementation: Creating a mix project
To perform some benchmarking, we found a Hex package called benchee that looked like it would suit our needs nicely. Before we could add this dependency though, we needed to package our current code into a Mix project!
We started from the Exercism exercise folder:
ls *
strain.exs
strain_test.exs
README.md
We made a new Mix project in this folder:
mix new strain_bench
* creating README.md
* creating .formatter.exs
* creating .gitignore
* creating mix.exs
* creating config
* creating config/config.exs
* creating lib
* creating lib/strain_bench.ex
* creating test
* creating test/test_helper.exs
* creating test/strain_bench_test.exs
Your Mix project was created successfully.
You can use "mix" to compile it, test it, and more:
cd strain_bench
mix test
Run "mix help" for more commands.
And then we copied our exercise files into the project:
cp strain.exs lib/strain.exs
cp strain_test.exs test/strain_test.exs
We needed to make a few changes to our files:
- We needed to remove or comment out the first three lines of the test, since this contained some Exercism-specific helper code that tried to load the main Elixir file from a particular location
- We needed to rename our
strain.exs
in thelib
folder tostrain.ex
so that Mix would compile this file and it could be tested.
With these two steps done, we could check our tests still passed:
mix test
..............
Finished in 0.1 seconds
1 doctest, 13 tests, 0 failures
Great! Time to benchmark.
Final implementation: Benchmarking
First, we needed to add that dependency. Easily done - the benchee page on
Hex.pm has a dependency block to add to our mix.exs
# mix.exs
defp deps do
[{:benchee, "~> 0.9", only: :dev}]
end
Then we download the dependencies: mix deps.get
After that, we followed a Elixir School guide on using Benchee. First step - create our benchmark.
# benchmark.exs
list = %w(apple zebra banana zombies cherimoya zelot)
checker = &String.starts_with?(&1, "z")
Benchee.run(%{
"strain.comprehension" => fn -> Strain.keep(list, checker) end,
"strain.enum" => fn -> Strain.keep_enum(list, checker) end,
"enum.filter" => fn -> Enum.filter(list, checker) end
})
We had three implementations we wished to test:
strain.comprehension
- our one line comprehension function.strain.enum
- our Enum function (we renamed this tokeep_enum
so we could benchmark it from the same module)enum.filter
- the normalfilter
function built into theEnum
module.
To run these benchmarks, we simply needed to mix run benchmark.exs
, which benchmarks each function
in the map a number of times and then reports results. During the benchmarking, we saw some warnings
explaining that our functions were a bit too fast and our benchmarking results might have
deviation across runs. We “fixed” this by copy-pasting our list until it was larger, but we figured
the warning could mostly be ignored since we were really just interested in a rough measurement.
Here’s what we found.
Results and Summary
Environment:
Operating System: macOS"
CPU Information: Intel(R) Core(TM) i5-3230M CPU @ 2.60GHz
Number of Available Cores: 4
Available memory: 8 GB
Elixir 1.7.3
Erlang 21.0.6
Test 1:
Comparison:
enum.filter 16.71 K
strain.comprehension 13.76 K - 1.21x slower
strain.enum 11.53 K - 1.45x slower
Test 2:
Comparison:
enum.filter 16.03 K
strain.comprehension 10.77 K - 1.49x slower
strain.enum 4.79 K - 3.34x slower
Test 3:
Comparison:
enum.filter 15.70 K
strain.comprehension 13.43 K - 1.17x slower
strain.enum 9.57 K - 1.64x slower
Unsurprisingly, Enum.filter/2
is fastest - this makes sense, since Enum functions are widely used
and I expect they have been heavily optimised.
Also unsurprisingly, our Enum implementation of keep
was the slowest. We expected this, since we
were doing two complete iterations of the list - once for the map
, and once for the reject
. This
function also had a lot more deviation in speed over multiple benchmarking runs - probably just
because there’s more going on in that function to get delayed.
Our comprehension sat very nicely, slightly slower than Enum.filter
, but not by a huge margin - on
average, 1.3 times slower, but still consistently faster than keep_enum
. I would expect that the
benchmarking time on Enum.filter
and keep
(with comprehension) would scale pretty linearly as
well, wheras the keep_enum
implementation would be a bit more unpredictable due to it’s additional
complexity.
Overall, we came to the conclusion that Enum.filter
is the obvious winner when filtering a list.
Not terribly surprising to us! We did find that a list comprehension is still worth considering
though, since it’s an extremely capable way of performing map/reduce type operations in a succint
way. We also wrapped up the exercise with a real appreciation for how we maybe shouldn’t jump
straight into Enum
every time just because it’s mentioned in the exercise scenario.