How (and Why) to Use Ruby Enumerators
How (and Why) to Use Ruby Enumerators
Ruby supports several ways of performing iterations, including loops and enumerators. Out of the two, enumerators turn out to be the best alternative, fair and square. This is due to the simplicity offered by Ruby Enumerators when compared with various shortcomings associated with other methods. Before we dive in, let’s analyze loops once.
Loops can be utilized to achieve almost anything with an array of data, but more often than not, they are not very easy to work with. Here’s how you would print the data stored in an array in Ruby using loops:
data = ['foo', 'bar', 'baz']
for elem in data
puts elem
end
This has a serious issue associated with it. If you would have used elem somewhere else in your code before the loop, this would update that variable with the last value iterated by the loop. This can become a cause of a lot of silly bugs, and so you might need to keep this caveat in mind whenever handling iteration through loops. Now let’s look at another alternative for iterating over arrays - Block.
We’ve already covered Blocks in Ruby in great depth in another post, so here we’ll just see how one can implement the above code using blocks and the each iterator:
data = ['foo', 'bar', 'baz']
data.each { |elem| puts elem }
Much simpler, isn’t it? This simplicity is what Enumerators bring to the table. This is why they should always be your go-to whenever you’re looking to go through an array. But, is just one example enough? And is each the only available method? Certainly not! Read along, as we dive deeper into Ruby Enumerators!
Feel free to use these links to navigate around the blog:
- What is an Enumerator in Ruby
- Ruby’s Enumerable Module
- Iterator vs. Enumerator
- How to implement a Ruby Enumerator
- Using the Enumerable Module
- Using Blocks
- Using a Blockless Method Call
- Converting Non-enumerables into Enumerables
What is an Enumerator in Ruby?
Now that we’ve gotten a sneak-peek at what enumerators encompass, let’s break the term down once. Enumeration, in essence, refers to the process of traversing over a set of elements. Also, an entity is said to be enumerable if it contains a set of items and knows a way to traverse over them. So, any element that contains a bunch of elements within itself, and defines methods that can help traverse over the elements is enumerable and can be traversed using enumerators.
Enumerator, specifically, is a class in Ruby that allows both types of iterations – external and internal. Internal iteration refers to the form of iteration which is controlled by the class in question, while external iteration means that the environment or the client controls the way iteration is performed.
The enumerator is a basic implementation of the iterator design pattern – so it allows the environment to access the list of items in a class without exposing other details (such as its implementation).
There are multiple ways enumeration happens in Ruby. As you already saw, the built-in methods in arrays such as each allow for handy traversal of elements, but when you’re trying to make a custom class as enumerable, there are a few modules you can use.
Ruby’s Enumerable Module
Ruby offers a module called Enumerable, which allows making classes enumerable. This class attains methods like include?, count, map, select, and uniq, among others. A lot of these methods which are associated with arrays or hashmaps are not implemented in the classes; rather, they are simply included (inherited).
Besides, the Enumerable module contains a bunch of methods, primarily focused on traversal, searching, sorting, etc which are included in every module that includes it. Important Ruby classes, like Array, Hash, Range, etc therefore rely on this module for their functionality. We’ll look at examples of using Enumerable a little later in the post.
Another important pillar of the Enumerable module is each. By default, each takes a list as its first argument, and a block as the second. Then, it iterates through every element in the list, running the block with the current element as its parameter. When you’re trying to make a class enumerable, you need to compulsorily define the each method for it. This makes sense because the enumerator or iterator has to go through each element of the list and do something with it.
As mentioned above, there are a bunch of pre-existing methods in the Enumerable module. These methods can be broken down into several types.
Iteration
Apart from the popular each method for traversing every element in the collection, the module also provides two more methods:
- any is a method that returns true if the block passed to it is true for any element.
- On the other hand, all returns true if the block passed to it is true for all elements.
These methods come in handy when analyzing the collection as a whole, and evaluating grouped conditions, such as the presence of a certain element.
Another important method that traverses the entire collection is cycle. The catch with cycle is that it keeps on iterating through the collection endlessly. Let’s compare each and cycle to understand the difference between them.
Say, we have a collection of integers:
list = [1, 4, 7]
Let’s try iterating over it via each:
each_elem = list.each
each_elem.next
#=> 1
each_elem.next
#=> 4
each_elem.next
#=> 7
each_elem.next
#=> StopIteration: iteration reached an end
Now let’s do the same using cycle:
each_elem = list.cycle
each_elem.next
#=> 1
each_elem.next
#=> 4
each_elem.next
#=> 7
each_elem.next
#=> 1
each_elem.next
#=> 4
each_elem.next
#=> 7
As can be seen, each stops after iterating through the last element of the list, while cycle keeps iterating through the list over and over again, restarting from the first element after the last one.
As an alternative, if you’re looking to traverse the collection opposite, i.e. starting from the end and iterating through to the beginning – you can use the reverse_each method. Here’s how that works:
each_elem = list.reverse_each
each_elem.next
#=> 7
each_elem.next
#=> 4
each_elem.next
#=> 1
each_elem.next
#=> StopIteration: iteration reached an end
But what if you need to cycle endlessly in the reverse direction? Unfortunately, there is no reverse_cycle method to achieve it. However, you can reverse the list before cycling through it, like so:
each_elem = list.reverse.cycle
Searching
Several methods are available for searching or filtering your collections (or arrays), such as – the ‘ect’ family of methods (select, detect, reject) along with the find and find_all.
list = [1, 2, 3, 4, 5, 6]
# Select: Filters the collection based on the condition via a block
list.select { |elem| elem % 2 == 0 }
# => [2. 4. 6]
# Detect: Finds and returns the first element which satisfies the condition passed via a block
list.detect { |elem| elem % 2 == 0 && elem % 3 == 0 }
# => 6
# Reject: Finds and returns all elements which do not satisfy the given condition
list.reject{ |elem| elem % 2 == 1 }
# => [2, 4, 6]
# find is an alias for detect, while find_all is an alias for select.
Inspired by the Linux grep tool, Enumerable also houses a grep method, which is a pretty advanced asset for searching through arrays.
num_list = (0..100)
str_list = ["foo", "bar", "baz"]
# Find numbers within a certain range
num_list.grep(95..100)
# => [95, 96, 97, 98, 99, 100]
# Find strings starting with f
str_list.grep(/^f/)
# => ["foo"]
# Find strings ending with z
str_list.grep(/z$/)
# => ["baz"]
Threequals (===) is also an important operator available in Enumerable which allows comparing elements in a loose sense – it does not establish exact equality, rather a general one. To understand better, here are some examples:
String === 'lorem ipsum'
#=> true
Range === (1..15)
#=> true
Array === %w(seven two)
#=> true
/car/ === 'cartrip'
#=> true
Threequals is a handy tool for practical queries when exact similarities are unknown, and only a general sense is being used to target and identify objects.
Sorting
Sorting is an important operation when handling arrays of data. You need to have ways to waste as little time as possible in sorting out your list in a particular fashion. Enumerable packs a few methods to help you do so.
If you have a homogeneous collection, sort should work just fine.
[1, 5, 8, 2, 6, 4, 9].sort
sort_by lets you pass in a block to define your logic for parsing the elements before sorting them out. For example, if you were to sort a list of integers and strings:
[1, 5, "8", 2, 6, "4", 9].sort_by {|a| a.to_i }
In the above example, the block passed into sort_by allows us to convert strings into integers before comparing them for sorting.
What if your collection consists of very diverse elements, such as integers and custom objects? In that case, you’ll also want to specify their comparison logic, as generic comparison operators will not make any sense.
For this, the <=> operator needs to be overridden. This is how you can do it:
class Person
attr_accessor :age
def initialize(age)
@age = age
end
def <=>(other)
@age <=> other.age
end
end
p1 = Person.new(28)
p2 = Person.new(20)
[p1, p2].sort
# => [#>Person:0x00000000de0328 @age=20>,
#>Person:0x00000000def890 @age=28]
It is important to note that if you try to do this directly, as shown below, it will result in an error.
p1 > p2
This happens because <=> is not a substitute for other operators, like >. You will have to define those operators separately in such cases.
Reduction
A very powerful method provided by the Enumerable module is reduce. It allows you to carry out an operation on each element of the collection, and retain only the single result from each intermediate operation. This results in an aggregated result of the entire collection. It comes in very handy when calculating certain cumulative properties of a collection, such as the sum. Here’s how you can calculate the sum of all elements in an array:
[1, 4, 7, 0].reduce(:+)
#=> 12
reduce uses an accumulator behind the scenes to calculate the result. Here’s how the expanded accumulator for the same code from above would look like:
[1, 4, 7, 0].reduce(0) { |accumulator, current| accumulator + current }
#=> 12
The expanded accumulator can get confusing at times. Here’s what a generic reduce call looks like:
collection.reduce(accumulator_initial) { |accumulator, current| result }
accumulator_initial is an optional argument that decides the initial values for accumulator and current fields. If accumulator_initial is null, accumulator starts from the first element, while current starts from the second element of the collection. If accumulator_initial is, say x, accumulator starts as x, while current starts from the first element of the collection.
Iterator vs. Enumerator
Since we’ve been mentioning iterators and enumerators a lot, let’s now understand the two independently. As hinted earlier, an iterator is just a concept of objects being able to expose methods that help clients to iterate over them without accessing their lists directly. On the other hand, an enumerator is a concrete implementation of the very same concept in Ruby. It is an actual class that exists in Ruby and helps define custom classes as iterators.
Ruby Enumerators add a few extra features over the conventional iterators concept, including chained calls, custom iteration, and more! Let’s understand these in detail now.
Why use It?
If you’re a Ruby developer, the Ruby Enumerator is a must-have weapon in your arsenal. It is not just due to its simplicity, but also due to the customizability that it provides to developers. Here are a few key highlights of the Ruby Enumerator:
Chaining Method Calls
With enumerators, chaining calls is a breeze. Suppose you had to print a certain range of numbers in cycles, up to a certain count. Here’s what an unchained, naive solution would look like:
# Assuming the range to be from 0 to 4
raw = 5.times
# Cycling and attaching copies of the original array to itself
cycled = raw.cycle
# Printing the first, say, 10 elements from the resulting array
puts cycled.first(10)
# => [0, 1, 2, 3, 4, 0, 1, 2, 3, 4]
Seems tedious, doesn’t it? Here’s how it would look with chained method calls:
puts 5.times.cycle.raw.first(10)
While this might seem obvious from the first example, many languages’ iterators don’t support this chaining paradigm because they do not return an enumerator on every enumerated method call. Luckily, Ruby Enumerators do – which is what adds to the simplicity of the language.
Infinite Lists & Lazy
Dealing with infinitely long lists is a tedious task. Let’s take the example of the Fibonacci series. If you try to slice and view the first 10 elements from the list, here’s what your code might look like:
# Assuming fib is the infinitely long array containing the Fibonacci series
puts fib.first(10)
You might expect the result to be [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]. But the Ruby interpreter doesn’t know when to stop. This is obviously because it is an infinite sequence, and the interpreter tries to process the entire list before slicing the first 10 elements off from it.
This problem is very easy to solve with enumerators. See the following snippet for a better understanding:
# Here's how you can iterate over the elements, using a lazy call in the chain:
puts fib.lazy.first(10)
# => [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
That’s it! That is all you need to overcome the infinite sequence issue. Ruby enumerators have been built smart enough to defer loading of elements until needed if specified with the lazy call in the chain. This comes in very handy when you’re unaware of the size of an array, or are looking to query only a few elements from a large list through a filter.
Custom Iteration
At times, you might want to arrange an array in an order different from the usual, ascending one. You might want to skip certain elements while allowing a client to access your array. You might want to define separate filters for different situations to allow customized and restricted access to the client, depending upon the requirements. Ruby Enumerators can handle all of this very easily due to the high level of customizability that it offers. Let’s take a look at an example:
# The common list, to be iterated upon
$list = [1, 3, 5, 7, 9, 11]
def prime_entries
index = -1
Enumerator.new do |yielder|
loop do
index += 1
yielder << $list[index] unless isNotPrime($list[index]) #Assuming isNotPrime(num) returns true when num is composite and false when num is a prime number
end
end
end
puts prime_entries.take(4)
# => [3, 5, 7, 11]
def alternate_entries
index = -1
Enumerator.new do |yielder|
loop do
index += 1
yielder << $list[index] unless isEven(index) # Assuming isEven(num) returns true when num is even and false when num is an odd number
end
end
end
puts alternate_entries.take(3)
# => [1, 5, 9]
This opens up a whole new room of possibilities, including controlled access to certain elements in an array, concealing the iteration logic from the client to reduce the chances of tampering, adding another layer of security to the data stored in the arrays, and much more!
Now that you know how useful Ruby Enumerators are, let’s try to build some for ourselves.
How to Implement a Ruby Enumerator
While we’ve already seen how each helps iterate normally over a Ruby array, it might not always fit our use case. At times, a custom set of elements might be required, with its pattern of generation. This is where custom Ruby enumerators come in, and we’ve already seen in the previous example how useful they can be. But, just like most things in programming, there is more than one way to create enumerators. Let’s discuss them one-by-one.
Using the Enumerable Module
Beginning with the very basics– Ruby offers an Enumerable module, which when mixed in a custom class, poses a very strong alternative as a viable Enumerator. Let’s take a look a look at a Fibonacci sequence enumerator implemented using the enumerable module:
class Fibonacci
attr_accessor :cap
def initialize(cap)
@cap = cap
end
include Enumerable # including the module
def each(&block)
first = 1
second = 1
third = 2
curr = 0
while curr < cap do
curr += 1
yield first
first = second
second = third
third = first + second
end
end
end
puts Fibonacci.new(10).to_a
# => [1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
As you can see above, we didn’t have to define the to_a method, it came out-of-the-box with the Enumerable module. All we had to do was to define the each method.
Now, let’s look at another interesting thing offered by the module:
class Contact
attr_accessor :emails
include Enumerable
end
friend_1 = Contact.new
friend_1.emails = ["school@email.com", "work@email.com", "uni@email.com"]
friend_1.emails.sort
# => ["school@email.com", "uni@email.com", "work@email.com"]
Once again, we didn’t have to write sort by ourselves – it came from the Enumerable module. This is how the module can be included to make a class enumerable.
Using Blocks
While including the Enumerable module has been quite the buzz for a long time, Ruby 1.9 has brought with itself another little addition, that makes this process very lightweight. The all-new Enumerator class uses the block syntax to quickly declare compact enumerators. Here’s what the Fibonacci enumerator from the previous example would look like when built using the Enumerator class:
fib = Enumerator.new do |yielder|
first = 1
second = 1
third = 2
curr = 1
loop do
if curr < 3
yielder << 1
else
yielder << third
first = second
second = third
third = first + second
end
curr += 1
end
end
puts fib.first(10).inspect
#=> [1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
This is how simple it can get with the latest Enumerator API! Although it doesn’t account for the class-turned-enumerators, it still serves as a great tool to scaffold out quick, independent enumerators.
Using a Blockless Method Call
As we’ve already seen earlier, several enumerable methods that take a block will return an enumerator when called without a block argument. These block-less calls help create convenient enumerators, which are exactly like the native enumerators for those methods, but easier to move around. Here’s an example to make it clearer for you:
list = [4, 7, 1, 3, 5]
enum = list.select
enum.each { |n| n % 3 == 0 }
#=> [3]
enum.each { |n| n % 2 == 0 }
#=> [4]
As can be seen from the example above, a block-less ‘select’ call on list converts it into an enumerator, stored in enum above. Now, this enumerator can be used again and again to filter out values based on conditions passed in dynamically to the corresponding each calls.
Converting Non-Enumerables into Enumerables
Until now, we have been looking at creating enumerators from scratch, either with the Enumerable module, the Enumerable class, or by the Enumerable methods that return enumerators. Building upon the “methods that return enumerators” saga, here is another way of creating them.
Some methods available across various classes in Ruby return Enumerators as well, such as the times, upto, and downto method of the Integer class, or the each_char, each_byte, each_line, and gsub methods of the String class. Here’s an instance of such a conversion over the gsub method of the String class:
enum = "Happy hacking".gsub /\b\w+\b/
enum.next # "Happy"
enum.next # "hacking"
Similarly, other methods can be used to return enumerators that are easy to use and store. While this does not leave much room for customization, it still is a very handy fix when it comes to iterating through generic arrays and strings – which is a frequent requirement in the development process.
Conclusion
In this post, we learned about the concept of iterators and Ruby’s implementation of them using enumerators. We looked at the conventional way of creating enumerators using the Enumerable module, the multitude of methods it supports out of the box, as well as different ways to integrate it in a custom class. We also covered an array of reasons to prefer enumerators over other iteration methods in Ruby. Finally, we went through a couple of different ways in which enumerators can be implemented in Ruby, understanding each of them with an example.
Enumerators are present everywhere throughout the Ruby language. Primitive classes such as Integer and String also utilize enumerators to carry out repetitive tasks easily. With the addition of the Enumerator class in Ruby 1.9, enumerators have become all the more important. Considering their deeply rooted involvement in the Ruby ecosystem, it would be fair to say that it is very crucial to have a solid understanding of them as an up and coming Ruby developer.