Birds of a Fiber: A look at Falcon, a modern asynchronous web server for Ruby
The GitHub Readme describes Falcon as, "... *a multi-process, multi-fiber rack-compatible HTTP server ... Each request is executed within a lightweight fiber and can block on up-stream requests without stalling the entire server process."*
The gist: Falcon aims to increase throughput of web applications by using Ruby’s Fibers to be able to continue serving requests while other requests are waiting on IO (ActiveRecord queries, network requests, file read/write, etc).
What’s a Fiber?
Most of us are familiar with Threads. A Ruby process can have multiple Threads which are coordinated and executed by the Ruby VM. A Fiber can be thought of like a more lightweight Thread, but the Ruby VM doesn’t handle the Fiber scheduling - the fibers themselves must coordinate to schedule execution. That’s the job of Falcon.
Let’s Get Ready to Fly!
Time for Falcon to spread its wings and show us what it’s got! We’ll test Falcon with a simple Rails 5 app running on Ruby 2.5, running in production mode.
We need some way to simulate ActiveRecord queries, network IO, and calls to native extension C code - all typical things average Rails applications do.
For all endpoints, we accept a sleep_time parameter in the URL, designating how long to sleep.
ActiveRecord Queries
We’ll use PostgreSQL’s pg_sleep function to simulate slow SQL queries:
class AverageController < ApplicationController
def slow_sql
ActiveRecord::Base.connection.execute("select * from pg_sleep(#{sleep_time})")
end
end
Network IO
We’ll use Net::HTTP to fetch a URL to simulate remote API calls:
class AverageController < ApplicationController
def remote_io
uri = URI.parse("http://localhost:8080/#{sleep_time}")
Net::HTTP.get_response(uri)
end
end
The HTTP server listening on the remote end is written in Go and sleeps for sleep_time before returning a 200 status and a very minimal body stating how long it slept before returning the response:
# sleepy_http.go
package main
import (
"fmt"
"log"
"net/http"
"strconv"
"time"
)
func handler(w http.ResponseWriter, r *http.Request) {
t, _ := strconv.Atoi(r.URL.Path[1:])
time.Sleep(time.Duration(t) * time.Millisecond)
fmt.Fprintf(w, "Slept for %d milliseconds", t)
}
func main() {
fmt.Println("Listening on port 8080")
http.HandleFunc("/", handler)
log.Fatal(http.ListenAndServe(":8080", nil))
}
Native Extension/C Calls
I wrote a small C native extension module that simply sleeps for the specified time in C before returning to Ruby. There’s a sleep with GVL and one without the GVL:
class AverageController < ApplicationController
def cworkwith_gvl
CFoo::MyClass.do_work(sleep_time)
end
def cworkwithout_gvl
CFoo::MyClass.do_work_without_gvl(sleep_time)
end
end
Test Flight
Falcon can either be used in forking or threaded mode. In forking mode, a single thread per forked worker is created. In both modes, many fibers run within each thread. One fiber is created for each new request. We’ll use forking mode in our tests with a concurrency of 5 (5 total threads across 5 forks, but no limit to the number of fibers).
We’ll use siege to make concurrent requests against our endpoints. The options we’ll use for siege:
siege -c -r
Slow SQL
First up, slow_sql with each SQL request taking 1 second:
$ siege -c 50 -r1 'http://localhost/slow_sql/1'
Transactions: 50 hits
Availability: 100.00 %
Elapsed time: 10.08 secs
Transaction rate: 4.96 trans/sec
Wait a second - if Falcon is able to serve requests while we’re waiting for the SQL to return, we should be seeing about 1 second of elapsed time.
Remote IO
Ok, we’ll come back to the SQL test. What about network IO?
$ siege -c 50 -r1 'http://localhost/remote_io/1000'
Transactions: 50 hits
Availability: 100.00 %
Elapsed time: 11.09 secs
Transaction rate: 4.51 trans/sec
Same results as our SQL test. Again, we should be seeing about 1 second of elapsed time.
What gives?
It turns out I forgot to mention a critical characteristic of fibers - they are cooperatively scheduled amongst themselves and not preemptible by the Ruby VM. That means in order for a fiber to run, another fiber must explicitly yield so Falcon can switch the running fiber.
In short: in order for Falcon to achieve its concurrency, you need to use libraries that are made to be ‘async aware’ of Falcon’s async reactor.
Async Aware
Fortunately, the author of Falcon has also created some async libraries for common things like Postgres and HTTP. Let’s use those to see how that improves concurrency!
Slow SQL
All we need to do is use async-postgres gem in place of our pg gem - no other code changes:
gem ‘pg’
gem 'async-postgres'
And the results?
$ siege -c 50 -r1 'http://localhost/slow_sql/1'
Transactions: 50 hits
Availability: 100.00 %
Elapsed time: 1.07 secs
Transaction rate: 46.73 trans/sec
That’s more like it! All 50 requests were being served concurrently.
Remote IO
Adding async-http and our remote_io endpoint now looks like:
class AverageController < ApplicationController
def remote_io
endpoint = Async::HTTP::URLEndpoint.parse("http://localhost:8080")
client = Async::HTTP::Client.new(endpoint)
client.get("/#{sleep_time}")
end
end
The results:
$ siege -c 50 -r1 'http://localhost/remote_io/1000'
Transactions: 50 hits
Availability: 100.00 %
Elapsed time: 1.08 secs
Transaction rate: 46.30 trans/sec
Awesome, right!?
So if we just replace some libraries with async aware libraries, we should get at least the same, if not better, concurrency than with Puma using the same number of threads, right?
Well, Not Quite
So far we’ve tested the endpoints that have async aware libraries that play nice with Falcon. What happens when we throw in an endpoint that does work that is not Falcon async-friendly?
For this test we’ll hit the async slowsql endpoint as we did before, but we’ll also hit the non-async cworkwithoutgvl endpoint at the same time, with a 5 second duration and only 5 requests (the same as the total number of Falcon threads):
$ siege -c 5 -r1 'http://localhost/cworkwithout_gvl/10000'
Transactions: 5 hits
Availability: 100.00 %
Elapsed time: 10.02 secs
Transaction rate: 0.50 trans/sec
Ok, no surprise there. What about the async endpoints that should only take 1 second?
$ siege -c 50 -r1 'http://localhost/remote_io/1000'
Transactions: 50 hits
Availability: 100.00 %
Elapsed time: 10.35 secs
Transaction rate: 4.83 trans/sec
Uh oh. Our 5 requests that triggered non-async work ended up blocking all of our async endpoints for 10 seconds!
Falcon Fibers vs Puma Threads
Puma is the default web server for Rails 5. Puma is a threaded webserver, meaning each Puma process usually has multiple threads to handle requests.
One big difference of threads vs fibers is that threads are preemptible by the Ruby VM. This means the Ruby VM can suspend a thread from running at any time, run another thread, and switch back and forth based on which thread is waiting for IO and based on time (so each thread gets a fair amount of time to run, if more than one thread wants to run at the same time).
Fibers are not preemptible by the Ruby VM. Fibers must coordinate among themselves about which fibers should run and when. Since your application’s code will not have fiber yield points, the switching will occur in the Falcon async library boundaries. In our (and almost all other) case, this is only during network IO, file IO, and SQL queries (a form of network IO itself).
So What’s the Lesson, What’s the Takeaway?
The biggest lesson here is that when a request is accepted by Falcon, it is immediately handled in a new fiber within an existing thread. All fibers within that thread can end up blocking each other if they do any meaningful work that is not completely async aware.
Unfortunately Falcon is not magic and likely will not provide better concurrency or performance without substantial code changes in your app - and even then you are likely to encounter unhappy surprises. Unless you really understand the trade-offs and implications of Falcon’s design, then instead of flying a Falcon you may end up battling a Dragon. Chances are you’re probably better sticking with a webserver like Puma.
Bonus Questions
-
Want to read about how Ruby might improve it’s concurrency performance in the future?
-
How does Falcon limit the number of Fibers it serves at one time?
-
Would Puma with five threads and one worker also block in the ‘Well, Not Quite’ scenario?
-
If Puma was configured with enough threads to handle all concurrent connections in these same scenarios, would it perform better/worse/the same as Falcon?
- What’s the overhead difference in CPU/Memory vs Falcon?
-
Does Falcon’s async reactor remind you of something you’ve seen before?
- How does it compare to EventMachine or Celluloid?
-
How do Thread local variables behave in Fibers? Are they also Fiber local?
-
Do the chances of having deadlocks or race conditions increase when using Fibers vs Threads?
More servers? Or faster code?
Adding servers can be a band-aid for slow code. Scout APM helps you find and fix your inefficient and costly code. We automatically identify N+1 SQL calls, memory bloat, and other code-related issues so you can spend less time debugging and more time programming.
Ready to optimize your site? Sign up for a free trial.
Updated version of an article first published on November 14th, 2018.