Birds of a Fiber: A look at Falcon, a modern asynchronous web server for Ruby

September 27, 2019

by

in

The GitHub Readme describes Falcon as, "... *a multi-process, multi-fiber rack-compatible HTTP server ... Each request is executed within a lightweight fiber and can block on up-stream requests without stalling the entire server process."*

The gist: Falcon aims to increase throughput of web applications by using Ruby’s Fibers to be able to continue serving requests while other requests are waiting on IO (ActiveRecord queries, network requests, file read/write, etc).

What’s a Fiber?

Most of us are familiar with Threads. A Ruby process can have multiple Threads which are coordinated and executed by the Ruby VM. A Fiber can be thought of like a more lightweight Thread, but the Ruby VM doesn’t handle the Fiber scheduling - the fibers themselves must coordinate to schedule execution. That’s the job of Falcon.

Let’s Get Ready to Fly!

Time for Falcon to spread its wings and show us what it’s got! We’ll test Falcon with a simple Rails 5 app running on Ruby 2.5, running in production mode.

We need some way to simulate ActiveRecord queries, network IO, and calls to native extension C code - all typical things average Rails applications do.

For all endpoints, we accept a sleep_time parameter in the URL, designating how long to sleep.

ActiveRecord Queries

We’ll use PostgreSQL’s pg_sleep function to simulate slow SQL queries:

class AverageController < ApplicationController def slow_sql ActiveRecord::Base.connection.execute("select * from pg_sleep(#{sleep_time})") end end

Network IO

We’ll use Net::HTTP to fetch a URL to simulate remote API calls:

class AverageController < ApplicationController def remote_io uri = URI.parse("http://localhost:8080/#{sleep_time}") Net::HTTP.get_response(uri) end end

The HTTP server listening on the remote end is written in Go and sleeps for sleep_time before returning a 200 status and a very minimal body stating how long it slept before returning the response:

# sleepy_http.go package main import ( "fmt" "log" "net/http" "strconv" "time" ) func handler(w http.ResponseWriter, r *http.Request) { t, _ := strconv.Atoi(r.URL.Path[1:]) time.Sleep(time.Duration(t) * time.Millisecond) fmt.Fprintf(w, "Slept for %d milliseconds", t) } func main() { fmt.Println("Listening on port 8080") http.HandleFunc("/", handler) log.Fatal(http.ListenAndServe(":8080", nil)) }

Native Extension/C Calls

I wrote a small C native extension module that simply sleeps for the specified time in C before returning to Ruby. There’s a sleep with GVL and one without the GVL:

class AverageController < ApplicationController def cworkwith_gvl CFoo::MyClass.do_work(sleep_time) end def cworkwithout_gvl CFoo::MyClass.do_work_without_gvl(sleep_time) end end

Test Flight

Falcon can either be used in forking or threaded mode. In forking mode, a single thread per forked worker is created. In both modes, many fibers run within each thread. One fiber is created for each new request. We’ll use forking mode in our tests with a concurrency of 5 (5 total threads across 5 forks, but no limit to the number of fibers).

We’ll use siege to make concurrent requests against our endpoints. The options we’ll use for siege:

siege -c -r

Slow SQL

First up, slow_sql with each SQL request taking 1 second:

$ siege -c 50 -r1 'http://localhost/slow_sql/1'

Transactions: 50 hits

Availability: 100.00 %

Elapsed time: 10.08 secs

Transaction rate: 4.96 trans/sec

Wait a second - if Falcon is able to serve requests while we’re waiting for the SQL to return, we should be seeing about 1 second of elapsed time.

Remote IO

Ok, we’ll come back to the SQL test. What about network IO?

$ siege -c 50 -r1 'http://localhost/remote_io/1000'

Transactions: 50 hits

Availability: 100.00 %

Elapsed time: 11.09 secs

Transaction rate: 4.51 trans/sec

Same results as our SQL test. Again, we should be seeing about 1 second of elapsed time.

What gives?

It turns out I forgot to mention a critical characteristic of fibers - they are cooperatively scheduled amongst themselves and not preemptible by the Ruby VM. That means in order for a fiber to run, another fiber must explicitly yield so Falcon can switch the running fiber.

In short: in order for Falcon to achieve its concurrency, you need to use libraries that are made to be ‘async aware’ of Falcon’s async reactor.

Async Aware

Fortunately, the author of Falcon has also created some async libraries for common things like Postgres and HTTP. Let’s use those to see how that improves concurrency!

Slow SQL

All we need to do is use async-postgres gem in place of our pg gem - no other code changes:

gem ‘pg’

gem 'async-postgres'

And the results?

$ siege -c 50 -r1 'http://localhost/slow_sql/1'

Transactions: 50 hits

Availability: 100.00 %

Elapsed time: 1.07 secs

Transaction rate: 46.73 trans/sec

That’s more like it! All 50 requests were being served concurrently.

Remote IO

Adding async-http and our remote_io endpoint now looks like:

class AverageController < ApplicationController def remote_io endpoint = Async::HTTP::URLEndpoint.parse("http://localhost:8080") client = Async::HTTP::Client.new(endpoint) client.get("/#{sleep_time}") end end

The results:

$ siege -c 50 -r1 'http://localhost/remote_io/1000'

Transactions: 50 hits

Availability: 100.00 %

Elapsed time: 1.08 secs

Transaction rate: 46.30 trans/sec

Awesome, right!?

So if we just replace some libraries with async aware libraries, we should get at least the same, if not better, concurrency than with Puma using the same number of threads, right?

Well, Not Quite

So far we’ve tested the endpoints that have async aware libraries that play nice with Falcon. What happens when we throw in an endpoint that does work that is not Falcon async-friendly?

For this test we’ll hit the async slowsql endpoint as we did before, but we’ll also hit the non-async cworkwithoutgvl endpoint at the same time, with a 5 second duration and only 5 requests (the same as the total number of Falcon threads):

$ siege -c 5 -r1 'http://localhost/cworkwithout_gvl/10000'

Transactions: 5 hits

Availability: 100.00 %

Elapsed time: 10.02 secs

Transaction rate: 0.50 trans/sec

Ok, no surprise there. What about the async endpoints that should only take 1 second?

$ siege -c 50 -r1 'http://localhost/remote_io/1000'

Transactions: 50 hits

Availability: 100.00 %

Elapsed time: 10.35 secs

Transaction rate: 4.83 trans/sec

Uh oh. Our 5 requests that triggered non-async work ended up blocking all of our async endpoints for 10 seconds!

Falcon Fibers vs Puma Threads

Puma is the default web server for Rails 5. Puma is a threaded webserver, meaning each Puma process usually has multiple threads to handle requests.

One big difference of threads vs fibers is that threads are preemptible by the Ruby VM. This means the Ruby VM can suspend a thread from running at any time, run another thread, and switch back and forth based on which thread is waiting for IO and based on time (so each thread gets a fair amount of time to run, if more than one thread wants to run at the same time).

Fibers are not preemptible by the Ruby VM. Fibers must coordinate among themselves about which fibers should run and when. Since your application’s code will not have fiber yield points, the switching will occur in the Falcon async library boundaries. In our (and almost all other) case, this is only during network IO, file IO, and SQL queries (a form of network IO itself).

So What’s the Lesson, What’s the Takeaway?

The biggest lesson here is that when a request is accepted by Falcon, it is immediately handled in a new fiber within an existing thread. All fibers within that thread can end up blocking each other if they do any meaningful work that is not completely async aware.

Unfortunately Falcon is not magic and likely will not provide better concurrency or performance without substantial code changes in your app - and even then you are likely to encounter unhappy surprises. Unless you really understand the trade-offs and implications of Falcon’s design, then instead of flying a Falcon you may end up battling a Dragon. Chances are you’re probably better sticking with a webserver like Puma.

Bonus Questions

Want to read about how Ruby might improve it’s concurrency performance in the future?
- Guilds
- Auto-Fibers
How does Falcon limit the number of Fibers it serves at one time?
Would Puma with five threads and one worker also block in the ‘Well, Not Quite’ scenario?
If Puma was configured with enough threads to handle all concurrent connections in these same scenarios, would it perform better/worse/the same as Falcon?
- What’s the overhead difference in CPU/Memory vs Falcon?
Does Falcon’s async reactor remind you of something you’ve seen before?
- How does it compare to EventMachine or Celluloid?
How do Thread local variables behave in Fibers? Are they also Fiber local?
Do the chances of having deadlocks or race conditions increase when using Fibers vs Threads?

More servers? Or faster code?

Adding servers can be a band-aid for slow code. Scout APM helps you find and fix your inefficient and costly code. We automatically identify N+1 SQL calls, memory bloat, and other code-related issues so you can spend less time debugging and more time programming.

Ready to optimize your site? Sign up for a free trial.

_{Updated version of an article first published on November 14th, 2018.}

Ready to Optimize Your App?

Join engineering teams who trust Scout Monitoring for hassle-free performance monitoring. With our 3-step setup, powerful tooling, and responsive support, you can quickly identify and fix performance issues before they impact your users.

Start Monitoring for Free

Birds of a Fiber: A look at Falcon, a modern asynchronous web server for Ruby

What’s a Fiber?

Let’s Get Ready to Fly!

ActiveRecord Queries

Network IO

Native Extension/C Calls

Test Flight

Slow SQL

Remote IO

What gives?

Async Aware

Slow SQL

gem ‘pg’

Remote IO

Awesome, right!?

Well, Not Quite

Falcon Fibers vs Puma Threads

So What’s the Lesson, What’s the Takeaway?

Bonus Questions

More servers? Or faster code?

latest Posts

Scout Gives Cookpad Actionable, Rails-Specific Performance Insights

Key Early Considerations Before Big Architecture or Technology Decisions

Scout helps DynaBliss build medical practice management software

Chaskiq Improves Performance 2x with Scout

Mid-Year Update 2025

ForAll Systems Saves Money and Improves Response Times 3X with Scout!

The Architecture Loop: How Early Can We Decide Speed, Stack and Scale?

IETF Decreased Mean Response Time by 90% with Scout APM!

The Architecture Loop: MVC and the Hidden Costs of Microservices

May Newsletter

Related posts

Key Early Considerations Before Big Architecture or Technology Decisions

The Architecture Loop: How Early Can We Decide Speed, Stack and Scale?

The Architecture Loop: MVC and the Hidden Costs of Microservices

Ready to Optimize Your App?

Monitoring

Features

Resources

Company