How to use Mint, an awesome HTTP library for Elixir - Part 01
What is Mint?
Mint is a shiny new Elixir package which allows you to make HTTP requests using the HTTP 1 and HTTP 2 protocols. It can transparently handle ALPN (Application Layer Protocol Negotiation), which essentially means that it can figure out if a server uses HTTP2 or HTTP1 on its own. It also comes with an optional dependency on a castore
package which verifies the SSL certificates of the servers (that you connect to).
Mint is a low-level library, which means you'll need some elbow grease to set it up properly. And once it is set up you'll be able to squeeze out all the performance that you can, out of your HTTP client. A few good use cases for this might be when your app is composed of HTTP microservices or when you need to talk to external services constantly e.g. Amazon DynamoDB, S3, GitHub, etc. A fair warning though, Mint might not be the best, if all you need to do is make a couple of HTTP calls once in a while.
The "Awesome Toolbox"!
The best way to learn a library is by using it, So, let us build an application which uses the GitHub API.
An Awesome List is a curated collection of stuff (blog posts, videos, packages, libraries, tools) nicely categorized and linked for a specific target, e.g. https://github.com/sindresorhus/awesome is an awesome list of awesome lists (How meta :) ) and https://github.com/h4cc/awesome-elixir is an awesome list of Elixir/Hex packages.
The problem with these lists is that they lack a lot of relevant information such as the number of stars of a GitHub repository or the number of downloads of a package, and often a user has to click on the links to figure out which of the available options is popular, active, has the most downloads, etc. We'll build an "Awesome Toolbox" which will enrich these awesome lists with relevant information to help decision-making easier for users.
MVP scope for v0.1
For our MVP let us just build out the bare minimum which can add value to our users. So, let us build out an app which can take a GitHub repository URL and give us a README annotated with extra information (for now just the GitHub star count).
AwesomeToolbox.annotate_readme("https://github.com/droptheplot/awesome-phoenix")
# => Should annotate the README with star count for each of the repositories
# listed in this awesome list.
So, a README file containing the following line should be transformed as below:
* [Hexpm](https://github.com/hexpm/hexpm) - API server and website for Hex.
# => should be transformed to
* [Hexpm](https://github.com/hexpm/hexpm) [528 ⭐] - API server and website for Hex.
Talking to GitHub
To be able to pull this off we need to first get the README contents from GitHub, parse each line and annotate it with GitHub star information (if the line contains a GitHub repository link).
Alright, before we get too ahead of ourselves let us start out with a simple endpoint. GitHub exposes a zen endpoint at https://api.github.com/zen
which returns a zen message.
$ curl https://api.github.com/zen
Speak like a human.
This would be very easy if we were using a higher level library like HTTPoison, it would actually be a single line HTTPoison.get "https://api.github.com/zen"
. However, we are NOT using HTTPoison. And Mint works at a lower level than HTTPoison and gives us so much more control (which we'll use in the next post). For now, let us create the simplest and most idiotic implementation possible.
# open a new http connection to api.github.com and get a handle to the connection struct
{:ok, conn} = Mint.HTTP.connect(_scheme = :https, _host = "api.github.com",
_port = 443)
# make a GET request to the `/zen` path using the above connection without any special headers
{:ok, conn, request_ref} = Mint.HTTP.request(conn, _method = "GET", _path = "/zen", _headers = [])
# receive and parse the response
receive do
message ->
# send received message to `Mint` to be parsed
{:ok, conn, responses} = Mint.HTTP.stream(conn, message)
for response <- responses do
case response do
{:status, ^request_ref, status_code} ->
IO.puts("> Response status code #{status_code}")
{:headers, ^request_ref, headers} ->
IO.puts("> Response headers: #{inspect(headers)}")
{:data, ^request_ref, data} ->
IO.puts("> Response body")
IO.puts(data)
{:done, ^request_ref} ->
IO.puts("> Response fully received")
end
end
Mint.HTTP.close(conn)
end
This prints the following
> Response status code 200
> Response headers: [{"server", "GitHub.com"}, ...]
> Response body
Approachable is better than simple.
> Response fully received
Awesome! We can now make HTTP requests to GitHub using Mint! Let us unpack what is going on here:
Setting up an HTTP connection
{:ok, conn} = Mint.HTTP.connect(:https, "api.github.com", 443)
The first line sets up a TCP+SSL connection. You might ask why the heck do I need to set up a connection just to send an HTTP request, can't I just send a request without all this ceremony? Well, you do need a TCP connection to send an HTTP request, However, most libraries hide this detail from you when they expose an API like HTTPoison.get
or HTTParty.get
and this robs you of performance optimization opportunities. Mint gives you complete control over this so that you can tune the heck out of it.
{:ok, conn, request_ref} = Mint.HTTP.request(conn, "GET", "/zen", [])
Alright, once we have a TCP connection we can send a request to the server. And this is precisely what the second line does. It sends a GET
request to the /zen
path of the GitHub API, the last argument contains the request headers which is empty. When a request is sent via Mint, it gives you back an updated connection and a unique ref which can uniquely identify the responses for this specific request.
receive do
message ->
# ...
end
Mint opens an active TCP connection with the calling process as the parent of the socket, what this means is that any response from the server is sent to your process as a message. This is why we are trying to receive
a message in the next line. Now, to decode this response we would have to implement an HTTP compliant parser and allows us to make sense of the responses. However, Mint has got our back on this one, Mint
provides a function called stream
which can take this message and decode it for us. Mint.stream
takes these messages and gives you an updated connection and a few mint-http-responses. The responses returned have a shape which matches one of the following in the specified order:
# a mint-http-response that returns the http status code, you'll get exactly one of these per request
{:status, ^request_ref, status_code}
# a mint-http-response that returns the http response headers, you'll get exactly one of these per request
{:headers, ^request_ref, headers} ->
# a mint-http-response that returns the response body/data, you'll get at least one of these
# per request, if your response body is large you'll get many of these
{:data, ^request_ref, data} ->
# a mint-http-response that signals the the response is complete and there is nothing
# more to read on the connection for this request, you'll get exactly one of these per request
{:done, ^request_ref} ->
Mint.HTTP.close(conn)
Finally, we need to close the TCP connection that was set up in step 1.
Whew, that was a lot of code for making a super simple GET request and you might be wondering if this was worth all the trouble (and rightfully so), Hopefully, I can convince you that this is worth every line of extra code for improved performance.
To help you follow along, I've set up a GitHub repository which has one commit for each stage of our progression, you can use this as a reference, Check out https://github.com/minhajuddin/awesome_toolbox/commit/164a274e68a4baedd262f855f93e1b501336c81b to see the commit which adds the
AwesomeToolbox.Github.zen
function.
Get the README of a GitHub repo
The GitHub API exposes a separate endpoint to get the README of a repository at /repos/:owner/:repo/readme
. We just need to tweak the previous code to make it use this path instead of the zen
path.
def readme(repo_full_name) do
{:ok, conn} = Mint.HTTP.connect(:https, "api.github.com", 443)
{:ok, conn, request_ref} =
Mint.HTTP.request(conn, "GET", "/repos/#{repo_full_name}/readme", [
{"content-type", "application/json"}
])
receive do
message ->
{:ok, conn, mint_responses} = Mint.HTTP.stream(conn, message)
# filter the mint-data-respones
body =
mint_responses
|> Enum.filter(fn
{:data, ^request_ref, _} -> true
_ -> false
end)
|> Enum.map(fn {_, _, data} -> data end)
# decode the json response body
json = Jason.decode!(body)
# decode the actual README from which GitHub returns as a base64 encoded string
readme = Base.decode64!(json["content"], ignore: :whitespace)
Mint.HTTP.close(conn)
readme
end
end
However, the above code doesn't work on a consistent basis because for large READMEs the response is split over multiple messages. So, we'll have to keep receiving
messages till we run into a :done
mint-response, to incorporate this into our code let us add a helper function called recv_response
which keeps receiving messages till we get a :done
.
def readme(repo_full_name) do
{:ok, conn} = Mint.HTTP.connect(:https, "api.github.com", 443)
{:ok, conn, request_ref} =
Mint.HTTP.request(conn, "GET", "/repos/#{repo_full_name}/readme", [
{"content-type", "application/json"}
])
# we now rely on the recv_response function to receive as many messages as
# needed and process them till we have a full response body
{:ok, conn, body} = recv_response(conn, request_ref)
json = Jason.decode!(body)
readme = Base.decode64!(json["content"], ignore: :whitespace)
Mint.HTTP.close(conn)
readme
end
# receive and parse the response till we get a :done mint response
defp recv_response(conn, request_ref, body \\ []) do
{conn, body, status} =
receive do
message ->
# send received message to `Mint` to be parsed
{:ok, conn, mint_responses} = Mint.HTTP.stream(conn, message)
# reduce all the mint responses returning a partial body and status
{body, status} =
Enum.reduce(mint_responses, {body, :incomplete}, fn mint_response, {body, _status} ->
case mint_response do
# the :status mint-response doesn't add anything to the body and receiving this
# doesn't signify the end of the response, let's ignore this for now.
{:status, ^request_ref, _status_code} ->
{body, :incomplete}
# the :headers mint-response doesn't add anything to the body and receiving this
# doesn't signify the end of the response, let's ignore this for now.
{:headers, ^request_ref, _headers} ->
{body, :incomplete}
# the :data mint-response returns a partial body, let us append this
# to the end of our accumulator, this still doesn't signify the end
# of our response, so let's continue
{:data, ^request_ref, data} ->
{[ data | body ], :incomplete}
# the :done mint-response signifies the end of the response
{:done, ^request_ref} ->
{Enum.reverse(body), :complete}
end
end)
{conn, body, status}
end
# if the status is complete we can return the body which was accumulated till now
if status == :complete do
{:ok, conn, body}
# else we make a tail recursive call to get more messages
else
recv_response(conn, request_ref, body)
end
end
The bulk of the work is being done by the recv_response
function which recursively calls itself till it gets a full response.
The commit for this step can be found at: https://github.com/minhajuddin/awesome_toolbox/commit/299a0790aa2123d4a2c298621c5cdceb0a6c2dc7
We have already covered a lot of ground in this post and I don't want to make it unreadable, so we'll finish our MVP in the next post.
However, before you leave I have a puzzler for you:
Do you create a new database connection every single time that you want to authenticate a user, load a product or make a database query?
user = Repo.get_by(User, email: "danny@k.com")
messages = Repo.get_by(Message, user_id: user_id)
While executing the above code from a phoenix controller:
- How many database connections are created and discarded? Why?
- Did we create 2 new database connections, run the 2 queries to fetch a user and messages and then discard those connections? Why?
- Why don't we treat HTTP connections in the same way as our database connections?
I'll leave you with some quick stats on the performance impact of connection reuse:
A naive benchmark
I wanted to run a naive benchmark on my local computer to see how re-using connections impact performance (we'll do a detailed benchmark at the end of the last post with a lot more variations with realistic situations)
requests per second | |
---|---|
1 new connection for every request (HTTParty) | 1.17K |
The same connection for all requests | 13.21K |
The same connection for all requests with HTTP pipelining | 27.66K |
Are you interested in monitoring your Elixir applications? Click here for an overview of Scout's Elixir monitoring agent.
I'll see you in the next post!