Forecasting Web Traffic with Scout and Prophet
Forecasting traffic to your web app is important for capacity planning, but generating a seasonally accurate model of your traffic is pretty daunting.
If you under-forecast:
- Your app servers may become oversaturated, and requests will start backing up in a queue.
- If many requests are database-heavy, your database load may increase slowing down many requests.
- Many times there are subtle signals of these events - like brief backups in your load balance during peak times - that slowly
worsen over time. - More alert noise and frustrated customers
If you over-forecast: you could spend a lot more 💰.
Looking to get out ahead of your growth the easy way? You're in luck. I've created a shared Google Colab notebook that creates a seasonal forecast of your traffic in less than a minute. Just save a copy of the notebook, enter your Scout API token, and boom: a seasonal forecast of your traffic:
Under the hood, the notebook uses standard Python data science libraries (like Pandas), Facebook Prophet (a forecasting procedure), and data collected by Scout (exposed via the Scout API), to train a Prophet model.
Read on for more on how this seasonal traffic forecast works.
Dataset
We need some traffic data! Ideally, this data should be restricted to just the requests hitting our app servers (and not the cache) as
Via the Scout API, we can easily gather metrics like throughput, response time, and error rates.
Assuming you've already signed up for Scout, create an API key within the settings area of the Scout UI:
Tools
I'll be using Google Colab and Python in this tutorial. All of the libraries mentioned are free and open-source.
Let's get started!
Exporting data from Scout
The code below will export four weeks of throughput data from raw_ts
Array.
url = "https://scoutapm.com/api/v0/apps/APP_ID/metrics/throughput" # Replace `APP_ID` with your app id.
today = datetime.now()
# The API has a max timeframe of 2 weeks but 30 days of data is stored. We'll
# fetch 4 works of data by making two requests.
range = [(today - timedelta(days=14)),today]
raw_ts = []
for end_time in range:
params = {
"from": (end_time - timedelta(days=14)).isoformat(),
"to": end_time.isoformat()
}
r = requests.get(url, params=params, headers=HEADERS)
res=r.json()
raw_ts += res["results"]['series']['throughput']
Example raw_ts data:
[['2019-02-02T10:00:00Z', 64.79166666666667],
['2019-02-02T12:00:00Z', 222.11666666666667],
['2019-02-02T14:00:00Z', 232.7],
['2019-02-02T16:00:00Z', 223.35],
['2019-02-02T18:00:00Z', 224.99166666666667],
...
Create a DataFrame
Let's create a Pandas DataFrame with the raw data. This lets us do all sorts of manipulations and visualizations of the data.
df = pd.DataFrame(raw_ts,columns=["time","throughput"],)
df.time = pd.to_datetime(df.time)
df = df.set_index("time")
Plot the throughput
Let's get a feel for the data by plotting it within our notebook:
plt.figure(figsize=(14,8))
plt.plot(df.index.values, df[df.columns[0]].values)
plt.xlabel('Time', fontsize=12)
plt.ylabel('Throughput', fontsize=12)
...which generates:
We can see five distinct traffic spikes followed by two smaller spikes. This smells like the weekly traffic pattern of a business app!
We can also see two significant drops in traffic. These might be from deploys and shouldn't be used in our forecast. We'll remove these - Prophet does just fine with gaps in data:
df = df[df.throughput > 100]
Time Series Modeling with Prophet
Now that we have our outlier-free data it's time to model the traffic with Prophet. Straight from the project homepage:
Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.
That sounds a lot like web traffic patterns, doesn't it! We'll use Prophet to model our web traffic and forecast it into the future.
To start, I'll initialize a Prophet model and fit it with our historical data. Note how this pattern is very similar to ML algorithms in Scikit-Learn. Prophet requires Dataframe columns to be named in a standard format so I handle that too:
model = Prophet()
df["time_for_prophet"] = df.index # hack ... can't use index
model.fit(df.rename(columns={"time_for_prophet": "ds","throughput": "y"}))
Making a forecast
Now I'll forecast two weeks of traffic:
df_forecast = model.make_future_dataframe(periods=14, freq="D")
forecast = model.predict(df_forecast)
Prophet generates the forecast! We can see it isn't confident enough to show the daily spikes in traffic from our historical data set but still shows the distinct weekday and weekend traffic pattern. We can also see a bit of an increase in traffic in the 2nd week.
Trend and Seasonality
Prophet exposes the model's forecast components so we can look at the overall trend and seasonality:
model.plot_components(forecast);
What have we learned from this forecast?
During the day, our traffic volume is substantially lower from 00:00 UTC-08:00 UTC. During the week, our traffic is substantially lower on the weekends. We could scale down the number of application servers for a third of every day and the weekends. That's a substantial portion of time where we could be saving money on infrastructure.
Your turn!
Python is an excellent choice for analyzing web performance patterns thanks to its wonderful ecosystem of data science tools. With Prophet and the Scout
Want to skip all of the data science and just forecast your traffic? Save a copy of this shared Google Colab notebook, insert your Scout API token, and run the notebook! That's it!