Databox's Forecasting Methodology

Databox provides a powerful feature for predicting future outcomes of key metrics. Developed by Databox's Data Science team, this service uses the advanced Facebook Prophet model to generate forecasts by analyzing historical data and incorporating various influencing factors.

Given that Databox's metric data is time-based, forecasting involves predicting how the sequence of observations will evolve over time. Accurate forecasting is essential for organizations involved in capacity planning and goal setting, as it enables efficient resource allocation and performance measurement against established targets.

The Forecasting Model

Prophet

Databox's Forecasting tool uses Prophet, which is a decomposable time series regression model developed by Facebook's AI research team. This model includes three components: trend, seasonality, and holidays. These components are combined in an additive fashion as described in the following equation:

g(t) is a piecewise linear or logistic growth curve trend. Prophet automatically detects changes in trends by selecting change points from the data.
s(t) is a combination of:
- A yearly seasonal component modeled using the Fourier series. Can be auto-detected if the appropriate setting is enabled.
- A weekly seasonal component using dummy variables.
h(t) is a user-provided list of important holidays.

Note: User-provided lists of important holidays, recognized by h(t) in the equation, is not currently included in Databox's forecasting model.

The key concept in Prophet is to accurately fit the trend component by using a flexible regression model to help improve the precision and accuracy of the forecast. This approach allows for greater modeling flexibility, easier model fitting, and better handling of missing data or outliers than traditional time series models.

With Prophet, Databox is able to model nonlinear saturating growth by specifying a minimum and maximum population value. For forecasting problems that don't exhibit saturating growth, a piecewise constant rate of growth that provides an economic model can be used.

Prophet's ability to perform well on a diverse range of forecasting problems and its efficient prediction generation process make it an ideal choice for large-scale forecasting projects.

Note: Although Databox utilizes the Prophet model developed by Facebook's team, the forecast feature in Databox is independently developed and operated. All data required for forecasting is processed solely by Databox and is neither sent to nor shared with any third parties.

Data Collection

To generate forecasts, the service uses a metric data configuration ID to retrieve the necessary data. The source of this data varies based on the storage solution employed. The required data is obtained directly from the current data warehouse, ensuring that the model-building and benchmarking processes utilize the same database as the production systems. This integration includes data from both Analytics and Benchmark Groups products.

Data Preparation

To build a time series forecasting model, it is essential to have a variable indexed by a specific date. Before fitting the model, the input data is adjusted to the format required by the model fitting function, enhancing performance compared to using the raw data.

A key factor in this process is handling anomalies. Since anomalies are inherently unpredictable, they can skew forecasts and inflate the estimated variance. To mitigate this, the prediction intervals are intentionally widened to account for the impact of these anomalies.

Trend Modelling

Our forecasting model uses a piecewise linear approach to capture changes in the underlying trend of the time series. This method involves analyzing how the data changes over time and identifying points where the trend shifts. The model then fits a linear regression to each segment, resulting in a more precise forecast.

Seasonality Modelling

The model uses the Fourier series to capture seasonal patterns in the data. This involves identifying the main seasonal periods in the data (e.g., weekly, monthly, yearly) and fitting a series of sinusoidal functions to capture those patterns. By doing this, the model makes more accurate predictions about how the data will change over time.

For daily granularity, weekly seasonality is activated when there are at least two weeks of data points. Yearly seasonality is enabled based on the following thresholds for each granularity:

Daily: 729 days
Weekly: 722 days
Monthly: 699 days

Forecast

Once the model is trained, it can forecast future values of the time series. The model provides a point forecast for each future time point, along with upper and lower bounds for the uncertainty intervals. This helps you understand potential changes in the data over time and gauge the model's confidence in its predictions.

The model uses a Bayesian framework to generate uncertainty intervals around its point forecasts. This technique allows the model to estimate the level of certainty about its predictions. The uncertainty intervals indicate the potential variation in the data and the model's confidence in its forecasts.

The main forecast line represents the point forecast, offering the best estimate of future values based on historical data. The upper and lower bounds of the forecast represent an 80% confidence interval, which shows the range within which the true future value is expected to fall with 80% probability.

The width of the confidence interval may vary based on the forecasted period, which is determined by the model's time horizon. As the forecast horizon increases, the uncertainty around the forecast may increase, resulting in wider confidence intervals.

Factors affecting forecasting accuracy

While Databox's forecasting model is highly accurate, several factors can impact its performance. Some of the most important factors that can affect the forecasting accuracy are:

Historical data: Forecasts are predictions based on past data. The more historical data available, the better the model can identify trends and patterns to predict future performance.
Data quality: The accuracy of forecasts relies on the quality of historical data. Incomplete, inconsistent, or erroneous data can significantly affect forecast accuracy.
Data anomalies: Unusual spikes or dips, caused by one-time events such as server outages or promotional campaigns, can impact forecast accuracy.
Seasonality: Metrics often experience seasonal fluctuations, such as decreased sales during holidays or end-of-quarter peaks. The model can account for these seasonal trends if they are evident in the historical data.
Holidays: Holidays can significantly impact metric behavior. For instance, e-commerce sites may see increased activity during holidays, while B2B sites might experience a decrease. The model can incorporate these patterns into the forecast calculation when generating a forecast.

Still need help?

Visit our community, send us an email, or start a chat in Databox.