Skip to content
  • There are no suggestions because the search field is empty.

Databox Forecasting Methodology

Explore how Databox forecasts future performance using historical data through an ensemble of Prophet and Bayesian regression models.

Databox provides a powerful feature for predicting future outcomes of key metrics. Developed by Databox's Data Science team, this service uses advanced methodologies to generate forecasts by analyzing historical data and incorporating various influencing factors.

Given that Databox's metric data is time-based, forecasting involves predicting how the sequence of observations will evolve over time. Accurate forecasting is essential for organizations involved in capacity planning and goal setting, as it enables efficient resource allocation and performance measurement against established targets.

Data collection

To generate forecasts, the service uses a metric data configuration ID to retrieve the necessary data. The source of this data varies based on the storage solution employed. The required data is obtained directly from the current data warehouse, ensuring that the model-building and benchmarking processes utilize the same database as the production systems. This integration includes data from both Analytics and Benchmark Groups products.

Data preparation

To build a time series forecasting model, it is essential to have a variable indexed by a specific date. Before fitting the model, the input data is adjusted to the format required by the model fitting function, enhancing performance compared to using the raw data.

A key factor in this process is handling anomalies. Since anomalies are inherently unpredictable, they can skew forecasts and inflate the estimated variance. To mitigate this, the prediction intervals are intentionally widened to account for the impact of these anomalies.

The forecasting model

Prophet

The Prophet forecasting model, developed by Facebook's AI research team, is the backbone of Databox’s forecasting system. This model includes three components: trend, seasonality, and holidays. These components are combined in an additive fashion as described in the following equation:

  • g(t) is a piecewise linear or logistic growth curve trend. Prophet automatically detects changes in trends by selecting change points from the data.
  • s(t) is a combination of:
    • A yearly seasonal component modeled using the Fourier series. Can be auto-detected if the appropriate setting is enabled.
    • A weekly seasonal component using dummy variables.
  • h(t) is a user-provided list of important holidays.

pinNote: User-provided lists of important holidays, recognized by h(t) in the equation, is not currently included in Databox's forecasting model.

The key concept in Prophet is to accurately fit the trend component by using a flexible regression model to help improve the precision and accuracy of the forecast. This approach allows for greater modeling flexibility, easier model fitting, and better handling of missing data or outliers than traditional time series models. 

With Prophet, Databox is able to model nonlinear saturating growth by specifying a minimum and maximum population value. For forecasting problems that don't exhibit saturating growth, a piecewise constant rate of growth that provides an economic model can be used.

Prophet's ability to perform well on a diverse range of forecasting problems and its efficient prediction generation process make it an ideal choice for large-scale forecasting projects.

pinNote: Although Databox utilizes the Prophet model developed by Facebook's team, the forecast feature in Databox is independently developed and operated. All data required for forecasting is processed solely by Databox and is neither sent to nor shared with any third parties.

Trend modelling

The Prophet model uses a piecewise linear approach to capture changes in the underlying trend of the time series. This method involves analyzing how the data changes over time and identifying points where the trend shifts. The model then fits a linear regression to each segment, resulting in a more precise forecast.

Seasonality modelling

The Prophet model uses the Fourier series to capture seasonal patterns in the data. This involves identifying the main seasonal periods in the data (e.g., weekly, monthly, yearly) and fitting a series of sinusoidal functions to capture those patterns. By doing this, the model makes more accurate predictions about how the data will change over time.

For daily granularity, weekly seasonality is activated when there are at least two weeks of data points. Yearly seasonality is enabled based on the following thresholds for each granularity:
  • Daily: 729 days
  • Weekly: 722 days
  • Monthly: 699 days

Forecasting methodologies

Databox supports several forecasting methodologies, each designed to fit different data patterns and use cases.

Databox Forecasting Methodology

Databox signature Forecasting methodology combines Prophet and Bayesian regression into an ensemble model:

  • Prophet captures long-term trends, seasonality, and general time-series structure.
  • Bayesian regression integrates impacting metrics with user-defined priors to refine the forecast.

The ensemble produces a balanced forecast that leverages both historical trends and domain-specific inputs.

Simple statistical methods

  • Simple exponential smoothing: Applies exponentially decaying weights to past observations to generate a smoothed forecast.
  • Linear change: Projects future values based on a user-defined linear trend.
  • Moving average: Generates forecasts as the mean of a rolling window of past observations.
  • Simple mean/median: Uses the historical mean or median as the forecast — useful as a baseline or when planning for metrics expected to stay within historical ranges.

Forecast

Once the model is trained, it can forecast future values of the time series. The Databox Forecasting Methodology and the Linear Regression methodology utilize the Bayesian framework to generate uncertainty intervals around their point intervals.

The uncertainty intervals indicate the potential variation in the data and the model's confidence in its forecasts.

The main forecast line represents the point forecast, offering the best estimate of future values based on historical data. The upper and lower bounds of the forecast represent an 80% confidence interval, which shows the range within which the true future value is expected to fall with 80% probability.

The width of the confidence interval may vary based on the forecasted period, which is determined by the model's time horizon. As the forecast horizon increases, the uncertainty around the forecast may increase, resulting in wider confidence intervals.

Factors affecting forecasting accuracy

While Databox's forecasting model is highly accurate, several factors can impact its performance. Some of the most important factors that can affect the forecasting accuracy are:

  • Historical data: Forecasts are predictions based on past data. The more historical data available, the better the model can identify trends and patterns to predict future performance.
  • Data quality: The accuracy of forecasts relies on the quality of historical data. Incomplete, inconsistent, or erroneous data can significantly affect forecast accuracy.
  • Data anomalies: Unusual spikes or dips, caused by one-time events such as server outages or promotional campaigns, can impact forecast accuracy.
  • Seasonality: Metrics often experience seasonal fluctuations, such as decreased sales during holidays or end-of-quarter peaks. The model can account for these seasonal trends if they are evident in the historical data.
  • Holidays: Holidays can significantly impact metric behavior. For instance, e-commerce sites may see increased activity during holidays, while B2B sites might experience a decrease. The model can incorporate these patterns into the forecast calculation when generating a forecast.

Still need help?

Visit our community, send us an email, or start a chat in Databox.