# Migrate from legacy SQL metrics to datasets

## Why migrate from legacy SQL metrics

Legacy SQL metric builders are being deprecated, along with all custom metrics created using them for the following integrations:

- Amazon Redshift
- Google BigQuery
- Microsoft Azure SQL
- Microsoft SQL Server
- MySQL
- PostgreSQL
- Snowflake


Going forward, SQL metrics will be created using [datasets](/understanding-datasets), which provide a more scalable and flexible way to work with your SQL data.

With legacy SQL metrics, each metric runs its own query on every sync. If you have 200 metrics, that means 200 separate queries hitting your database every time data refreshes. This increases database load, slows down syncs, and makes ongoing maintenance more complex.

With datasets, syncs are tied to datasets instead of individual metrics. One dataset query can power dozens or even hundreds of metrics. This significantly reduces database load and simplifies long-term management.

By migrating to datasets, you can:

- Store up to 20x more data
- Reuse the same dataset across multiple metrics
- Analyze data across multiple dimensions
- Apply filters and drill-downs directly in visualizations
- Create custom calculated columns
- Merge datasets from multiple data sources
- Export raw data when needed
- Use AI-powered insights with Genie and [Databox MCP](/query-and-analyze-your-data-using-ai-tools-and-databox-mcp)
- Improve refresh performance and maintainability


In-app migration tools will be available soon to help identify legacy data sources and replace legacy SQL metrics. You may still choose to migrate manually based on your timeline or requirements, but the tooling will provide a more streamlined and reliable migration experience once available.

## Key considerations for migration

### What's changing

Previously, each metric had its own SQL query that ran independently on every sync. Now, SQL queries are defined at the Dataset level, creating a data preparation step between your database and your metrics.

One dataset query powers multiple metrics, and a single sync refreshes all metrics built on that dataset. Dimension filtering now happens at the metric level instead of in SQL WHERE clauses.

### What stays the same

Your existing database connection, account content, and historical data remain unchanged. The key difference is consolidating many per-metric queries into fewer, broader dataset queries.

## Manual migration steps

The biggest benefit of datasets comes from consolidating queries.

Instead of migrating each metric 1:1, look for patterns where multiple metrics query the same table with different filters.

### Step 1: Identify legacy SQL metrics

1. Go to [Metrics > Custom Metrics](https://app.databox.com/metrics/custom-metrics).
2. Filter by your SQL data source (PostgreSQL, MySQL, Google BigQuery, etc.).
3. Look for metrics marked with a ⚠️ warning icon.
4. Open each metric and **copy the SQL query** for reference.


### Step 2: Identify consolidation patterns

Look for metrics that:

- Query the same table or set of tables
- Use the same or very similar JOIN structure
- Share the same base dataset but differ in filters, date granularity, aggregations, or calculated logic


Common patterns include:

| Pattern | Example | Consolidation approach |
|  --- | --- | --- |
| Same query, different filter values | `WHERE location_id = 101` vs `WHERE location_id = 102` | Remove the hardcoded filter. Include `location_id` in `SELECT`. Apply filtering at the metric level. |
| Separate queries for different aggregations | One metric uses `SUM(revenue)`, another uses `COUNT(order_id)` from the same table | Include all required measures in the same dataset query. Create separate metrics using different value columns or aggregations. |
| Separate queries for calculated variations | One metric calculates `SUM(revenue)`, another calculates `SUM(revenue) - SUM(cost)` | Return base columns (such as `revenue` and `cost`) in the dataset. Create calculated metrics or [calculated columns](/add-a-calculated-column-to-a-dataset) instead of separate SQL queries. |
| Separate queries for different date grains | One query groups by `DATE(order_date)`, another by `DATE_TRUNC('month', order_date)` | Return data at the lowest useful grain (for example daily). Handle aggregation at the metric or visualization level. |


### Step 3: Create consolidated datasets

1. Go to [Data Manager > Datasets](https://app.databox.com/data-manager/datasets).
2. Click **+ New Dataset**.
3. Select your SQL data source and click **Continue**.
4. Select **Write SQL**.
5. Enter your consolidated SQL query.
6. Click **Preview data**to confirm the structure and formatting look correct and the data appears as expected.
7. Click **Continue**.
8. Select update frequency and click **Finish**.


### Step 4: Recreate metrics from the datasets

For each legacy metric:

1. Open the [dataset view](/view-a-dataset).
2. Click **+ Create new metric** on the right-hand panel.
3. Select the appropriate **measure**, **date**,and **dimension** columns.
4. Reapply any logic that previously lived in the SQL query, such as:
  1. Adding filters that were previously in the WHERE clause
  2. Choosing the correct aggregation
5. Click **Preview data** and compare the new metric with the legacy metric to ensure the numbers align.
6. Click **Save**.
7. Replace the old metric wherever it's used across your content to ensure everything continues working with the new dataset-based metric.


## Consolidation example

### Before: Separate queries per location


```sql
SELECT order_date::DATE AS "Date", SUM(total_amount) AS "Revenue"  
FROM orders  
WHERE location_id = 101  
GROUP BY order_date::DATE  
ORDER BY "Date" DESC;
```

Same structure repeated for each location.

### After: One dataset query


```sql
SELECT  
    o.order_date::DATE AS "Date",  
    l.location_id,  
    l.location_name,  
    SUM(o.total_amount) AS "Revenue"  
FROM orders o  
JOIN locations l ON o.location_id = l.location_id  
GROUP BY o.order_date::DATE, l.location_id, l.location_name  
ORDER BY "Date" DESC;
```

Now you can filter by `location_id` or `location_name` at the metric level.

Result: One query replaces many. New locations appear automatically without modifying SQL queries.

## Best practices

When creating consolidated datasets, focus on flexibility, clarity, and long-term maintainability.

### Structure your query for flexibility

Include all columns that may be used for filtering, grouping, comparison, or future reporting in the `SELECT` statement.

Instead of hardcoding values in a `WHERE` clause or excluding supporting fields, return those columns in your dataset so they become reusable dimensions. This makes the dataset more flexible and reduces the need to modify the SQL query later.

Keep datasets focused

Avoid combining many unrelated tables into one large query. Overly broad datasets are harder to maintain and can slow down performance.

Instead, group related metrics into focused datasets, such as:

- Sales and transactions
- Budget and goals
- Customer activity
- Employee performance


### Avoid unnecessary TOP or LIMIT clauses

Legacy metrics often included `TOP` or `LIMIT` clauses as safeguards.

Datasets support significantly larger volumes of data, so artificial limits are usually unnecessary. Remove them unless they are required for performance or business logic.

No. You can migrate metrics at your own pace before the deprecation takes effect.

Group related metrics and migrate them together as a consolidated dataset. This makes it easier to validate results and ensure everything works correctly before moving on to the next group.

No. Each dataset connects to a single data source. Metrics that pull from different databases must be created as separate datasets.

If you need to analyze data across multiple sources, you can [merge datasets](/merge-datasets) after they are created.

Datasets support a 20x larger data size limit compared to legacy SQL metrics. In most cases, removing a WHERE clause filter and adding the column as a dimension will still fit within the limit.

If you reach the size limit, you can:

- Add a date range filter in your query (for example, limit data to the last two years)
- Split the query into multiple focused datasets instead of one broad dataset
- Reduce granularity where appropriate, such as aggregating older data weekly instead of daily


Not all metrics should be combined into a single dataset. If metrics rely on different tables or fundamentally different JOIN structures, keep them as separate datasets.

Even partial consolidation delivers significant benefits. For example, reducing 100 individual queries to 10 dataset queries dramatically lowers database load and simplifies long-term maintenance.

 

Ask Genie
Get instant answers or help with your data using the in-app AI assistant.

Talk to an expert
For customers: Get help with your setup, strategy, or making the most of Databox.

Book a demo
New to Databox? See how it works and get guidance on getting started.

Send an email
Reach out to support for help with your account, data, or technical issues.