Migrate from legacy SQL metrics to datasets

Why migrate from legacy SQL metrics

Legacy SQL metric builders are being deprecated, along with all custom metrics created using them for the following integrations:

Amazon Redshift
Google BigQuery
Microsoft Azure SQL
Microsoft SQL Server
MySQL
PostgreSQL
Snowflake

Going forward, SQL metrics will be created using datasets, which provide a more scalable and flexible way to work with your SQL data.

With legacy SQL metrics, each metric runs its own query on every sync. If you have 200 metrics, that means 200 separate queries hitting your database every time data refreshes. This increases database load, slows down syncs, and makes ongoing maintenance more complex.

With datasets, syncs are tied to datasets instead of individual metrics. One dataset query can power dozens or even hundreds of metrics. This significantly reduces database load and simplifies long-term management.

By migrating to datasets, you can:

Store up to 20x more data
Reuse the same dataset across multiple metrics
Analyze data across multiple dimensions
Apply filters and drill-downs directly in visualizations
Create custom calculated columns
Merge datasets from multiple data sources
Export raw data when needed
Use AI-powered insights with Genie and Databox MCP
Improve refresh performance and maintainability

Note: In-app migration tools will be available soon to help identify legacy data sources and replace legacy SQL metrics. You may still choose to migrate manually based on your timeline or requirements, but the tooling will provide a more streamlined and reliable migration experience once available.

Key considerations for migration

What's changing

Previously, each metric had its own SQL query that ran independently on every sync. Now, SQL queries are defined at the Dataset level, creating a data preparation step between your database and your metrics.

One dataset query powers multiple metrics, and a single sync refreshes all metrics built on that dataset. Dimension filtering now happens at the metric level instead of in SQL WHERE clauses.

What stays the same

Your existing database connection, account content, and historical data remain unchanged. The key difference is consolidating many per-metric queries into fewer, broader dataset queries.

Manual migration steps

The biggest benefit of datasets comes from consolidating queries.

Instead of migrating each metric 1:1, look for patterns where multiple metrics query the same table with different filters.

Step 1: Identify legacy SQL metrics

Go to Metrics > Custom Metrics.
Filter by your SQL data source (PostgreSQL, MySQL, Google BigQuery, etc.).
Look for metrics marked with a ⚠️ warning icon.
Open each metric and copy the SQL query for reference.

Step 2: Identify consolidation patterns

Look for metrics that:

Query the same table or set of tables
Use the same or very similar JOIN structure
Share the same base dataset but differ in filters, date granularity, aggregations, or calculated logic

Common patterns include:

Pattern	Example	Consolidation approach
Same query, different filter values	`WHERE location_id = 101` vs `WHERE location_id = 102`	Remove the hardcoded filter. Include `location_id` in `SELECT`. Apply filtering at the metric level.
Separate queries for different aggregations	One metric uses `SUM(revenue)`, another uses `COUNT(order_id)` from the same table	Include all required measures in the same dataset query. Create separate metrics using different value columns or aggregations.
Separate queries for calculated variations	One metric calculates `SUM(revenue)`, another calculates `SUM(revenue) - SUM(cost)`	Return base columns (such as `revenue` and `cost`) in the dataset. Create calculated metrics or calculated columns instead of separate SQL queries.
Separate queries for different date grains	One query groups by `DATE(order_date)`, another by `DATE_TRUNC('month', order_date)`	Return data at the lowest useful grain (for example daily). Handle aggregation at the metric or visualization level.

Step 3: Create consolidated datasets

Go to Data Manager > Datasets.
Click + New Dataset.
Select your SQL data source and click Continue.
Select Write SQL.
Enter your consolidated SQL query.
Click Preview data to confirm the structure and formatting look correct and the data appears as expected.
Click Continue.
Select update frequency and click Finish.

Step 4: Recreate metrics from the datasets

For each legacy metric:

Open the dataset view.
Click + Create new metric on the right-hand panel.
Select the appropriate measure, date, and dimension columns.
Reapply any logic that previously lived in the SQL query, such as:
1. Adding filters that were previously in the WHERE clause
2. Choosing the correct aggregation
Click Preview data and compare the new metric with the legacy metric to ensure the numbers align.
Click Save.
Replace the old metric wherever it’s used across your content to ensure everything continues working with the new dataset-based metric.

Consolidation example

Before: Separate queries per location

SELECT order_date::DATE AS "Date", SUM(total_amount) AS "Revenue"
FROM orders
WHERE location_id = 101
GROUP BY order_date::DATE
ORDER BY "Date" DESC;

Same structure repeated for each location.

After: One dataset query

SELECT
    o.order_date::DATE AS "Date",
    l.location_id,
    l.location_name,
    SUM(o.total_amount) AS "Revenue"
FROM orders o
JOIN locations l ON o.location_id = l.location_id
GROUP BY o.order_date::DATE, l.location_id, l.location_name
ORDER BY "Date" DESC;

Now you can filter by location_id or location_name at the metric level.

Result: One query replaces many. New locations appear automatically without modifying SQL queries.

Best practices

When creating consolidated datasets, focus on flexibility, clarity, and long-term maintainability.

Structure your query for flexibility

Include all columns that may be used for filtering, grouping, comparison, or future reporting in the SELECT statement.

Instead of hardcoding values in a WHERE clause or excluding supporting fields, return those columns in your dataset so they become reusable dimensions. This makes the dataset more flexible and reduces the need to modify the SQL query later.

Keep datasets focused

Avoid combining many unrelated tables into one large query. Overly broad datasets are harder to maintain and can slow down performance.

Instead, group related metrics into focused datasets, such as:

Sales and transactions
Budget and goals
Customer activity
Employee performance

Avoid unnecessary TOP or LIMIT clauses

Legacy metrics often included TOP or LIMIT clauses as safeguards.

Datasets support significantly larger volumes of data, so artificial limits are usually unnecessary. Remove them unless they are required for performance or business logic.

Frequently Asked Questions

Do I have to migrate everything at once?

No. You can migrate metrics at your own pace before the deprecation takes effect.

Group related metrics and migrate them together as a consolidated dataset. This makes it easier to validate results and ensure everything works correctly before moving on to the next group.

Should I consolidate metrics that use different databases?

No. Each dataset connects to a single data source. Metrics that pull from different databases must be created as separate datasets.

If you need to analyze data across multiple sources, you can merge datasets after they are created.

What if my consolidated dataset returns too much data?

Datasets support a 20x larger data size limit compared to legacy SQL metrics. In most cases, removing a WHERE clause filter and adding the column as a dimension will still fit within the limit.

If you reach the size limit, you can:

Add a date range filter in your query (for example, limit data to the last two years)
Split the query into multiple focused datasets instead of one broad dataset
Reduce granularity where appropriate, such as aggregating older data weekly instead of daily

What if my queries are too different to consolidate?

Not all metrics should be combined into a single dataset. If metrics rely on different tables or fundamentally different JOIN structures, keep them as separate datasets.

Even partial consolidation delivers significant benefits. For example, reducing 100 individual queries to 10 dataset queries dramatically lowers database load and simplifies long-term maintenance.

Still need help?

Visit our community, send us an email, or start a chat in Databox.