Merge datasets

Availability

Users, Editors, and Admins

All accounts

Feature exclusive to select subscription plans

Merging datasets helps you combine related information from multiple sources into a single, unified dataset. By merging datasets, businesses can get a complete view of their performance, simplify reporting, improve data accuracy, and unlock deeper insights. With flexible join options, you can control how data is matched and merged to suit your business needs.

Merge datasets

Before merging, make sure you have already created the datasets with all the columns you want to include, as you won't be able to add or modify columns during the merge process.

To start, navigate to Data Manager > Datasets and click + Merge Datasets. This will open the merge flow, where you can follow these steps:

Click Select dataset, choose the first dataset you want to merge, and click Continue.
Select the columns you want to include from this dataset, then review the dataset preview.
Click Dataset above the preview table to add another dataset.
Select the second dataset, choose the Join operator (Left outer, Inner, Right outer, or Full outer), and define the Join conditions by matching columns from each dataset. The Output preview at the bottom shows the number of rows that match based on the selected join operator and conditions.
Click Select to confirm the join, then choose which columns from the second dataset you want to include.
Click Continue to review the preview of the merged dataset.

You can repeat steps 3–6 to merge additional datasets, up to a maximum of 20. Note that the final merged dataset can include up to 150 columns in total.

Note: Merging datasets does not alter the original datasets. A new, separate merged dataset is created.

When merging datasets, Databox processes them from left to right in the list of datasets. This means:

The left side of each join refers to either the first dataset or the result of all previous merges.
The right side is the new dataset being merged at that step.

Here’s a simple example:

Dataset A ➔ (merge with) ➔ Dataset B ➔ (merge with) ➔ Dataset C

First, A is merged with B.
Then, the result of (A + B) is merged with C.

Join operators

When merging datasets, you can choose between four join operators depending on how you want to match and combine data:

Join operator	Rows from left (first) dataset	Rows from right (second) dataset	Result
Left outer	All rows from the left dataset	Matching rows from the right dataset	Includes all left dataset rows, and matching rows from the right dataset. Non-matching rows from the right are excluded.
Inner	Only matching rows	Only matching rows	Includes only rows that have a match in both datasets.
Right outer	Matching rows from the left dataset	All rows from the right dataset	Includes all right dataset rows, and matching rows from the left dataset. Non-matching rows from the left are excluded.
Full outer	All rows from the left dataset	All rows from the right dataset	Includes all rows from both datasets. Non-matching rows will have null (empty) values where there’s no match.

Join conditions

When merging datasets, join conditions define how the datasets are matched together.

A join condition links a column from the left dataset to a column from the right dataset, based on matching values.

For example, if both datasets have a column called ID, you can create a join condition where ID from the left matches ID from the right. Only rows where the selected column values match will be joined together, depending on the join operator.

You can set one or more join conditions for each merge step. If you add multiple conditions, all conditions must be true for rows to match (similar to using "AND" logic in SQL).

Best practices

Use columns with unique or clearly matching values whenever possible (such as IDs, emails, or names).
Avoid using columns with a lot of null or empty values, as this may lead to missing matches or unexpected results.
Make sure the data types of the columns you join on are compatible (e.g., both columns should be text, or both should be numbers).

Examples

Let's say you have two datasets:

Dataset A

Region ID	Name	Sales
1	Alice	100
2	Bob	150
3	Charlie	200

Dataset B

Region ID	Region
2	East
3	West
4	South

Before merging, you define a join condition that matches the Region ID column from the first dataset with the Region ID column from the second dataset. This ensures that rows with the same Region ID value are matched together during the merge.

Depending on the join operator you select (Left outer, Inner, Right outer, or Full outer), the resulting merged dataset will look different. See the examples for each join operator below.

Left outer

Includes all rows from Dataset A, with matching Region values from Dataset B. Rows without a match in B have null values for Region.

Region ID	Name	Sales	Region
1	Alice	100	`null`
2	Bob	150	East
3	Charlie	200	West

Inner

Includes only rows where the Region ID exists in both datasets.

Region ID	Name	Sales	Region
2	Bob	150	East
3	Charlie	200	West

Right outer

Includes all rows from Dataset B, with matching Name and Sales values from Dataset A. Rows without a match in A have null values for Name and Sales.

Region ID	Name	Sales	Region
2	Bob	150	East
3	Charlie	200	West
4	`null`	`null`	South

Full outer

Includes all rows from both datasets. Missing values are shown as null.

Region ID	Name	Sales	Region
1	Alice	100	`null`
2	Bob	150	East
3	Charlie	200	West
4	`null`	`null`	South

Frequently Asked Questions

Can I merge datasets from different integrations?

Yes, you can merge datasets from different integrations as long as they have compatible join columns (e.g., matching IDs, names, or other common fields).

Can I use a merged dataset as an input in another merge?

Yes. Once you create and save a merged dataset, it becomes available alongside your other datasets. You can select it as an input when setting up a new merge, allowing you to build more complex datasets in multiple steps if needed.

What happens if two datasets have columns with the same name?

If there are duplicate column names between datasets, Databox will automatically rename the columns during the merge to avoid conflicts (e.g., "Sales" and "Sales (2)").

Will merging datasets change my original datasets?

No. Merging datasets creates a new, separate dataset. Your original datasets remain unchanged.

Still need help?

Visit our community, send us an email, or start a chat in Databox.

Merge datasets

Join multiple datasets together by matching columns to create a single, merged dataset without altering the original data.

Merge datasets

Join operators

Join conditions

Best practices

Examples

Left outer

Inner

Right outer

Full outer

Frequently Asked Questions

Can I merge datasets from different integrations?

Can I use a merged dataset as an input in another merge?

What happens if two datasets have columns with the same name?

Will merging datasets change my original datasets?

Still need help?