Skip to content
  • There are no suggestions because the search field is empty.

Merge datasets

Join multiple datasets together by matching columns to create a single, merged dataset without altering the original data.


Availability

profile  Users, Editors, and Admins

box  All accounts

lock  Feature exclusive to select subscription plans



Merging datasets helps you combine related information from multiple sources into a single, unified dataset. By merging datasets, businesses can get a complete view of their performance, simplify reporting, improve data accuracy, and unlock deeper insights. With flexible join options, you can control how data is matched and merged to suit your business needs.

Merge datasets

Before merging, make sure you have already created the datasets with all the columns you want to include, as you won't be able to add or modify columns during the merge process.

To start, navigate to Data Manager > Datasets and click + Merge Datasets. This will open the merge flow, where you can follow these steps:
  1. Click Select dataset, choose the first dataset you want to merge, and click Continue.
  2. Select the columns you want to include from this dataset, then review the dataset preview.
  3. Click circle-plus Dataset above the preview table to add another dataset.
  4. Select the second dataset, choose the Join operator (Left outer, Inner, Right outer, or Full outer), and define the Join conditions by matching columns from each dataset. The Output preview at the bottom shows the number of rows that match based on the selected join operator and conditions.
  5. Click Select to confirm the join, then choose which columns from the second dataset you want to include.
  6. Click Continue to review the preview of the merged dataset.

You can repeat steps 3–6 to merge additional datasets, up to a maximum of 20. Note that the final merged dataset can include up to 150 columns in total.

pinNote: Merging datasets does not alter the original datasets. A new, separate merged dataset is created.

When merging datasets, Databox processes them from left to right in the list of datasets. This means:

  • The left side of each join refers to either the first dataset or the result of all previous merges.
  • The right side is the new dataset being merged at that step.

Here’s a simple example:

Dataset A ➔ (merge with) ➔ Dataset B ➔ (merge with) ➔ Dataset C

  • First, A is merged with B.
  • Then, the result of (A + B) is merged with C.

Join operators

When merging datasets, you can choose between four join operators depending on how you want to match and combine data:

Join operator Rows from left (first) dataset Rows from right (second) dataset Result
Left outer All rows from the left dataset Matching rows from the right dataset Includes all left dataset rows, and matching rows from the right dataset. Non-matching rows from the right are excluded.
Inner Only matching rows Only matching rows Includes only rows that have a match in both datasets.
Right outer Matching rows from the left dataset All rows from the right dataset Includes all right dataset rows, and matching rows from the left dataset. Non-matching rows from the left are excluded.
Full outer All rows from the left dataset All rows from the right dataset Includes all rows from both datasets. Non-matching rows will have null (empty) values where there’s no match.

Join conditions

When merging datasets, join conditions define how the datasets are matched together.

A join condition links a column from the left dataset to a column from the right dataset, based on matching values.

For example, if both datasets have a column called ID, you can create a join condition where ID from the left matches ID from the right. Only rows where the selected column values match will be joined together, depending on the join operator.

You can set one or more join conditions for each merge step. If you add multiple conditions, all conditions must be true for rows to match (similar to using "AND" logic in SQL).

Best practices

  • Use columns with unique or clearly matching values whenever possible (such as IDs, emails, or names).
  • Avoid using columns with a lot of null or empty values, as this may lead to missing matches or unexpected results.
  • Make sure the data types of the columns you join on are compatible (e.g., both columns should be text, or both should be numbers).

Examples

Let's say you have two datasets:

Dataset A

Region ID Name Sales
1 Alice 100
2 Bob 150
3 Charlie 200

Dataset B

Region ID Region
2 East
3 West
4 South

Before merging, you define a join condition that matches the Region ID column from the first dataset with the Region ID column from the second dataset. This ensures that rows with the same Region ID value are matched together during the merge.

Depending on the join operator you select (Left outer, Inner, Right outer, or Full outer), the resulting merged dataset will look different. See the examples for each join operator below.

Left outer

Includes all rows from Dataset A, with matching Region values from Dataset B. Rows without a match in B have null values for Region.

Region ID Name Sales Region
1 Alice 100 null
2 Bob 150 East
3 Charlie 200 West

Inner

Includes only rows where the Region ID exists in both datasets.

Region ID Name Sales Region
2 Bob 150 East
3 Charlie 200 West

Right outer

Includes all rows from Dataset B, with matching Name and Sales values from Dataset A. Rows without a match in A have null values for Name and Sales.

Region ID Name Sales Region
2 Bob 150 East
3 Charlie 200 West
4 null null South

Full outer

Includes all rows from both datasets. Missing values are shown as null.

Region ID Name Sales Region
1 Alice 100 null
2 Bob 150 East
3 Charlie 200 West
4 null null South

Frequently Asked Questions

Can I merge datasets from different integrations?

Yes, you can merge datasets from different integrations as long as they have compatible join columns (e.g., matching IDs, names, or other common fields).

Can I use a merged dataset as an input in another merge?

Yes. Once you create and save a merged dataset, it becomes available alongside your other datasets. You can select it as an input when setting up a new merge, allowing you to build more complex datasets in multiple steps if needed.

What happens if two datasets have columns with the same name?

If there are duplicate column names between datasets, Databox will automatically rename the columns during the merge to avoid conflicts (e.g., "Sales" and "Sales (2)").

Will merging datasets change my original datasets?

No. Merging datasets creates a new, separate dataset. Your original datasets remain unchanged.

Still need help?

Visit our community, send us an email, or start a chat in Databox.