Merge datasets
Join multiple datasets together by matching columns to create a single, merged dataset without altering the original data.
Availability
Users, Editors, and Admins
All accounts
Feature exclusive to select subscription plans
Merging datasets helps you combine related information from multiple sources into a single, unified dataset. By merging datasets, businesses can get a complete view of their performance, simplify reporting, improve data accuracy, and unlock deeper insights. With flexible join options, you can control how data is matched and merged to suit your business needs.
Merge datasets
Before merging, make sure you have already created the datasets with all the columns you want to include, as you won't be able to add or modify columns during the merge process.
To start, navigate to Data Manager > Datasets and click + Merge Datasets. This will open the merge flow, where you can follow these steps:- Click Select dataset, choose the first dataset you want to merge, and click Continue.
- Select the columns you want to include from this dataset, then review the dataset preview.
- Click
Dataset above the preview table to add another dataset.
- Select the second dataset, choose the Join operator (Left outer, Inner, Right outer, or Full outer), and define the Join conditions by matching columns from each dataset. The Output preview at the bottom shows the number of rows that match based on the selected join operator and conditions.
- Click Select to confirm the join, then choose which columns from the second dataset you want to include.
- Click Continue to review the preview of the merged dataset.
You can repeat steps 3–6 to merge additional datasets, up to a maximum of 20. Note that the final merged dataset can include up to 150 columns in total.
Note: Merging datasets does not alter the original datasets. A new, separate merged dataset is created.
When merging datasets, Databox processes them from left to right in the list of datasets. This means:
- The left side of each join refers to either the first dataset or the result of all previous merges.
- The right side is the new dataset being merged at that step.
Here’s a simple example:
Dataset A ➔ (merge with) ➔ Dataset B ➔ (merge with) ➔ Dataset C
- First, A is merged with B.
- Then, the result of (A + B) is merged with C.
Join operators
When merging datasets, you can choose between four join operators depending on how you want to match and combine data:
Join operator | Rows from left (first) dataset | Rows from right (second) dataset | Result |
---|---|---|---|
![]() |
All rows from the left dataset | Matching rows from the right dataset | Includes all left dataset rows, and matching rows from the right dataset. Non-matching rows from the right are excluded. |
![]() |
Only matching rows | Only matching rows | Includes only rows that have a match in both datasets. |
![]() |
Matching rows from the left dataset | All rows from the right dataset | Includes all right dataset rows, and matching rows from the left dataset. Non-matching rows from the left are excluded. |
![]() |
All rows from the left dataset | All rows from the right dataset | Includes all rows from both datasets. Non-matching rows will have null (empty) values where there’s no match. |
Join conditions
When merging datasets, join conditions define how the datasets are matched together.
A join condition links a column from the left dataset to a column from the right dataset, based on matching values.
For example, if both datasets have a column called ID
, you can create a join condition where ID
from the left matches ID
from the right. Only rows where the selected column values match will be joined together, depending on the join operator.
You can set one or more join conditions for each merge step. If you add multiple conditions, all conditions must be true for rows to match (similar to using "AND" logic in SQL).
Best practices
- Use columns with unique or clearly matching values whenever possible (such as IDs, emails, or names).
- Avoid using columns with a lot of
null
or empty values, as this may lead to missing matches or unexpected results. - Make sure the data types of the columns you join on are compatible (e.g., both columns should be text, or both should be numbers).
Examples
Let's say you have two datasets:
Dataset A
|
Dataset B
|
Before merging, you define a join condition that matches the Region ID
column from the first dataset with the Region ID
column from the second dataset. This ensures that rows with the same Region ID
value are matched together during the merge.
Depending on the join operator you select (Left outer, Inner, Right outer, or Full outer), the resulting merged dataset will look different. See the examples for each join operator below.
Left outer
Includes all rows from Dataset A, with matching Region values from Dataset B. Rows without a match in B have null
values for Region.
Region ID | Name | Sales | Region |
---|---|---|---|
1 | Alice | 100 | null |
2 | Bob | 150 | East |
3 | Charlie | 200 | West |
Inner
Includes only rows where the Region ID exists in both datasets.
Region ID | Name | Sales | Region |
---|---|---|---|
2 | Bob | 150 | East |
3 | Charlie | 200 | West |
Right outer
Includes all rows from Dataset B, with matching Name and Sales values from Dataset A. Rows without a match in A have null
values for Name and Sales.
Region ID | Name | Sales | Region |
---|---|---|---|
2 | Bob | 150 | East |
3 | Charlie | 200 | West |
4 | null |
null |
South |
Full outer
Includes all rows from both datasets. Missing values are shown as null
.
Region ID | Name | Sales | Region |
---|---|---|---|
1 | Alice | 100 | null |
2 | Bob | 150 | East |
3 | Charlie | 200 | West |
4 | null |
null |
South |
Frequently Asked Questions
Can I merge datasets from different integrations?
Yes, you can merge datasets from different integrations as long as they have compatible join columns (e.g., matching IDs, names, or other common fields).
Can I use a merged dataset as an input in another merge?
Yes. Once you create and save a merged dataset, it becomes available alongside your other datasets. You can select it as an input when setting up a new merge, allowing you to build more complex datasets in multiple steps if needed.
What happens if two datasets have columns with the same name?
If there are duplicate column names between datasets, Databox will automatically rename the columns during the merge to avoid conflicts (e.g., "Sales" and "Sales (2)").
Will merging datasets change my original datasets?
No. Merging datasets creates a new, separate dataset. Your original datasets remain unchanged.
Still need help?
Visit our community, send us an email, or start a chat in Databox.