# Merge datasets

Merging datasets helps you combine related information from multiple sources into a single, unified dataset. By merging datasets, businesses can get a complete view of their performance, simplify reporting, improve data accuracy, and unlock deeper insights. With flexible join options, you can control how data is matched and merged to suit your business needs.

## Merge datasets

Before merging, make sure you have already created the datasets with all the columns you want to include, as you won't be able to add or modify columns during the merge process.

To start, navigate to **Data Manager > Datasets** and click **+ Merge Datasets**. This will open the merge flow, where you can follow these steps:

1. Click **Select dataset**, choose the first dataset you want to merge, and click **Continue**.
2. Select the columns you want to include from this dataset, then review the dataset preview.
3. Click ![circle-plus](/assets/merge-datasets_1.eb2b11537d300991fe3304a0410d18d77813e64ed42a09a0da2669436a962667.2fc30de7.svg) **Dataset** above the preview table to add another dataset.
4. Select the second dataset, choose the **Join operator** (Left outer, Inner, Right outer, or Full outer), and define the **Join conditions** by matching columns from each dataset. The **Output preview** at the bottom shows the number of rows that match based on the selected join operator and conditions.
5. Click **Select** to confirm the join, then choose which columns from the second dataset you want to include.
6. Click **Continue** to review the preview of the merged dataset.


You can repeat steps 3–6 to merge additional datasets, up to a maximum of **20**. Note that the final merged dataset can include up to **150 columns** in total.

Merging datasets does not alter the original datasets. A new, separate merged dataset is created.

When merging datasets, Databox processes them from **left to right** in the list of datasets. This means:

- The **left** side of each join refers to either the first dataset or the **result of all previous merges**.
- The **right** side is the new dataset being merged at that step.


Here's a simple example:

`Dataset A ➔ (merge with) ➔ Dataset B ➔ (merge with) ➔ Dataset C`

- First, A is merged with B.
- Then, the result of (A + B) is merged with C.


## Join operators

When merging datasets, you can choose between four join operators depending on how you want to match and combine data:

| Join operator | Rows from left (first) dataset | Rows from right (second) dataset | Result |
|  --- | --- | --- | --- |
| ![left-outer](/assets/merge-datasets_2.70a775e81ae97083f088afd8fd27f33b3b317c7625655accaf0371d81bbd498f.2fc30de7.svg) **Left outer** | All rows from the left dataset | Matching rows from the right dataset | Includes all left dataset rows, and matching rows from the right dataset. Non-matching rows from the right are excluded. |
| ![inner](/assets/merge-datasets_3.5d19839f99f9f7caa5c93ff50512f2107905ad6107e0fe575703bf6fb0d3a490.2fc30de7.svg) **Inner** | Only matching rows | Only matching rows | Includes only rows that have a match in both datasets. |
| ![right-outer](/assets/merge-datasets_4.263b5f89d0958a8869abd99c98630b41e7996d392af28f3dbeb4c72dcec08e76.2fc30de7.svg) **Right outer** | Matching rows from the left dataset | All rows from the right dataset | Includes all right dataset rows, and matching rows from the left dataset. Non-matching rows from the left are excluded. |
| ![full-outer](/assets/merge-datasets_5.61316279400fd99419fc80f60653ac797db6d1d128ea4e7b1d6ba70ee680a93b.2fc30de7.svg) **Full outer** | All rows from the left dataset | All rows from the right dataset | Includes all rows from both datasets. Non-matching rows will have null (empty) values where there's no match. |


## Join conditions

When merging datasets, **join conditions** define how the datasets are matched together.

A join condition links a column from the left dataset to a column from the right dataset, based on matching values.

For example, if both datasets have a column called `ID`, you can create a join condition where `ID` from the left matches `ID` from the right. Only rows where the selected column values match will be joined together, depending on the join operator.

You can set one or more join conditions for each merge step. If you add multiple conditions, **all** conditions must be true for rows to match (similar to using "AND" logic in SQL).

join-conditions
### Best practices

- Use columns with unique or clearly matching values whenever possible (such as IDs, emails, or names).
- Avoid using columns with a lot of `null` or empty values, as this may lead to missing matches or unexpected results.
- Make sure the data types of the columns you join on are compatible (e.g., both columns should be text, or both should be numbers).


## Examples

Let's say you have two datasets:

Dataset A
| Region ID | Name | Sales |
|  --- | --- | --- |
| 1 | Alice | 100 |
| 2 | Bob | 150 |
| 3 | Charlie | 200 |


Dataset B
| Region ID | Region |
|  --- | --- |
| 2 | East |
| 3 | West |
| 4 | South |


Before merging, you define a **join condition** that matches the `Region ID` column from the first dataset with the `Region ID` column from the second dataset. This ensures that rows with the same `Region ID` value are matched together during the merge.

Depending on the **join operator** you select (Left outer, Inner, Right outer, or Full outer), the resulting merged dataset will look different. See the examples for each join operator below.

### Left outer

Includes all rows from Dataset A, with matching Region values from Dataset B. Rows without a match in B have `null` values for Region.

| Region ID | Name | Sales | Region |
|  --- | --- | --- | --- |
| 1 | Alice | 100 | `null` |
| 2 | Bob | 150 | East |
| 3 | Charlie | 200 | West |


### Inner

Includes only rows where the Region ID exists in both datasets.

| Region ID | Name | Sales | Region |
|  --- | --- | --- | --- |
| 2 | Bob | 150 | East |
| 3 | Charlie | 200 | West |


### Right outer

Includes all rows from Dataset B, with matching Name and Sales values from Dataset A. Rows without a match in A have `null` values for Name and Sales.

| Region ID | Name | Sales | Region |
|  --- | --- | --- | --- |
| 2 | Bob | 150 | East |
| 3 | Charlie | 200 | West |
| 4 | `null` | `null` | South |


### Full outer

Includes all rows from both datasets. Missing values are shown as `null`.

| Region ID | Name | Sales | Region |
|  --- | --- | --- | --- |
| 1 | Alice | 100 | `null` |
| 2 | Bob | 150 | East |
| 3 | Charlie | 200 | West |
| 4 | `null` | `null` | South |


Yes, you can merge datasets from different integrations as long as they have compatible join columns (e.g., matching IDs, names, or other common fields).

Yes. Once you create and save a merged dataset, it becomes available alongside your other datasets. You can select it as an input when setting up a new merge, allowing you to build more complex datasets in multiple steps if needed.

If there are duplicate column names between datasets, Databox will automatically rename the columns during the merge to avoid conflicts (e.g., "Sales" and "Sales (2)").

No. Merging datasets creates a new, separate dataset. Your original datasets remain unchanged.

 
Ask Genie
Get instant answers or help with your data using the in-app AI assistant.

Talk to an expert
For customers: Get help with your setup, strategy, or making the most of Databox.

Book a demo
New to Databox? See how it works and get guidance on getting started.

Send an email
Reach out to support for help with your account, data, or technical issues.