# Integrate Amazon Redshift with Databox

Amazon Redshift is a fully managed cloud data warehouse from AWS designed for large-scale analytics workloads. It uses columnar storage, parallel query execution, and automatic compression to deliver fast query performance over structured data at scale. Redshift integrates natively with the broader AWS ecosystem and supports standard SQL, making it a common choice for centralizing and querying business data from multiple sources. Connecting Amazon Redshift to Databox lets you pull data directly from your warehouse, build custom metrics using SQL queries, and visualize results alongside data from your other connected tools.

## Connection

If you've already established a connection, you can [reuse](/add-a-data-source) it to add new data sources to your Databox account.

### Step 1: Create a read-only Redshift user for Databox

Databox only reads data from your warehouse — it never writes to it. Create a dedicated Redshift user with `SELECT`-only privileges on the schemas and tables you want to expose. You can run these commands using a SQL client connected to your Redshift cluster, or via the **Query editor** in the AWS Management Console.


```sql
CREATE USER databox PASSWORD 'your_secure_password';
GRANT CONNECT ON DATABASE your_database TO databox;
GRANT USAGE ON SCHEMA public TO databox;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO databox;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO databox;
```

Replace `your_database` with the name of the database you want to connect, and `your_secure_password` with a strong password. If you need to expose tables in additional schemas, repeat the `GRANT USAGE` and `GRANT SELECT` statements for each schema.

Avoid the following special characters in your Redshift password, as they can cause encoding issues when establishing the connection: ```, `'`, `"`, `/`, `\`, and spaces.

### Step 2: Enable public accessibility on your Redshift cluster

By default, Redshift clusters are not accessible from outside their VPC. To allow Databox to reach your cluster, it must be configured as publicly accessible.

1. In the [AWS Management Console](https://console.aws.amazon.com/redshift/), go to **Clusters** and select your cluster.
2. Click **Actions > Modify**.
3. Under **Network and security**, set **Publicly accessible** to **Yes**.
4. Click **Save changes**.


For the cluster to be reachable after enabling public access, its VPC subnet must have an internet gateway attached. If connectivity still fails, verify that a route to `0.0.0.0/0` via an internet gateway exists in the associated route table.

### Step 3: Add the Databox IP to your VPC security group

Redshift controls network access through VPC security groups. Add an inbound rule that permits TCP traffic on port `5439` from the Databox IP address.

1. In the [AWS Management Console](https://console.aws.amazon.com/redshift/), go to **Clusters** and select your cluster.
2. Under the **Properties** tab, click the link to the **VPC security group** associated with the cluster.
3. Select the security group, then click **Edit inbound rules**.
4. Click **Add rule** and fill in:
  - **Type**: Redshift
  - **Protocol**: TCP
  - **Port range**: 5439
  - **Source**: Custom — enter `52.4.198.118/32`
5. Click **Save rules**.


### Step 4: Enter your Redshift connection details in Databox

1. In Databox, go to **Data Sources > + New connection**.
2. Search for **Amazon Redshift** and click **Connect**.
3. Fill in the connection form:
  - **Data source name** — a label for this connection in Databox.
  - **Host** — your Redshift cluster endpoint, found on the cluster's **General information** tab in the AWS Console under **Endpoint** (e.g., `my-cluster.abc123.us-east-1.redshift.amazonaws.com`). Do not include the port number that may appear at the end of the endpoint string.
  - **Port** — the port your Redshift cluster listens on. The default is `5439`.
  - **Username** — the Redshift username created in [Step 1](#step-1-create-a-read-only-redshift-user-for-databox).
  - **Password** — the password for that user.
  - **Database name** (optional) — the specific database to connect to. Leave blank to connect at the server level.
  - **Timezone** — the time zone used to interpret date values in query results. Defaults to `Etc/UTC`.
4. Toggle **Use SSL/TLS** to enable encrypted connections.
5. Click **Connect**.


connect
## Datasets

The Amazon Redshift integration supports the creation of [datasets](/understanding-datasets), which allow you to define and shape the specific data you want to use for reporting in Databox. Datasets make it easier to focus on the most relevant information, enabling you to filter, visualize, and analyze metrics across projects, teams, and clients without writing complex queries each time.

### Steps to create a dataset

1. **Select a table**: Pick the appropriate schema within that database.
2. **Select columns**: Browse and select the specific columns (fields) from your tables or views to include in your dataset. These columns define the structure and content of your dataset.


### Optional: Write SQL

For more advanced use cases, you can write a **custom SQL query** instead of selecting columns manually. This allows you to:

- Join multiple tables
- Apply filters and aggregations
- Format or transform data before importing it into Databox


Your query must return a valid tabular result to be used as a dataset.

### Additional resources

- [Amazon Redshift documentation](https://docs.aws.amazon.com/redshift/) — Official AWS docs for Amazon Redshift, covering cluster creation, connectivity, user management, security, backups, and query optimization.
- [Amazon Redshift database developer guide](https://docs.aws.amazon.com/redshift/latest/dg/welcome.html) — SQL reference, user and group management, and schema design guidance for Redshift databases.


## Resources

For comprehensive details on metrics, data availability, templates, specifications, usage guidelines, and other key information, refer to the resources listed below.

No. Databox connects using standard database username and password credentials. IAM authentication is not currently supported.

Check the following in order:

1. **Publicly accessible** is set to **Yes** on the cluster (under **Actions > Modify > Network and security**).
2. The **VPC security group** has an inbound rule allowing TCP on port `5439` from `52.4.198.118/32`.
3. The VPC subnet has a route to an internet gateway (`0.0.0.0/0`).
4. The Redshift user was created and has `CONNECT` and `SELECT` privileges on the target database and schema.


In the AWS Management Console, go to **Redshift > Clusters**, select your cluster, and open the **General information** tab. The endpoint is listed under **Endpoint** and follows the format `cluster-name.xxxxxxxxxxxx.region.redshift.amazonaws.com:5439`. Enter only the hostname part (without `:5439`) in the **Host** field in Databox, and enter `5439` separately in the **Port** field.

 

Ask Genie
Get instant answers or help with your data using the in-app AI assistant.

Talk to an expert
For customers: Get help with your setup, strategy, or making the most of Databox.

Book a demo
New to Databox? See how it works and get guidance on getting started.

Send an email
Reach out to support for help with your account, data, or technical issues.