# Add a calculated column to a dataset

Calculated columns let you create new values in your dataset by applying custom logic or transformations to your existing data. This is especially useful when you need to:

- Normalize or reformat values for comparison or visualization
- Combine multiple columns into a single result
- Apply conditional logic (e.g., flagging results based on thresholds)
- Perform mathematical operations that aren't available by default


Instead of modifying your source data, calculated columns give you a flexible way to enrich it — right inside the dataset editor.

## Add a calculated column to a dataset

1. Navigate to the **Datasets** page and find the dataset you want to update.
2. Click the **down arrow** (![chevron-down](/assets/chevron-down.dfd0e90adc6eb576be32262463090aae7c7f5408ed7975e3795f69775e36d174.419bb737.svg)) next to the dataset name and select **Edit data**.
3. In the dataset editing mode, click the **+ bar** at the far right of the table to open the formula input field above the dataset.
4. Enter your formula using supported functions, operators, column references, and values.
5. Click ![play](/assets/play-fill.35a7138bd1b337b171af4171a9e930d7589c283ae4e083fa01002f46c98bcebe.419bb737.svg) **Run** to preview how the new column will look across all rows.
6. If everything looks good, click **Save** in the top right corner to apply your changes.


add-a-calculated-column-to-a-dataset
### Helpful shortcuts

To speed up formula building and reduce the chance of errors, use the following keyboard shortcuts:

- **Ctrl + Space** — Opens a list of all supported functions you can use in formulas.
- **$** — Displays a dropdown of available columns you can insert as parameters.


These shortcuts make it easier to build accurate formulas without needing to memorize every function or column name.

### Formula validation and supported inputs

The formula field only accepts specific input types and supports formulas up to **10,000 characters**. If your formula contains unsupported elements or syntax errors, an **error message will be displayed** and you'll need to correct the formula before you can run or save it.

Supported inputs include:

1. **Functions**
2. **Operators**
3. **Numeric values*** (e.g., `10`, `3.14`)
4. **Text values**, enclosed in **single quotes** (e.g., `'Completed'`)
5. **Column references**, using the **$** symbol (e.g., `$Status`), and double quotes (`$"Project-Name"`) if the column name includes spaces or special characters


** In dataset columns, logical values are automatically cast as numbers — `true` becomes `1`, and `false` becomes `0`*

Any input outside of these types will be flagged as invalid.

Certain characters — such as parentheses, commas, or symbols — are reserved and may cause issues when parsing or validating your formula. To avoid errors, make sure your **column names don’t include special characters**, or **rename columns** before referencing them in a formula.

To help you get started, all supported [operators](#operators) and [functions](#functions) are organized by category below.

## Operators

Operators perform basic operations on values or expressions. You can think of them as the "glue" that connects, compares, or transforms values within formulas.

Below is a quick overview of the different types of operators. Click a category below to jump to its full explanation.

arithmetic-operators
**Arithmetic Operators**

comparison-operators
**Comparison Operators**

logical-and-conditional-operators
**Logical and Conditional Operators**

### Arithmetic Operators

These are basic math symbols used to calculate values.

![bulb](/assets/bulb.15e2e40f464cc9812981ecfefac6f4df74e56779ba2070871cdde55422d09b97.419bb737.svg) *If formulas are recipes, numeric operators are the math utensils you use to prepare your ingredients.*

| Operator | Description | Example | Result |
|  --- | --- | --- | --- |
| `+` | Addition | `2 + 3` | `5` |
| `-` | Subtraction | `2 - 3` | `-1` |
| `*` | Multiplication | `2 * 3` | `6` |
| `/` | Division | `5 / 2` | `2.5` |
| `//` | Integer Division (quotient) | `5 // 2` | `2` |
| `%` | Modulo (remainder) | `5 % 4` | `1` |
| `^` | Exponent | `3 ^ 4` | `81` |
| `!` | Factorial | `4!` | `24` |


### Comparison Operators

They evaluate the relationship between two values and return `true` or `false`. They are commonly used in formulas to compare numbers, text, or column values.

![bulb](/assets/bulb.15e2e40f464cc9812981ecfefac6f4df74e56779ba2070871cdde55422d09b97.419bb737.svg) *You can think of comparison operators like yes/no questions you ask about your data to evaluate relationships between values.*

| Operator | Description | Example | Result |
|  --- | --- | --- | --- |
| < | Less than | `5 < 10` | `true` |
| > 
 | Greater than | `5 > 10` | `false` |
| <= | Less than or equal to | `5 <= 5` | `true` |
| `>=` | Greater than or equal to | `5 >= 6` | `false` |
| = | Equal | `5 = 5` | `true` |
| != | Not equal | `5 != 3` | `true` |


### Logical and Conditional Operators

These operators evaluate conditions and return true/false or determine outcomes based on logic.

- **Logical operators** like `AND`, `OR`, and `NOT` help you build complex conditions by combining simpler ones.
- **Conditional operators** like `CASE` or `IS NULL` allow you to return specific values depending on which conditions are met.


![bulb](/assets/bulb.15e2e40f464cc9812981ecfefac6f4df74e56779ba2070871cdde55422d09b97.419bb737.svg) *You can think of logical operators as the building blocks of a test, and conditional expressions as the final decision maker.*

| Operator | Description |
|  --- | --- |
| `AND` | Returns `true` if **both** conditions are true. Example: `(5 > 3) AND (2 < 4)` → `true` |
| `CASE WHEN ... THEN ... [ELSE ...] END` | Returns the result corresponding to the first true condition. If no conditions are true, returns the `ELSE` result (if provided). Example: `CASE WHEN 5 > 10 THEN 'High' WHEN 5 > 3 THEN 'Medium' ELSE 'Low' END` → `'Medium'` |
| `IS NOT NULL` | Returns `true` if the value is **not** null. Example: `'abc' IS NOT NULL` → `true` |
| `IS NULL` | Returns `true` if the value is null. Example: `NULL IS NULL` → `true` |
| `NOT` | Returns `true` if the condition is false, and `false` if the condition is true. Example: `NOT(5 > 3)` → `false` |
| `OR` | Returns `true` if **either** condition is true. Example: `(5 > 3) OR (2 > 4)` → `true` |


## Functions

Functions perform more advanced, often reusable operations. They take one or more inputs (called arguments) and return a calculated result.

Below is a categorized overview of available functions. Click a category below to jump to its detailed section.

aggregate-functions
**Aggregate Functions**

date-and-time-functions
**Date and Time Functions**

numeric-functions
**Numeric Functions**

text-functions
**Text Functions**

utility-functions
**Utility Functions**

window-functions
**Window Functions**

### Aggregate Functions

These perform calculations across multiple rows in a dataset, such as summing values or finding an average.

![bulb](/assets/bulb.15e2e40f464cc9812981ecfefac6f4df74e56779ba2070871cdde55422d09b97.419bb737.svg) *You can think of aggregate functions as "summary tools" — they look at all your data and give you a big-picture number.*

| **Syntax** | **Description** |
|  --- | --- |
| `AVG(expression)` | Calculates the average (arithmetic mean) of all non-null values in `expression` across the dataset. Example: `AVG([5, 10, 15])` → `10` |
| `COUNT(expression)` | Counts the number of non-null values in `expression` across all rows. If `expression` is omitted, counts all rows. Example: `COUNT(['A', 'B', null])` → `2` |
| `COUNTDISTINCT(expression)` | Counts the number of unique (distinct) non-null values in `expression` across all rows. Example: `COUNTDISTINCT(['A', 'B', 'A'])` → `2` |
| `MAX(expression)` | Returns the maximum value from `expression` across all rows. Example: `MAX([10, 20, 5])` → `20` |
| `MEDIAN(expression)` | Returns the median value from `expression` across all rows. Example: `MEDIAN([1, 3, 5])` → `3` |
| `MIN(expression)` | Returns the minimum value from `expression` across all rows. Example: `MIN([10, 20, 5])` → `5` |
| `STDEV(expression)` | Returns the sample standard deviation of values in `expression` across all rows. Example: `STDEV([4, 6, 8])` → `2` |
| `STDEVP(expression)` | Returns the population standard deviation of values in `expression` across all rows. Example: `STDEVP([4, 6, 8])` → `1.63` *(approx.)* |
| `SUM(expression)` | Returns the sum of all values in `expression` across the dataset. Example: `SUM([5, 10, 15])` → `30` |
| `VARIANCE(expression)` | Returns the sample variance of values in `expression` across all rows. Example: `VARIANCE([4, 6, 8])` → `4` |
| `VARIANCEP(expression)` | Returns the population variance of values in `expression` across all rows. Example: `VARIANCEP([4, 6, 8])` → `2.67` *(approx.)* |


### Date and Time Functions

Used to extract parts of a date, perform date math, or return the current time.

![bulb](/assets/bulb.15e2e40f464cc9812981ecfefac6f4df74e56779ba2070871cdde55422d09b97.419bb737.svg) *These are your "calendar helpers" — they help you understand when things happen or calculate time-based differences.*

| **Syntax** | **Description** |
|  --- | --- |
| `DATEADD(date, number, interval)` | Returns a new date by adding the specified `number` of `interval` units (e.g., 'day', 'month') to the given `date`. Example: `DATEADD('2025-04-01', 7, 'day')` → `'2025-04-08'` |
| `DATEDIFF(start_date, end_date, unit)` | Returns the number of `unit` intervals (e.g., 'day', 'month') between `start_date` and `end_date`. Example: `DATEDIFF('2025-01-01', '2025-01-10', 'day')` → `9` |
| `DATEFORMAT(date, format)` | Returns a text string representing the specified `date` in the given [format](https://duckdb.org/docs/stable/sql/functions/dateformat.html#format-specifiers). Example: `DATEFORMAT('2025-04-15', '%b %-d, %Y')` → `Apr 15, 2025` |
| `DAY(date)` | Returns the day of the month (1–31) from the given `date`. Example: `DAY('2025-04-15')` → `15` |
| `HOUR(date_time)` | Returns the hour (0–23) from the given `date_time`. Example: `HOUR('2025-04-15T14:30:00Z')` → `14` |
| `MINUTE(date_time)` | Returns the minute (0–59) from the given `date_time`. Example: `MINUTE('2025-04-15T14:30:00Z')` → `30` |
| `MONTH(date)` | Returns the month (1–12) from the given `date`. Example: `MONTH('2025-04-15')` → `4` |
| `NOW()` | Returns the current date and time in Coordinated Universal Time (UTC). Example: `NOW()` → `'2025-04-23T10:42:00Z'` |
| `QUARTER(date)` | Returns the quarter (1–4) of the year for the given `date`. Example: `QUARTER('2025-04-15')` → `2` |
| `SECOND(date_time)` | Returns the seconds (0–59) from the given `date_time`. Example: `SECOND('2025-04-15T14:30:45Z')` → `45` |
| `TODAY()` | Returns the current date with time set to `00:00:00` in Coordinated Universal Time (UTC). Example: `TODAY()` → `'2025-04-23'` |
| `WEEK(date)` | Returns the ISO week number (1–53) for the given `date`. Example: `WEEK('2025-01-01')` → `1` |
| `WEEKDAY(date)` | Returns the day of the week (0 = Sunday, 6 = Saturday) for the given `date`. Example: `WEEKDAY('2025-04-21')` → `1` |
| `YEAR(date)` | Returns the year from the given `date`. Example: `YEAR('2025-04-15')` → `2025` |


### Numeric Functions

These apply mathematical formulas to one or more values.

![bulb](/assets/bulb.15e2e40f464cc9812981ecfefac6f4df74e56779ba2070871cdde55422d09b97.419bb737.svg) *Numeric functions are like a calculator inside your dataset — they help you transform and analyze numerical values.*

| **Syntax** | **Description** |
|  --- | --- |
| `ABS(number)` | Returns the absolute (non-negative) value of `number`. Example: `ABS(-5)` → `5` |
| `CEILING(number)` | Rounds `number` up to the nearest integer. Example: `CEILING(4.2)` → `5` |
| `FLOOR(number)` | Rounds `number` down to the nearest integer. Example: `FLOOR(4.8)` → `4` |
| `MOD(dividend, divisor)` | Returns the remainder after dividing `dividend` by `divisor`. Example: `MOD(10, 3)` → `1` |
| `POWER(number, exponent)` | Returns `number` raised to the power of `exponent`. Example: `POWER(2, 3)` → `8` |
| `SQRT(number)` | Returns the square root of `number`. Example: `SQRT(9)` → `3` |


### Text Functions

Used to manipulate, transform, or extract information from text strings.

![bulb](/assets/bulb.15e2e40f464cc9812981ecfefac6f4df74e56779ba2070871cdde55422d09b97.419bb737.svg) *These are your "text editors" — perfect for cleaning up or formatting names, tags, and other string data.*

| **Syntax** | **Description** |
|  --- | --- |
| `CONCAT(value1, value2, ...)` | Joins multiple values into a single string. Example: `CONCAT('Data', 'box')` → `Databox` |
| `CONTAINS(string, substring)` | Returns `1` if the `string` contains the specified `substring`; otherwise, returns `0`. Example: `CONTAINS('dashboard', 'board')` → `1` |
| `ENDSWITH(string, substring)` | Returns `1` if the `string` ends with `substring`; otherwise, returns `0`. Example: `ENDSWITH('metrics.csv', '.csv')` → `1` |
| `FIND(string, substring)` | Returns the index of the first occurrence of `substring` in `string`. Returns `0` if not found. Example: `FIND('dashboard', 'board')` → `5` |
| `JSONEXTRACT(json_string, path)` | Returns the fragment located at the specified [JSONPath](https://jsonpath.com/) in the `json_string`. Example: `JSONEXTRACT('{"user":{"id":123,"name":"Alice"}}', '$.user.id')` → `123` |
| `LEFT(string, num_chars)` | Returns the first `num_chars` characters from the `string`. Example: `LEFT('Databox', 4)` → `Data` |
| `LENGTH(string)` | Returns the number of characters in the `string`. Example: `LENGTH('Databox')` → `7` |
| `LOWER(string)` | Converts all characters in the `string` to lowercase. Example: `LOWER('Databox')` → `databox` |
| `REGEXEXTRACT(string, pattern, [group])` | Returns the specified match `group` (0-indexed) from the `string`, based on the regular expression `pattern`. If `group` is omitted, it defaults to `0`. Example: `REGEXEXTRACT('databox.com', '([a-z]+)\.com')` → `databox` |
| `REGEXREPLACE(string, pattern, new_substring)` | Returns `string` with all text matching the regular expression `pattern` replaced by `new_substring`. Example: `REGEXREPLACE('abc123def', '([0-9]+)', 'X')` → `abcXdef` |
| `REPLACE(string, old_substring, new_substring)` | Replaces all occurrences of `old_substring` in the `string` with `new_substring`. Example: `REPLACE('2025-05-01', '-', '/')` → `2025/05/01` |
| `RIGHT(string, num_chars)` | Returns the last `num_chars` characters from the `string`. Example: `RIGHT('Databox', 3)` → `box` |
| `SPLITROWS(string, separator)` | Returns one row for each value in the `string`, split by the specified `separator`. Example: `SPLITROWS('a;b;c', ';')` → `['a', 'b', 'c']` |
| `STARTSWITH(string, substring)` | Returns `1` if the `string` starts with `substring`; otherwise, returns `0`. Example: `STARTSWITH('dashboard', 'dash')` → `1` |
| `SUBSTRING(string, start, [length])` | Extracts a substring from the `string` starting at the `start` index, with an optional `length`. Example: `SUBSTRING('Databox', 5, 3)` → `box` |
| `TRIM(string)` | Removes whitespace from the beginning and end of the `string`. Example: `TRIM(' Databox ')` → `Databox` |
| `UPPER(string)` | Converts all characters in the `string` to uppercase. Example: `UPPER('Databox')` → `DATABOX` |


### Utility Functions

Versatile functions for comparisons, conditionals, and handling nulls or arrays.

![bulb](/assets/bulb.15e2e40f464cc9812981ecfefac6f4df74e56779ba2070871cdde55422d09b97.419bb737.svg) *Utility functions are the "everyday helpers" — they let you apply simple if-then logic, set fallbacks for nulls, compare values to find minimums or maximums, or expand arrays into individual rows.*

| **Syntax** | **Description** |
|  --- | --- |
| `COALESCE(value1, value2, ...)` | Returns the first non-null value from the list. If all values are `NULL`, the result is `NULL`. Example: `COALESCE(NULL, 'default', NULL, 'other')` → `default` |
| `GREATEST(value1, value2, ...)` | Returns the largest value from a list of numbers, strings, or other comparable values. `NULL` values are ignored unless all values are `NULL`, in which case the result is `NULL`. Example: `GREATEST(10, 5, 20)` → `20` |
| `IF(condition, true_value, false_value)` | Returns `true_value` if `condition` is true; otherwise, returns `false_value`. Example: `IF(5 > 3, 'Yes', 'No')` → `Yes` |
| `IFNULL(expression, value_if_null)` | Returns `expression` if it is not `NULL`; otherwise, returns `value_if_null`. Example: `IFNULL(null, 'N/A')` → `N/A` |
| `LEAST(value1, value2, ...)` | Returns the smallest value from a list of numbers, text values, or other comparable values. `NULL` values are ignored unless all values are `NULL`, in which case the result is `NULL`. Example: `LEAST(10, 5, 20)` → `5` |
| `UNNEST(array_expression)` | Returns each element of `array_expression`as a separate row. Often paired with `JSONEXTRACT` to expand extracted arrays into individual rows. Example: `UNNEST('["tech","science","art"]')` → `['tech','science','art']` |


### Window Functions

Functions for row numbering, ranking, and accessing values across partitions.

![bulb](/assets/bulb.15e2e40f464cc9812981ecfefac6f4df74e56779ba2070871cdde55422d09b97.419bb737.svg) *Window functions are the "look across rows" tools — they let you assign sequence numbers, compare values within groups, or pull data from preceding or following rows.*

| **Syntax** | **Description** |
|  --- | --- |
| `FIRSTVALUE(column_expression, order_columns, [partition_columns])` | Returns the first value of the specified column or expression, based on the defined sort order. If `partition_columns` columns are provided, returns the first value within each group. Sort direction defaults to ascending (ASC) if not specified. Example: `FIRSTVALUE($customer_email, '$order_date, $total_amount DESC', '$store_id, $customer_id')` → Returns the first email address associated with each customer in each store, based on order date (earliest first) and total amount (highest first). |
| `LAG(column_expression, order_columns, [partition_columns], [offset], [default_value])` | Returns a value from a previous row in the sorted data. If `partition_columns` are provided, looks back within each group. If `offset` is not specified, returns the value from the previous row. If no value exists (for example, it's the first row in the group), returns `NULL` or the `default_value`. Sort direction defaults to ascending (ASC) if not specified. Example: `LAG($balance, '$transaction_date', '$account_id', 1, 0)`→ Returns the previous balance for each account, based on transaction date. If it's the first transaction, returns `0` |
| `LASTVALUE(column_expression, order_columns, [partition_columns])` | Returns the last value of the specified column or expression, based on the defined sort order. If `partition_columns` are provided, returns the last value within each group. Sort direction defaults to ascending (ASC) if not specified. Example: `LASTVALUE($customer_email, '$order_date, $total_amount DESC', '$store_id, $customer_id')`→ Returns the last email address associated with each customer in each store, based on order date (earliest first) and total amount (highest first). |
| `LEAD(column_expression, order_columns, [partition_columns], [offset], [default_value])` | Returns the value of a column or expression from a later row in the sorted result. If `partition_columns` are provided, looks ahead within each group. If no `offset` is specified, the function defaults to the next row. If no value exists (e.g. it's the last row), returns `NULL` or the specified `default_value`. Sort direction defaults to ascending (ASC) if not specified. Example: `LEAD($order_date, '$order_date ASC', '$customer_id', 1, NOW())` → Returns the date of the next order for each customer. If no future order exists, returns the current date and time. |
| `ROWNUMBER([order_columns], [partition_columns])` | Assigns a sequential number starting from 1 to each row. If `order_columns` are provided, rows are numbered based on the specified sort order. If `partition_columns` are provided, numbering restarts within each group. Sort direction defaults to ascending (ASC) if not specified. Example: `ROWNUMBER('$order_date, $total_amount DESC', '$store_id, $customer_id')` → Sorts orders by date (ascending) and total amount (descending), and assigns a row number for each customer in each store, starting from 1. |


Yes. You can include as many column references as needed by using the **`$`** symbol before each column name (e.g., `$Revenue - $Cost`).

Absolutely. Once saved, calculated columns behave like any other column in your dataset and can be used in metric creation.

Yes. The calculated column is dynamic — it will recalculate automatically whenever the underlying dataset is refreshed.

 

Ask Genie
Get instant answers or help with your data using the in-app AI assistant.

Talk to an expert
For customers: Get help with your setup, strategy, or making the most of Databox.

Book a demo
New to Databox? See how it works and get guidance on getting started.

Send an email
Reach out to support for help with your account, data, or technical issues.