In Data Engineering, there is a process called ELT (Extract, Load, Transform). Data Engineers extract raw data, load it into a system, and transform it into readable, useful data for later use. During the Transform step, there are two kinds of tests: Great Expectations and dbt Tests. Dbt Tests evaluates if the transformation was successful, while Great Expectations measures if the data has quality for proper usage.
Imagine data as water. A data pipeline is like a pumping system, with a filter and all. It brings data from a source to your analytics database. Just as water, you can’t take it unfiltered. Raw data needs to be properly prepared before analysis; it undergoes a transformation. A discussion about Great Expectations vs dbt is all about data quality after the Transform stage.
Just like how water tastes after filtration.
In this article, we will explore a bit more about dbt (Data Build Tool) and how it compares with Great Expectations, especially for those with zero experience with data engineering.
What is dbt and how does it work?
Dbt (or Data Build Tool) is a tool that transforms raw data into clean, organized data ready for data analysts and data scientists.
Let’s say you extracted raw data coming from many sources (a sheet, an app, a sales page, etc.). You can’t use raw data in an analysis or a report. They may be messed, truncated, incomplete, or in different formats.
Dbt organizes that data. For example: If the header of a column in a certain sheet doesn’t have a proper name, dbt gives it a name. If some data has duplicated redundancy, dbt removes that redundant data. If you don’t need certain data (like height and weight from a group of people), dbt removes it.
Here is how it works:
- Data Modelling
A developer writes an instruction in SQL to tell dbt how data should be transformed. For instance, the developer can:
- Get information from many different sheets
- For example: bringing together information from clients with their purchases.
- Clean data
- For example: correcting discrepancies, typos, and standardizing data formats
- Generate new columns in a sheet with the given information
- For example: calculating age using birthdays
- Summarize data
- For example: computing total sales per month or total sales of each product
- Data Testing
Dbt allows you to determine if your data transformation is correct through tests and version control, showing you if the resulting data is reliable. Some examples of tests are: verifying if the ID of a customer is never empty or if all sales values are always positive (you can’t sell a negative number).
- Data Documentation
Dbt generates documentation about every data transformation and data model, making it easier to understand what each sheet or field actually contains.
If you have many other steps for your transform stage, dbt helps you to organize and execute these steps in the order you determine.
In short, dbt allows data engineers to build, test, and deploy data transformation flows as reliably and collaboratively as possible. It brings all good practices of software development (as version control, tests, and documentation) to data transformation.
Example:
What is Great Expectations?
Great Expectations (GX) is an open‑source Python framework that treats data validation like unit tests for your tables, files, or streams.
Think about it as a quality inspector for your data, or as your tongue that tastes whether the water is drinkable after filtration. GE (or GX) doesn’t transform data as dbt does, but it verifies it, checking if data meets your “expectations” — sorry for the pun. In other words, Great Expectations is a great tool for data quality assurance and validation.
Here is how it works.
- Defining how data should be
Just as dbt, you can use SQL to give Great Expectations instructions (here is the setup), although it mainly uses Python. You may ask GE to:
- Check if the e-mail column has only valid e-mail addresses.
- Verify if the age column has valid numbers between 0 to 120.
- Confirm if the country column has only valid countries.
- Set a rule, for example, “no column should have more than 5% empty values”.
- Data Validation
After analyzing your data (after dbt transform or just raw data), Great Expectations generates a report. These reports tell you if your data is valid or not, providing documentation of what data gets it right and on where data is wrong, missing, or unclear.
Sample Expectation: ExpectColumnMaxToBeBetween
Directly from Great Expectations’ website.
Sample data | ||
test | test2 | |
0 | 1 | 1 |
1 | 1.3 | 7 |
2 | .8 | 2.5 |
3 | 2 | 3 |
Passing Case in Python:
ExpectColumnMaxToBeBetween(
column="test",
min_value=1,
max_value=3
)
Returns
{
"exception_info": {
"raised_exception": false,
"exception_traceback": null,
"exception_message": null
},
"result": {
"observed_value": 2.0
},
"meta": {},
"success": true
}
Failing case
ExpectColumnMaxToBeBetween(
column="test2",
min_value=1,
max_value=7,
strict_min=False,
strict_max=True
)
Returns
{
"exception_info": {
"raised_exception": false,
"exception_traceback": null,
"exception_message": null
},
"result": {
"observed_value": 7.0
},
"meta": {},
"success": false
}
Great Expectations vs dbt Side by Side Comparison
The difference between DBT and Great Expectations is that DBT’s main purpose is to transform raw data into better, readable data, while Great Expectations checks the quality of any kind of data, raw or otherwise. Here is a cheat sheet highlighting the main differences between Great Expectations and DBT in terms of their goals, how they work, the main result you get, and when it’s best to use each.
dbt (data build tool) | Great Expectations | |
Main Goal | Transform and Modeling of Data (“T” from ELT) | Quality and Validation of Data |
How it works | Defines how data should be shaped | Defines expectations of how data should be |
Main Result | Data transformed sheets, ready to use | Data quality report |
When using | To clean, organize, bring together, and prepare raw data | To verify if the data (raw or transformed) is correct and reliable |
Why Add Great Expectations When dbt Already Has Tests?
Can you use Great Expectations if you can simply use dbt Tests and get the same verification?. Here is the kicker: No, you can’t. Great Expectations and dbt Tests are completely different in use and scope and the reality is that dbt Tests vs Great Expectations is not an either-or decision but a complementary, full-coverage strategy. Dbt Tests measure the transformation of data, while Great Expectations measures data quality. Running both tools together seals gaps a single SQL-only layer can’t reach, giving you truly bullet-proof, end-to-end data quality.
Dbt is all about transforming data, filtering the water if we’re coming back to the water analogy. Is filtered water always good to drink? If heavy metals like lead or copper taint the water, no filter can help you. In the same vein, you need Great Expectations to know if the transformed data is usable.
dbt Tests Examples
- Not Null – checks if a column has no missing or empty values.
- Unique – verifies if all values in a column are distinct (no duplicates).
- Accepted Values – determines if values in a column belong to an expected set (for instance, if a “status” column contains only “active” or “inactive”).
- Relationships – check that relationships between tables are consistent (useful when you are melding different data sources into the same sheet, like all data about a single client).
Again, these tests determine if the data transformation was successful, not if the data has quality.
What Great Expectations can do that dbt Tests can’t?
Let’s say we want to verify an Age column, guaranteeing no negative age in our data. This should be easy in dbt tests, just using SQL.
select *
from your_table
where age < 0
But what if you want something else? Go beyond SQL limitations!
Imagine you have a dataset with user profiles, including an age column and a birthdate column. You want to make sure the data makes sense logically—that the age column matches the birthdate.
“Verify that each user’s age matches the difference between today’s date and their birthdate (±1 year).”
In other words, if today is May 2025, a person born in 2000 should have an age of around 25. Meanwhile, a person born in 1990 should have an age of around 35, etc.
This will be a “cross-column” validation with a calculated date range. If you write this purely in dbt SQL tests… Is it possible? Certainly, it’s doable, but it gets complicated and hard to program.
However, Great Expectations makes it super easy, barely an inconvenience.
In Great Expectations, you can easily program a custom expectation in Python. It will be easily reusable across datasets. GE can handle this kind of logic gracefully because it allows Python-based, flexible, and conditional logic checks—something dbt tests, which are purely SQL-based, don’t easily support.
Great Expectations vs dbt Tests differences
After covering dbt Tests, here’s a quick comparison: dbt Tests verify that your SQL models run exactly as intended, ensuring transformations meet your specified criteria, whereas Great Expectations evaluates overall data quality, determining whether a dataset is fit for analysis or needs further cleansing. In short, dbt tests measure the reliability of data transformation, answering the question “Was I able to successfully transform data according to my desired parameters?”. Meanwhile, Great Expectations tests if the data has quality, answering the question “Will this data be good enough for data analysis and data science, or should I transform it again?”.
dbt Tests (in dbt Core) | Great Expectations (GX) | |
Primary Purpose | Transformation reliability | Data quality assurance |
Role in Pipeline | Transformation | Before, during, and after transformation |
How Tests are Defined | Declaratively in SQL or YAML within a dbt project. Use simple test names for common checks (e.g. unique, not_null) or write custom SQL for specialized tests. | As “expectations” in code or config. Many expectations are provided (e.g. expect_column_values_to_not_be_null with parameters) so you configure rather than code from scratch. Custom expectations can be written in Python for new rules. |
Built-in Checks | Four core tests included (not null, unique, accepted values, relationships); additional checks require custom tests or community packages. | Dozens of built-in expectation types (nulls, ranges, patterns, cross-field comparisons, etc.) are available by default. Highly extensible with custom Python expectations for any special logic. |
Ease of Use | Easy if you know SQL. | Requires understanding the GE framework. Some Python or scripting knowledge is helpful for custom expectations. Initial setup (configuring data connections, writing expectation suites) is key |
Flexibility | Very limited. It works on data in SQL databases, limited to what you can express in SQL queries | Highly flexible. It works with multiple data backends (SQL, files, big data, etc.). It can also validate nearly any condition or rule (e.g., conditional logic, multi-table comparisons). Use GE for complex validation scenarios that go beyond basic checks. |
Output & Reporting | Test results are shown in console/log output: Pass/Fail status for each test. No automatic rich report (you see which tests failed, and you can inspect failing rows manually if needed). | Generates Data Docs – detailed HTML reports of all expectations and results that are actually readable! 😱 It might also integrate with alerting (e.g., email or Slack notifications) to inform the team. |
Strengths | It has simple integration into the transformation stage; version-controlled with transformation code; leverages existing SQL skills; it catches issues at the point of data creation; promotes low overhead for basic quality checks. | It generates comprehensive data quality coverage and documentation of data requirements. GE is applicable to many types of data and pipelines, and it has a strong community with a growing library of expectations. Finally, it also promotes a shared understanding of data quality across teams. |
Conclusion
In summary, dbt tests and Great Expectations are not mutually exclusive – they serve complementary roles. Using both can give you data reliability in every layer of your data pipeline.
In fact, combining dbt’s testing with Great Expectations’ validation provides a robust way to enforce data quality. You will find issues early in the process, dramatically reducing the chances of “bad data” making it to your analytics. The result is a pipeline that not only delivers data efficiently, but also delivers data you can trust for making decisions.
Frequently Asked Questions
Short answer: Only if your checks are simple and live in SQL.
See Section “What is Great Expectations?” for complex scenarios where GX shines.
No. dbt builds data; GX validates it. They complement each other.
Yes—trigger a GX Python script in a dbt run-operation or via an orchestrator like Airflow.