- read

Testable BigQuery SQL

Tufin 30

We are currently developing a solution that uses Google’s BigQuery for data analytics.

As part of this solution which is written in Golang, we need to perform some complex SQL queries (hundreds of lines each). These queries implement business-logic which is fundamental to our solution, and, as an integral part of our code, and they undergo a standard development cycle of experimentation and on-going enhancement.

As the queries evolved, we started running into bugs which were time-consuming to debug and fix, and we realized it is impossible to continue without proper testing, just like any other part of the code.

We posted this question about Testable SQL for BigQuery on reddit were you can see some of the solutions that people proposed, but we decided to publish our own solution (diversity is good, right?).

So we created Espresso, a framework for writing testable BigQuery queries. The idea is to compose complex queries from simpler ones and to be able to write unit tests for each component and for their combinations. The tests can be run against data-as-code (data which is specified as part of the code and controlled by git) or data from the database.

We compose SQL queries by injecting them into each other, for example:

WITH base AS (
SELECT
"orange" AS fruit
UNION ALL
SELECT
"apple"
)
SELECT fruit FROM base

Can be written as:

{{ define "fruit" }}WITH base AS (
{{ .Base }}
)
SELECT fruit FROM base
{{ end }}

Where the sub-query Base is:

{{ define "base" }}SELECT
"orange" AS fruit
UNION ALL
SELECT
"apple"
{{ end }}

Note: we are using Golang template notation to compose the queries.

And now we write a query definition using YAML:

Name: fruit
Requires:
- Base
Tests:
Test1:
Args:
- Name: Base
Source: base
Result:
Source: fruit_result

This tells espresso to load the fruit query and inject the base query into it. The query is then run and the result is compared to the expected result: fruit_result, which, in this simplified case, is identical to Base.

The tests can be embedded into Golang standard unit tests:

//go:embed queries/fruitvar templates embed.FSfunc TestEspressoShot_Embed(t *testing.T) { queryValues, resultValues, err := shot.NewShotWithClient(env.GetGCPProjectID(), "", templates).RunTest("fruit", "Test1", []bigquery.QueryParameter{}, &map[string]bigquery.Value{}) require.NoError(t, err)
require.ElementsMatch(t, queryValues, resultValues)
}

Or run from the command-line:

./espresso -dir="./shot/queries/fruit/" -query="fruit" -test="Test1"

We released espresso as an open-source project and we look forward to your feedback and contributions.