Understanding Iceberg Support in the Databricks Ecosystem

Sean Lynch
30 September 2024

In many ways, Spark is responsible for the creation and excitement around Iceberg. As a result Spark has the best support for Iceberg ecosystem. And given that Databricks is largely built on and around Spark, it’s also not surprising to find that it has great support for working with Iceberg and Data Lakehouse architectures generally.

Unity Catalog, which now a core part of any Databricks deployment provides a lot of the governance capabilities of the platform, but it also now can natively act as as an Iceberg Catalog. All you need to do is get Iceberg tables within your Databricks workspace. Enter UniForm.

By default in Databricks, data is stored in the Delta Lake table format. This is a competitor to Iceberg. That means your Databricks today is filled with Delta Lake-formatted tables. UniForm can automatically generate the Iceberg metadata tables alongside Delta Lake so you don’t need to convert or create Iceberg-specific tables within Databricks. You just need to opt them into UniForm by setting the following two table properties. At the moment, this needs to set on a per-table basis:


'delta.enableIcebergCompatV2' = 'true'
'delta.universalFormat.enabledFormats' = 'iceberg'

UniForm compatibility isn’t perfect yet. For example, doesn’t work if your previous table had deletion vectors enabled. So enabling for your particular table may require a few additional steps.

UniForm / Unity Catalog doesn’t seem to support Iceberg Views yet. You can set these table properties on a view but SHOW VIEWS returns an Catalog unity does not support views error in Spark.

Once you’ve set that up on your tables, you can now access them directly by using Iceberg Catalog REST API for Unity Catalog (and they launched OAuth support in the time it took me to write this article).

To connect in Spark, create a new REST connection. Databricks doesn’t support Iceberg-Access-Delegation header (yet, looks like it’s coming) so you’ll also need to provide the credentials to the underlying object storage as well (in this case, S3)


"spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
"spark.sql.catalog.unity": "org.apache.iceberg.spark.SparkCatalog",
"spark.sql.catalog.unity.type": "rest",
"spark.sql.catalog.unity.uri": "/api/2.1/unity-catalog/iceberg",
"spark.sql.catalog.unity.token":"",

# Providing S3 credentials in line, each object store will have different credentials
"spark.sql.catalog.unity.io-impl": "org.apache.iceberg.aws.s3.S3FileIO"
"spark.sql.catalog.unity.s3.access-key-id": "",
"spark.sql.catalog.unity.s3.secret-access-key": "",
"spark.driver.extraJavaOptions": "-Daws.region=us-east-1"

Once you’re connected, your Databricks tables and schemas will be mounted under the unity catalog for Spark. Querying is straightforward. Note that writing is not supported at this time.

spark.sql('SELECT * FROM unity.databricks_catalog.my_table;').show()

One interesting quirk: Querying nested namespaces (which a default Databricks unity catalog will have) requires a bit of fancy escaping around the namespaces (the names between the top level spark catalog, and the table/view:

spark.sql('SELECT * FROM unity.`catalog.schema`.my_table;').show()

The It turns out that, Iceberg REST catalog uses the unit separator to indicate separate namespaces. Databricks’ API does not like this and throws an error, likely for very fundamental reasons. Escaping the nested namespaces using backticks will send the . character to the Databricks API raw. It shows that the standard is early but moving really quickly, expect a fix for this issue to roll out soon.

What about Tabular acquisition

If you don’t follow tech industry news, you may not have seen Databrick’s monster acquisition of Tabular, the Iceberg data catalog, back in June 2024.

It’s early days and remains to be seen how Tabular’s own catalog and table format expertise will be applied to Databrick’s similar offerings. Tabular itself is no longer accepting new signups so unless you were already a user, Tabular is probably not an option you need to worry about.

But don’t ignore it completely. Tabular still hosts some of the best Iceberg documentation on the internet today. We’ll keep a close eye on how the Databricks and Tabular teams work together.