TL;DR use this Snowflake Quickstart to explore Iceberg tables https://quickstarts.snowflake.com/guide/getting_started_iceberg_tables
Let’s try to understand these Iceberg tables, as developments around data-lake table formats can be take some effort to understand. For me, this means learning by doing.
All links are added below for explanation. Some oneliners from those resources:
Iceberg is a high-performance format for huge analytic tables. Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to safely work with the same tables, at the same time.
TLDR: The Iceberg format is a logical table with underlying data stored in columnar formats on cloud object storage. When working with Iceberg tables, you want to ensure best practices such as choosing the right partitioning scheme, compacting small files, managing data retention, and managing schema evolution.
Second, now that we understand what it is, let’s work on some Iceberg tables. I am using the Snowflake quickstart here.
We need to do the AWS setup for the external volume. For now, I just create the AWS stuff manually. We need to use the same region for AWS and Snowflake.