# Flink + Iceberg Local Development Setup

## Objectives
- Configure PyFlink for local development with Iceberg integration
- Establish connection to remote Iceberg services from local environment
- Validate cross-environment connectivity between local Flink and containerized services
- Demonstrate local development workflow for Flink-Iceberg applications

In [1]:
import os
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.table import StreamTableEnvironment

# Set AWS environment variables for local development
os.environ['AWS_REGION'] = 'us-east-1'
os.environ['AWS_ACCESS_KEY_ID'] = 'admin'
os.environ['AWS_SECRET_ACCESS_KEY'] = 'password'

print("✅ AWS environment variables set for local development")

✅ AWS environment variables set for local development


In [2]:
env = StreamExecutionEnvironment.get_execution_environment()
table_env = StreamTableEnvironment.create(stream_execution_environment=env) # type: ignore

  import pkg_resources


In [3]:
table_env.execute_sql("SHOW CATALOGS").print()

+-----------------+
|    catalog name |
+-----------------+
| default_catalog |
+-----------------+
1 row in set


In [4]:
catalog_sql = """
CREATE CATALOG IF NOT EXISTS iceberg_catalog WITH (
    'type' = 'iceberg',
    'catalog-type'='rest',
    'uri' = 'http://localhost:8181',
    'warehouse' = 's3://warehouse/',
    'io-impl' = 'org.apache.iceberg.aws.s3.S3FileIO',
    's3.endpoint' = 'http://localhost:9000',
    's3.region' = 'us-east-1',
    's3.access-key-id' = 'admin',
    's3.secret-access-key' = 'password',
    's3.path-style-access' = 'true'
)
"""

table_env.execute_sql(catalog_sql).print()


OK


In [5]:
table_env.execute_sql("SHOW CATALOGS").print()

+-----------------+
|    catalog name |
+-----------------+
| default_catalog |
| iceberg_catalog |
+-----------------+
2 rows in set


In [6]:
# Switch to iceberg catalog and namespace
table_env.use_catalog("iceberg_catalog")
print("✅ Using iceberg_catalog")

# List available databases/namespaces
print("\nAvailable namespaces:")
table_env.execute_sql("SHOW DATABASES").print()


✅ Using iceberg_catalog

Available namespaces:
+---------------+
| database name |
+---------------+
|  play_iceberg |
+---------------+
1 row in set


In [7]:
# Switch to play_iceberg namespace and query users table
table_env.use_database("play_iceberg")
print("✅ Using play_iceberg namespace")

# List tables in current namespace
print("\nAvailable tables:")
table_env.execute_sql("SHOW TABLES").print()

# Query users table
print("\nQuerying users table:")
table_env.execute_sql("SELECT * FROM users").print()


✅ Using play_iceberg namespace

Available tables:
+------------+
| table name |

Querying users table:
+------------+
|      users |
+------------+
1 row in set
2025-07-02 16:27:41,551 INFO  org.apache.hadoop.io.compress.CodecPool                      [] - Got brand-new decompressor [.zstd]
+----+----------------------+--------------------------------+--------------------------------+-----------+--------------+---------------+-------------+----------------------------+
| op |              user_id |                       username |                          email | is_active | created_year | created_month | created_day |                 updated_at |
+----+----------------------+--------------------------------+--------------------------------+-----------+--------------+---------------+-------------+----------------------------+
| +I |                    1 |                       john_doe |           john.doe@example.com |      TRUE |         2025 |             7 |           2 | 2025-07-0