![iceberg-logo](https://www.apache.org/logos/res/iceberg/iceberg.png)

### [Docker, Spark, and Iceberg: The Fastest Way to Try Iceberg!](https://tabular.io/blog/docker-spark-and-iceberg/)

In [None]:
from pyiceberg import __version__

__version__

# Write support

This notebook demonstrates writing to Iceberg tables using PyIceberg. First, connect to the [catalog](https://iceberg.apache.org/concepts/catalog/#iceberg-catalogs), the place where tables are being tracked.

In [None]:
from pyiceberg.catalog import load_catalog

catalog = load_catalog('default')

# Create an Iceberg table

Next create the Iceberg table directly from the `pyarrow.Table`.

In [None]:
table_name = "default.commits"

try:
    # In case the table already exists
    catalog.drop_table(table_name)
except:
    pass

from pyiceberg.schema import Schema, NestedField, StringType, LongType

schema = Schema(
    NestedField(1, "id", LongType(), True),
    NestedField(2, "name", StringType(), True),
    NestedField(3, "state", StringType(), True),
    NestedField(4, "additions", LongType(), True),
    NestedField(5, "deletes", LongType(), True),
    identifier_field_ids=[1]
)

table = catalog.create_table(table_name, schema=schema)

table

# Loading data using Arrow

Create an example PyArrow table that mimics data from the GitHub API.

In [None]:
import pyarrow as pa

from pyiceberg.io.pyarrow import schema_to_pyarrow

pa_schema = schema_to_pyarrow(schema)

df = pa.Table.from_pylist(
    [
        {"id": 123, "name": "Fix bug", "state": "Open", "additions": 22, "deletes": 10},
        {"id": 234, "name": "Add VariantType", "state": "Open", "additions": 29123, "deletes": 302},
        {"id": 345, "name": "Add commit retries", "state": "Open", "additions": 22, "deletes": 10},
    ],
    schema=pa_schema
)

df

# Write the data

Let's append the data to the table:

In [None]:
table.append(df)

assert len(table.scan().to_arrow()) == len(df)

table.scan().to_pandas()

In [None]:
table.inspect.snapshots().to_pandas()

# Add moar data

In [None]:
table.append(pa.Table.from_pylist(
    [
        {"id": 456, "name": "Add NanosecondTimestamps", "state": "Merged", "additions": 2392, "deletes": 8},
        {"id": 567, "name": "Add documentation around filters", "state": "Open", "additions": 7543, "deletes": 3},
    ],
    schema=pa_schema
))

table.scan().to_pandas()

In [None]:
table.inspect.snapshots().to_pandas()

# Upsert new data

In [None]:
table.upsert(pa.Table.from_pylist(
    [
        # Nothing changes: No-op
        {"id": 456, "name": "Add NanosecondTimestamps", "state": "Merged", "additions": 2392, "deletes": 8},

        # Updated, nc
        {"id": 567, "name": "Add documentation around filters", "state": "Merged", "additions": 9238, "deletes": 22},
    ],
    schema=pa_schema
))

table.scan().to_pandas()