Skip to content

Python CLI

Pyiceberg comes with a CLI that's available after installing the pyiceberg package.

You can pass the path to the Catalog using the --uri and --credential argument, but it is recommended to setup a ~/.pyiceberg.yaml config as described in the Catalog section.

  pyiceberg --help
Usage: pyiceberg [OPTIONS] COMMAND [ARGS]...

Options:
  --catalog TEXT
  --verbose BOOLEAN
  --output [text|json]
  --ugi TEXT
  --uri TEXT
  --credential TEXT
  --help                Show this message and exit.

Commands:
  create      Operation to create a namespace.
  describe    Describe a namespace or a table.
  drop        Operations to drop a namespace or table.
  files       List all the files of the table.
  list        List tables or namespaces.
  list-refs   List all the refs in the provided table.
  location    Return the location of the table.
  properties  Properties on tables/namespaces.
  rename      Rename a table.
  schema      Get the schema of the table.
  spec        Return the partition spec of the table.
  uuid        Return the UUID of the table.
  version     Print pyiceberg version.

This example assumes that you have a default catalog set. If you want to load another catalog, for example, the rest example above. Then you need to set --catalog rest.

  pyiceberg list
default
nyc
  pyiceberg list nyc
nyc.taxis
  pyiceberg describe nyc.taxis
Table format version  1
Metadata location     file:/.../nyc.db/taxis/metadata/00000-aa3a3eac-ea08-4255-b890-383a64a94e42.metadata.json
Table UUID            6cdfda33-bfa3-48a7-a09e-7abb462e3460
Last Updated          1661783158061
Partition spec        []
Sort order            []
Current schema        Schema, id=0
├── 1: VendorID: optional long
├── 2: tpep_pickup_datetime: optional timestamptz
├── 3: tpep_dropoff_datetime: optional timestamptz
├── 4: passenger_count: optional double
├── 5: trip_distance: optional double
├── 6: RatecodeID: optional double
├── 7: store_and_fwd_flag: optional string
├── 8: PULocationID: optional long
├── 9: DOLocationID: optional long
├── 10: payment_type: optional long
├── 11: fare_amount: optional double
├── 12: extra: optional double
├── 13: mta_tax: optional double
├── 14: tip_amount: optional double
├── 15: tolls_amount: optional double
├── 16: improvement_surcharge: optional double
├── 17: total_amount: optional double
├── 18: congestion_surcharge: optional double
└── 19: airport_fee: optional double
Current snapshot      Operation.APPEND: id=5937117119577207079, schema_id=0
Snapshots             Snapshots
└── Snapshot 5937117119577207079, schema 0: file:/.../nyc.db/taxis/metadata/snap-5937117119577207079-1-94656c4f-4c66-4600-a4ca-f30377300527.avro
Properties            owner                 root
write.format.default  parquet

Or output in JSON for automation:

  pyiceberg --output json describe nyc.taxis | jq
{
  "identifier": [
    "nyc",
    "taxis"
  ],
  "metadata_location": "file:/.../nyc.db/taxis/metadata/00000-aa3a3eac-ea08-4255-b890-383a64a94e42.metadata.json",
  "metadata": {
    "location": "file:/.../nyc.db/taxis",
    "table-uuid": "6cdfda33-bfa3-48a7-a09e-7abb462e3460",
    "last-updated-ms": 1661783158061,
    "last-column-id": 19,
    "schemas": [
      {
        "type": "struct",
        "fields": [
          {
            "id": 1,
            "name": "VendorID",
            "type": "long",
            "required": false
          },
...
          {
            "id": 19,
            "name": "airport_fee",
            "type": "double",
            "required": false
          }
        ],
        "schema-id": 0,
        "identifier-field-ids": []
      }
    ],
    "current-schema-id": 0,
    "partition-specs": [
      {
        "spec-id": 0,
        "fields": []
      }
    ],
    "default-spec-id": 0,
    "last-partition-id": 999,
    "properties": {
      "owner": "root",
      "write.format.default": "parquet"
    },
    "current-snapshot-id": 5937117119577207000,
    "snapshots": [
      {
        "snapshot-id": 5937117119577207000,
        "timestamp-ms": 1661783158061,
        "manifest-list": "file:/.../nyc.db/taxis/metadata/snap-5937117119577207079-1-94656c4f-4c66-4600-a4ca-f30377300527.avro",
        "summary": {
          "operation": "append",
          "spark.app.id": "local-1661783139151",
          "added-data-files": "1",
          "added-records": "2979431",
          "added-files-size": "46600777",
          "changed-partition-count": "1",
          "total-records": "2979431",
          "total-files-size": "46600777",
          "total-data-files": "1",
          "total-delete-files": "0",
          "total-position-deletes": "0",
          "total-equality-deletes": "0"
        },
        "schema-id": 0
      }
    ],
    "snapshot-log": [
      {
        "snapshot-id": "5937117119577207079",
        "timestamp-ms": 1661783158061
      }
    ],
    "metadata-log": [],
    "sort-orders": [
      {
        "order-id": 0,
        "fields": []
      }
    ],
    "default-sort-order-id": 0,
    "refs": {
      "main": {
        "snapshot-id": 5937117119577207000,
        "type": "branch"
      }
    },
    "format-version": 1,
    "schema": {
      "type": "struct",
      "fields": [
        {
          "id": 1,
          "name": "VendorID",
          "type": "long",
          "required": false
        },
...
        {
          "id": 19,
          "name": "airport_fee",
          "type": "double",
          "required": false
        }
      ],
      "schema-id": 0,
      "identifier-field-ids": []
    },
    "partition-spec": []
  }
}

You can also add, update or remove properties on tables or namespaces:

  pyiceberg properties set table nyc.taxis write.metadata.delete-after-commit.enabled true
Set write.metadata.delete-after-commit.enabled=true on nyc.taxis

  pyiceberg properties get table nyc.taxis
write.metadata.delete-after-commit.enabled  true

  pyiceberg properties remove table nyc.taxis write.metadata.delete-after-commit.enabled
Property write.metadata.delete-after-commit.enabled removed from nyc.taxis

  pyiceberg properties get table nyc.taxis write.metadata.delete-after-commit.enabled
Could not find property write.metadata.delete-after-commit.enabled on nyc.taxis