API overview

batchcorder exposes two public classes. Both implement the Arrow PyCapsule Interface (__arrow_c_stream__ and __arrow_c_schema__), so they are interoperable with PyArrow, DuckDB, Polars, DataFusion, and any other Arrow-compatible library without explicit conversion.

StreamCache

The primary entry point. Wraps an Arrow stream source and manages the hybrid memory + disk cache.

Method / property Description
__init__(...) Create a StreamCache from any __arrow_c_stream__ source. Accepts optional max_readers to enable bounded-memory eviction.
reader(from_start=True) Return a new independent reader handle.
ingest_all() Eagerly consume the upstream source into the cache.
schema Arrow schema of the stream.
ingested_count Number of batches pulled from upstream so far.
upstream_exhausted True once the upstream source is fully consumed.
__iter__() Create a reader starting at batch 0 and return it.
__arrow_c_stream__() Create a reader starting at batch 0 (for PyCapsule consumers).
__arrow_c_schema__() Expose the schema for PyCapsule consumers.

Full API documentation: batchcorder.StreamCache.

StreamCacheReader

A single-use iterator handle. Maintains its own position in the batch sequence. Obtain one via StreamCache.reader() — do not construct directly.

Method / property Description
__iter__() Return self (readers are their own iterators).
__next__() Return the next pyarrow.RecordBatch.
schema Arrow schema of batches produced by this reader.
closed True if the reader has been consumed.
__arrow_c_stream__() Consume the reader (for PyCapsule consumers).
__arrow_c_schema__() Expose the schema for PyCapsule consumers.

Full API documentation: batchcorder.StreamCacheReader.

Compatibility

Both classes accept and produce data through the Arrow PyCapsule Interface, making them compatible with:

import pyarrow as pa
import duckdb

ds = ...   # a StreamCache

# PyArrow
table = pa.table(ds)
reader = pa.RecordBatchReader.from_stream(ds)

# DuckDB (register as a named relation)
duckdb.table("ds")

# Any library that supports __arrow_c_stream__