API overview¶

batchcorder exposes two public classes. Both implement the Arrow PyCapsule Interface (__arrow_c_stream__ and __arrow_c_schema__), so they are interoperable with PyArrow, DuckDB, Polars, DataFusion, and any other Arrow-compatible library without explicit conversion.

`StreamCache`¶

The primary entry point. Wraps an Arrow stream source and manages the hybrid memory + disk cache.

Method / property	Description
`__init__(...)`	Create a StreamCache from any `__arrow_c_stream__` source. Accepts optional `max_readers` to enable bounded-memory eviction.
`reader(from_start=True)`	Return a new independent reader handle.
`ingest_all()`	Eagerly consume the upstream source into the cache.
`schema`	Arrow schema of the stream.
`ingested_count`	Number of batches pulled from upstream so far.
`upstream_exhausted`	`True` once the upstream source is fully consumed.
`__iter__()`	Create a reader starting at batch 0 and return it.
`__arrow_c_stream__()`	Create a reader starting at batch 0 (for PyCapsule consumers).
`__arrow_c_schema__()`	Expose the schema for PyCapsule consumers.

Full API documentation: batchcorder.StreamCache.

`StreamCacheReader`¶

A single-use iterator handle. Maintains its own position in the batch sequence. Obtain one via StreamCache.reader() — do not construct directly.

Method / property	Description
`__iter__()`	Return `self` (readers are their own iterators).
`__next__()`	Return the next `pyarrow.RecordBatch`.
`schema`	Arrow schema of batches produced by this reader.
`closed`	`True` if the reader has been consumed.
`__arrow_c_stream__()`	Consume the reader (for PyCapsule consumers).
`__arrow_c_schema__()`	Expose the schema for PyCapsule consumers.

Full API documentation: batchcorder.StreamCacheReader.

Compatibility¶

Both classes accept and produce data through the Arrow PyCapsule Interface, making them compatible with:

import pyarrow as pa
import duckdb

ds = ...   # a StreamCache

# PyArrow
table = pa.table(ds)
reader = pa.RecordBatchReader.from_stream(ds)

# DuckDB (register as a named relation)
duckdb.table("ds")

# Any library that supports __arrow_c_stream__

API overview¶

StreamCache¶

StreamCacheReader¶

Compatibility¶

`StreamCache`¶

`StreamCacheReader`¶