Skip to main content
Beta — introduced in Geneva 0.11.0 Standard UDFs produce exactly one output value per input row. Scalar UDTFs enable 1:N row expansion — each source row can produce multiple output rows. The results are stored as a materialized view with MV-style incremental refresh.
Source TableDerived TableExpansion
1 video row→ N clip rowsVideo segmentation
1 document row→ N chunk rowsText chunking
1 image row→ N tile rowsImage tiling

Defining a Scalar UDTF

Use the @scalar_udtf decorator on a function that yields output rows. Geneva infers the output schema from the return type annotation. Input parameters are bound to source columns by name — the parameter video_path binds to source column video_path, just like standard UDFs.
A scalar UDTF can yield zero rows for a source row. The source row is still marked as processed and will not be retried on the next refresh.

List return pattern

If you prefer to build the full list in memory rather than yielding, you can return a list instead of an Iterator:

Batched scalar UDTF

For vectorized processing, use batch=True. The function receives Arrow arrays and returns a RecordBatch of expanded rows. Because the return type pa.RecordBatch cannot be inferred, you must supply output_schema explicitly:

Creating a Scalar UDTF View

Scalar UDTFs use the create_scalar_udtf_view API: The query parameter controls which source columns are inherited. Columns listed in .select() are carried into every child row automatically.

Inherited Columns

Child rows automatically include the parent’s columns — no manual join required. The columns available in the child table are determined by the query’s .select():

videos table (source)

video_pathdurationmetadata
/v/a.mp4120.0{fps: 30}
/v/b.mp460.0{fps: 24}

clips table (derived, 1:N)

video_pathmetadataclip_startclip_endclip_bytes
/v/a.mp4{fps: 30}0.010.0b”\x00\x1a…”
/v/a.mp4{fps: 30}10.020.0b”\x00\x2b…”
/v/a.mp4{fps: 30}20.030.0b”\x00\x3c…”
/v/b.mp4{fps: 24}0.010.0b”\x00\x4d…”
/v/b.mp4{fps: 24}10.020.0b”\x00\x5e…”
The first three rows come from the /v/a.mp4 source row, the last two from /v/b.mp4. Inherited columns (video_path, metadata) are carried over automatically; clip_start, clip_end, and clip_bytes are generated by the UDTF.

Adding Computed Columns After Creation

Since scalar UDTF views are materialized views, you can add UDF-computed columns to the child table and backfill them: This is a powerful pattern: expand source rows with a scalar UDTF, then enrich the expanded rows with standard UDFs.

Incremental Refresh

Scalar UDTFs support incremental refresh, just like standard materialized views:
  • New source rows: The UDTF runs on new rows, inserting child rows.
  • Deleted source rows: Child rows linked to the deleted parent are cascade-deleted.
  • Updated source rows: Old children are deleted, UDTF re-runs, new children inserted.
Only the new source rows are processed. Existing clips from previous refreshes are untouched.

Chaining UDTF Views

Scalar UDTF views are standard materialized views, so they can serve as the source for further views:

Full Example: Document Chunking

For a comparison of all three function types (UDFs, Scalar UDTFs, Batch UDTFs), see Understanding Transforms. Reference: