- PyArrow schemas for explicit schema control
LanceModelfor Pydantic-based validation
Create a table with data
Initialize a LanceDB connection and create a table Depending on the SDK, LanceDB can ingest arrays of records, Arrow tables or record batches, and Arrow batch iterators or readers. Let’s take a look at some of the common patterns.From list of objects
You can provide a list of objects to create a table. The Python and TypeScript SDKs support lists/arrays of dictionaries, while the Rust SDK supports lists of structs.From a custom schema
You can define a custom Arrow schema for the table. This is useful when you want to have more control over the column types and metadata.From an Arrow Table
You can also create LanceDB tables directly from Arrow tables. Rust uses an ArrowRecordBatchReader for the same Arrow-native ingest flow.
From a Pandas DataFrame
Python OnlyData is converted to Arrow before being written to disk. For maximum control over how data is saved, either provide the PyArrow schema to convert to or else provide a PyArrow Table directly.
The
vector column needs to be a Vector (defined as pyarrow.FixedSizeList) type.From a Polars DataFrame
Python Only LanceDB supports Polars, a modern, fast DataFrame library written in Rust. Just like in Pandas, the Polars integration is enabled by PyArrow under the hood. A deeper integration between LanceDB Tables and Polars DataFrames is on the way.From Pydantic Models
Python Only When you create an empty table without data, you must specify the table schema. LanceDB supports creating tables by specifying a PyArrow schema or a specialized Pydantic model calledLanceModel.
For example, the following Content model specifies a table with 5 columns:
movie_id, vector, genres, title, and imdb_id. When you create a table, you can
pass the class as the value of the schema parameter to create_table.
The vector column is a Vector type, which is a specialized Pydantic type that
can be configured with the vector dimensions. It is also important to note that
LanceDB only understands subclasses of lancedb.pydantic.LanceModel
(which itself derives from pydantic.BaseModel).
Nested schemas
Sometimes your data model may contain nested objects. For example, you may want to store the document string and the document source name as a nested Document object: This can be used as the type of a LanceDB table column: This creates a struct column called “document” that has two subfields called “content” and “source”:Validators
BecauseLanceModel inherits from Pydantic’s BaseModel, you can combine them with Pydantic’s
field validators. The example
below shows how to add a validator to ensure that only valid timezone-aware datetime objects are used
for a created_at field.
When you run this code it, should raise the ValidationError.
From Batch Iterators
For bulk ingestion on large datasets, prefer batching instead of adding one row at a time. Python and Rust can create a table directly from Arrow batch iterators or readers. In TypeScript, the practical pattern today is to create an empty table and append Arrow batches in chunks. Use this pattern when:- Your source data already arrives in Arrow batches, readers, datasets, or streams.
- Materializing the entire ingest as one giant in-memory list or array would be too expensive.
- You want to control chunk size explicitly during ingestion.
Write with Concurrency
For Python users who want to speed up bulk ingest jobs, it is usually better to write from Arrow-native sources that already produce batches, such as readers, datasets, or scanners, instead of first materializing everything as one large Python list. This is most useful when you are writing large amounts of data from an existing Arrow pipeline or another batch-oriented source. The current codebase also contains a lower-level ingest mechanism for describing a batch source together with extra metadata such as row counts and retry behavior. However, that path is not accepted by the released Pythoncreate_table(...) and add(...) workflow in lancedb==0.30.0, so we are not showing it as a docs example yet.
In Rust, the same lower-level ingest mechanism is available, but the common batch-reader example above is usually the better starting point unless you specifically need to define your own batch source or provide size and retry hints. In TypeScript, this lower-level mechanism is not exposed publicly, so chunked Arrow batch writes remain the recommended pattern.
Create empty table
You can create an empty table for scenarios where you want to add data to the table later. An example would be when you want to collect data from a stream/external file and then add it to a table in batches. An empty table can be initialized via an Arrow schema. Alternatively, you can also use Pydantic to specify the schema for the empty table. Note that we do not directly importpydantic but instead use lancedb.pydantic which is a subclass of pydantic.BaseModel
that has been extended to support LanceDB specific types like Vector.
Once the empty table has been created, you can append to it or modify its contents,
as explained in the updating and modifying tables section.
Open an existing table
You can open an existing table by specifying the name of the table to theopen_table / openTable method.
If you forget the name of your table, you can always get a listing of all table names.
Drop a table
Use thedrop_table() method on the database to remove a table.
This permanently removes the table and is not recoverable, unlike deleting rows.
By default, if the table does not exist an exception is raised. To suppress this,
you can pass in ignore_missing=True.