Table

table provides a DataTable / DataFrame structure similar to pandas and xarray in Python, and Apache Arrow Table, using tensor n-dimensional columns aligned by common outermost row dimension.

Data in the table is accessed by first getting the Column tensor (typically by name), and then using the tensor.RowMajor methods to access data within that tensor in a row-wise manner:

Sorting and filtering

The Column method creates a tensor.Rows for the underlying column values, with a list of indexes used for the row-level access, which enables efficient sorting and filtering by row, as only these indexes need to be updated, not the underlying data values. The indexes are maintained on the table, which provides an indexed view onto the underlying data values that are stored in a separate table.Columns structure. Thus, there can be multiple different such table views onto the same underlying columns data.

CSV / TSV file format

Tables can be saved and loaded from CSV (comma separated values) or TSV (tab separated values) files. See the next section for special formatting of header strings in these files to record the type and tensor cell shapes.

Type and Tensor Headers

To capture the type and shape of the columns, we support the following header formatting. We weren’t able to find any other widely supported standard (please let us know if there is one that we’ve missed!)

Here is the mapping of special header prefix characters to standard types:

Columns that have tensor cell shapes (not just scalars) are marked as such with the first such column having a suffix indicating the shape of the cells in this column, e.g., <2:5,4> indicates a 2D cell Y=5,X=4. Each individual column is then indexed as [ndims:x,y..] e.g., the first would be [2:0,0], then [2:0,1] etc.

Example

Here’s a TSV file for a scalar String column (Name), a 2D 1x4 tensor float32 column (Input), and a 2D 1x2 float32 Output column.

Loading...

Static preview:

Sorting and filtering

CSV / TSV file format

Type and Tensor Headers

Example