table provides a DataTable / DataFrame structure similar to pandas and xarray in Python, and Apache Arrow Table, using tensor n-dimensional columns aligned by common outermost row dimension.
Data in the table is accessed by first getting the Column
tensor (typically by name), and then using the tensor.RowMajor methods to access data within that tensor in a row-wise manner:
Sorting and filtering
The Column
method creates a tensor.Rows for the underlying column values, with a list of indexes used for the row-level access, which enables efficient sorting and filtering by row, as only these indexes need to be updated, not the underlying data values. The indexes are maintained on the table, which provides an indexed view onto the underlying data values that are stored in a separate table.Columns structure. Thus, there can be multiple different such table views onto the same underlying columns data.
CSV / TSV file format
Tables can be saved and loaded from CSV (comma separated values) or TSV (tab separated values) files. See the next section for special formatting of header strings in these files to record the type and tensor cell shapes.
Type and Tensor Headers
To capture the type and shape of the columns, we support the following header formatting. We weren’t able to find any other widely supported standard (please let us know if there is one that we’ve missed!)
Here is the mapping of special header prefix characters to standard types:
Columns that have tensor cell shapes (not just scalars) are marked as such with the first such column having a
suffix indicating the shape of the cells in this column, e.g., <2:5,4>
indicates a 2D cell Y=5,X=4. Each individual column is then indexed as [ndims:x,y..]
e.g., the first would be [2:0,0]
, then [2:0,1]
etc.
Example
Here’s a TSV file for a scalar String column (Name
), a 2D 1x4 tensor float32 column (Input
), and a 2D 1x2 float32 Output
column.