Function Repository Resource:

TableSet

Source Notebook

Create a lightweight, named column store of tabular data that is compatible with Query, Select and other SQL-like functions

Contributed by: Edmund B Robinson

ResourceFunction["TableSet"][names, data]

creates a data structure with columns named names and tabular data data.

Details

Tabular data is the most common data structure seen in data analysis. This form of data does not require much of the functionality needed to manage hierarchical data that is in Dataset. ResourceFunction["TableSet"] fills the gap between accessing a lightweight List with indices and a memory-hungry Dataset with named columns.
ResourceFunction["TableSet"] takes a list of strings for the column names.
ResourceFunction["TableSet"] takes any expression that satisfies MatrixQ for data.
The length of names must equal the number of columns in data.
ResourceFunction["TableSet"][]["prop"] can take the following string properties as "prop":
"ColumnNames"list of the column names
"Data"list of the data
"RowCount"number of rows
ResourceFunction["TableSet"] is compatible with the following functions:
Deletedelete a row or multiple rows
DeleteDuplicatesdelete duplicate rows
Insertadd a row at a specified position
JoinAcrossjoin two ResourceFunction["TableSet"] objects by "Left", "Inner" and "Right" specifications
Partreturn parts of the rows and columns
Queryperform a query using the column names as named slots
Selectpick out rows according to a criterion
TableFormdisplay a ResourceFunction["TableSet"] arranged in an array of rectangular cells
Unionreturn a sorted list of all the distinct elements that appear in a ResourceFunction["TableSet"] object
For JoinAcross, "Outer" join and the KeyCollisionFunction option are not implemented.
For Query, use named slots to reference columns.
Since ResourceFunction["TableSet"] is optimised for tabular data, it has a smaller memory footprint than Dataset, which is optimized for hierarchical data. When working with large tabular data, a significant decrease in memory usage can be gained by using ResourceFunction["TableSet"] as compared to Dataset.

Examples

Basic Examples (5) 

Create a TableSet:

In[1]:=
table = ResourceFunction["TableSet"][{"A", "B", "E"}, Partition[Range@12, 3, 1]]
Out[1]=
Image

View the number of rows:

In[2]:=
table["RowCount"]
Out[2]=
Image

Insert a row:

In[3]:=
table = Insert[table, Range@3, 4]
Out[3]=
Image

View the updated number of rows:

In[4]:=
table["RowCount"]
Out[4]=
Image

View the TableSet in TableForm:

In[5]:=
TableForm[table]
Out[5]=
Image

Scope (10) 

TableSet has a summary form:

In[6]:=
table = ResourceFunction["TableSet"][{"A", "B", "E"}, Partition[Range@12, 3, 1]]
Out[6]=
Image

Return the column names:

In[7]:=
table["ColumnNames"]
Out[7]=
Image

Return the row count:

In[8]:=
table["RowCount"]
Out[8]=
Image

Return the data in the TableSet:

In[9]:=
table["Data"]
Out[9]=
Image

Delete a row or rows from a TableSet:

In[10]:=
deleteTable = ResourceFunction["TableSet"][{"A", "B", "E"}, Partition[Range@15, 3, 1]]
Out[10]=
Image

TableSet rows before deleting a row:

In[11]:=
deleteTable["RowCount"]
Out[11]=
Image

Delete row 3:

In[12]:=
deleteTable = Delete[deleteTable, 3]
Out[12]=
Image

Row count after deleting row 3:

In[13]:=
deleteTable["RowCount"]
Out[13]=
Image

Delete row 7 and the last row:

In[14]:=
deleteTable = Delete[deleteTable, {{7}, {-1}}]
Out[14]=
Image

Row count after deleting row 7 and the last row:

In[15]:=
deleteTable["RowCount"]
Out[15]=
Image

Delete duplicate rows:

In[16]:=
deleteTable = ResourceFunction["TableSet"][{"A", "B"}, Partition[Flatten[Range /@ {4, 6}], 2]]
Out[16]=
Image

The first two data rows are identical to the next two rows:

In[17]:=
TableForm[deleteTable]
Out[17]=
Image

Delete duplicate rows:

In[18]:=
deleteTable = DeleteDuplicates[deleteTable];
TableForm[deleteTable]
Out[13]=
Image

Add a row at a specified position:

In[19]:=
insertTable = ResourceFunction["TableSet"][{"A", "B"}, Partition[Flatten[Range@6], 2]]
Out[19]=
Image

Insert a row at position 3:

In[20]:=
insertTable = Insert[insertTable, Range@2, 3];
TableForm[insertTable]
Out[18]=
Image

Insert rows into position 3 and the last position:

In[21]:=
insertTable = Insert[insertTable, Range@2, {{3}, {-1}}];
TableForm[insertTable]
Out[14]=
Image

For JoinAcross, "Outer" and KeyCollisionFunction are not implemented:

In[22]:=
table1 = ResourceFunction["TableSet"][{"A", "B", "E"}, Partition[Range@10, 3, 1]];
table2 = ResourceFunction["TableSet"][{"B", "C", "D"}, Partition[Range[6, 15], 3, 1]];
Grid[{TableForm /@ {table1, table2}}
 , Spacings -> {3, Automatic}
 ]
Out[22]=
Image

Evaluate "Left", "Inner" and "Right" joins:

In[23]:=
Function[{jspec},
   Labeled[
    TableForm@JoinAcross[table1, table2, "B", jspec],
    jspec, Top
    ]] /@ {"Left", "Inner", "Right"} //
 Grid[{#}, Alignment -> {Center, Top}, Spacings -> {3, Automatic}] &
Out[23]=
Image

Return parts of the rows and columns:

In[24]:=
partTable = ResourceFunction["TableSet"][{"A", "B", "E"}, Partition[Range@10, 3, 1]]
Out[24]=
Image

Return the first two rows:

In[25]:=
partTable[[;; 2]]
Out[25]=
Image

Return every other row:

In[26]:=
partTable[[;; ;; 2]]
Out[26]=
Image

Return the last column:

In[27]:=
partTable[[All, -1]]
Out[27]=
Image

Use named slots to reference TableSet columns in Query:

In[28]:=
queryTable = ResourceFunction["TableSet"][{"A", "B"}, Partition[RandomInteger[{1, 100}, 10^6], 2]]
Out[28]=
Image

Query a Histogram of the sum of the columns:

In[29]:=
Query[Histogram, #A + #B &]@queryTable
Out[29]=
Image

Add columns based on calculations on existing columns:

In[30]:=
queryTable = Query[All, <|#, "Total" -> #A + #B, "Difference" -> Differences@{#A, #B}|> &]@queryTable;
queryTable["ColumnNames"]
Out[27]=
Image

Pick out rows according to a criterion:

In[31]:=
selectTable = ResourceFunction["TableSet"][{"A", "B"}, Partition[RandomInteger[{1, 100}, 10^6], 2]]
Out[31]=
Image

Select rows with column A > 4 and B < 12:

In[32]:=
resultTable = Select[selectTable, #A > 4 && #B < 12 &]
Out[32]=
Image

Compare the row count of the original and result table:

In[33]:=
selectTable["RowCount"]
Out[33]=
Image
In[34]:=
resultTable["RowCount"]
Out[34]=
Image

Display the TableSet arranged in an array of rectangular cells:

In[35]:=
displayTable = ResourceFunction["TableSet"][{"A", "B"}, Partition[Flatten[Range@6], 2]]
Out[35]=
Image
In[36]:=
TableForm[displayTable]
Out[36]=
Image

Create TableSets with example data:

In[37]:=
table1 = ResourceFunction["TableSet"][{"A", "B", "E"}, Partition[Range@10, 3, 1]];
table2 = ResourceFunction["TableSet"][{"B", "C", "D"}, Partition[Range[6, 15], 3, 1]];

View TableSetobjects in TableForm:

In[38]:=
Grid[{TableForm /@ {table1, table2}}
 , Spacings -> {3, Automatic}
 ]
Out[38]=
Image

Union gives the union of the rows maintaining the column names from the first table:

In[39]:=
TableForm@Union[table1, table2]
Out[39]=
Image

Properties and Relations (4) 

When working with large tabular data, a significant decrease in memory usage can be gained by using TableSet as compared to Dataset. For example, with a two-column table of integer values, the ByteCount difference quickly widens with the record count as the graph tooltips and the following table demonstrate:

In[40]:=
aBytes = {};
dBytes = {};
tsBytes = {};
obs = Power[2, Range@24];

Create the data:

In[41]:=
Module[
  {data, a}
  , Scan[
   Function[n
    , data = Partition[RandomReal[{1, 100}, n], 2]
    ; a = AssociationThread[{"A", "B"}, #] & /@ data
    ; AppendTo[aBytes, ByteCount[a]]
    ; AppendTo[dBytes, ByteCount[Dataset@a]]
    ; AppendTo[tsBytes, ByteCount[ResourceFunction["TableSet"][{"A", "B"}, data]]];
    ]
   , obs
   ]
  ];

Plot it:

In[42]:=
ListPlot[
 Tooltip[{##}, Grid@{{"Obs", ":", QuantityMagnitude@#1}, {"MB", ":", N@#2}}] & @@@ Transpose@{Quantity[obs, "Unities"], #} & /@
  UnitConvert[ Quantity[{aBytes, dBytes, tsBytes}, "Bytes"], "Megabytes"]
 , TargetUnits -> {"Mega", Automatic}
 , PlotStyle -> {Directive[Thickness[.02], Opacity[.4]], Directive[Red, Thickness[.005]], Automatic}
 , PlotLabel -> "ByteCount Comparison"
 , Joined -> True
 , FrameLabel -> Automatic
 , PlotLegends -> {Association, Dataset, ResourceFunction[
   "TableSet"]}
 , Mesh -> All
 , MeshStyle -> Purple
 , PlotRange -> Full
 , PlotTheme -> "Detailed"
 , ImageSize -> Large
 ]
Out[42]=
Image

See the data in a table:

In[43]:=
TableForm[
 MapAt[UnitConvert[N@Quantity[#, "Bytes"], "Megabytes"] &, {2 ;;, All}]@{obs, aBytes, dBytes, tsBytes}[[All, -11 ;; -6]]
 , TableHeadings -> {{"Records", Association, Dataset, ResourceFunction["TableSet"]}, None}
 , TableAlignments -> {Decimal, Automatic}
 ]
Out[43]=
Image

Possible Issues (2) 

A Query to add columns returns a TableSet:

In[44]:=
queryTable = ResourceFunction["TableSet"][{"A", "B", "E"}, Partition[RandomInteger[10, 15], 3, 1]]
Out[44]=
Image
In[45]:=
Query[All, <|#, "ABTotal" -> #A + #B|> &]@queryTable
Out[45]=
Image

However, a general query does not return a TableSet:

In[46]:=
Query[GroupBy[#A - #B &]]@queryTable
Out[46]=
Image

Publisher

Edmund B Robinson

Version History

  • 1.0.0 – 10 August 2021

Related Resources

License Information