Function Repository Resource:

RecordsSummary

Source Notebook

Summarize datasets, lists or associations that can be transformed into full two-dimensional arrays

Contributed by: Anton Antonov

ResourceFunction["RecordsSummary"][data]

summarizes the argument data.

ResourceFunction["RecordsSummary"][data,cols]

summarizes data using the specified column names cols.

Details and Options

ResourceFunction["RecordsSummary"] works on datasets of 2D tables of atomic objects, full 2D arrays (matrices), lists of atomic objects, and associations the values of which are vectors or full 2D arrays. This can include Dataset, Association and Tabular expressions.
Missing values are summarized separately.
The number of the summarized categorical values shown in the summary can be with changed with the option settng "MaxTallies"_Integer.
ResourceFunction["RecordsSummary"] threads for a list of rules or an association with the option setting ThreadTrue.
For 2D full arrays by default ResourceFunction["RecordsSummary"] automatically names the columns.
For 2D full arrays, lists and associations, a second argument can be provided specifying the column names.
By default the summarized columns are numbered.
Automatic numbering can be prevented with the option setting "NumberedColumns"False.
If v is a vector (that is, has only one dimension) then ResourceFunction["RecordsSummary"][v] is equivalent to ResourceFunction["RecordsSummary"][List/@v].

Examples

Basic Examples (5) 

Summarize a vector of numbers:

In[1]:=
ResourceFunction["RecordsSummary"][RandomReal[{-10, 10}, 100]]
Out[1]=
Image

Summarize a matrix of strings and specify the column names:

In[2]:=
sarr = Transpose[{RandomChoice[CharacterRange["A", "Z"], 20], RandomWord["CommonWords", 20]}];
ResourceFunction[
 "RecordsSummary"][sarr, {"random letter", "random word"}]
Out[3]=
Image

Summarize a vector of numbers with missing values:

In[4]:=
ResourceFunction["RecordsSummary"][
 RandomSample[Join[RandomReal[{-10, 10}, 100], Table[Missing[], 4]]]]
Out[4]=
Image

Summarize a full 2D array with numerical and categorical columns (numbers, strings, and symbols):

In[5]:=
Block[{n = 200},
  arr = Flatten /@ Transpose[{RandomReal[{-10, 10}, {n, 2}], MapAt[ToLowerCase, RandomChoice[CharacterRange["A", "Z"], {n, 2}], {All, 2}], RandomChoice[{E, I, \[CapitalGamma]}, n]}]
  ];
In[6]:=
ResourceFunction["RecordsSummary"][arr]
Out[6]=
Image

Summarize a dataset:

In[7]:=
ResourceFunction["RecordsSummary"][Dataset[arr]]
Out[7]=
Image

Summarize a dataset with column names:

In[8]:=
ds = Dataset[arr][All, AssociationThread[{"num1", "num2", "char1", "char2", "symb"}, #] &];
In[9]:=
ResourceFunction["RecordsSummary"][ds]
Out[9]=
Image

Summarize a Tabular object:

In[10]:=
tbl = Tabular[ds]
Out[10]=
Image
In[11]:=
ResourceFunction["RecordsSummary"][tbl]
Out[11]=
Image

Summarize an association of vectors:

In[12]:=
asc = AssociationThread[
   Range[10] -> Table[Append[RandomReal[1, 2], RandomWord[]], 10]];
In[13]:=
ResourceFunction["RecordsSummary"][asc, Thread -> True]
Out[13]=
Image

Scope (4) 

Define a dataset:

In[14]:=
Block[{n = 200},
  arr = Flatten /@ Transpose[{RandomReal[{-10, 10}, {n, 2}], MapAt[ToLowerCase, RandomChoice[CharacterRange["A", "Z"], {n, 2}], {All, 2}], RandomChoice[{E, I, \[CapitalGamma]}, n]}]
  ];
In[15]:=
ds = Dataset[arr][All, AssociationThread[{"num1", "num2", "char1", "char2", "symb"}, #] &];

A larger number of categorical values can be seen using the option "MaxTallies":

In[16]:=
ResourceFunction["RecordsSummary"][ds, "MaxTallies" -> 12]
Out[16]=
Image

The function works with missing values and summarizes them separately of the rest of the values in a column:

In[17]:=
ResourceFunction["RecordsSummary"][
 ds[All, {"num2" -> (If[# > 2, Missing[], #] &), "char1" -> (If[ToCharacterCode[#][[1]] > 76, Missing[], #] &)}]]
Out[17]=
Image

Here we make a list of date objects with missing values:

In[18]:=
dateObjs = RandomSample[
   Join[DateObject /@ DateRange[{2011, 1, 1}, {2019, 12, 31}, Quantity[3, "Months"]], Table[Missing[], {4}]]];

Here is the summary of the date objects list and with a specified column name:

In[19]:=
ResourceFunction["RecordsSummary"][dateObjs, "date object"]
Out[19]=
Image

Here we make an association of random images:

In[20]:=
asc = AssociationThread[
   Range[40] -> RandomChoice[Table[RandomImage[4, {40, 12}], 5], 40]];

This summarizes the list of rules in the association:

In[21]:=
ResourceFunction["RecordsSummary"][asc]
Out[21]=
Image

We can summarize association's keys and values separately using the option setting ThreadTrue:

In[22]:=
ResourceFunction["RecordsSummary"][asc, Thread -> True]
Out[22]=
Image

A dataset does not have to have named columns:

In[23]:=
Block[{n = 200},
  arr = Flatten /@ Transpose[{RandomReal[{-10, 10}, {n, 2}], MapAt[ToLowerCase, RandomChoice[CharacterRange["A", "Z"], {n, 2}], {All, 2}], RandomChoice[{E, I, \[CapitalGamma]}, n]}]
  ];
In[24]:=
ds = Dataset[arr][All, AssociationThread[{"num1", "num2", "char1", "char2", "symb"}, #] &];
In[25]:=
ResourceFunction["RecordsSummary"][ds[Values]]
Out[25]=
Image

Options (7) 

MaxTallies (1) 

With the option "MaxTallies" we specify how many of summarized items we want to see for each column (variable):

In[26]:=
Block[{n = 200},
  arr = Flatten /@ Transpose[{RandomReal[{-10, 10}, {n, 2}], MapAt[ToLowerCase, RandomChoice[CharacterRange["A", "Z"], {n, 2}], {All, 2}], RandomChoice[{E, I, \[CapitalGamma]}, n]}]
  ];
In[27]:=
ds = Dataset[arr][All, AssociationThread[{"num1", "num2", "char1", "char2", "symb"}, #] &];
In[28]:=
ResourceFunction["RecordsSummary"][ds, "MaxTallies" -> 12]
Out[28]=
Image
In[29]:=
ResourceFunction["RecordsSummary"][ds, "MaxTallies" -> 1]
Out[29]=
Image

NumberedColumns (2) 

By default the summarized columns (variables) are automatically numbered:

In[30]:=
Block[{n = 200},
  arr = Flatten /@ Transpose[{RandomReal[{-10, 10}, {n, 2}], MapAt[ToLowerCase, RandomChoice[CharacterRange["A", "Z"], {n, 2}], {All, 2}], RandomChoice[{E, I, \[CapitalGamma]}, n]}]
  ];
In[31]:=
ds = Dataset[arr][All, AssociationThread[{"num1", "num2", "char1", "char2", "symb"}, #] &];
In[32]:=
ResourceFunction["RecordsSummary"][ds]
Out[32]=
Image

With the option "NumberedColumns" the automatic numbering can be prevented:

In[33]:=
ResourceFunction["RecordsSummary"][ds, "NumberedColumns" -> False]
Out[33]=
Image

Thread (4) 

The option Thread is used to specify should the summarization be "threaded" if data to be summarized is an association or a list of rules.

Here is an association of 3D points:

In[34]:=
asc = AssociationThread[Range[40] -> RandomReal[10, {40, 3}]];
Short[asc]
Out[31]=
Image

Summarizing without threading:

In[35]:=
ResourceFunction["RecordsSummary"][asc, Thread -> False]
Out[35]=
Image

Summarizing with threading:

In[36]:=
ResourceFunction["RecordsSummary"][asc, Thread -> True]
Out[36]=
Image

Optionally column names can be added:

In[37]:=
ResourceFunction["RecordsSummary"][asc, "Key" -> {"X", "Y", "Z"}, "NumberedColumns" -> False, Thread -> True]
Out[37]=
Image

Applications (3) 

Summarize Classify-ready data (2) 

Here we summarize the Titanic data:

In[38]:=
ResourceFunction["RecordsSummary"][
 ExampleData[{"MachineLearning", "Titanic"}, "Data"],
 ExampleData[{"MachineLearning", "Titanic"}, "VariableDescriptions"],
 Thread -> True]
Out[38]=
Image

Here we summarize the Mushroom data:

In[39]:=
ResourceFunction["RecordsSummary"][
 ExampleData[{"MachineLearning", "Mushroom"}, "Data"],
 ExampleData[{"MachineLearning", "Mushroom"}, "VariableDescriptions"],
 Thread -> True]
Out[39]=
Image

Summaries browser (1) 

If we have a set of datasets we can easily build an interactive interface that allows browsing of dataset summaries:

In[40]:=
dataNames = ExampleData["Statistics"];
Manipulate[
 Column[{
   Grid[{{"Dataset name:", name},
     {"Dimensions:", ExampleData[name, "Dimensions"]}},
    Alignment -> Left
    ],
   Multicolumn[
    ResourceFunction["RecordsSummary"][ExampleData[name], ExampleData[name, "ColumnDescriptions"]], 4, Alignment -> Top]
   }],
 {{name, dataNames[[29]], "Dataset name"}, dataNames, ControlType -> PopupMenu}]
Out[31]=
Image

Possible Issues (7) 

It is expected that the first argument of RecordsSummary is an object that can be converted to a full array atom objects:

In[41]:=
dataset = Dataset[{
   <|"a" -> 1, "b" -> "x", "c" -> {1}|>,
   <|"a" -> 2, "b" -> "y", "c" -> {2, 3}|>,
   <|"a" -> 3, "b" -> "z", "c" -> {3}|>,
   <|"a" -> 4, "b" -> "x", "c" -> {4, 5}|>,
   <|"a" -> 5, "b" -> "y", "c" -> {5, 6, 7}|>,
   <|"a" -> 6, "b" -> "z", "c" -> {}|>}]
Out[41]=
Image

This fails because dataset cannot be converted to a full 2D array:

In[42]:=
ResourceFunction["RecordsSummary"][dataset]
Image
Out[42]=
Image

A work-around is to use HoldForm for the columns that are not vectors:

In[43]:=
ResourceFunction["RecordsSummary"][dataset[All, {"c" -> HoldForm}]]
Out[43]=
Image

If the numerical columns have Quantity values those columns are treated as categorical:

In[44]:=
ResourceFunction["RecordsSummary"][
 ExampleData[{"Dataset", "Planets"}][[All, {1, 2}]]]
Out[44]=
Image

A summary of numerical values can be obtained by using QuantityMagnitude:

In[45]:=
ResourceFunction["RecordsSummary"][
 ExampleData[{"Dataset", "Planets"}][[All, {1, 2}]][
  All, {"Mass" -> QuantityMagnitude, "Radius" -> QuantityMagnitude}]]
Out[45]=
Image

For associations the values of which are not full arrays, using the option setting ThreadTrue produces a failure:

In[46]:=
ResourceFunction["RecordsSummary"][<|1 -> Range[2], 2 -> Range[3]|>, Thread -> True]
Image
Out[46]=
Image

This works though:

In[47]:=
ResourceFunction["RecordsSummary"][<|1 -> Range[2], 2 -> Range[3]|>, Thread -> False]
Out[47]=
Image

Neat Examples (1) 

Summarize subsets of Titanic data that correspond to each passenger class:

In[48]:=
titanic = ExampleData[{"Dataset", "Titanic"}];
In[49]:=
ColumnForm@
 Normal@Map[Grid[{ResourceFunction["RecordsSummary"][Dataset[#]]}] &, Normal[titanic[GroupBy["class"]]]]
Out[49]=
Image

Publisher

Anton Antonov

Version History

  • 1.1.0 – 13 August 2025
  • 1.0.0 – 02 October 2019

Source Metadata

Related Resources

Author Notes

This function, RecordsSummary, corresponds to R's fundamental function summary.

License Information