Making tables in R with table1 – Notes from a data witch

I’ll never be good enough
You make me wanna die
And everything you love will burn up in the light
And every time I look inside your eyes
You make me wanna die
- The Pretty Reckless

It’s no secret that my health hasn’t been so great these last few months. Nothing life-threatening, I hasten to add, but severe enough that I’ve spent a depressing amount of 2024 in bed, and not in the fun way. I’m fortunate enough to have a remote job, and the workload this year hasn’t been as demanding as it was last year. I’ve been able to manage, yes,¹ but it has been rough. I’ve necessarily been focusing what little energy I’ve had on my kids and on my day to day work. I’ve had no bandwidth at all to write, or make art, or learn new things. A sorry state of affairs, and one that sucks much of the joy out of life.

Happily, things have started to turn around in recent weeks. I’ve had a little more energy, I’ve been able to work from my desk rather than my bed, and while the artistic impulse hasn’t come back yet I’ve started to write once more. I told myself that I’d start with something fairly simple for my first attempt at writing – Danielle, perhaps you could write up a few notes about a package you use at work? Nothing complicated. Just a little something on the table1 package² by Benjamin Rich, perhaps? Nice and simple, short and sweet. Won’t take very long at all will it my dear?

Yeah, right.

As my health recovers I’ve been listening to The Pretty Reckless a lot, and discovering that Taylor Momsen is so much more awesome than I realised

Getting started

The table1 package is one of those “niche” packages that is designed to solve exactly one problem, and solve that problem well: it is designed to produce tables of descriptive statistics of the sort that typically appear as “Table 1” in an academic paper (hence the name). It’s not a general purpose tool for table construction like gt or flextable, and compared to those packages it has a number of limitations. However, because the scope of the package is narrower, it’s able to solve the specific problem that it is designed for in an extremely efficient manner. It’s used a lot in my workplace and while I was a little skeptical at first I’ve come to love it.

So let’s get this party started shall we? First, I’ll need to load a few packages in order to make this post even remotely legible. Besides the table1 package itself, I’ll load the palmerpenguins package so that I have a data set I can play with, and dplyr for any data wrangling I need to do later on:

library(palmerpenguins)
library(table1)
library(dplyr)

The palmerpenguins data set that I’ll be using in this post is one I’ve used many times before, and it’s nicely documented on the package website. Suffice it to say, the data set contains a collection of measurements from three penguin species, and the data set looks like this:

penguins

# A tibble: 344 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex     year
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int> <fct>  <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750 male    2007
 2 Adelie  Torgersen           39.5          17.4               186        3800 female  2007
 3 Adelie  Torgersen           40.3          18                 195        3250 female  2007
 4 Adelie  Torgersen           NA            NA                  NA          NA <NA>    2007
 5 Adelie  Torgersen           36.7          19.3               193        3450 female  2007
 6 Adelie  Torgersen           39.3          20.6               190        3650 male    2007
 7 Adelie  Torgersen           38.9          17.8               181        3625 female  2007
 8 Adelie  Torgersen           39.2          19.6               195        4675 male    2007
 9 Adelie  Torgersen           34.1          18.1               193        3475 <NA>    2007
10 Adelie  Torgersen           42            20.2               190        4250 <NA>    2007
# ℹ 334 more rows

To illustrate the basic usage of the table1 package, I’ll create a table that provides descriptive statistics for the bill_length_mm and island variables, computed separately by each species represented in the data set. We can do this with very little difficulty by passing a one-sided formula to the table1() function. The formula we want looks like this:

~ island + bill_length_mm | species

On the left we have the two variables that contain the measurements we want to describe (bill_length_mm and island), and on the right we have the stratification variable that supplies the grouping (species). When calling the table1() function, all we have to do is pass this formula and the data frame itself:³ ⁴

table1(~ island + bill_length_mm | species, penguins)

	Adelie (N=152)	Chinstrap (N=68)	Gentoo (N=124)	Overall (N=344)
island
Biscoe	44 (28.9%)	0 (0%)	124 (100%)	168 (48.8%)
Dream	56 (36.8%)	68 (100%)	0 (0%)	124 (36.0%)
Torgersen	52 (34.2%)	0 (0%)	0 (0%)	52 (15.1%)
bill_length_mm
Mean (SD)	38.8 (2.66)	48.8 (3.34)	47.5 (3.08)	43.9 (5.46)
Median [Min, Max]	38.8 [32.1, 46.0]	49.6 [40.9, 58.0]	47.3 [40.9, 59.6]	44.5 [32.1, 59.6]
Missing	1 (0.7%)	0 (0%)	1 (0.8%)	2 (0.6%)

The output here is a table showing frequency counts for the discrete variable (island) and some standard summary statistics for the continuous variable (bill_length_mm). This table isn’t perfect but it’s surprisingly good given how little effort I had to put in when creating it, and I suspect this ease-of-use factor is the main reason why this package gets used so much in my workplace. Like everything in life though, the devil is in the details, and if you want to make the most of the package it’s helpful to dive into those details to get a good sense of what the package can (and cannot) do.

Applying labels

The table I produced above immediately illustrates the first problem a data analyst has to grapple with when using the table1 package: variable labels. In most respects this “off the shelf” table is pretty good: it’s almost good enough to use. But there’s one big eyesore: the raw variable names island and bill_length_mm appear as row labels in the output. These are both excellent variable names for programming, but they’re not very nice when exposed in a table. To fix this, we can use the label() function supplied by the table1 package to associate each of these variables with a pretty, human-readable label:⁵

label(penguins$island) <- "Island"
label(penguins$bill_length_mm) <- "Bill Length (mm)"

table1(~ island + bill_length_mm | species, penguins)

	Adelie (N=152)	Chinstrap (N=68)	Gentoo (N=124)	Overall (N=344)
Island
Biscoe	44 (28.9%)	0 (0%)	124 (100%)	168 (48.8%)
Dream	56 (36.8%)	68 (100%)	0 (0%)	124 (36.0%)
Torgersen	52 (34.2%)	0 (0%)	0 (0%)	52 (15.1%)
Bill Length (mm)
Mean (SD)	38.8 (2.66)	48.8 (3.34)	47.5 (3.08)	43.9 (5.46)
Median [Min, Max]	38.8 [32.1, 46.0]	49.6 [40.9, 58.0]	47.3 [40.9, 59.6]	44.5 [32.1, 59.6]
Missing	1 (0.7%)	0 (0%)	1 (0.8%)	2 (0.6%)

To understand what’s really going on here, it’s helpful to recognise that the label() function is purely a convenience function. All it’s really doing is setting the “label” metadata attribute for the relevant object. If you really wanted to, you could do exactly the same thing in base R via the attr() function:

attr(penguins$bill_depth_mm, "label") <- "Bill Depth (mm)"

table1(~ bill_depth_mm | species, penguins)

	Adelie (N=152)	Chinstrap (N=68)	Gentoo (N=124)	Overall (N=344)
Bill Depth (mm)
Mean (SD)	18.3 (1.22)	18.4 (1.14)	15.0 (0.981)	17.2 (1.97)
Median [Min, Max]	18.4 [15.5, 21.5]	18.5 [16.4, 20.8]	15.0 [13.1, 17.3]	17.3 [13.1, 21.5]
Missing	1 (0.7%)	0 (0%)	1 (0.8%)	2 (0.6%)

That being said, I am slowly coming to like the setLabel() convenience function that table1 provides rather than using label() or attr(). The setLabel() function has the nice property that it returns the labelled object itself and so it plays very nicely with a dplyr workflow. If you have a data frame with several variables that need to be labelled, you can use mutate() and setLabel() to apply all the labels in one step, like this:

penguins <- penguins |> 
  mutate(
    flipper_length_mm = setLabel(flipper_length_mm, "Flipper Length (mm)"),
    body_mass_g = setLabel(body_mass_g, "Body Mass (g)"),
    sex = setLabel(sex, "Sex"),
    year = setLabel(year, "Year")
  )

table1(~ flipper_length_mm + body_mass_g + sex + year | species, penguins)

	Adelie (N=152)	Chinstrap (N=68)	Gentoo (N=124)	Overall (N=344)
Flipper Length (mm)
Mean (SD)	190 (6.54)	196 (7.13)	217 (6.48)	201 (14.1)
Median [Min, Max]	190 [172, 210]	196 [178, 212]	216 [203, 231]	197 [172, 231]
Missing	1 (0.7%)	0 (0%)	1 (0.8%)	2 (0.6%)
Body Mass (g)
Mean (SD)	3700 (459)	3730 (384)	5080 (504)	4200 (802)
Median [Min, Max]	3700 [2850, 4780]	3700 [2700, 4800]	5000 [3950, 6300]	4050 [2700, 6300]
Missing	1 (0.7%)	0 (0%)	1 (0.8%)	2 (0.6%)
Sex
female	73 (48.0%)	34 (50.0%)	58 (46.8%)	165 (48.0%)
male	73 (48.0%)	34 (50.0%)	61 (49.2%)	168 (48.8%)
Missing	6 (3.9%)	0 (0%)	5 (4.0%)	11 (3.2%)
Year
Mean (SD)	2010 (0.822)	2010 (0.863)	2010 (0.792)	2010 (0.818)
Median [Min, Max]	2010 [2010, 2010]	2010 [2010, 2010]	2010 [2010, 2010]	2010 [2010, 2010]

Customing cell content

Somebody mixed my medicine
Somebody’s in my head again
Well, I’ll drink what you leak and I’ll smoke what you sigh
See across the room with a look in your eye
–The Pretty Reckless

One thing I really like about the table1 package is that it supplies very sensible defaults for tables of descriptive statistics: continuous variables are summarised not only via means and standard deviations, but you also get the medians, ranges, and missing data summaries. Categorical variables are summarised with counts and percentages, and again you get a missing data summary. Very often this is exactly the summary you want, and no customisation at all is required.

Inevitably, though, every data analyst comes across as situation that requires a different collection of summary statistics. At that point, you need to dive a little deeper and understand the syntax table1 uses to modify the summaries that it produces.

Using abbreviated codes

The table1 package has a very practical and flexible mechanism for customising the descriptive statistics that it produces, but one that needs a bit of unpacking to understand. If you really want to do so, you can write an entire “rendering” function from scratch that affords very fine grained control over the output (more on that later!) but most of the time you don’t actually want to go to all that effort. In most situations, all you really want to do is swap out one widely-used descriptive statistic for a different widely-used descriptive statistic. It would be no fun for the analyst if they had to write an entire rendering function from scratch just to switch from reporting arithmetic means to reporting geometric means. To that end, table1 provides a compact syntax using “abbreviated codes” that covers a lot of common use cases.

As a concrete example, let’s consider the task I described above: reporting geometric means and standard deviations. This is a very common task in pharmacometrics because a lot of observed data are approximately log-normal in distribution, and in my everyday work I find I have to do this a lot. Luckily for me, the table1 package recognises the strings "GMEAN" and "GSD" as abbreviated codes, and internally will replace them with function calls that compute the geometric mean and geometric standard deviation. To define a custom render that produces these two statistics, all I have to do is define a named vector like this one:

render_geometric <- c(
  "Geometric mean" = "GMEAN", 
  "Geometric SD" = "GSD"
)

In this compressed syntax, the names define the row labels that will be printed in the output table (e.g., "Geometric mean" becomes a row label), and the values are interpreted using the abbreviated code (e.g., the "GMEAN" string is replaced by the value of the geometric mean). To apply this custom render to my table only to the continuous variables in the summary table, all I have to do is include render.continuous = render_geometric in the call to table1():

table1(
  x = ~ flipper_length_mm + body_mass_g + sex + year | species, 
  data = penguins, 
  render.continuous = render_geometric
)

	Adelie (N=152)	Chinstrap (N=68)	Gentoo (N=124)	Overall (N=344)
Flipper Length (mm)
Geometric mean	190	196	217	200
Geometric SD	1.04	1.04	1.03	1.07
Missing	1 (0.7%)	0 (0%)	1 (0.8%)	2 (0.6%)
Body Mass (g)
Geometric mean	3670	3710	5050	4130
Geometric SD	1.13	1.11	1.11	1.21
Missing	1 (0.7%)	0 (0%)	1 (0.8%)	2 (0.6%)
Sex
female	73 (48.0%)	34 (50.0%)	58 (46.8%)	165 (48.0%)
male	73 (48.0%)	34 (50.0%)	61 (49.2%)	168 (48.8%)
Missing	6 (3.9%)	0 (0%)	5 (4.0%)	11 (3.2%)
Year
Geometric mean	2010	2010	2010	2010
Geometric SD	1.00	1.00	1.00	1.00

As you can see from the output, the custom render has been applied to the two continuous variables flipper_length_mm and body_mass_g but not the categorical variables sex and year, just as you’d expect given that the argument I specified is called render.continuous. However, there are two features that might be a little surprising:

The missing data summary for the continuous variables is unaffected
If you look at the documentation for table1() you’ll notice it has a render argument but not a render.continuous argument

I’ll unpack both of those things later in the blog post, but I wanted to mention them now because these things confused me a little when I first started using table1. For now, let’s just accept that it works and move on.

The table1 package comes equipped with quite a few of these abbreviated codes, which makes life considerably easier. For instance if we needed to compute the 10th, 50th, and 90th percentiles of each continuous variable, we could use the "q10", "q50", and "q90" keywords, like so:

table1(
  x = ~ flipper_length_mm + body_mass_g + sex | species, 
  data = penguins, 
  render.continuous = c(
    "10th percentile" = "q10", 
    "50th percentile" = "q50",
    "90th percentile" = "q90"
  )
)

	Adelie (N=152)	Chinstrap (N=68)	Gentoo (N=124)	Overall (N=344)
Flipper Length (mm)
10th percentile	181	187	209	185
50th percentile	190	196	216	197
90th percentile	198	205	228	221
Missing	1 (0.7%)	0 (0%)	1 (0.8%)	2 (0.6%)
Body Mass (g)
10th percentile	3150	3300	4400	3300
50th percentile	3700	3700	5000	4050
90th percentile	4300	4200	5700	5400
Missing	1 (0.7%)	0 (0%)	1 (0.8%)	2 (0.6%)
Sex
female	73 (48.0%)	34 (50.0%)	58 (46.8%)	165 (48.0%)
male	73 (48.0%)	34 (50.0%)	61 (49.2%)	168 (48.8%)
Missing	6 (3.9%)	0 (0%)	5 (4.0%)	11 (3.2%)

Very handy.

The quote from “My medicine” at the start of this section isn’t accidental. The abbreviated code syntax in table1 is a powerful tool, but when I started reading table1 code without understanding the keyword matching involved it did feel a little like “someone’s in my head again”, substituting code where a string should be

Supported aliases

The natural question you might have as the user of the package is, of course, what abbreviated codes does the table1 package understand? As documented here in the package vignette, you can find a complete listing by playing around with the stats.default() function.⁶ So let’s do that. For continuous variables, this is the list of supported aliases:

continuous <- 1:10
names(stats.default(continuous))

 [1] "N"      "NMISS"  "SUM"    "MEAN"   "SD"     "CV"     "GMEAN"  "GSD"    "GCV"   
[10] "MEDIAN" "MIN"    "MAX"    "q01"    "q02.5"  "q05"    "q10"    "q25"    "q50"   
[19] "q75"    "q90"    "q95"    "q97.5"  "q99"    "Q1"     "Q2"     "Q3"     "IQR"   
[28] "T1"     "T2"

Here’s what each of these mean:⁷

"N", "NMISS": these compute the number of non-missing observations and number of missing observations respectively
"SUM", "MEAN", "SD", "MEDIAN", "MIN", "MAX": these all correspond to to the functions of the same name, e.g., "SUM" produces a call to sum(), with missing values removed
"CV": the coefficient of variation, i.e., 100 times the standard deviation divided by the absolute value of the mean
"GMEAN", "GSD", "GCV": the geometric mean, geometric standard deviation, and geometric coefficient of variation
q01, q02.5, "q05", "q10", "q25", "q50", "q75", "q90", "q95", "q97.5", "q99": these are understood to refer to specific quantiles, e.g., "q25" is translated as a function call that computes the 25th percentile
"Q1", "Q2" "Q3": these are used to compute quartiles (25th, 50th, and 75th percentiles)
"T1", "T2": these are used to compute tertiles (33rd and 67th percentiles)
"IQR": this computes the interquartile range

In all cases except for "NMISS", the relevant statistics are computed after removing missing values. Turning now to categorical variables, we can again use the stats.default() function to find the supported abbreviated codes:⁸

categorical <- c("a", "b", "c")
names(stats.default(categorical)[[1]])

[1] "FREQ"    "PCT"     "PCTnoNA" "NMISS"

The interpretation of these is as follows:

"FREQ": the frequency count for a category
"PCT": the percent relative frequency, with missing values included in the denominator
"PCTnoNA": the percent relative frequency, after missing values are removed
"NMISS": the number of missing values, as before

The nice thing about these abbreviated codes is that they cover a surprisingly wide variety of use cases. More often than not I’ve found that the descriptive statistics I need can be specified using this mechanism. From the analyst perspective this is great: you really don’t want to waste time writing more code than you have to, so if you can specify your table of descriptive statistics without bothering to write a function, you’re doing well.

Writing render functions

I am strong, love is evil
It’s a version of perversion that is only for the lucky people
Take your time and do with me what you will
I won’t mind, you know I’m ill, you know I’m ill
So hit me like a man
And love me like a woman
– The Pretty Reckless

Probably no surprise to anyone who knows me that “Hit me like a man” is my favourite Pretty Reckless song. But also appropriate to how I feel about the render function syntax. The functionality is powerful once you know how to use it, but it’s also simple and lovely

Alas life is not always kind to us, and it’s not uncommon to run into situations where your table of descriptive statistics requires the computation of something that doesn’t have an abbreviated code in table1. When that happens, the only recourse is for the user to write a rendering function that takes the data as input and returns a vector of strings to be printed into the table. As an example, suppose you have a need to report Winsorised summary statistics for your continuous variables:

render_winsorized <- function(x, cutoff = .05, ...) {
  lo <- quantile(x, cutoff, na.rm = TRUE)
  hi <- quantile(x, 1 - cutoff, na.rm = TRUE)
  x[x < lo] <- lo
  x[x > hi] <- hi
  strs <- c(
    "",
    "Winsorized mean" = sprintf("%1.2f", mean(x, na.rm = TRUE)),
    "Winsorized SD" = sprintf("%1.2f", sd(x, na.rm = TRUE))
  )
  return(strs)
}

Notice that this render_winsorized() function returns a named vector of strings that follows the same convention that we followed with the simpler render_geometric example earlier: the names of the output string become the row labels, and the values are printed into the table itself. Along similar lines, we can define a rendering function to be applied to the categorical variables in the data. Here’s a very simple one that reports only the absolute frequencies for each category:

render_counts <- function(x, ...) c("", table(stringr::str_to_title(x)))

Having defined our render functions, we produce the desired table by passing render_winsorized() as the handler for continuous variables and render_counts() as the handler for categorical variables:

table1(
  x = ~ flipper_length_mm + body_mass_g + sex | species, 
  data = penguins, 
  render.continuous = render_winsorized,
  render.categorical = render_counts
)

	Adelie (N=152)	Chinstrap (N=68)	Gentoo (N=124)	Overall (N=344)
Flipper Length (mm)
Winsorized mean	189.91	195.99	217.23	200.85
Winsorized SD	5.75	6.43	6.38	13.52
Missing	1 (0.7%)	0 (0%)	1 (0.8%)	2 (0.6%)
Body Mass (g)
Winsorized mean	3696.52	3738.68	5074.80	4200.80
Winsorized SD	432.98	336.76	471.74	765.82
Missing	1 (0.7%)	0 (0%)	1 (0.8%)	2 (0.6%)
Sex
Female	73	34	58	165
Male	73	34	61	168
Missing	6 (3.9%)	0 (0%)	5 (4.0%)	11 (3.2%)

Our table is mostly done, but we still don’t have a method for adjusting how the missing data summaries are produced. To do that we need to define one more rendering function and pass it as the render.missing argument:

render_missing <- function(x, ...) c("Missing" = sum(is.na(x)))

table1(
  x = ~ flipper_length_mm + body_mass_g + sex | species, 
  data = penguins, 
  render.continuous = render_winsorized,
  render.categorical = render_counts,
  render.missing = render_missing
)

	Adelie (N=152)	Chinstrap (N=68)	Gentoo (N=124)	Overall (N=344)
Flipper Length (mm)
Winsorized mean	189.91	195.99	217.23	200.85
Winsorized SD	5.75	6.43	6.38	13.52
Missing	1	0	1	2
Body Mass (g)
Winsorized mean	3696.52	3738.68	5074.80	4200.80
Winsorized SD	432.98	336.76	471.74	765.82
Missing	1	0	1	2
Sex
Female	73	34	58	165
Male	73	34	61	168
Missing	6	0	5	11

Unpacking render functions

There’s still a bit of a mystery here, because the table1() function doesn’t have arguments render.continuous, render.categorical, or render.missing: instead, it has a render argument. What’s actually going on here is that the default value for render is the render.default() function exported by table1, and render.default() accepts render.continuous, render.categorical, or render.missing as arguments. In other words, what’s happening in the code above is that my custom functions end up being passed to render.default() via the dots.

There’s nothing to prevent you from bypassing this whole process by writing your own render function that handles all the input variables. For example, here’s a very simple rendering function that counts the number of non-missing observations, and prints it in the same row as the variable name:

render_n <- function(x, ...) sum(!is.na(x))

table1(
  x = ~ flipper_length_mm + body_mass_g + sex | species, 
  data = penguins, 
  render = render_n
)

	Adelie (N=152)	Chinstrap (N=68)	Gentoo (N=124)	Overall (N=344)
Flipper Length (mm)	151	68	123	342
Body Mass (g)	151	68	123	342
Sex	146	68	119	333

I have to confess it took me waaaaay too long to realise that I could do this in table1. True, I don’t often have a need to bypass the render.default() function, but there are definitely times when that’s a handy little bit of functionality. Sigh. Sometimes I’m quite dense.

Table annotations

Follow me down to the river
Drink while the water is clean
Follow me down to the river tonight
I’ll be down here on my knees
–The Pretty Reckless

Okay I’ll admit it. This lyric isn’t really connected to the text. I mean, the video clip is a collection of annotations showing the lyrics, I guess. But honestly I just like the song

Time to switch gears a little. In the previous section I talked about how to customise the statistics that are reported in the table cells. Implicit in this discussion is the fact that a custom render function allows you to customise the row labels associated with each statistic, in exactly the same way that variable labels allow you to customise the variable descriptions that appear in the leftmost column of the table. Taken together, these two mechanisms (render functions and variable labels) give the user a lot of control over what appears in the leftmost column of the table. But what about the header row? How do we customise that in table1?

Strata column labels

To start with, let’s consider the columns that associated with a particular stratum. In the penguins tables I’ve been creating, the strata are defined by the species variable so are three columns that are associated with a specific stratum. By default table1() will add a description for each such column in the header row that contains the category name (e.g., “Gentoo”) and the number of observations that belong to this category. But perhaps we don’t want those sample size numbers? Maybe all we want is the category name. To customise how each stratum is labelled in the header row, the table1() function has an argument called render.strat that takes a function as its value. The strata rendering function takes three arguments: the label is the value in the data that defines that category (e.g., "Gentoo"), n is the number of observations that have been assigned to the category, and transpose is a logical variable indicating whether the table is transposed (more on that later). The output of the function is a string that specifies (as HTML) what should appear in the header row. To illustrate the idea, here’s a very simple stratum rendering function that only prints the category label:

render_strat <- function(label, n, transpose = FALSE) {
  sprintf("<span class='stratlabel'>%s</span>", label)
}

The only subtlety to this render_strat() function is that it outputs some HTML that wraps the label in an HTML span tag and assigns it to a class that we can (and will) use later on to create some fancy styling using CSS. But I’m getting ahead of myself a little. For now, it’s enough to note that render_strat() creates a very simple label that just prints out the category label. Here it is in action:

table1(
  x = ~ flipper_length_mm + body_mass_g | species,
  data = penguins,
  render.strat = render_strat
)

	Adelie	Chinstrap	Gentoo	Overall
Flipper Length (mm)
Mean (SD)	190 (6.54)	196 (7.13)	217 (6.48)	201 (14.1)
Median [Min, Max]	190 [172, 210]	196 [178, 212]	216 [203, 231]	197 [172, 231]
Missing	1 (0.7%)	0 (0%)	1 (0.8%)	2 (0.6%)
Body Mass (g)
Mean (SD)	3700 (459)	3730 (384)	5080 (504)	4200 (802)
Median [Min, Max]	3700 [2850, 4780]	3700 [2700, 4800]	5000 [3950, 6300]	4050 [2700, 6300]
Missing	1 (0.7%)	0 (0%)	1 (0.8%)	2 (0.6%)

Indeed it performs as expected: the sample size annotations are gone and we now have very minimalistic labels for each of the three penguin species. So let’s move along.

Other column labels

There are two other columns that typically appear in a table1 output object: on the left we have column that contains the row labels, and on the right we have a table that contains the descriptive statistics for the “overall” data set where we collapse across all strata. By default, the row label column on the left has no label and the aggregated column on the right uses that label “Overall”. Both of these are customisable in the call to table1(), using the overall and rowlabelhead arguments. An example of this is illustrated below:

table1(
  x = ~ flipper_length_mm + body_mass_g | species,
  data = penguins,
  overall = "All Species",
  rowlabelhead = "Measurement"
)

Measurement	Adelie (N=152)	Chinstrap (N=68)	Gentoo (N=124)	All Species (N=344)
Flipper Length (mm)
Mean (SD)	190 (6.54)	196 (7.13)	217 (6.48)	201 (14.1)
Median [Min, Max]	190 [172, 210]	196 [178, 212]	216 [203, 231]	197 [172, 231]
Missing	1 (0.7%)	0 (0%)	1 (0.8%)	2 (0.6%)
Body Mass (g)
Mean (SD)	3700 (459)	3730 (384)	5080 (504)	4200 (802)
Median [Min, Max]	3700 [2850, 4780]	3700 [2700, 4800]	5000 [3950, 6300]	4050 [2700, 6300]
Missing	1 (0.7%)	0 (0%)	1 (0.8%)	2 (0.6%)

Footnotes and captions

Oh lord, heaven knows, we belong way down below
Oh lord, tell us so, we belong way down below
–The Pretty Reckless

In addition to allowing you to customise the header row, the table1 package also supports the addition of captions and footnotes. Shockingly, it turns out you can specify these with table1() by using the caption and footnote arguments. Both of these take a single string as their input, and you can use HTML tags here. For example, here’s a footer that acknowledges the two packages I’ve relied on most in this post, specifying the package names in boldface:

table1(
  x = ~ flipper_length_mm + body_mass_g | species,
  data = penguins,
  footnote = "Created using <b>table1</b> and <b>palmerpenguins</b>"
)

	Adelie (N=152)	Chinstrap (N=68)	Gentoo (N=124)	Overall (N=344)
Created using table1 and palmerpenguins
Flipper Length (mm)
Mean (SD)	190 (6.54)	196 (7.13)	217 (6.48)	201 (14.1)
Median [Min, Max]	190 [172, 210]	196 [178, 212]	216 [203, 231]	197 [172, 231]
Missing	1 (0.7%)	0 (0%)	1 (0.8%)	2 (0.6%)
Body Mass (g)
Mean (SD)	3700 (459)	3730 (384)	5080 (504)	4200 (802)
Median [Min, Max]	3700 [2850, 4780]	3700 [2700, 4800]	5000 [3950, 6300]	4050 [2700, 6300]
Missing	1 (0.7%)	0 (0%)	1 (0.8%)	2 (0.6%)

Specifying captions is very similar. Here’s a simple example:

table1(
  x = ~ flipper_length_mm + body_mass_g | species,
  data = penguins,
  caption = "Flipper length and body mass by species among the Palmer penguins"
)

Flipper length and body mass by species among the Palmer penguins
	Adelie (N=152)	Chinstrap (N=68)	Gentoo (N=124)	Overall (N=344)
Flipper Length (mm)
Mean (SD)	190 (6.54)	196 (7.13)	217 (6.48)	201 (14.1)
Median [Min, Max]	190 [172, 210]	196 [178, 212]	216 [203, 231]	197 [172, 231]
Missing	1 (0.7%)	0 (0%)	1 (0.8%)	2 (0.6%)
Body Mass (g)
Mean (SD)	3700 (459)	3730 (384)	5080 (504)	4200 (802)
Median [Min, Max]	3700 [2850, 4780]	3700 [2700, 4800]	5000 [3950, 6300]	4050 [2700, 6300]
Missing	1 (0.7%)	0 (0%)	1 (0.8%)	2 (0.6%)

One day, just for one brief moment, I would like to look this cool. It’s never going to happen of course, but a girl can dream

Table structure

So far we’ve discussed how to control what statistics are computed in the table, and in the process I’ve also talked about how to specify the row and columns labels that are associated with the various statistics. I’ve also talked about other marginalia associated with a table, which in table1 is really just the footnote and caption. What I haven’t talked about yet is how to control the structure of the table that gets produced. For the most part, this structure is controlled by the formula that you use to specify the table. When I specify a table like this

~ flipper_length_mm + body_mass_g | species

I will generally get a table that has one column for each unique value of species (the strata), and a block of rows associated with each of the variables (flipper_length_mm and body_mass_g) that supply the relevant descriptive statistics. By default we also get one additional “overall” column that collapses the strata and reports descriptive statistics for the entire data set. Most of the time this is exactly the structure we want, but not always. To that end, table1 allows you to customise the structure in various ways.

Removing “overall”

The simplest way to modify the table structure is to remove the “overall” column that collapses the strata. We can do this using the overall argument. In the last section I showed how to use this argument to change the label associated with this column by passing a string, but if instead we set overall = FALSE, that column will be removed entirely:

table1(
  x = ~ flipper_length_mm + body_mass_g | species,
  data = penguins,
  overall = FALSE
)

	Adelie (N=152)	Chinstrap (N=68)	Gentoo (N=124)
Flipper Length (mm)
Mean (SD)	190 (6.54)	196 (7.13)	217 (6.48)
Median [Min, Max]	190 [172, 210]	196 [178, 212]	216 [203, 231]
Missing	1 (0.7%)	0 (0%)	1 (0.8%)
Body Mass (g)
Mean (SD)	3700 (459)	3730 (384)	5080 (504)
Median [Min, Max]	3700 [2850, 4780]	3700 [2700, 4800]	5000 [3950, 6300]
Missing	1 (0.7%)	0 (0%)	1 (0.8%)

Nested stratifications

Up to this point in the post I’ve used only a single variable to define the strata in the table. At the start of the post I mentioned that you can drop the stratification entirely simply by specifying a one-sided formula like ~ flipper_length_mm + body_mass_g that doesn’t include anything on the right hand side of the | separator. But that’s rarely of interest to us in real world data analysis. The thing we’re more likely to want is a nested stratification, where we define strata based on all unique combinations of two variables. For example, let’s suppose I wanted to compute descriptive statistics for each species of penguin, but for each species the statistics should be computed separately for each island. Happily, the table1 package supports this kind of two-level stratification. All I have to do is write species * island on the right hand side of the separator like so:

table1(
  x = ~ flipper_length_mm + body_mass_g | species * island,
  data = penguins,
  overall = FALSE
)

	Adelie			Chinstrap	Gentoo
	Biscoe (N=44)	Dream (N=56)	Torgersen (N=52)	Dream (N=68)	Biscoe (N=124)
Flipper Length (mm)
Mean (SD)	189 (6.73)	190 (6.59)	191 (6.23)	196 (7.13)	217 (6.48)
Median [Min, Max]	190 [172, 203]	190 [178, 208]	191 [176, 210]	196 [178, 212]	216 [203, 231]
Missing	0 (0%)	0 (0%)	1 (1.9%)	0 (0%)	1 (0.8%)
Body Mass (g)
Mean (SD)	3710 (488)	3690 (455)	3710 (445)	3730 (384)	5080 (504)
Median [Min, Max]	3750 [2850, 4780]	3580 [2900, 4650]	3700 [2900, 4700]	3700 [2700, 4800]	5000 [3950, 6300]
Missing	0 (0%)	0 (0%)	1 (1.9%)	0 (0%)	1 (0.8%)

As it happens, only the Adelie penguins appear on all three islands: the Chinstrap penguins appear only on Dream Island, and the Gentoo penguins appear only on Biscoe Island. So the table here contains three columns for the Adelie penguins, and only one each for the Chinstrap and Gentoo penguins.

There are some limitations to this functionality. You can’t stratify by more than two variables, and the stratification variables cannot contain any missing values. Even so, the functionality is pretty handy, and it is sensitive to the order in which you specify the two stratification variables. If I write island * species rather than species * island, I get this table instead:

table1(
  x = ~ flipper_length_mm + body_mass_g | island * species,
  data = penguins,
  overall = FALSE
)

	Biscoe		Dream		Torgersen
	Adelie (N=44)	Gentoo (N=124)	Adelie (N=56)	Chinstrap (N=68)	Adelie (N=52)
Flipper Length (mm)
Mean (SD)	189 (6.73)	217 (6.48)	190 (6.59)	196 (7.13)	191 (6.23)
Median [Min, Max]	190 [172, 203]	216 [203, 231]	190 [178, 208]	196 [178, 212]	191 [176, 210]
Missing	0 (0%)	1 (0.8%)	0 (0%)	0 (0%)	1 (1.9%)
Body Mass (g)
Mean (SD)	3710 (488)	5080 (504)	3690 (455)	3730 (384)	3710 (445)
Median [Min, Max]	3750 [2850, 4780]	5000 [3950, 6300]	3580 [2900, 4650]	3700 [2700, 4800]	3700 [2900, 4700]
Missing	0 (0%)	1 (0.8%)	0 (0%)	0 (0%)	1 (1.9%)

I’ve found this functionality useful many times in my everyday life.

Adding extra columns

Another kind of “structural” customisation that table1 allows is adding new columns. A common use case for this functionality is to add a column that reports a p-value associated with a particular row. For example, suppose what I wanted my table to do is run a one-way ANOVA for each of the continuous variables in the table, to test to see if the categories have different group means. To handle something like this we’ll again need to write a custom render function that accepts data from all groups – as a list of vectors – and returns a string that should be printed into the relevant cell in the new “p-values” column. This render_p_value() function will do this for me:

render_p_value <- function(x, ...) {
  dat <- bind_rows(
    purrr::map(x, ~ data.frame(value = .)), 
    .id = "group"
  )
  mod <- aov(value ~ group, dat)
  p <- summary(mod)[[1]][1, 5]
  return(scales::label_pvalue()(p))
}

I don’t want to dive into the details of what this function is doing, but if you’re familiar with the standard interface for linear models in R it should look very familiar. If not, here’s the gist: the first part of the code rearranges the list of vectors input into a data frame format, the second part estimates parameters for the model, runs the usual F-test, and extracts the p-value from the output. Finally, it returns the p-value as a prettily-formatted string.

The render_p_value() function is the one we’ll use to render our new column, but for the purposes of this example I’ll also define a custom renderer for the contents of the strata columns as well, so that the only thing it does is report the mean value for the group. You don’t have to do this, I’m only doing it because I want my table to be as simple as possible.

render_mean <- function(x, ...) sprintf("%1.1f", mean(x, na.rm = TRUE))

Now that we have our rendering functions, I can specify one or more additional columns in my table by passing a named list of functions to the extra.col argument (the names are used to specify the column labels). In my case I’m only adding a single extra column with the p-value so my list has only a single function, but it’s not too hard to imagine scenarios where I’d want to add more than one (e.g., maybe I want to report the degrees of freedom associated with my F-test). Anyway, here’s the code:

table1(
  x = ~ flipper_length_mm + body_mass_g | species,
  data = penguins,
  render = render_mean,
  extra.col = list("p-value" = render_p_value)
)

	Adelie (N=152)	Chinstrap (N=68)	Gentoo (N=124)	Overall (N=344)	p-value
Flipper Length (mm)	190.0	195.8	217.2	200.9	<0.001
Body Mass (g)	3700.7	3733.1	5076.0	4201.8	<0.001

Very nice.

Transposing tables

Let me open up the discussion with
I’m not impressed with any motherfucking word I say
See I lied that I cried when he came inside
And now I’m burning a highway to Hades
–The Pretty Reckless

Before I dive a little deeper and talk about table structures that go beyond what you can do with the formula interface to table1() there’s one more thing I should talk about. I don’t actually want to talk about this topic because it makes me cry a little bit every time I encounter it, but I’ll be good and try anyway.

That topic is transposing a table.

Normally when you specify a table with a formula, the stratification is used to create the columns, and the variable list is used to define the rows. You can flip this if you like by setting transpose = TRUE when calling table1(), but in my experience this is a bit messy and often requires a lot of tinkering with your render functions to make the results look good. To that end, here’s a simple rendering function that I’ll use in this example. All it does is compute the mean and standard deviation for a continuous variable:

render_mean_sd <- function(x, ...) {
  m <- mean(x, na.rm = TRUE)
  s <- sd(x, na.rm = TRUE)
  sprintf("%1.1f (%1.1f)", m, s)
}

With the help of this render_mean_sd() function and the simple render_strat() function I defined earlier, here’s an example of a transposed table in which the stratification variable (species) defines the rows, and the four variables on the left are used to define columns:

table1(
  x = ~ bill_length_mm + bill_depth_mm + flipper_length_mm + body_mass_g | species,
  data = penguins,
  transpose = TRUE,
  rowlabelhead = "Species",
  render = render_mean_sd,
  render.strat = render_strat
)

Species	Bill Length (mm)	Bill Depth (mm)	Flipper Length (mm)	Body Mass (g)
Adelie	38.8 (2.7)	18.3 (1.2)	190.0 (6.5)	3700.7 (458.6)
Chinstrap	48.8 (3.3)	18.4 (1.1)	195.8 (7.1)	3733.1 (384.3)
Gentoo	47.5 (3.1)	15.0 (1.0)	217.2 (6.5)	5076.0 (504.1)
Overall	43.9 (5.5)	17.2 (2.0)	200.9 (14.1)	4201.8 (802.0)

It works, and there are definitely use cases for this. But the functionality is limited. It only works for a single stratification level (i.e., you can’t do species * island in this context), and it only looks pretty because my render_mean_sd() function doesn’t contain any labels. Things get messy pretty fast when working with transposed tables, and in all honesty I’ve never been willing to use this functionality in a client project.

Arbitrary stratification

Why’d you bring a shotgun to the party?
–The Pretty Reckless

Up to this point in the post every table I’ve created with table1() has used a formula to specify the basic structure of the output. In real life this is almost always what you want to do, because this “formula interface” is pretty flexible and extremely easy to work with. I’d even go so far as to say that this formula interface is one of the most appealing aspects to the table1 package. However, in point of fact the formula interface that everyone uses is actually a bit of sweet syntactic sugar laid atop a lower-level interface. Very occasionally you encounter a situation where the formula interface isn’t expressive enough to do what you want, and when that happens you have to “bring a shotgun to the party” and use the low level interface which takes a list of data frames (one per stratum) as input.

To motivate the discussion, I’ll give an example of something that I’ve occasionally wanted to do but isn’t possible with the formula interface. Earlier in this post I showed you how to create a nested stratification where we pass two stratification variables in the formula, and the table contains one stratum for every unique combination of the two variables. Sometimes, though, you want to create a table that stratifies by two variables but only shows the marginal stratifications. For example, I might want a table that includes descriptive statistics for each species of penguin, and next to that my table would have descriptive statistics for each sex. In this situation I’m not interested in the cross-tabulation. That is, I don’t care about which species a male penguin belongs to, and I don’t care about the sex of the various Adelie penguins either. I’m interested in species and sex completely independently of each other. The formula interface doesn’t support this kind of stratification, so I’m going to have to do it manually.

Let’s see how this is done. First, I’m going to make a few tweaks to the data that aren’t really very important, but will make my table a little nicer. Specifically, I’ll convert the sex variable to title case so that I get nice labels later, and I’ll convert year to a factor so that table1() treats it as a categorical variable:

penguins <- penguins |>
  mutate(
    sex = stringr::str_to_title(sex),
    year = factor(year)
  )

Now let’s move along to the important step. I’m going to create a new variable called penguins_strata that is a list of data frames.

penguins_strata <- c(
    split(penguins, ~species), 
    split(penguins, ~sex),
    list("All" = penguins)
  )

penguins_strata

$Adelie
# A tibble: 152 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex   
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int> <chr> 
 1 Adelie  Torgersen           39.1          18.7               181        3750 Male  
 2 Adelie  Torgersen           39.5          17.4               186        3800 Female
 3 Adelie  Torgersen           40.3          18                 195        3250 Female
 4 Adelie  Torgersen           NA            NA                  NA          NA <NA>  
 5 Adelie  Torgersen           36.7          19.3               193        3450 Female
 6 Adelie  Torgersen           39.3          20.6               190        3650 Male  
 7 Adelie  Torgersen           38.9          17.8               181        3625 Female
 8 Adelie  Torgersen           39.2          19.6               195        4675 Male  
 9 Adelie  Torgersen           34.1          18.1               193        3475 <NA>  
10 Adelie  Torgersen           42            20.2               190        4250 <NA>  
# ℹ 142 more rows
# ℹ 1 more variable: year <fct>

$Chinstrap
# A tibble: 68 × 8
   species   island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex   
   <fct>     <fct>           <dbl>         <dbl>             <int>       <int> <chr> 
 1 Chinstrap Dream            46.5          17.9               192        3500 Female
 2 Chinstrap Dream            50            19.5               196        3900 Male  
 3 Chinstrap Dream            51.3          19.2               193        3650 Male  
 4 Chinstrap Dream            45.4          18.7               188        3525 Female
 5 Chinstrap Dream            52.7          19.8               197        3725 Male  
 6 Chinstrap Dream            45.2          17.8               198        3950 Female
 7 Chinstrap Dream            46.1          18.2               178        3250 Female
 8 Chinstrap Dream            51.3          18.2               197        3750 Male  
 9 Chinstrap Dream            46            18.9               195        4150 Female
10 Chinstrap Dream            51.3          19.9               198        3700 Male  
# ℹ 58 more rows
# ℹ 1 more variable: year <fct>

$Gentoo
# A tibble: 124 × 8
   species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex   
   <fct>   <fct>           <dbl>         <dbl>             <int>       <int> <chr> 
 1 Gentoo  Biscoe           46.1          13.2               211        4500 Female
 2 Gentoo  Biscoe           50            16.3               230        5700 Male  
 3 Gentoo  Biscoe           48.7          14.1               210        4450 Female
 4 Gentoo  Biscoe           50            15.2               218        5700 Male  
 5 Gentoo  Biscoe           47.6          14.5               215        5400 Male  
 6 Gentoo  Biscoe           46.5          13.5               210        4550 Female
 7 Gentoo  Biscoe           45.4          14.6               211        4800 Female
 8 Gentoo  Biscoe           46.7          15.3               219        5200 Male  
 9 Gentoo  Biscoe           43.3          13.4               209        4400 Female
10 Gentoo  Biscoe           46.8          15.4               215        5150 Male  
# ℹ 114 more rows
# ℹ 1 more variable: year <fct>

$Female
# A tibble: 165 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex   
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int> <chr> 
 1 Adelie  Torgersen           39.5          17.4               186        3800 Female
 2 Adelie  Torgersen           40.3          18                 195        3250 Female
 3 Adelie  Torgersen           36.7          19.3               193        3450 Female
 4 Adelie  Torgersen           38.9          17.8               181        3625 Female
 5 Adelie  Torgersen           41.1          17.6               182        3200 Female
 6 Adelie  Torgersen           36.6          17.8               185        3700 Female
 7 Adelie  Torgersen           38.7          19                 195        3450 Female
 8 Adelie  Torgersen           34.4          18.4               184        3325 Female
 9 Adelie  Biscoe              37.8          18.3               174        3400 Female
10 Adelie  Biscoe              35.9          19.2               189        3800 Female
# ℹ 155 more rows
# ℹ 1 more variable: year <fct>

$Male
# A tibble: 168 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex  
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int> <chr>
 1 Adelie  Torgersen           39.1          18.7               181        3750 Male 
 2 Adelie  Torgersen           39.3          20.6               190        3650 Male 
 3 Adelie  Torgersen           39.2          19.6               195        4675 Male 
 4 Adelie  Torgersen           38.6          21.2               191        3800 Male 
 5 Adelie  Torgersen           34.6          21.1               198        4400 Male 
 6 Adelie  Torgersen           42.5          20.7               197        4500 Male 
 7 Adelie  Torgersen           46            21.5               194        4200 Male 
 8 Adelie  Biscoe              37.7          18.7               180        3600 Male 
 9 Adelie  Biscoe              38.2          18.1               185        3950 Male 
10 Adelie  Biscoe              38.8          17.2               180        3800 Male 
# ℹ 158 more rows
# ℹ 1 more variable: year <fct>

$All
# A tibble: 344 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex   
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int> <chr> 
 1 Adelie  Torgersen           39.1          18.7               181        3750 Male  
 2 Adelie  Torgersen           39.5          17.4               186        3800 Female
 3 Adelie  Torgersen           40.3          18                 195        3250 Female
 4 Adelie  Torgersen           NA            NA                  NA          NA <NA>  
 5 Adelie  Torgersen           36.7          19.3               193        3450 Female
 6 Adelie  Torgersen           39.3          20.6               190        3650 Male  
 7 Adelie  Torgersen           38.9          17.8               181        3625 Female
 8 Adelie  Torgersen           39.2          19.6               195        4675 Male  
 9 Adelie  Torgersen           34.1          18.1               193        3475 <NA>  
10 Adelie  Torgersen           42            20.2               190        4250 <NA>  
# ℹ 334 more rows
# ℹ 1 more variable: year <fct>

As you can see from this lengthy output, the penguins_strata variable is a list of six data frames. There are three data frames corresponding to each of the three species, two data frames corresponding to each unique sex category, and a final data frame that contains the entire data set. Later on when I construct the table, the table1() function will render each of these six data frames into a single column with the appropriate descriptive statistics.

The second input we need is a “labels” list that does two jobs:

Notice that so far I’ve defined my strata, but haven’t specified the variables. The labels list has to do this. It specifies which variables should to be used when computing descriptive statistics. For the purposes of this example, I would like to tabulate the number of penguins (in each stratum) found on each island and the number of penguins observed in each year
In my head, the thing I want to do here is stratify the data separately by each species and by each sex. But the penguins_strata object doesn’t say anything about this. It’s just a flat list of six data frames. It doesn’t “know” that three of these data frames refer to different species and two of them refer to different sexes. For that matter, it doesn’t know that the last data frame isn’t associated with a grouping variable either. So we’ll also need to specify a list of groups that supplies the names of these grouping variables.

In other words, we need something like this:

penguins_labels <- list(
    variables = list(
      island = "Island", # names denote variables, values supply labels
      year = "Year"
    ), 
    groups = list("Species", "Sex", "") # this is a list of labels only
  )

penguin_groups <- c(3, 2, 1) # first three data frames are group 1, etc

At this point we have:

penguins_strata, a variable that contains all the data organised into a list with one data frame per strata
penguins_labels, a list that specifies the variables for which descriptive statistics are requested and the labels that should be assigned to variables and strata groups; and
penguins_groups, a vector that specifies how the strata columns should be grouped

It’s quite a bit of setup work, but having done all the hard parts during the setup, the call to table1() is now very simple:

table1(penguins_strata, penguins_labels, groupspan = penguin_groups)

	Species			Sex
	Adelie (N=152)	Chinstrap (N=68)	Gentoo (N=124)	Female (N=165)	Male (N=168)	All (N=344)
Island
Biscoe	44 (28.9%)	0 (0%)	124 (100%)	80 (48.5%)	83 (49.4%)	168 (48.8%)
Dream	56 (36.8%)	68 (100%)	0 (0%)	61 (37.0%)	62 (36.9%)	124 (36.0%)
Torgersen	52 (34.2%)	0 (0%)	0 (0%)	24 (14.5%)	23 (13.7%)	52 (15.1%)
Year
2007	50 (32.9%)	26 (38.2%)	34 (27.4%)	51 (30.9%)	52 (31.0%)	110 (32.0%)
2008	50 (32.9%)	18 (26.5%)	46 (37.1%)	56 (33.9%)	57 (33.9%)	114 (33.1%)
2009	52 (34.2%)	24 (35.3%)	44 (35.5%)	58 (35.2%)	59 (35.1%)	120 (34.9%)

And there it is. A table with two marginal stratifications and an overall column that provides descriptive statistics for two categorical variables.

Styling tables

The last topic I want to cover in this post is the visual styling of tables produced by table1(). In a moment I’ll show you what a table1 object looks like under the hood, but the short version is that the table structure is specified using HTML, and the visual styling is performed with the help of CSS. Most of the time when you’re using the table1 package you don’t really have to think too much about this, because the package comes with a collection of built-in styles that are usually good enough for a data analyst to use, but sometimes you want to go a little further. So we should talk a little about styling.

Built-in styles

Let’s start with the built-in styles that come for free with the table1 package. As described in the package vignette, if you’re not particularly interested in writing your own CSS code you have these options available to you:

zebra: alternating shaded and unshaded rows
grid: show all grid lines
shade: shade the header in gray
times: use a serif font
center: center all columns

Each of these is associated with a CSS class that has the Rtable1- prefix, e.g., the zebra style corresponds to the Rtable1-zebra CSS class. You can have one of these classes applied to your table using the topclass argument. For instance, here’s a “zebra” style table:

table1(
  x = ~ flipper_length_mm + body_mass_g | species,
  data = penguins,
  topclass = "Rtable1-zebra",
  render = render_mean,
  render.strat = render_strat,
  footnote = "Source: palmerpenguins"
)

	Adelie	Chinstrap	Gentoo	Overall
Source: palmerpenguins
Flipper Length (mm)	190.0	195.8	217.2	200.9
Body Mass (g)	3700.7	3733.1	5076.0	4201.8

Because these built-in styles are all CSS classes, you can apply more than one to your table. For example, if I want a table with zebra-style stripes, a shaded header bar, and text in Times New Roman font, I can specify topclass = "Rtable1-zebra Rtable1-shade Rtable1-times" and get the desired result:

table1(
  x = ~ flipper_length_mm + body_mass_g | species,
  data = penguins,
  topclass = "Rtable1-zebra Rtable1-shade Rtable1-times",
  render = render_mean,
  render.strat = render_strat,
  footnote = "Source: palmerpenguins"
)

	Adelie	Chinstrap	Gentoo	Overall
Source: palmerpenguins
Flipper Length (mm)	190.0	195.8	217.2	200.9
Body Mass (g)	3700.7	3733.1	5076.0	4201.8

Using custom CSS

Signed with the devil
Signed with the devil
Signed with the devil, oh
–The Pretty Reckless

In everyday data analysis work the build-in style classes that come with the table1 package are good enough to create pretty outputs. But sometimes they are not. A client or a journal might want a table to be formatted in a very specific style, and at that point you’re going to have to write your own CSS code. I have a love/hate relationship with CSS. It’s such a powerful tool for styling HTML objects, but somehow it never feels natural to me and I feel like I’m making a pact with dark powers every time I use. Unfortunately I’m at the point in the post where I have to deal with demonic forces. Let’s just hope we all come through this unscathed yeah?

To help with this disussion I’ll start by creating a table, but instead of printing it to the output, I’ll assign it to a variable called tbl. As you can see from the code below, I’ve specified topclass = "mytable" so that I can write some CSS that will be applied only to this table (or, I suppose, any other table that has CSS class mytable, but I’m only going to make one):

tbl <- table1(
  x = ~ flipper_length_mm + body_mass_g | species,
  data = penguins,
  topclass = "mytable",
  render = render_mean,
  render.strat = render_strat,
  footnote = "Source: palmerpenguins"
)

Next, if we want to write some CSS that will target this table, it helps a great deal to be able to see the actual HTML associated with the tbl object. Here it is:

cat(as.character(tbl))

<table class="mytable">
<thead>
<tr>
<th class='rowlabel firstrow lastrow'></th>
<th class='firstrow lastrow'><span class='stratlabel'>Adelie</span></th>
<th class='firstrow lastrow'><span class='stratlabel'>Chinstrap</span></th>
<th class='firstrow lastrow'><span class='stratlabel'>Gentoo</span></th>
<th class='firstrow lastrow'><span class='stratlabel'>Overall</span></th>
</tr>
<tfoot><tr><td colspan="5" class="Rtable1-footnote"><p>Source: palmerpenguins</p>
</td></tr></tfoot>
</thead>
<tbody>
<tr>
<td class='rowlabel firstrow lastrow'>Flipper Length (mm)</td>
<td class='firstrow lastrow'>190.0</td>
<td class='firstrow lastrow'>195.8</td>
<td class='firstrow lastrow'>217.2</td>
<td class='firstrow lastrow'>200.9</td>
</tr>
<tr>
<td class='rowlabel firstrow lastrow'>Body Mass (g)</td>
<td class='firstrow lastrow'>3700.7</td>
<td class='firstrow lastrow'>3733.1</td>
<td class='firstrow lastrow'>5076.0</td>
<td class='firstrow lastrow'>4201.8</td>
</tr>
</tbody>
</table>

This output reveals the CSS class names associated with the specific components of the table. So, let’s suppose that the client has indicated that the table footnote needs to be in italics and – for reasons known but to god – the header text needs to be shown in hot pink. Thanks to the blood magic of CSS nesting, I can write a little snippet of CSS that specifies that for any table of CSS class mytable, the footnote should be in italics and the stratification labels should be shown in hot pink:

.mytable {
  .Rtable1-footnote {
    font-style: italic;
  }
  .stratlabel {
    color: hotpink
  }
}

Under the hood, I have saved this exact CSS snippet to a tiny stylesheet that is imported within this post, so when I print out the tbl object I get the desired result:

tbl

	Adelie	Chinstrap	Gentoo	Overall
Source: palmerpenguins
Flipper Length (mm)	190.0	195.8	217.2	200.9
Body Mass (g)	3700.7	3733.1	5076.0	4201.8

Another day, another encounter with CSS that I have survived. I will take the victory.

Epilogue

For the ways that I hurt when I’m hiking up my skirt
For the man that I hate I’m going to hell
–The Pretty Reckless

There’s a lot I’m not saying in this post. There’s a lot of hidden detail in the table1 package, and additional tricks that you can deploy to make it work to your advantage. But a post has to end somewhere and besides, if you’ve hit the point where the tools I’ve talked about in this post can’t solve your specific problem you’re probably at the point where table1 is the wrong fit.

It’s never wise to try to use force.

Footnotes

It always strikes me as a bitter failure of public policy that when someone falls sick, their first thought is always something along the lines of “can I still work?” Very few people actually love their jobs so much that they want to work through a serious illness, but the fear that the company will discard you the moment something bad happens is built into our society at a low level. If you’re not dead you work. Because capitalism.↩︎
The package source code is on github, and the package vignette provides a lot of useful detail that you can’t necessarily find by browsing the help files.↩︎
The output of a call to table1() has S3 class “table1”, and internally specifies an HTML table (more on that later). When printed in a quarto or R markdown document like this one, in the normal course of events the table1:::knit_print.table() method is called, in which case the table1 object is coerced to a data frame and the end result looks the same as a data frame would look when knitr::kable() is called. However, this is slightly different to how the table looks if you call it interactively in an R session where the S3 method called is table1:::print.table1(). Because I want the output in this post to look as close as possible to the typical output when calling table1() in a regular R session, I’ve set results = "asis" for all my code chunks in this document, thereby ending up with tables that look the same as the ones you see interactively in the R session.↩︎
The stratification variable (i.e. species) isn’t actually necessary to create a table, and if you wanted you could produce a table using a formula like ~ island + bill_length_mm. In practice, however, I’ve found that I never do this: almost every table I’ve created in real life has a stratification variable.↩︎
The table1 package also supports units as a separate piece of metadata via the units() function, but I have to admit I never really use that one.↩︎
If you’re a foolish person like I am you can also dig into the source code to find the answer, because why would I be smart and read the package vignette before reading the source code?↩︎
If you want a more precise answer, you can use a command like parse.abbrev.render.code("GMEAN") to return the actual function that is executed whenever a "GMEAN" is computed during the table rendering process.↩︎
If you’re curious as to why I’m extracting the first element of the output in this code, try playing around with stats.default() and looking at the differences between how the output is structured for continuous versus categorical inputs.↩︎

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{navarro2024,
  author = {Navarro, Danielle},
  title = {Making Tables in {R} with Table1},
  date = {2024-06-21},
  url = {https://blog.djnavarro.net/posts/2024-06-21_table1/},
  langid = {en}
}

For attribution, please cite this work as:

Navarro, Danielle. 2024. “Making Tables in R with Table1.” June 21, 2024. https://blog.djnavarro.net/posts/2024-06-21_table1/.