env <- rlang::env(x = 100)
get("x", envir = env)[1] 100
May 9, 2025
Non-Standard Evaluation (NSE) is not just a curiosity — it’s a core part of modern R programming, especially for package developers who want to:
dplyr::filter(mpg > 20))To use NSE effectively, you need more than quote() or substitute(). You need tools that work inside functions, play well with environments, and support user-friendly interfaces. That’s where base tools like get() and modern tools from {rlang} come in.
Let’s start with something simple — but deceptive: using get() to evaluate code in a different environment.
get()get() is a base R function that retrieves the value bound to a name from an environment. It’s equivalent to doing env$name, but with more flexibility:
This looks simple. But it only works if you already know the name as a string.
If you try:
You’ll get an error: get() doesn’t evaluate expressions — just single symbols (variable names). This limitation is why get() is only useful in very narrow NSE contexts.
get() Is Not EnoughSuppose you write a generic logging function like this:
It works:
But now try:
This fails — because "x + 1" is not a symbol. It’s a string that represents an expression, and get() can’t parse or evaluate it.
eval() Instead of get()To evaluate expressions, we need eval():
This works because eval() knows how to process structured expressions (not just names) and can recursively resolve variables.
Now wrap this in a function:
log_eval_expr <- function(expr, env = parent.frame()) {
val <- eval(expr, envir = env)
cat("Expression", deparse(expr), "evaluated to", val, "\n")
}
x <- 5
log_eval_expr(quote(x + 1))Expression x + 1 evaluated to 6
Takeaway: Use get() when you have a name, use eval() when you have a language object (i.e., an expression).
substitute()You may wonder why we are not using substitute in this function as we learned in the previous post? Good question. You need substitute() when you’re writing a function that receives user-typed code (unevaluated), and you want to capture the expression itself — before R evaluates it.
Example:
Here, expr is just a placeholder. Without substitute(), R evaluates it before the function body runs, so the function can’t recover the original code.
If you’re writing a function where expr is already an expression (e.g., something like expr <- quote(a + b) or expr <- rlang::expr(a + b)), then you don’t want substitute() — because the expression is already captured.
For example:
Here, code is already a proper unevaluated call object because it has been capture with quote(). So, you don’t need to use substitute() on eval() (as in eval(sustitute(code))) because it would just return the symbol code, not the inner expression (similar to what happen when you use quote() inside a function).
get() and assign() for Controlled EvaluationIn some cases, we don’t just want to evaluate expressions or capture what the user typed — we want to manipulate variables by name: retrieve their values from a particular environment, or assign new ones dynamically. That’s exactly what get() and assign() let us do.
These tools operate on variable names as strings, which makes them incredibly flexible — and also risky if not used carefully. In this section, we’ll break down both functions and show how they interact with environments and evaluation.
get() — Look Up a Variable by NameThe base R function get() retrieves the value of a variable, given its name as a string, and optionally, an environment in which to look.
It’s equivalent to just writing x, but you can control where to search:
If x is not found, get() walks up the chain of parent environments, just like R’s normal variable resolution.
get_and_print <- function(name, env = parent.frame()) {
if (!exists(name, envir = env)) stop("Boo! it is not here")
val <- get(name, envir = env)
print(val)
}
env2 <- rlang::env(y = 2)
x <- 10
get_and_print("x") # 10[1] 10
[1] 100
[1] 10
Error in get_and_print("zz", env = env2): Boo! it is not here
Suppose you want to log both the name of the variable the user passed, and its value, without evaluating the entire expression. This is the same that we saw in the previous post with eval() but instead of having an expression, we have a the name of a variable.
log_value <- function(varname) {
name <- substitute(varname) # Capture the name (unevaluated)
var_str <- deparse(name) # Deparse to string
# Lookup the value
value <- get(var_str, envir = parent.frame())
cli::cli_inform("Variable {.code {var_str}} has value: {value}")
}
score <- 88
log_value(score)Variable `score` has value: 88
This is cleaner and safer than passing varname directly to get(), which only receives an object name (given as a character string or a symbol). The reason we need to use substitute() and deparse() is that otherwise, this code would evaluate to the value before we can inspect it.
assign() — Create or Modify a Variable by NameNow suppose you want to set a variable dynamically. assign() does the opposite of get() — it takes a name as a string and gives it a value:
You can also specify the environment where the variable should be created or updated:
This is useful in programmatic pipelines, custom data transformations, or internal helpers where variable names are passed as arguments.
This gives you total control over naming, assignment, and lookup — useful for package internals, simulations, or even domain-specific languages (DSLs).
get() and assign() allow for string-based variable manipulation.substitute(), you can safely bridge symbolic expressions and string-based evaluation.{rlang} Equivalents: env_get(), env_poke(), and FriendsIf you’re writing packages or advanced tools, it’s usually better to avoid base R’s get() and assign() in favor of {rlang}’s environment manipulation functions, which are:
rlang::env_get(): Safer Alternative to get()You can also provide a default value if the variable is missing:
Error in `rlang::env_get()`:
! Can't find `y` in environment.
[1] NA
[1] "only in global"
By default, rlang::env_get() does not walk up the parent environments — unlike get(). If you want it to, use:
get_and_print_rlang <- function(name,
env = parent.frame(),
inherit = FALSE) {
if (!exists(name, envir = env)) stop("Boo! it is not here")
val <- rlang::env_get(env = env,
nm = name,
inherit = inherit)
print(val)
}
env2 <- rlang::env(y = 2)
x <- 10
get_and_print("x") # 10[1] 10
[1] 100
[1] 10
Error in `rlang::env_get()`:
! Can't find `x` in environment.
[1] 10
rlang::env_poke(): Replacement for assign()env_poke() sets a binding in an environment:
For simple code, rlang::env_poke() and assign() are practically the same. Yet, the former is clearer and more explicit.
env_poke() tells you exactly what it’s doing: modifying the environment e by assigning a value to the name "z". By contrast, assign("z", 100, envir = e) looks more like a string operation than an environment manipulation.
{rlang} treats environments as first-class mutable data structures, much like lists. So, env_poke(e, "z", 100) reads like “poke the value 100 into environment e at key "z"”, making the analogy to lists and dictionaries clearer:
{rlang} pairs env_poke() with other tools like env_get(), env_has(), env_names(), etc. They form a coherent vocabulary.assign() has quirks (e.g., it doesn’t behave well with missing arguments, or nested scoping) and lacks symmetry with get() in some edge cases.env_poke() always operates locally and predictably. assign() may accidentally modify variables in environments higher up the chain if you’re not careful with the envir argument.{rlang}’s environment tools never touch parent environments unless you explicitly ask them to.env_poke() reads cleanly because the first argument is the environmentenv_has() and env_unbind()Want to check if a variable exists?
Remove a variable:
{rlang} Instead of Base R?| Task | Base R | {rlang} |
Benefit |
|---|---|---|---|
| Get value | get("x", envir = e) |
env_get(e, "x") |
No surprises, inherits only if you want |
| Set value | assign("x", val, envir = e) |
env_poke(e, "x", val) |
Cleaner and safer |
| Has binding? | exists("x", envir = e) |
env_has(e, "x") |
Vectorized and clear |
| Remove binding | rm("x", envir = e) |
env_unbind(e, "x") |
Safer removal |
When writing user-friendly R packages — especially those involving modeling, plotting, or data manipulation — it’s not enough to evaluate expressions in a custom environment. Often, you want to evaluate expressions as if columns of a data frame were variables, the way dplyr::filter() and ggplot2::aes() do it.
A data mask is an environment that makes the columns of a data frame behave like variables. In practice, it means users can write expressions like mpg > 25 instead of df$mpg > 25, and your function will still understand what they meant.
The data mask lets you evaluate those expressions in a way that prioritizes the columns of the data frame while still allowing access to other objects from parent environments.
This is where data-masked evaluation comes in, and {rlang} provides the perfect tool for this: rlang::eval_tidy().
But first, we need to understand the difference between {rlang} and base R
enquo() vs. substitute(): Tidy Capture vs. Base CaptureIn base R, you use substitute(expr) to capture the unevaluated expression passed to a function. This gives you access to what the user typed, not the result of evaluation:
However, substitute() has no built-in way to capture quosures — expressions plus their environment. This is where rlang::enquo() comes in.
<quosure>
expr: ^x + 1
env: global
substitute() gives you a raw expression.rlang::enquo() gives you a quosure: an expression and the environment where it was typed.This matters for tidy evaluation, where functions need to know not just what was written, but also where to evaluate it — especially if variables can be found in different environments.
eval_tidy() vs. eval(): Masked vs. Regular EvaluationIn base R, eval(expr, envir) simply evaluates expr using the environment envir — no special treatment. But it can fail if the expression requires variables from both the data and the calling environment.
In contrast, rlang::eval_tidy(expr, data, env) evaluates expr in a data mask: a special layered environment where:
data (e.g. column names or list elements),This allows expressions to blend data variables and contextual variables naturally — just like in dplyr, ggplot2, or purrr.
data <- list(x = 1:5)
env <- rlang::env(threshold = 3)
expr <- quote(x > threshold)
# This works: x from data, threshold from env
right <- rlang::eval_tidy(expr, data = data, env = env)
# This fails: base R's eval() only sees data, not env
# and threshold is not in global
eval(expr, envir = data)Error in eval(expr, envir = data): object 'threshold' not found
# Yet, if we had a threshold in global, we would get
# the wrong results.
threshold <- 2
wrong <- eval(expr, envir = data)
all.equal(right, wrong)[1] "1 element mismatch"
# If a variable exists in data, it's used first. Only if it’s not in data,
# eval_tidy() will look in env
data <- list(x = 1:5,
threshold = 4)
rlang::eval_tidy(expr, data = data, env = env)[1] FALSE FALSE FALSE FALSE TRUE
Here, eval() can’t see threshold, because it’s not in data nor in env. But eval_tidy() builds a layered environment where both x (from data) and threshold (from env) are visible. Notice that if threshold is in data then it is evaluated there, not in env.
eval_tidy() Makes It EasyLet’s say we want to write a filtering function that behaves like dplyr::filter() — letting users refer to columns and outside variables naturally.
Now this works:
mpg cyl disp hp drat wt qsec vs am gear carb
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
mpg cyl disp hp drat wt qsec vs am gear carb
Duster 360 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
Ford Pantera L 15.8 8 351 264 4.22 3.170 14.50 0 1 5 4
Maserati Bora 15.0 8 301 335 3.54 3.570 14.60 0 1 5 8
Inside eval_tidy(), {rlang} creates a data mask where:
mpg is found in data, not in the global env.threshold is found in the parent frame (where the function was called).eval()?You might wonder: does eval(expr, envir = data) already do what we need?
Sometimes, yes — for simple expressions that only reference columns, it works fine:
my_filter_eval <- function(data, expr) {
expr <- substitute(expr)
rows <- eval(expr, envir = data)
data[rows, ]
}
my_filter_eval(mtcars, mpg > 25) mpg cyl disp hp drat wt qsec vs am gear carb
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
mpg cyl disp hp drat wt qsec vs am gear carb
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
This also works — but only because eval() implicitly builds an environment from data whose parent is the calling frame. That’s why threshold is still visible.
So what’s the issue?
eval() and eval_tidy() Behave Differently: Data Masking in ActionSo far, we’ve seen that eval() works fine when evaluating simple expressions in a user-provided environment. But what happens when there are multiple environments involved — e.g. the global environment, the function environment, and the data environment?
Let’s see an example where the same symbol exists in three different scopes:
Now we define a function where fruit is redefined again:
Let’s call it with and without data:
When data is provided, eval_tidy() finds fruit in the data mask and returns "banana".
When no data is provided, it looks in the quosure’s environment, that is the env of expr, which is the global environment — so it returns "apple".
Now try the same with eval():
This what happens under the hood.
Here’s the difference:
substitute(expr) captures the symbol fruit.
eval(expr, envir = data) creates an environment (if data is a list) whose parent is the current frame, i.e. the body of with_data_eval().
So when data is NULL, the lookup falls back to the local fruit <- "avocado", not the global "apple".
eval() is frame-dependent and unpredictable when multiple layers of scoping are involved.
eval_tidy() uses a clean and layered lookup system:
.data.env (the quosure’s environment)This guarantees consistent and predictable behavior for user-written expressions.
get() and FriendsSo far, we’ve worked with static or captured expressions. But what if you need to look up a variable by name, programmatically? This is where get(), as.name(), and their {rlang} equivalents come in.
get() – Retrieve an Object by NameThe base R function get() retrieves the value of a variable by string name, from a specified environment.
This returns 100, because "x" is resolved in the global environment by default (env = parent.frame()).
You can change the lookup environment explicitly:
This kind of lookup is useful when you’re writing generic tools that receive the name of a variable and need to fetch its value dynamically.
as.name() – Build a Symbol from a StringSometimes you want to construct an expression from variable names provided as strings. as.name() (or the back tick syntax `var`) turns a string into a symbol — the building block of an R expression:
x
var
[1] 101
[1] "x"
quote(var) captures the symbol var literally — it doesn’t evaluate var to get “x”. That’s why we need as.name(var).
This is how you build dynamic expressions:
hp > 200
[1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
[25] FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE
This evaluates hp > 200 inside mtcars.
{rlang}: sym() and parse_expr()While base R tools like as.name() and parse() work well for simple metaprogramming, {rlang} provides cleaner, safer, and more composable alternatives: sym() and parse_expr().
sym() – Safer Alternative to as.name()rlang::sym() turns a character string into a symbol, just like as.name(), but is designed to integrate seamlessly with {rlang}’s metaprogramming toolkit.
Like as.name(), this converts the string "x" into the symbol x, which can be evaluated in the usual way.
You can also use it inside substitute() or bquote():
threshold <- 200
var <- "hp"
expr <- substitute(v > t, list(v = rlang::sym(var),
t = threshold))
exprhp > 200
[1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
[25] FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE
At this point,
sym()is just a drop-in replacement foras.name()— but when we move to building more advanced programmatic expressions, its advantages will become more evident.
parse_expr() – Cleaner Alternative to parse(text = ...)In base R, parse() turns a string into an expression list, which is slightly awkward:
[1] 1
hp > 200
[1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
[25] FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE
rlang::parse_expr() returns a single expression, not a list, so it’s cleaner and more consistent:
hp > 200
[1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
[25] FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE
You get the same result, with fewer surprises and better compatibility with {rlang} tools down the road.
!! and expr() – A Better Way to Inject Variables into ExpressionsIn base R, if you want to create an expression like hp > 200 programmatically, you need to use substitute() and build a list of values.
Suppose you want to write a function that filters a data frame based on a variable and a cutoff — but both are provided as arguments:
filter_by_name_base <- function(data, varname, cutoff) {
var_sym <- as.name(varname)
expr <- substitute(v > t, list(v = var_sym, t = cutoff))
rows <- eval(expr, envir = data)
data[rows, ]
}
filter_by_name_base(mtcars, "hp", 200) mpg cyl disp hp drat wt qsec vs am gear carb
Duster 360 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
Ford Pantera L 15.8 8 351 264 4.22 3.170 14.50 0 1 5 4
Maserati Bora 15.0 8 301 335 3.54 3.570 14.60 0 1 5 8
This works — but becomes brittle if data isn’t a proper environment, or if the expression becomes more complex.
{rlang} Version of the filter_by_name()rlang::expr() lets you build expressions just like writing them by hand. And !! (called bang-bang) lets you inject programmatic values directly into those expressions.
filter_tidy <- function(data, varname, cutoff) {
var_sym <- rlang::sym(varname)
expr <- rlang::expr(!!var_sym > !!cutoff)
rows <- rlang::eval_tidy(expr, data)
data[rows, ]
}
filter_tidy(mtcars, "hp", 200) mpg cyl disp hp drat wt qsec vs am gear carb
Duster 360 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
Ford Pantera L 15.8 8 351 264 4.22 3.170 14.50 0 1 5 4
Maserati Bora 15.0 8 301 335 3.54 3.570 14.60 0 1 5 8
Both functions return the same result — but the {rlang} version is cleaner, composable, and plays well with the rest of the tidyverse metaprogramming tools:
substitute()list()!!! – Splicing Multiple Arguments into an ExpressionIf !! lets you inject a single value or symbol into an expression, then !!! lets you inject a list of values — as if you’d written them out one by one.
This is called unquote-splicing, and it’s especially useful when building calls with a variable number of arguments.
In base R, if you want to programmatically build mean(x, na.rm = TRUE), you’d do something like this:
mean(x = x, na.rm = TRUE)
[1] 1.5
It works — but it’s awkward and hard to read.
{rlang} with !!!With rlang, this becomes more readable:
mean(x = x, na.rm = TRUE)
[1] 1.5
!!! splices the contents of the list into the call to mean().Non-Standard Evaluation gives you tremendous power — but with power comes the responsibility to handle user input safely. NSE functions often delay evaluation, manipulate environments, or work with variable names as symbols. That means things can fail in subtle ways.
In this section, we’ll learn how to write robust, user-friendly functions that:
Suppose you write a dynamic filter like this:
bad_filter <- function(data, var, cutoff) {
expr <- rlang::expr(!!rlang::sym(var) > !!cutoff)
rows <- rlang::eval_tidy(expr, data)
data[rows, ]
}
bad_filter(mtcars, "mpg", 25) # Works mpg cyl disp hp drat wt qsec vs am gear carb
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Error: object 'not_a_column' not found
This is hard to debug for users because it is not telling that not_a_column is not part of mtcars.
Let’s check that var exists in the data:
safe_filter <- function(data, var, cutoff) {
if (!var %in% names(data)) {
rlang::abort(
message = glue::glue("Variable '{var}' not found in data."),
class = "invalid_column"
)
}
expr <- rlang::expr(!!rlang::sym(var) > !!cutoff)
rows <- rlang::eval_tidy(expr, data)
data[rows, ]
}
safe_filter(mtcars, "gear", 4) mpg cyl disp hp drat wt qsec vs am gear carb
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.7 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.5 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.5 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.6 0 1 5 8
Error in `safe_filter()`:
! Variable 'not_a_column' not found in data.
rlang::abort() gives you structured errors, which can be caught or logged. This is something that we learn in a previous post.
Sometimes you want your function to fall back to a default variable if none is provided.
default_filter <- function(data, var = NULL, cutoff = 0) {
var <- if (is.null(var)) {
"mpg" # fallback to "mpg" if NULL
} else {
var
}
if (!var %in% names(data)) {
rlang::abort(
message = glue::glue("Variable '{var}' not found in data."),
class = "invalid_column"
)
}
expr <- rlang::expr(!!rlang::sym(var) > !!cutoff)
rows <- rlang::eval_tidy(expr, data)
data[rows, ]
}
default_filter(mtcars, "gear", 4) mpg cyl disp hp drat wt qsec vs am gear carb
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.7 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.5 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.5 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.6 0 1 5 8
Error in `default_filter()`:
! Variable 'not_a_column' not found in data.
mpg cyl disp hp drat wt qsec vs am gear carb
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Wrap the evaluation in tryCatch() and throw helpful messages:
robust_filter <- function(data, var, cutoff) {
expr <- rlang::expr(!!rlang::sym(var) > !!cutoff)
tryCatch(
{
rows <- rlang::eval_tidy(expr, data)
data[rows, ]
},
error = function(e) {
rlang::abort(
glue::glue("Filtering failed for column '{var}':
{e$message}"),
class = "filter_error"
)
}
)
}
robust_filter(mtcars, "gear", 4) mpg cyl disp hp drat wt qsec vs am gear carb
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.7 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.5 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.5 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.6 0 1 5 8
Error in `value[[3L]]()`:
! Filtering failed for column 'not_a_column':
object 'not_a_column' not found
When writing user-facing functions, it’s often helpful to accept both:
var = mpg (unquoted, NSE)var = "mpg" (quoted, SE)This makes your function more intuitive and versatile — just like library("dplyr") and library(dplyr) both work.
To support this, we’ll combine rlang::enquo() (to capture NSE input) with logic to convert strings to symbols when needed.
smart_filter <- function(data, var, cutoff) {
# Capture the input
quo <- rlang::enquo(var)
expr <- rlang::quo_get_expr(quo)
# Determine symbol from input
if (rlang::is_symbol(expr)) {
var_sym <- expr
} else if (rlang::is_string(expr)) {
var_sym <- rlang::sym(expr)
} else {
rlang::abort("`var` must be a column name (unquoted) or a string.")
}
# Check if column exists
if (!as.character(var_sym) %in% names(data)) {
rlang::abort(glue::glue("Column '{var_sym}' not found in data."))
}
# Build and evaluate the expression
expr <- rlang::expr(!!var_sym > !!cutoff)
rows <- rlang::eval_tidy(expr, data)
data[rows, ]
} mpg cyl disp hp drat wt qsec vs am gear carb
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
mpg cyl disp hp drat wt qsec vs am gear carb
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
All of these return the same filtered data frame.
This function allows the user to pass the column name as a bare symbol (e.g. mpg) or as a string (e.g. "mpg"), just like many tidyverse functions do.
How does it work?
rlang::enquo(var) captures the input as a quosure — preserving both the expression and its environment.
rlang::quo_get_expr() retrieves the raw expression typed by the user.
We test:
mpg), we use it directly."mpg"), we convert it to a symbol using rlang::sym().We then build an expression with rlang::expr(!!var_sym > !!cutoff) and evaluate it in the data context using rlang::eval_tidy().
This gives you the best of both worlds: intuitive NSE behavior and robust programmatic support.
When writing metaprogramming-heavy functions—especially those using tidy evaluation—defensive coding is essential. This means writing functions that:
Below we explore key techniques to bulletproof your NSE functions.
Programming is about manipulating values.
Metaprogramming is about manipulating the code that manipulates values.
Use rlang::is_symbol(), is_call(), is_quosure(), etc., to check exactly what kind of object you’re working with.
safe_summary <- function(data = NULL, expr) {
expr <- rlang::enquo(expr)
if (!rlang::is_call(rlang::get_expr(expr))) {
rlang::abort("`expr` must be a function call like `mean(x)` or `sum(x)`.")
}
result <- rlang::eval_tidy(expr, data = data)
result
}
# Fails early with a clean message
safe_summary(mtcars, mpg)Error in `safe_summary()`:
! `expr` must be a function call like `mean(x)` or `sum(x)`.
[1] 20.09062
.env and .data Pronouns for ClarityIf your function uses both external variables and a data mask, ambiguity can arise. The solution is to use pronouns to avoid variable name collisions:
safe_mean <- function(data, var) {
var <- rlang::enquo(var)
rlang::eval_tidy(rlang::expr(mean(.data[[!!var]])), data = data)
}
safe_mean(mtcars, mpg)[1] 20.09062
This ensures you’re only using the data column, even if the same name exists in .env.
rlang::abort()You can make your error messages contextual by using:
caller_env() to find where the error occurredcaller_call() to name the offending functioncaller_call() to find user’s functionImagine you’re writing a helper function, but you want the error message to point back to the user-facing function that called it, not the helper itself.
Now try calling my_summarize() incorrectly:
Notice that the error is pointing to my_summarize(), not to validate_var() even though the error is triggered there. This is useful if you need to point the user to the function they called — not the internal one that failed.
caller_env() to Evaluate in the User’s EnvironmentImagine you’re writing a helper function that receivesa expression (not a string), and you want to evaluate the value of that variable — but in the environment of the user-facing function:
# Helper that evaluates an expression in the caller's environment
resolve_expr <- function(expr_quo) {
expr_raw <- rlang::get_expr(expr_quo)
caller_env <- rlang::caller_env()
val <- rlang::eval_tidy(expr_raw, env = caller_env)
cli::cli_inform("Resolved expression {.code {deparse(expr_raw)}} to {.val {val}}")
cli::cli_inform("Evaluated in environment: {.emph {rlang::env_label(caller_env)}}")
val
}
# Now use it in a user-facing wrapper:
my_resolver <- function(var) {
var_quo <- rlang::enquo(var)
x <- 3
resolve_expr(var_quo)
}Try it out: