Why is "cmd | xargs" to join lines of cmd with spaces wrong

Question

We often see code such as:

cmd | xargs

or even:

cmd2 $(cmd | xargs)

In an attempt to join the lines of cmd with spaces (and in the second case so as to construct a command line for cmd2) or to trim spaces.

It is however wrong in the general case. Why? What can I do instead?

It never actually occurred to me to use xargs this way. Still seems like a useful FAQ, though. — Karl Knechtel
– Karl Knechtel, Commented 2 days ago

Jeff Schaller · Accepted Answer · 2025-12-07 15:10:29Z

xargs (cross args) is the tool that takes words on stdin and runs a command (echo by default) with those words as arguments, running that command as many times as necessary. It's not a tool designed to do text processing.

Using cmd | xargs to join lines of cmd's output with spaces is wrong for several reasons:

xargs doesn't split its input on lines¹ but using a complex parsing similar to that of the syntax of the Mashey shell from the mid-70s, to which it is contemporary.
echo is a clunky and very non-portable tool that can't be used for arbitrary data.
echo may be run several times meaning not all the words will be joined with spaces.
The execve() system call and the xargs utility have a number of limitations.
cmd2 $(cmd | xargs) adds a few more layers of wrongness.

1. xargs' splitting

xargs was first introduced in PWB Unix in the mid-70s, where the Mashey shell, also called PWB shell was also developed.

So it parses its output similarly to how the Mashey shell would tokenise its syntax back then.

In particular, it handles quoting and backslash escaping the same way as the Mashey shell, which is different from the syntax of any modern shell.

As an example, an input such as:

  "A\" B\ C 'D\' E\
F _ G

is split into:

A\ (backslash not special within double quotes)
B C
D\
E<newline>F (backslash being the only way to escape a newline)

And with some xargs implementations, also:

_
G

As _ is the default end-of-file string in the original implementation and still in some implementations today as allowed by POSIX.

And the result is passed as separate arguments to echo which prints them space separated.

So it should be obvious from the above that it can't join arbitrary lines.

And if used for whitespace trimming, it will also mangle its input if it contains quotes or backslashes or possibly _ or return errors if there are unmatched 's or "s.

Example:

$ xargs <<< "  It's wrong"
xargs: unmatched single quote; by default quotes are special to xargs unless you use the -0 option
$ xargs <<< "  It's still wrong,    isn't   it? "
Its still wrong,    isnt it?

(note the apostrophes are gone, the spacing between wrong, and isn't is preserved as single-quoted, but not the one between isn't and it).

2. `echo` can't be used for arbitrary data.

xargs runs echo by default.

That's generally the standalone echo executable² which it runs in a separate process, which may behave differently from the echo builtin of your shell.

Why it can't be used to output arbitrary data is explained at great length at Why is printf better than echo?. Some examples:

$ xargs <<< -Eene
$

On my system, echo (here /bin/echo which xargs finds through a search in the directories of $PATH) is not UNIX compliant by default in that it accepts options. That -Eene line of input ends up taken as an option by echo, and the -n one tells it to not output a newline which explains why I get no output at all, not even an empty line.

That particular implementation of echo can be told to be UNIX compliant by passing a POSIXLY_CORRECT variable in the environment (with any value)

$ POSIXLY_CORRECT=please xargs <<< -Eene
-Eene

But a UNIX-compliant echo is also required to process backslash sequences:

$ POSIXLY_CORRECT=please xargs <<< ' c:\\users\\chris '
c:\users

(I had to double the \s to work around xargs treating them as escape operator from point 1 above).

Here echo is passed a c:\users\chris argument, but for a standard echo, \c is the instruction for it to stop outputting (\c being one of many escapes that it supports, the list of which varying with the implementation³).

Your mileage may vary depending on which echo implementation xargs finds on your system.

3, 4. `xargs` may run the command several times

On most systems, the execve() system call that is used to execute a command with an array of arguments and another array of environment variables has a limit on the cumulative size of those arrays.

And that's where xargs can be useful in its normal usage: xargs cmd will run cmd as many times as necessary to process the list of words on input, working around that execve() system call limitation.

That means however, that to join lines, that fails as soon as there are more than a few words to join. And where the list is split depends on the xargs implementation, the size of the environment, the OS, the version thereof, also the stacksize limit value on modern versions of Linux.

For example on Ubuntu 24.04 here:

$ seq 100000 | gnu/xargs | wc -l
5
$ seq 100000 | busybox xargs | wc -l
20
$ seq 100000 | toybox xargs | wc -l
8
$ seq 100000 | ast-open/xargs | wc -l
1
$ seq 100000 | heirloom/xargs | wc -l
13

And that gives different values after export VAR{1..1000}=$(printf %01900d) or ulimit -s 1024.

Command line arguments are NUL-delimited strings so cannot contain NUL bytes.

On Linux, a single command line argument cannot be larger than 128KiB.

$ printf %0200000d | xargs
xargs: argument line too long

xargs expects text on input (and is meant to honour the locale to determine whether a character is classified as blank when deciding whether to treat it as separator), so may choke if some lines of input are larger than LINE_MAX bytes (which POSIX allows to be as low as 1024) or contains NULs (which it can't pass to echo anyway as seen above) or cannot be decoded as text as per the locale's encoding (LC_ALL=C would help for that last case).

5. the case of `cmd2 $(cmd | xargs)`

We often see this type of usage by people coming from the Microsoft world. On Microsoft OSes, you execute commands by passing them a single string which the application interprets, usually by splitting it on blanks, often interpreting some form of quoting, sometimes doing globbing themselves.

But on Unix-like systems, executed commands are directly passed a list of arguments (the one discussed above for the execve() system call).

It's the job of the shell to take a command line, which effectively is code in the syntax of the shell language and invoke commands with an array of arguments resulting of the parsing of that shell code / command line.

So doing the concatenation with space here is doing things backward, and more like what you'd do (unreliably) on Microsoft OSes.

What you want here is cmd2 to be passed the lines of cmd's output as separate arguments. As it happens $(cmd) gets the output of cmd and also splits it (and btw that has nothing to do with the parsing of the shell language into whitespace separated tokens), not on spaces, but on characters of $IFS, and by default, $IFS happens to contain both space and newline.

So, here using xargs in a broken attempt to join the lines with spaces is pointless given that $(...) will split on newline just as well as on spaces by default.

Here, rather, you'd want to make sure $IFS contains only the newline character, and you'd want to disable globbing which is the other side effect of leaving $(...) unquoted (except in zsh):

IFS=$'\n'
set -o noglob
cmd2 $(cmd)

Or use dedicated operators of your shell to get the lines of the output of cmd such as

zsh
```
cmd2 ${(f)"$(cmd)"}
```

bash

readarray -t lines < <(cmd)
cmd2 "${lines[@]}"

More correct ways to join lines with spaces

That's the typical job for the self-pasting mode of the paste utility:

cmd | LC_ALL=C paste -s -d ' ' -

(note the - which signifies stdin and is required by POSIX, though some implementations allow you to omit).

That will work regardless of the lengths of the lines on input, and thanks to LC_ALL=C even if the input is not valid text in the locale. Some paste implementations will still choke on NUL bytes in input though.

Note that it always produces one line of output even if the input is empty (except with busybox paste).

Or use perl:

cmd | perl -lpe '$\ = eof ? "\n" : " "'

(has no limitation other than each input line has to fit in memory).

In any case, once you've joined all those lines of input into one long line of output, if that line is greater than LINE_MAX bytes, you may find that text utilities fail to process it correctly.

More correct ways to trim leading and trailing whitespace of all lines

That's covered at How do I trim leading and trailing whitespace from each line of some output?

Reliably and considering only ASCII whitespace:

cmd | perl -lpe 's/^\s*|\s*$//g'

^{¹ Though the GNU implementation of xargs has -d '\n' to split on newline only and not handle any escaping/quoting, as well as -0 to split on NUL characters. The latter has been copied by many other xargs implementations and has been standardised by POSIX, but AFAIK, to this day -d is still specific to GNU xargs.}

^{² busybox' xargs, depending on version and build option can actually invoke its own echo applet without forking any process and executing a standalone echo executable in it. In current versions at least, it still splits the list of args into several invocations. ast-open's xargs implements the echo internally (with no option nor backslash processing).}

^{³ in particular, that \u above is treated the same as \u0 or \u0000 in some and outputs a NUL character (U+0000).}

I'm not sure what the exact specification of “join lines with spaces” is, but it seems close to the simple tr '\n' ' ', apart from the final newline (which doesn't matter if it's in a command substitution anyway). As a bonus, tr -s '\n' ' ' removes blank entries, if that's desired. — Gilles 'SO- stop being evil'
– Gilles 'SO- stop being evil', Commented 2 days ago
@Joshua Aside from the GNU implementation, which other one accepts -d? — Fravadona
– Fravadona, Commented yesterday
@Gilles'SO-stopbeingevil', tr '\n' ' ' leaves a space at the end whether in command substitution or not. If that command substitution is unquoted and in list context, then that trailing space would be ignored, but using tr in that case would be pointless as both \n and space are in the default value of $IFS used to do that splitting. — Stéphane Chazelas
– Stéphane Chazelas, Commented yesterday

Stack Exchange Network

Why is "cmd | xargs" to join lines of cmd with spaces wrong

1 Answer 1

1. xargs' splitting

2. `echo` can't be used for arbitrary data.

3, 4. `xargs` may run the command several times

5. the case of `cmd2 $(cmd | xargs)`

More correct ways to join lines with spaces

More correct ways to trim leading and trailing whitespace of all lines

You must log in to answer this question.

Linked

Hot Network Questions

Why is "cmd | xargs" to join lines of cmd with spaces wrong

1 Answer 1

1. xargs' splitting

2. echo can't be used for arbitrary data.

3, 4. xargs may run the command several times

5. the case of cmd2 $(cmd | xargs)

More correct ways to join lines with spaces

More correct ways to trim leading and trailing whitespace of all lines

You must log in to answer this question.

Linked

Related

Hot Network Questions

2. `echo` can't be used for arbitrary data.

3, 4. `xargs` may run the command several times

5. the case of `cmd2 $(cmd | xargs)`