Awk Command in Linux with Examples

By 

Updated on

20 min read

awk command

Awk is a powerful text-processing language designed for extracting, transforming, and reporting data. It is one of the most versatile command-line tools available on Linux and Unix systems, capable of handling everything from simple column extraction to complex data analysis.

Unlike most procedural programming languages, awk is data-driven. You define a set of rules consisting of patterns and actions, and awk applies them to each line of input automatically. This guide covers awk syntax, patterns, actions, built-in variables, string functions, arrays, and practical examples.

For a quick reference, see the Awk cheatsheet .

How awk Works

There are several different implementations of awk. We will use the GNU implementation of awk, which is called gawk. On many Linux systems, awk points to gawk, while on others it may point to a different implementation such as mawk.

Records and Fields

Awk processes textual data files and streams. The input data is divided into records and fields. Awk operates on one record at a time until the end of the input is reached. Records are separated by a character called the record separator. The default record separator is the newline character, meaning each line in the text data is a record. A new record separator can be set using the RS variable.

Records consist of fields that are separated by the field separator. By default, fields are separated by whitespace, including one or more tab, space, and newline characters.

The fields in each record are referenced by the dollar sign ($) followed by the field number beginning with 1. The first field is represented with $1, the second with $2, and so on. The last field can also be referenced with the special variable $NF. The entire record can be referenced with $0.

Here is a visual representation showing how to reference records and fields:

html
tmpfs      788M  1.8M  786M   1% /run/lock
/dev/sda1  234G  191G   31G  87% /
|-------|  |--|  |--|   |--| |-| |--------|
   $1       $2    $3     $4   $5  $6 ($NF) --> fields
|-----------------------------------------|
                    $0                     --> record

Awk Program

To process text with awk, you write a program that tells the command what to do. The program consists of a series of rules and optional user-defined functions. Each rule contains a pattern and action pair. Rules are separated by newlines or semicolons (;). A typical awk program looks like this:

txt
pattern { action }
pattern { action }
...

When awk processes data, if the pattern matches the record, it performs the specified action on that record. When a rule has no pattern, every input record is matched. When a rule has no action, it defaults to printing the entire record.

An awk action is enclosed in braces ({}) and consists of statements. Each statement specifies the operation to be performed. Multiple statements are separated by newlines or semicolons (;).

When writing awk programs, everything after the hash mark (#) and until the end of the line is considered a comment. Long lines can be broken into multiple lines using the continuation character, backslash (\).

Executing Awk Programs

An awk program can be run in several ways. If the program is short and simple, it can be passed directly to the awk interpreter on the command line:

sh
awk 'program' input-file...

When running the program on the command line, it should be enclosed in single quotes ('') so the shell does not interpret the program.

If the program is large and complex, it is best to put it in a file and use the -f option to pass the file to the awk command:

sh
awk -f program-file input-file...

In the examples below, we will use a file named “teams.txt” that looks like the one below:

txt
Bucks Milwaukee    60 22 0.732
Raptors Toronto    58 24 0.707
76ers Philadelphia 51 31 0.622
Celtics Boston     49 33 0.598
Pacers Indiana     48 34 0.585

Awk Syntax

The general syntax of the awk command is:

txt
awk [OPTIONS] 'pattern { action }' file

The most commonly used options are:

  • -F — Sets the input field separator (same as setting FS inside the program).
  • -v var=value — Assigns a value to a variable before the program begins executing.
  • -f program-file — Reads the awk program from a file instead of the command line.

Awk can also read input from standard input (stdin) when no file is specified. This makes it useful in pipelines:

sh
command | awk '{ print $1 }'

Awk Patterns

Patterns control whether the associated action is executed. Awk supports several types of patterns, including regular expressions, relational expressions, ranges, and special patterns.

When the rule has no pattern, each input record is matched. Here is an example of a rule containing only an action:

Terminal
awk '{ print $3 }' teams.txt

The program prints the third field of each record:

output
60
58
51
49
48

Regular Expression Patterns

A regular expression or regex is a pattern that matches a set of strings. Awk regular expression patterns are enclosed in slashes (//):

txt
/regex pattern/ { action }

The most basic example is a literal character or string matching. To display the first field of each record that contains “0.5”, run the following command:

Terminal
awk '/0.5/ { print $1 }' teams.txt
output
Celtics
Pacers

The pattern can be any type of extended regular expression. Here is an example that prints the first field if the record starts with two or more digits:

Terminal
awk '/^[0-9][0-9]/ { print $1 }' teams.txt
output
76ers

Relational Expression Patterns

Relational expression patterns are generally used to match the content of a specific field or variable.

By default, regular expression patterns are matched against the entire record. To match a regex against a field, specify the field and use the “contain” comparison operator (~) against the pattern.

For example, to print the first field of each record whose second field contains “ia”:

Terminal
awk '$2 ~ /ia/ { print $1 }' teams.txt
output
76ers
Pacers

To match fields that do not contain a given pattern, use the !~ operator:

Terminal
awk '$2 !~ /ia/ { print $1 }' teams.txt
output
Bucks
Raptors
Celtics

You can compare strings or numbers for relationships such as greater than, less than, or equal. The following command prints the first field of all records whose third field is greater than 50:

Terminal
awk '$3 > 50 { print $1 }' teams.txt
output
Bucks
Raptors
76ers

Range Patterns

Range patterns consist of two patterns separated by a comma:

txt
pattern1, pattern2

All records starting with a record that matches the first pattern until a record that matches the second pattern are matched.

Here is an example that prints the first field of all records starting from the record including “Raptors” until the record including “Celtics”:

Terminal
awk '/Raptors/,/Celtics/ { print $1 }' teams.txt
output
Raptors
76ers
Celtics

The patterns can also be relational expressions. The command below prints all records starting from the one whose fourth field is equal to 31 until the one whose fourth field is equal to 33:

Terminal
awk '$4 == 31, $4 == 33 { print $0 }' teams.txt
output
76ers Philadelphia 51 31 0.622
Celtics Boston     49 33 0.598

Range patterns cannot be combined with other pattern expressions.

Special Expression Patterns

Awk includes the following special patterns:

  • BEGIN — Used to perform actions before records are processed.
  • END — Used to perform actions after records are processed.

The BEGIN pattern is commonly used to set variables, and the END pattern to process data from the records such as calculations.

The following example prints “Start Processing.”, then prints the third field of each record, and finally “End Processing.”:

Terminal
awk 'BEGIN { print "Start Processing." }; { print $3 }; END { print "End Processing." }' teams.txt
output
Start Processing.
60
58
51
49
48
End Processing.

If a program has only a BEGIN pattern, actions are executed, and the input is not processed. If a program has only an END pattern, the input is processed before performing the rule actions.

The GNU version of awk also includes two more special patterns, BEGINFILE and ENDFILE, which allow you to perform actions when processing files.

Combining Patterns

Awk allows you to combine two or more patterns using the logical AND operator (&&) and logical OR operator (||).

Here is an example that uses the && operator to print the first field of those records whose third field is greater than 50 and the fourth field is less than 30:

Terminal
awk '$3 > 50 && $4 < 30 { print $1 }' teams.txt
output
Bucks
Raptors

Awk Actions

Awk actions are enclosed in braces ({}) and executed when the pattern matches. An action can have zero or more statements. Multiple statements are executed in the order they appear and must be separated by newlines or semicolons (;).

Several types of action statements are supported in awk:

  • Expressions, such as variable assignment, arithmetic operators, increment, and decrement operators.
  • Control statements, used to control the flow of the program (if, for, while, switch, and more).
  • Output statements, such as print and printf.
  • Compound statements, to group other statements.
  • Input statements, to control the processing of the input.
  • Deletion statements, to remove array elements.

The print Statement

The print statement is the most commonly used awk statement. It prints text, records, fields, and variables.

When printing multiple items, you need to separate them with commas. Here is an example:

Terminal
awk '{ print $1, $3, $5 }' teams.txt

The printed items are separated by single spaces:

output
Bucks 60 0.732
Raptors 58 0.707
76ers 51 0.622
Celtics 49 0.598
Pacers 48 0.585

If you do not use commas, there will be no space between the items:

Terminal
awk '{ print $1 $3 $5 }' teams.txt

The printed items are concatenated:

output
Bucks600.732
Raptors580.707
76ers510.622
Celtics490.598
Pacers480.585

When print is used without an argument, it defaults to print $0, printing the current record.

To print custom text, you must quote the text with double-quote characters:

Terminal
awk '{ print "The first field:", $1}' teams.txt
output
The first field: Bucks
The first field: Raptors
The first field: 76ers
The first field: Celtics
The first field: Pacers

You can also print special characters such as newline:

Terminal
awk 'BEGIN { print "First line\nSecond line\nThird line" }'
output
First line
Second line
Third line

The printf Statement

The printf statement gives you more control over the output format. Unlike print, printf does not automatically add a newline after each output. Here is an example that inserts line numbers:

Terminal
awk '{ printf "%3d. %s\n", NR, $0 }' teams.txt
output
  1. Bucks Milwaukee    60 22 0.732
  2. Raptors Toronto    58 24 0.707
  3. 76ers Philadelphia 51 31 0.622
  4. Celtics Boston     49 33 0.598
  5. Pacers Indiana     48 34 0.585

Common printf format specifiers include %s for strings, %d for integers, %f for floating-point numbers, and %x for hexadecimal.

Control Statements

Awk supports standard control flow statements. The if/else statement lets you execute different actions based on conditions:

Terminal
awk '{ if ($3 > 50) print $1, "- strong"; else print $1, "- average" }' teams.txt
output
Bucks - strong
Raptors - strong
76ers - strong
Celtics - average
Pacers - average

The for loop is useful for iterating over a range of values:

Terminal
awk 'BEGIN { for (i = 1; i <= 5; i++) print "Square of", i, "is", i*i }'
output
Square of 1 is 1
Square of 2 is 4
Square of 3 is 9
Square of 4 is 16
Square of 5 is 25

The while loop works similarly. Here is the same example using while:

Terminal
awk 'BEGIN { i = 1; while (i <= 5) { print "Square of", i, "is", i*i; i++ } }'

Summing a Column

The following command calculates the sum of the values stored in the third field across all lines:

Terminal
awk '{ sum += $3 } END { printf "%d\n", sum }' teams.txt
output
266

Running Programs from Files

When writing longer programs, you should create a separate program file:

prg.awksh
BEGIN {
  for (i = 1; i <= 5; i++) {
    print "Square of", i, "is", i*i
  }
}

Run the program by passing the file name to the awk interpreter:

Terminal
awk -f prg.awk

You can also run an awk program as an executable by using the shebang directive and setting the awk interpreter:

prg.awksh
#!/usr/bin/awk -f
BEGIN {
  for (i = 1; i <= 5; i++) {
    print "Square of", i, "is", i*i
  }
}

Save the file and make it executable :

Terminal
chmod +x prg.awk

You can now run the program by entering:

Terminal
./prg.awk

Built-in Variables

Awk has a number of built-in variables that contain useful information and allow you to control how the program is processed. Below are the most common built-in variables:

  • NF — The number of fields in the current record.
  • NR — The number of the current record (line number across all files).
  • FNR — The number of the current record in the current file. Unlike NR, FNR resets to 1 for each new file.
  • FILENAME — The name of the input file currently being processed.
  • FS — The input field separator (default: whitespace).
  • RS — The input record separator (default: newline).
  • OFS — The output field separator (default: space).
  • ORS — The output record separator (default: newline).
  • OFMT — The output format for numbers (default: "%.6g").

Here is an example showing how to print the file name and the number of lines (records):

Terminal
awk 'END { print "File", FILENAME, "contains", NR, "lines." }' teams.txt
output
File teams.txt contains 5 lines.

The OFS variable controls what character is placed between fields when you use a comma in print. In the following example, we set it to a tab:

Terminal
awk 'BEGIN { OFS = "\t" } { print $1, $3, $5 }' teams.txt
output
Bucks	60	0.732
Raptors	58	0.707
76ers	51	0.622
Celtics	49	0.598
Pacers	48	0.585

Variables in awk can be set at any line in the program. To define a variable for the entire program, put it in a BEGIN pattern.

Changing the Field and Record Separator

The default value of the field separator is any number of space or tab characters. It can be changed by setting the FS variable.

For example, to set the field separator to .:

Terminal
awk 'BEGIN { FS = "." } { print $1 }' teams.txt
output
Bucks Milwaukee    60 22 0
Raptors Toronto    58 24 0
76ers Philadelphia 51 31 0
Celtics Boston     49 33 0
Pacers Indiana     48 34 0

The field separator can also be set to more than one character:

Terminal
awk 'BEGIN { FS = ".." } { print $1 }' teams.txt

When running awk one-liners on the command line, you can use the -F option to change the field separator:

Terminal
awk -F "." '{ print $1 }' teams.txt

By default, the record separator is a newline character and can be changed using the RS variable.

Here is an example showing how to change the record separator to .:

Terminal
awk 'BEGIN { RS = "." } { print $1 }' teams.txt
output
Bucks Milwaukee    60 22 0
732
Raptors Toronto    58 24 0
707
76ers Philadelphia 51 31 0
622
Celtics Boston     49 33 0
598
Pacers Indiana     48 34 0
585

String Functions

Awk includes a rich set of built-in string functions for text manipulation.

length()

The length() function returns the number of characters in a string. When called without an argument, it returns the length of the current record:

Terminal
awk '{ print $1, length($1) }' teams.txt
output
Bucks 5
Raptors 7
76ers 5
Celtics 7
Pacers 6

substr()

The substr(string, start, length) function extracts a substring. The start position is 1-based. If length is omitted, it returns everything from start to the end of the string:

Terminal
awk '{ print substr($1, 1, 3) }' teams.txt
output
Buc
Rap
76e
Cel
Pac

split()

The split(string, array, separator) function splits a string into an array and returns the number of elements. In the following example, we split a colon-separated string:

Terminal
echo "one:two:three" | awk '{ n = split($0, a, ":"); for (i = 1; i <= n; i++) print a[i] }'
output
one
two
three

sub() and gsub()

The sub(regex, replacement, target) function replaces the first match of a regex in the target string. The gsub() function replaces all matches:

Terminal
echo "hello world hello" | awk '{ gsub(/hello/, "hi"); print }'
output
hi world hi

If the target is omitted, $0 (the entire record) is used.

tolower() and toupper()

These functions convert strings to lowercase or uppercase:

Terminal
awk '{ print toupper($1) }' teams.txt
output
BUCKS
RAPTORS
76ERS
CELTICS
PACERS

index() and match()

The index(string, target) function returns the position of the first occurrence of target in string, or 0 if not found. The match(string, regex) function does the same but with a regular expression:

Terminal
awk '{ print $1, index($1, "er") }' teams.txt
output
Bucks 0
Raptors 0
76ers 3
Celtics 0
Pacers 4

Arrays

Awk supports associative arrays, which use strings as indices (keys). Arrays do not need to be declared before use.

Creating and Accessing Arrays

You can assign values to array elements using array[key] = value:

Terminal
awk 'BEGIN { fruits["apple"] = 5; fruits["banana"] = 3; print fruits["apple"], fruits["banana"] }'
output
5 3

Iterating Over Arrays

Use the for (key in array) syntax to iterate over all elements in an array:

Terminal
awk '{ teams[$1] = $3 } END { for (name in teams) print name, teams[name] }' teams.txt

This stores each team name and their wins count, then prints them all at the end. Note that the order of elements in awk associative arrays is not guaranteed.

Deleting Array Elements

Use the delete statement to remove an element from an array:

Terminal
awk 'BEGIN { a["x"] = 1; a["y"] = 2; delete a["x"]; for (k in a) print k, a[k] }'
output
y 2

Word Frequency Counter

Associative arrays are commonly used for counting. The following example counts the frequency of each word in a file:

Terminal
awk '{ for (i = 1; i <= NF; i++) words[$i]++ } END { for (w in words) print words[w], w }' teams.txt | sort -rn
output
5 0
1 76ers
1 Celtics
1 Raptors
1 Bucks
1 Pacers
1 Milwaukee
1 Toronto
1 Philadelphia
1 Boston
1 Indiana
1 60
1 58
1 51
1 49
1 48
1 22
1 24
1 31
1 33
1 34
1 0.732
1 0.707
1 0.622
1 0.598
1 0.585

Practical Examples

This section demonstrates common real-world uses of awk.

Processing /etc/passwd

The /etc/passwd file uses colons as field separators. To list all usernames and their shells:

Terminal
awk -F: '{ print $1, $7 }' /etc/passwd

To find users with /bin/bash as their shell:

Terminal
awk -F: '$7 == "/bin/bash" { print $1 }' /etc/passwd

Parsing Log Files

To extract IP addresses from an Apache or Nginx access log (where the IP is the first field):

Terminal
awk '{ print $1 }' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10

This pipeline extracts the IPs, counts occurrences with uniq , sorts by frequency, and shows the top 10.

Piped Commands

Awk is frequently used in pipelines to extract specific columns from command output. To show the name and CPU usage of running processes:

Terminal
ps aux | awk '{ print $11, $3 }' | head -10

To display filesystem usage with only the mount point and used percentage:

Terminal
df -h | awk 'NR > 1 { print $6, $5 }'

The NR > 1 pattern skips the header line.

Summing a Column

To calculate the total size of all files listed by ls -l:

Terminal
ls -l | awk 'NR > 1 { sum += $5 } END { print "Total bytes:", sum }'

Counting Records

To count the number of lines that match a pattern:

Terminal
awk '/error/ { count++ } END { print count }' /var/log/syslog

Removing Duplicate Lines

To remove duplicate lines while preserving order:

Terminal
awk '!seen[$0]++' file.txt

This works by using the current line as an array key. The first time a line is seen, seen[$0] is 0 (false), so !seen[$0] is true and the line is printed. On subsequent occurrences, the value is non-zero, so the line is skipped.

CSV Processing

To process a simple CSV file, set the field separator to a comma:

Terminal
awk -F, '{ print $1, $3 }' data.csv

For CSV files with quoted fields that may contain commas, use the FPAT variable (available in gawk 4.0+):

Terminal
awk 'BEGIN { FPAT = "([^,]*)|(\"[^\"]*\")" } { print $1, $3 }' data.csv

Using Shell Variables in Awk Programs

If you are using the awk command in shell scripts, you will often need to pass a shell variable to the awk program. One option is to enclose the program with double instead of single quotes and substitute the variable in the program. However, this approach makes your awk program more complex as you must escape the awk variables.

The recommended way to use shell variables in awk programs is to assign the shell variable to an awk variable using the -v option. Here is an example:

Terminal
num=51
awk -v n="$num" 'BEGIN {print n}'
output
51

You can also pass multiple variables:

Terminal
min=50
max=60
awk -v low="$min" -v high="$max" '$3 >= low && $3 <= high { print $1, $3 }' teams.txt
output
Bucks 60
Raptors 58
76ers 51

Quick Reference

TaskCommand
Print a specific fieldawk '{ print $2 }' file
Print multiple fieldsawk '{ print $1, $3 }' file
Filter by patternawk '/pattern/ { print }' file
Filter by field valueawk '$3 > 50 { print }' file
Set field separatorawk -F: '{ print $1 }' file
Use BEGIN/ENDawk 'BEGIN { print "Header" } { print } END { print "Footer" }' file
Sum a columnawk '{ sum += $3 } END { print sum }' file
Count matching linesawk '/pattern/ { c++ } END { print c }' file
Remove duplicatesawk '!seen[$0]++' file
Print line numbersawk '{ print NR, $0 }' file
Replace textawk '{ gsub(/old/, "new"); print }' file
Pass shell variableawk -v x="$var" '{ print x, $1 }' file
Run program from fileawk -f script.awk file
Print last fieldawk '{ print $NF }' file

FAQ

What is the difference between awk and sed?
Awk is a full programming language designed for field-based text processing. It excels at working with structured, columnar data. Sed is a stream editor optimized for line-by-line text transformations such as find-and-replace. Use awk when you need to extract or compute values from specific columns, and sed when you need simple text substitutions.

How do I print a specific column in awk?
Use the dollar sign followed by the column number. For example, awk '{ print $2 }' file prints the second column. You can print multiple columns by separating them with commas: awk '{ print $1, $3 }' file.

What is the difference between awk and gawk?
Gawk (GNU awk) is the GNU implementation of the awk language. It is fully compatible with the POSIX awk specification and adds extensions such as BEGINFILE/ENDFILE patterns, the FPAT variable for field-based parsing, and network I/O. Depending on the distribution, awk may point to gawk or to another implementation.

How do I use awk with CSV files?
For simple CSV files without quoted fields, set the field separator to a comma with -F,. For CSV files with quoted fields that may contain commas, use gawk’s FPAT variable: awk 'BEGIN { FPAT = "([^,]*)|(\"[^\"]*\")" } { print $1 }' file.csv. For complex CSV processing, consider dedicated tools like csvtool or miller.

Conclusion

Awk is one of the most powerful tools available for text processing on the command line. It handles everything from simple column extraction to complex data analysis with its pattern-action programming model. For comprehensive documentation, see the official Gawk manual .

If you have any questions, feel free to leave a comment below.

Linuxize Weekly Newsletter

A quick weekly roundup of new tutorials, news, and tips.

About the authors

Dejan Panovski

Dejan Panovski

Dejan Panovski is the founder of Linuxize, an RHCSA-certified Linux system administrator and DevOps engineer based in Skopje, Macedonia. Author of 800+ Linux tutorials with 20+ years of experience turning complex Linux tasks into clear, reliable guides.

View author page