Awk Command in Linux with Examples

Awk is a powerful text-processing language designed for extracting, transforming, and reporting data. It is one of the most versatile command-line tools available on Linux and Unix systems, capable of handling everything from simple column extraction to complex data analysis.
Unlike most procedural programming languages, awk is data-driven. You define a set of rules consisting of patterns and actions, and awk applies them to each line of input automatically. This guide covers awk syntax, patterns, actions, built-in variables, string functions, arrays, and practical examples.
For a quick reference, see the Awk cheatsheet .
How awk Works
There are several different implementations of awk. We will use the GNU implementation of awk, which is called gawk. On many Linux systems, awk points to gawk, while on others it may point to a different implementation such as mawk.
Records and Fields
Awk processes textual data files and streams. The input data is divided into records and fields. Awk operates on one record at a time until the end of the input is reached. Records are separated by a character called the record separator. The default record separator is the newline character, meaning each line in the text data is a record. A new record separator can be set using the RS variable.
Records consist of fields that are separated by the field separator. By default, fields are separated by whitespace, including one or more tab, space, and newline characters.
The fields in each record are referenced by the dollar sign ($) followed by the field number beginning with 1. The first field is represented with $1, the second with $2, and so on. The last field can also be referenced with the special variable $NF. The entire record can be referenced with $0.
Here is a visual representation showing how to reference records and fields:
tmpfs 788M 1.8M 786M 1% /run/lock
/dev/sda1 234G 191G 31G 87% /
|-------| |--| |--| |--| |-| |--------|
$1 $2 $3 $4 $5 $6 ($NF) --> fields
|-----------------------------------------|
$0 --> recordAwk Program
To process text with awk, you write a program that tells the command what to do. The program consists of a series of rules and optional user-defined functions. Each rule contains a pattern and action pair. Rules are separated by newlines or semicolons (;). A typical awk program looks like this:
pattern { action }
pattern { action }
...When awk processes data, if the pattern matches the record, it performs the specified action on that record. When a rule has no pattern, every input record is matched. When a rule has no action, it defaults to printing the entire record.
An awk action is enclosed in braces ({}) and consists of statements. Each statement specifies the operation to be performed. Multiple statements are separated by newlines or semicolons (;).
When writing awk programs, everything after the hash mark (#) and until the end of the line is considered a comment. Long lines can be broken into multiple lines using the continuation character, backslash (\).
Executing Awk Programs
An awk program can be run in several ways. If the program is short and simple, it can be passed directly to the awk interpreter on the command line:
awk 'program' input-file...When running the program on the command line, it should be enclosed in single quotes ('') so the shell does not interpret the program.
If the program is large and complex, it is best to put it in a file and use the -f option to pass the file to the awk command:
awk -f program-file input-file...In the examples below, we will use a file named “teams.txt” that looks like the one below:
Bucks Milwaukee 60 22 0.732
Raptors Toronto 58 24 0.707
76ers Philadelphia 51 31 0.622
Celtics Boston 49 33 0.598
Pacers Indiana 48 34 0.585Awk Syntax
The general syntax of the awk command is:
awk [OPTIONS] 'pattern { action }' fileThe most commonly used options are:
-F— Sets the input field separator (same as settingFSinside the program).-v var=value— Assigns a value to a variable before the program begins executing.-f program-file— Reads the awk program from a file instead of the command line.
Awk can also read input from standard input (stdin) when no file is specified. This makes it useful in pipelines:
command | awk '{ print $1 }'Awk Patterns
Patterns control whether the associated action is executed. Awk supports several types of patterns, including regular expressions, relational expressions, ranges, and special patterns.
When the rule has no pattern, each input record is matched. Here is an example of a rule containing only an action:
awk '{ print $3 }' teams.txtThe program prints the third field of each record:
60
58
51
49
48Regular Expression Patterns
A regular expression or regex is a pattern that matches a set of strings. Awk regular expression patterns are enclosed in slashes (//):
/regex pattern/ { action }The most basic example is a literal character or string matching. To display the first field of each record that contains “0.5”, run the following command:
awk '/0.5/ { print $1 }' teams.txtCeltics
PacersThe pattern can be any type of extended regular expression. Here is an example that prints the first field if the record starts with two or more digits:
awk '/^[0-9][0-9]/ { print $1 }' teams.txt76ersRelational Expression Patterns
Relational expression patterns are generally used to match the content of a specific field or variable.
By default, regular expression patterns are matched against the entire record. To match a regex against a field, specify the field and use the “contain” comparison operator (~) against the pattern.
For example, to print the first field of each record whose second field contains “ia”:
awk '$2 ~ /ia/ { print $1 }' teams.txt76ers
PacersTo match fields that do not contain a given pattern, use the !~ operator:
awk '$2 !~ /ia/ { print $1 }' teams.txtBucks
Raptors
CelticsYou can compare strings or numbers for relationships such as greater than, less than, or equal. The following command prints the first field of all records whose third field is greater than 50:
awk '$3 > 50 { print $1 }' teams.txtBucks
Raptors
76ersRange Patterns
Range patterns consist of two patterns separated by a comma:
pattern1, pattern2All records starting with a record that matches the first pattern until a record that matches the second pattern are matched.
Here is an example that prints the first field of all records starting from the record including “Raptors” until the record including “Celtics”:
awk '/Raptors/,/Celtics/ { print $1 }' teams.txtRaptors
76ers
CelticsThe patterns can also be relational expressions. The command below prints all records starting from the one whose fourth field is equal to 31 until the one whose fourth field is equal to 33:
awk '$4 == 31, $4 == 33 { print $0 }' teams.txt76ers Philadelphia 51 31 0.622
Celtics Boston 49 33 0.598Range patterns cannot be combined with other pattern expressions.
Special Expression Patterns
Awk includes the following special patterns:
BEGIN— Used to perform actions before records are processed.END— Used to perform actions after records are processed.
The BEGIN pattern is commonly used to set variables, and the END pattern to process data from the records such as calculations.
The following example prints “Start Processing.”, then prints the third field of each record, and finally “End Processing.”:
awk 'BEGIN { print "Start Processing." }; { print $3 }; END { print "End Processing." }' teams.txtStart Processing.
60
58
51
49
48
End Processing.If a program has only a BEGIN pattern, actions are executed, and the input is not processed. If a program has only an END pattern, the input is processed before performing the rule actions.
The GNU version of awk also includes two more special patterns, BEGINFILE and ENDFILE, which allow you to perform actions when processing files.
Combining Patterns
Awk allows you to combine two or more patterns using the logical AND operator (&&) and logical OR operator (||).
Here is an example that uses the && operator to print the first field of those records whose third field is greater than 50 and the fourth field is less than 30:
awk '$3 > 50 && $4 < 30 { print $1 }' teams.txtBucks
RaptorsAwk Actions
Awk actions are enclosed in braces ({}) and executed when the pattern matches. An action can have zero or more statements. Multiple statements are executed in the order they appear and must be separated by newlines or semicolons (;).
Several types of action statements are supported in awk:
- Expressions, such as variable assignment, arithmetic operators, increment, and decrement operators.
- Control statements, used to control the flow of the program (
if,for,while,switch, and more). - Output statements, such as
printandprintf. - Compound statements, to group other statements.
- Input statements, to control the processing of the input.
- Deletion statements, to remove array elements.
The print Statement
The print statement is the most commonly used awk statement. It prints text, records, fields, and variables.
When printing multiple items, you need to separate them with commas. Here is an example:
awk '{ print $1, $3, $5 }' teams.txtThe printed items are separated by single spaces:
Bucks 60 0.732
Raptors 58 0.707
76ers 51 0.622
Celtics 49 0.598
Pacers 48 0.585If you do not use commas, there will be no space between the items:
awk '{ print $1 $3 $5 }' teams.txtThe printed items are concatenated:
Bucks600.732
Raptors580.707
76ers510.622
Celtics490.598
Pacers480.585When print is used without an argument, it defaults to print $0, printing the current record.
To print custom text, you must quote the text with double-quote characters:
awk '{ print "The first field:", $1}' teams.txtThe first field: Bucks
The first field: Raptors
The first field: 76ers
The first field: Celtics
The first field: PacersYou can also print special characters such as newline:
awk 'BEGIN { print "First line\nSecond line\nThird line" }'First line
Second line
Third lineThe printf Statement
The printf statement gives you more control over the output format. Unlike print, printf does not automatically add a newline after each output. Here is an example that inserts line numbers:
awk '{ printf "%3d. %s\n", NR, $0 }' teams.txt 1. Bucks Milwaukee 60 22 0.732
2. Raptors Toronto 58 24 0.707
3. 76ers Philadelphia 51 31 0.622
4. Celtics Boston 49 33 0.598
5. Pacers Indiana 48 34 0.585Common printf format specifiers include %s for strings, %d for integers, %f for floating-point numbers, and %x for hexadecimal.
Control Statements
Awk supports standard control flow statements. The if/else statement lets you execute different actions based on conditions:
awk '{ if ($3 > 50) print $1, "- strong"; else print $1, "- average" }' teams.txtBucks - strong
Raptors - strong
76ers - strong
Celtics - average
Pacers - averageThe for loop is useful for iterating over a range of values:
awk 'BEGIN { for (i = 1; i <= 5; i++) print "Square of", i, "is", i*i }'Square of 1 is 1
Square of 2 is 4
Square of 3 is 9
Square of 4 is 16
Square of 5 is 25The while loop works similarly. Here is the same example using while:
awk 'BEGIN { i = 1; while (i <= 5) { print "Square of", i, "is", i*i; i++ } }'Summing a Column
The following command calculates the sum of the values stored in the third field across all lines:
awk '{ sum += $3 } END { printf "%d\n", sum }' teams.txt266Running Programs from Files
When writing longer programs, you should create a separate program file:
BEGIN {
for (i = 1; i <= 5; i++) {
print "Square of", i, "is", i*i
}
}Run the program by passing the file name to the awk interpreter:
awk -f prg.awkYou can also run an awk program as an executable by using the shebang
directive and setting the awk interpreter:
#!/usr/bin/awk -f
BEGIN {
for (i = 1; i <= 5; i++) {
print "Square of", i, "is", i*i
}
}Save the file and make it executable :
chmod +x prg.awkYou can now run the program by entering:
./prg.awkBuilt-in Variables
Awk has a number of built-in variables that contain useful information and allow you to control how the program is processed. Below are the most common built-in variables:
NF— The number of fields in the current record.NR— The number of the current record (line number across all files).FNR— The number of the current record in the current file. UnlikeNR,FNRresets to 1 for each new file.FILENAME— The name of the input file currently being processed.FS— The input field separator (default: whitespace).RS— The input record separator (default: newline).OFS— The output field separator (default: space).ORS— The output record separator (default: newline).OFMT— The output format for numbers (default:"%.6g").
Here is an example showing how to print the file name and the number of lines (records):
awk 'END { print "File", FILENAME, "contains", NR, "lines." }' teams.txtFile teams.txt contains 5 lines.The OFS variable controls what character is placed between fields when you use a comma in print. In the following example, we set it to a tab:
awk 'BEGIN { OFS = "\t" } { print $1, $3, $5 }' teams.txtBucks 60 0.732
Raptors 58 0.707
76ers 51 0.622
Celtics 49 0.598
Pacers 48 0.585Variables in awk can be set at any line in the program. To define a variable for the entire program, put it in a BEGIN pattern.
Changing the Field and Record Separator
The default value of the field separator is any number of space or tab characters. It can be changed by setting the FS variable.
For example, to set the field separator to .:
awk 'BEGIN { FS = "." } { print $1 }' teams.txtBucks Milwaukee 60 22 0
Raptors Toronto 58 24 0
76ers Philadelphia 51 31 0
Celtics Boston 49 33 0
Pacers Indiana 48 34 0The field separator can also be set to more than one character:
awk 'BEGIN { FS = ".." } { print $1 }' teams.txtWhen running awk one-liners on the command line, you can use the -F option to change the field separator:
awk -F "." '{ print $1 }' teams.txtBy default, the record separator is a newline character and can be changed using the RS variable.
Here is an example showing how to change the record separator to .:
awk 'BEGIN { RS = "." } { print $1 }' teams.txtBucks Milwaukee 60 22 0
732
Raptors Toronto 58 24 0
707
76ers Philadelphia 51 31 0
622
Celtics Boston 49 33 0
598
Pacers Indiana 48 34 0
585String Functions
Awk includes a rich set of built-in string functions for text manipulation.
length()
The length() function returns the number of characters in a string. When called without an argument, it returns the length of the current record:
awk '{ print $1, length($1) }' teams.txtBucks 5
Raptors 7
76ers 5
Celtics 7
Pacers 6substr()
The substr(string, start, length) function extracts a substring. The start position is 1-based. If length is omitted, it returns everything from start to the end of the string:
awk '{ print substr($1, 1, 3) }' teams.txtBuc
Rap
76e
Cel
Pacsplit()
The split(string, array, separator) function splits a string into an array and returns the number of elements. In the following example, we split a colon-separated string:
echo "one:two:three" | awk '{ n = split($0, a, ":"); for (i = 1; i <= n; i++) print a[i] }'one
two
threesub() and gsub()
The sub(regex, replacement, target) function replaces the first match of a regex in the target string. The gsub() function replaces all matches:
echo "hello world hello" | awk '{ gsub(/hello/, "hi"); print }'hi world hiIf the target is omitted, $0 (the entire record) is used.
tolower() and toupper()
These functions convert strings to lowercase or uppercase:
awk '{ print toupper($1) }' teams.txtBUCKS
RAPTORS
76ERS
CELTICS
PACERSindex() and match()
The index(string, target) function returns the position of the first occurrence of target in string, or 0 if not found. The match(string, regex) function does the same but with a regular expression:
awk '{ print $1, index($1, "er") }' teams.txtBucks 0
Raptors 0
76ers 3
Celtics 0
Pacers 4Arrays
Awk supports associative arrays, which use strings as indices (keys). Arrays do not need to be declared before use.
Creating and Accessing Arrays
You can assign values to array elements using array[key] = value:
awk 'BEGIN { fruits["apple"] = 5; fruits["banana"] = 3; print fruits["apple"], fruits["banana"] }'5 3Iterating Over Arrays
Use the for (key in array) syntax to iterate over all elements in an array:
awk '{ teams[$1] = $3 } END { for (name in teams) print name, teams[name] }' teams.txtThis stores each team name and their wins count, then prints them all at the end. Note that the order of elements in awk associative arrays is not guaranteed.
Deleting Array Elements
Use the delete statement to remove an element from an array:
awk 'BEGIN { a["x"] = 1; a["y"] = 2; delete a["x"]; for (k in a) print k, a[k] }'y 2Word Frequency Counter
Associative arrays are commonly used for counting. The following example counts the frequency of each word in a file:
awk '{ for (i = 1; i <= NF; i++) words[$i]++ } END { for (w in words) print words[w], w }' teams.txt | sort -rn5 0
1 76ers
1 Celtics
1 Raptors
1 Bucks
1 Pacers
1 Milwaukee
1 Toronto
1 Philadelphia
1 Boston
1 Indiana
1 60
1 58
1 51
1 49
1 48
1 22
1 24
1 31
1 33
1 34
1 0.732
1 0.707
1 0.622
1 0.598
1 0.585Practical Examples
This section demonstrates common real-world uses of awk.
Processing /etc/passwd
The /etc/passwd file uses colons as field separators. To list all usernames and their shells:
awk -F: '{ print $1, $7 }' /etc/passwdTo find users with /bin/bash as their shell:
awk -F: '$7 == "/bin/bash" { print $1 }' /etc/passwdParsing Log Files
To extract IP addresses from an Apache or Nginx access log (where the IP is the first field):
awk '{ print $1 }' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10This pipeline extracts the IPs, counts occurrences with uniq
, sorts by frequency, and shows the top 10.
Piped Commands
Awk is frequently used in pipelines to extract specific columns from command output. To show the name and CPU usage of running processes:
ps aux | awk '{ print $11, $3 }' | head -10To display filesystem usage with only the mount point and used percentage:
df -h | awk 'NR > 1 { print $6, $5 }'The NR > 1 pattern skips the header line.
Summing a Column
To calculate the total size of all files listed by ls -l:
ls -l | awk 'NR > 1 { sum += $5 } END { print "Total bytes:", sum }'Counting Records
To count the number of lines that match a pattern:
awk '/error/ { count++ } END { print count }' /var/log/syslogRemoving Duplicate Lines
To remove duplicate lines while preserving order:
awk '!seen[$0]++' file.txtThis works by using the current line as an array key. The first time a line is seen, seen[$0] is 0 (false), so !seen[$0] is true and the line is printed. On subsequent occurrences, the value is non-zero, so the line is skipped.
CSV Processing
To process a simple CSV file, set the field separator to a comma:
awk -F, '{ print $1, $3 }' data.csvFor CSV files with quoted fields that may contain commas, use the FPAT variable (available in gawk 4.0+):
awk 'BEGIN { FPAT = "([^,]*)|(\"[^\"]*\")" } { print $1, $3 }' data.csvUsing Shell Variables in Awk Programs
If you are using the awk command in shell scripts, you will often need to pass a shell variable to the awk program. One option is to enclose the program with double instead of single quotes and substitute the variable in the program. However, this approach makes your awk program more complex as you must escape the awk variables.
The recommended way to use shell variables in awk programs is to assign the shell variable to an awk variable using the -v option. Here is an example:
num=51
awk -v n="$num" 'BEGIN {print n}'51You can also pass multiple variables:
min=50
max=60
awk -v low="$min" -v high="$max" '$3 >= low && $3 <= high { print $1, $3 }' teams.txtBucks 60
Raptors 58
76ers 51Quick Reference
| Task | Command |
|---|---|
| Print a specific field | awk '{ print $2 }' file |
| Print multiple fields | awk '{ print $1, $3 }' file |
| Filter by pattern | awk '/pattern/ { print }' file |
| Filter by field value | awk '$3 > 50 { print }' file |
| Set field separator | awk -F: '{ print $1 }' file |
| Use BEGIN/END | awk 'BEGIN { print "Header" } { print } END { print "Footer" }' file |
| Sum a column | awk '{ sum += $3 } END { print sum }' file |
| Count matching lines | awk '/pattern/ { c++ } END { print c }' file |
| Remove duplicates | awk '!seen[$0]++' file |
| Print line numbers | awk '{ print NR, $0 }' file |
| Replace text | awk '{ gsub(/old/, "new"); print }' file |
| Pass shell variable | awk -v x="$var" '{ print x, $1 }' file |
| Run program from file | awk -f script.awk file |
| Print last field | awk '{ print $NF }' file |
FAQ
What is the difference between awk and sed?
Awk is a full programming language designed for field-based text processing. It excels at working with structured, columnar data. Sed is a stream editor optimized for line-by-line text transformations such as find-and-replace. Use awk when you need to extract or compute values from specific columns, and sed when you need simple text substitutions.
How do I print a specific column in awk?
Use the dollar sign followed by the column number. For example, awk '{ print $2 }' file prints the second column. You can print multiple columns by separating them with commas: awk '{ print $1, $3 }' file.
What is the difference between awk and gawk?
Gawk (GNU awk) is the GNU implementation of the awk language. It is fully compatible with the POSIX awk specification and adds extensions such as BEGINFILE/ENDFILE patterns, the FPAT variable for field-based parsing, and network I/O. Depending on the distribution, awk may point to gawk or to another implementation.
How do I use awk with CSV files?
For simple CSV files without quoted fields, set the field separator to a comma with -F,. For CSV files with quoted fields that may contain commas, use gawk’s FPAT variable: awk 'BEGIN { FPAT = "([^,]*)|(\"[^\"]*\")" } { print $1 }' file.csv. For complex CSV processing, consider dedicated tools like csvtool or miller.
Conclusion
Awk is one of the most powerful tools available for text processing on the command line. It handles everything from simple column extraction to complex data analysis with its pattern-action programming model. For comprehensive documentation, see the official Gawk manual .
If you have any questions, feel free to leave a comment below.
Tags
Linuxize Weekly Newsletter
A quick weekly roundup of new tutorials, news, and tips.
About the authors

Dejan Panovski
Dejan Panovski is the founder of Linuxize, an RHCSA-certified Linux system administrator and DevOps engineer based in Skopje, Macedonia. Author of 800+ Linux tutorials with 20+ years of experience turning complex Linux tasks into clear, reliable guides.
View author page