Text is the heart of Unix. The philosophy of “everything is a file” completely permeates the entire system and the tools developed for it. That’s why working with text is one of the required skills not only of a system administrator but also of a regular Linux user who wants to understand this operating system more deeply. So What is the awk command in Linux?

What is the awk command in linux?
The awk team is one of the most powerful text processing and filtering tools available even to people who are not connected with programming. This is not just a utility, but a whole language designed to process and retrieve data. In this article, we will understand How to use awk command in Linux.
Using the awk programming language, you can do the following:
- Declare variables to store data.
- Use arithmetic and string operators to work with data.
- Use structural elements and control structures of the language, such as the if-then operator and loops, which allows you to implement complex data processing algorithms.
- Create formatted reports.
If we talk only about the ability to create formatted reports that are convenient to read and analyze, this is very useful when working with log files that can contain millions of records. But awk is much more than a reporting tool.
First you need to understand how the utility works. Awk reads the document one line at a time, performs the actions you specify, and prints the result to standard output. One of the most common tasks for which awk is used is to fetch one of the columns. All awk parameters are in quotation marks, and the action to be performed is in curly brackets. Here is its basic syntax:
- $ awk options ‘ condition { action }‘
- $ awk options ‘ condition { action } condition { action }‘
Using an action, you can perform conversions with the processed string. We will talk about this later, but now let’s look at the utility options :
- F, –field-separator – field separator, used to split text into columns;
- -f, –file – read data not from standard output, but from a file;
- -v, –assign – assign a value to a variable, for example foo = bar;
- -b, –characters-as-bytes – consider all characters as single-byte;
- -d, –dump-variables – print the values of all awk variables by default;
- -D, –debug – debugging mode, allows you to enter commands interactively from the keyboard;
- -e, –source – execute the specified awk code;
- -o, –pretty-print – output the result of the program to a file;
- -V, –version – print the utility version.
These are far from all awk options, however you will have enough of them for the first time. Now we list a few action functions that you can use:
- print (string) – output of something to the standard output stream;
- printf (string) – formatted output to standard output stream;
- system (command) – executes a command in the system;
- length (string) – returns the length of the string;
- substr (string, start, quantity) – truncates the string and returns the result;
- tolower (string) – converts the string to lowercase;
- toupper (string) – convert the string to uppercase.
You can use various variables and operators in action functions, here are a few of them:
- FNR – number of the processed line in the file;
- FS – field separator;
- NF – the number of columns in this row;
- NR – the total number of lines in the processed text;
- RS – line separator, by default a newline character;
- $ – link to the column by number.
In addition to these variables, there are others, and you can also declare your own.
The condition allows you to process only those lines that contain the data we need, it can be used as a filter, like grep. And the condition allows you to execute certain blocks of awk code for the beginning and end of the file, for this, instead of the regular expression, use the BEGIN (start) and END (end)directives. There is still a lot of everything, but for today, perhaps enough. Now let’s move on to the examples.
awk syntax
For awk, there are concepts of the command and actions performed by this command. The actions to be performed are enclosed in curly braces {}, and the command itself (which includes the actions) is contained in single quotes ”:
awk '{action1; action2; actionN}'
Several actions are separated (according to AWK semantics) by a semicolon.
The following command will print the entire file.txt file like cat
awk '{print}' file.txt
String output containing ‘string’
awk '/' string '/ {print}' file.txt
The print operator accepts the expressions $ 0, $ 1, $ 2 … These expressions indicate which fields should be displayed, for example. The $ 0 operator will output the entire file. for instance
awk '{print $ 0}' file.txt
Similarly, awk ‘{print}’ file.txt will output the entire file

The following examples demonstrate the use of awk in the most common situations:
dpkg -l | awk '{print $ 2}'
As a result, a list with the names of installed packages will be displayed. If you need to find out, for example, which PHP or Apache packages are installed on the system. You should give the command:
dpkg -l | awk '/' php '/ {print $ 2}'
or for Apache:
dpkg -l | awk '/' apache '/ {print $ 2}'
The expression for searching / sorting / selecting is, as you can see, between the characters / ” /.
awk command in linux with examples
using awk command
The simplest and often demanded task is to select fields from standard output. You will not find a more suitable tool for solving this task than awk. By default, awk separates fields with spaces. If you want to print the first field, you just need to use the print function and pass the $ 1 parameter to it, if there is only one function, you can omit the brackets:
echo 'one two three four' | awk '{print $1}'

Yes, using curly braces is a bit unusual, but this is only the first time. Have you already guessed how to print the second, third, fourth, or other fields? Correctly it is $ 2, $ 3, $ 4 respectively.
echo 'one two three four' | awk '{print $3}'

Sometimes it is necessary to present data in a specific format, for example, select a few words. awk command easily handles the grouping of several fields and even allows you to include static data:
echo 'one two three four' | awk '{print $3,$1}'

echo 'one two three four' | awk '{print "foo:",$3,"| bar:",$1}'

If the fields are not separated by spaces, but by another separator, simply specify the necessary separator in quotation marks in the -F parameter, for example, “:”:
echo 'one mississippi:two mississippi:three mississippi:four mississippi' | awk -F":" '{print $4}'

But the delimiter does not have to be quoted. The following conclusion is similar to the previous one:
echo 'one mississippi:two mississippi:three mississippi:four mississippi' | awk -F: '{print $4}'

Sometimes you need to process data with an unknown number of fields. If you need to select the last field, you can use the $ NF variable. This is how you can display the last field:
echo 'one two three four' | awk '{print $NF}'

You can also use the $ NF variable to get the penultimate field:
echo 'one two three four' | awk '{print $(NF-1)}'

Or fields from the middle:
echo 'one two three four' | awk '{print $((NF/2)+1)}'
echo 'one two three four five' | awk '{print $((NF/2)+1)}'

All this can be done with utilities such as sed, cut and grep, but it will be much more complicated.
As I said above, awk processes one line at a time, this is a confirmation:
echo -e 'one 1\n two 2' | awk '{print $1}'

And here is an example of filtering using a condition, it will only display the line containing the text one:
echo -e 'one 1\n two 2' | awk '/one/ {print $1}'
Here is an example of using variable operations:
echo -e 'one 1\n two 2' | awk '{sum+=$2} END {print sum}'
This means that we must execute the next block of code for each line. This can be used, for example, to calculate the amount of transmitted data for requests from the web server log.
so Linux awk command is really long some time and hard to remember? you can use Alias command to shortcut all of awk commands with alias command.
awk Programming language structure
The awk consists of sets of instructions, each of which consists of a description of the template and the action. awk automatically reads input data line by line (either from files or from a standard input stream). If the specified pattern matches the string, the specified action is taken. However, there are several special patterns. The BEGIN instruction is executed primarily before reading any input data, and the END instruction is executed last after processing all input data.
Some complex awk scripts consist only of a BEGIN rule and a getline statement to read the input. In the event that the template is not specified, the action is always performed. If no action is specified, awk will simply output the lines.
A regular expression escaped using slash characters ( “/” ) can be used as a template , moreover, matching the input string to this regular expression (that is, the presence of the text corresponding to it in this string) will be equivalent to matching this string to the pattern. The regular expression / ^ [^ #] / allows you to get all lines that do not start with a pound symbol. Also, the template can be represented in the form of an awk expression (such as an NF> 5 expression , designed to retrieve all lines containing more than 5 words).
Each time after reading a line (either in automatic mode, or using the getline statement ), it is divided into separate words. The first word is assigned to the expression $ 1 , the second to the expression $ 2, and so on. This feature of awk makes it easier to work with data columns. The variable NF contains the number of words per line. The $ character is an awk language operator ; therefore, it can be used within any expression. For example, the expression $ NF matches the last word in a string.
The power of awk lies in the ability to use dynamic arrays, especially in combination with regular expressions. This circumstance allows you to create complex queries for processing multiple files with the collection and comparison of results, similar to the query below, which allows you to establish: “What are the first words of all lines and how often are they used?”.
The above examples, not all awk features are demonstrated. However, to work effectively in the Linux command shell, and indeed with the system of the above examples, it is quite enough. Because they show the general principles of building commands and actions for awk. That allows you to create specific and more complex structures depending on the specific task. For deeper and wider use of the awk utility. It is recommended that you devote some time to learning the AWK language itself.
Leave a Reply