Text Filters with Linux (head tail sort nl wc uniq sed tac cut)

Estimated read time 9 min read

Filter means an input for the Linux command line. It can be generated by a program, read from a file, or entered by the user. After this filter entry, the necessary actions are taken and the main document is processed according to the filter. The result can be written to the screen as desired or added to another file.

In this article, we will see the commands used for these operations as a whole. It would be more useful to write a single article rather than to discuss them in separate articles. General usage is shown without going into too much detail. In our examples, we will use the working file containing the following data. To follow the examples, create an examplefile.txt file for yourself by copying and pasting the following data.

Fatih elmasuyu 20
Suzan portakalsuyu 5
Melih kavunsuyu 12
Melih kavunsuyu 12
Rasim kirazsuyu 4
Tarık portakalsuyu 9
Lale şeftalisuyu 7
Suzan portakalsuyu 12
Melih kayısısuyu 39
Ayşe mangosuyu 7
Galip havuçsuyu 3
Osman karpuzsuyu 2
Betül narsuyu 14

head

This command displays the desired number of lines from the beginning of the requested document. If no line count is given, the default value is 10 lines.

Format : head [-number of lines to print] [path]

head examplefile.txt 
Fatih elmasuyu 20
Suzan portakalsuyu 5
Melih kavunsuyu 12
Melih kavunsuyu 12
Rasim kirazsuyu 4
Tarık portakalsuyu 9
Lale şeftalisuyu 7
Suzan portakalsuyu 12
Melih kayısısuyu 39
Ayşe mangosuyu 7

The first 10 lines from the beginning are displayed above. Now let’s view the first 4 lines.

head -4 examplefile.txt 
Fatih elmasuyu 20
Suzan portakalsuyu 5
Melih kavunsuyu 12
Melih kavunsuyu 12

tail

The tail command scans and displays from the end of the document, just the opposite of the head command. Displays the desired number of lines of a document from the end. If no number of lines is given, the default is 10 lines.

Format : tail [-number of lines to print] [path]

tail examplefile.txt 
Melih kavunsuyu 12
Rasim kirazsuyu 4
Tarık portakalsuyu 9
Lale şeftalisuyu 7
Suzan portakalsuyu 12
Melih kayısısuyu 39
Ayşe mangosuyu 7
Galip havuçsuyu 3
Osman karpuzsuyu 2
Betül narsuyu 14

Now let’s view the last 3 lines.

tail -3 examplefile.txt 
Galip havuçsuyu 3
Osman karpuzsuyu 2
Betül narsuyu 14

shorts

It sorts the given text input alphabetically by default. It is possible to sort by other criteria. You can check the man pages for detailed information.

Format : sort [-options] [path]

sort examplefile.txt 
Ayşe mangosuyu 7
Betül narsuyu 14
Fatih elmasuyu 20
Galip havuçsuyu 3
Lale şeftalisuyu 7
Melih kavunsuyu 12
Melih kavunsuyu 12
Melih kayısısuyu 39
Osman karpuzsuyu 2
Rasim kirazsuyu 4
Suzan portakalsuyu 12
Suzan portakalsuyu 5
Tarık portakalsuyu 9

nl

This command takes its name from the initials of the expression number lines, which means number the lines.

Format : nl [-options] [path]

nl examplefile.txt 
     1	Fatih elmasuyu 20
     2	Suzan portakalsuyu 5
     3	Melih kavunsuyu 12
     4	Melih kavunsuyu 12
     5	Rasim kirazsuyu 4
     6	Tarık portakalsuyu 9
     7	Lale şeftalisuyu 7
     8	Suzan portakalsuyu 12
     9	Melih kayısısuyu 39
    10	Ayşe mangosuyu 7
    11	Galip havuçsuyu 3
    12	Osman karpuzsuyu 2
    13	Betül narsuyu 14

Sometimes you may want to add to the output. For example, if you want to put a period after the line numbers and leave a 10-character space before the numbers, you can try the example below.

nl -s '. ' -w 10 examplefile.txt 
         1. Fatih elmasuyu 20
         2. Suzan portakalsuyu 5
         3. Melih kavunsuyu 12
         4. Melih kavunsuyu 12
         5. Rasim kirazsuyu 4
         6. Tarık portakalsuyu 9
         7. Lale şeftalisuyu 7
         8. Suzan portakalsuyu 12
         9. Melih kayısısuyu 39
        10.Ayşe mangosuyu 7
        11.Galip havuçsuyu 3
        12.Osman karpuzsuyu 2
        13.Betül narsuyu 14

In the example above, two different command options are used. The -s option specifies that the . and space characters will be used as separators after the line number. The -w option specifies how much space will be left before the line number. Note that in this example, the options are entered in quotation marks.

toilet

The wc command consists of the initials of the word count expression and gives the number of words in the entered text document. Unless otherwise specified, the number of lines, words, and letters are reported in the command output.

Format : wc [-options] [path]

wc examplefile.txt 
13  39 255 examplefile.txt

Sometimes, we may need only one of these pieces of information. In this case, it is sufficient to specify the letter option of the information required to the command. -l (line) will specify the number of lines, -w (word) the number of words, and -m the number of characters.

wc -l examplefile.txt 
13 examplefile.txt

You can also combine more than one of these options.

wc -lw examplefile.txt 
13  39 examplefile.txt

cut

The Cut command allows you to take the columns you want from a file if your data is separated into columns, and copies the columns you want from CSV (Comma Separated Values) or texts consisting of space-separated values.

In the sample file we use, the data is separated by spaces. The first column indicates the name, the second column indicates the juice, and the third column indicates the quantity. If we want to get only the names from here, we can do this as follows.

-f : It is the first letter of the Fields expression and indicates which fields we will take.

-d : It is the first letter of the delimiter expression and specifies the character to be used to separate fields.

Format : cut [-options] [path]

cut -f 1 -d ' ' examplefile.txt 
Fatih
Suzan
Melih
Melih
Rasim
Tarık
Lale
Suzan
Melih
Ayşe
Galip
Osman
Betül

Let’s see how to take 2 columns and use them with an example.

cut -f 1,2 -d ' ' examplefile.txt 
Fatih elmasuyu
Suzan portakalsuyu
Melih kavunsuyu
Melih kavunsuyu
Rasim kirazsuyu
Tarık portakalsuyu
Lale şeftalisuyu
Suzan portakalsuyu
Melih kayısısuyu
Ayşe mangosuyu
Galip havuçsuyu
Osman karpuzsuyu
Betül narsuyu

sed

The sed command is created from the Stream Editor statement. It uses SEARCH-FIND/REPLACE logic. As can be seen from the explanation, it can be used to search for an expression and replace it with another expression. Although it has a number of other capabilities, we will show basic usage here.

Format : sed <expression> [path]

Basically, expression has the following structure.

Expression : s/searchexpression/newexpression/g

The s at the beginning   tells the sed command that the substitute operation will be performed. There are also other letters and operations.   The expression between the first and second apostrophe used after the letter s indicates what to search for, and the next part indicates what to replace with. The g statement at the end   indicates that the operation should be performed globally.  The letter g  may not be used. If left blank, the first value found during the search is changed, but the rest of the text is not changed.

Let’s look at our file contents first.

cat examplefile.txt
Fatih elmasuyu 20
Suzan portakalsuyu 5
Melih kavunsuyu 12
Melih kavunsuyu 12
Rasim kirazsuyu 4
Tarık portakalsuyu 9
Lale şeftalisuyu 7
Suzan portakalsuyu 12
Melih kayısısuyu 39
Ayşe mangosuyu 7
Galip havuçsuyu 3
Osman karpuzsuyu 2
Betül narsuyu 14

With the example below, all Suzan names in our file are replaced with Serpil.

sed 's/Suzan/Serpil/g' examplefile.txt 
Fatih elmasuyu 20
Serpil portakalsuyu 5
Melih kavunsuyu 12
Melih kavunsuyu 12
Rasim kirazsuyu 4
Tarık portakalsuyu 9
Lale şeftalisuyu 7
Serpil portakalsuyu 12
Melih kayısısuyu 39
Ayşe mangosuyu 7
Galip havuçsuyu 3
Osman karpuzsuyu 2
Betül narsuyu 14

sed searches and replaces entered expressions character by character, not word by word. In this case, you can also replace Suz with Ser. Sed searches case-sensitively by default. Instead of the expression to be searched, you can create different filters using [regular expressions], which we will explain in another section.

Finally, note that the options we entered for sed are written in quotes. If you accidentally forget to put the quotes,  you can use the CTRL+c  key combination to terminate the process.

unique

The uniq command is created from the word unique, meaning one and only. Basically, what it does is to take only one of the repeating lines and disable the other repeats. Sometimes there may be double entries in records. In this case, it is used to correct and simplify records. The important thing to note here is that repeating lines must follow each other, one under the other. If there are repeating lines in the document but they are not one under the other, we will discuss what needs to be done to solve this situation in the article on Piping and Redirection.

You may have noticed that some lines in our sample file are repeated. Let’s extract these lines using uniq. Let’s first look at the original version of the file. As can be seen, Melih repeats the line twice and consecutively.

cat examplefile.txt
Fatih elmasuyu 20
Suzan portakalsuyu 5
Melih kavunsuyu 12
Melih kavunsuyu 12
Rasim kirazsuyu 4
Tarık portakalsuyu 9
Lale şeftalisuyu 7
Suzan portakalsuyu 12
Melih kayısısuyu 39
Ayşe mangosuyu 7
Galip havuçsuyu 3
Osman karpuzsuyu 2
Betül narsuyu 14

After executing the command, it can be seen that the repeated lines are cleared.

Format : uniq [options] [path]

uniq examplefile.txt 
Fatih elmasuyu 20
Suzan portakalsuyu 5
Melih kavunsuyu 12
Rasim kirazsuyu 4
Tarık portakalsuyu 9
Lale şeftalisuyu 7
Suzan portakalsuyu 12
Melih kayısısuyu 39
Ayşe mangosuyu 7
Galip havuçsuyu 3
Osman karpuzsuyu 2
Betül narsuyu 14

crown

The tac command does the opposite of the cat command. It reads the bottom line of the file and writes it as the first line. Let us note that it is different from the Head and Tail commands.

Sometimes, while keeping records, new records may be written to the bottom of the file. You may want to see these new records at the top. In this case, using tac will make your job easier.

Format : tac [path]

tac examplefile.txt 
Betül narsuyu 14
Osman karpuzsuyu 2
Galip havuçsuyu 3
Ayşe mangosuyu 7
Melih kayısısuyu 39
Suzan portakalsuyu 12
Lale şeftalisuyu 7
Tarık portakalsuyu 9
Rasim kirazsuyu 4
Melih kavunsuyu 12
Melih kavunsuyu 12
Suzan portakalsuyu 5
Fatih elmasuyu 20
İbrahim Korucuoğlu

Yazar, bilişim ve teknoloji alanında derlediği faydalı içerikleri bu blogta paylaşmaktadır.