Linux Commands

Terminal Shortcut tmux cut jq screen split

The cut command is one of the commands used to parse file data.
It is highly efficient in the process of parsing and preprocessing data.

Usability

  1. When you want to view only part of a simple file, you can use sed or awk, but cut offers the best performance.
    • You can see a noticeable performance difference when dealing with large files over 1GB.
  2. When you want to inspect large files, opening them with vi may take a long time, but with cut, you can view the content quickly.
  3. If a single line is extremely large (for example, 10GB), you might not be able to open it with tools like vi. Since it’s a single line, even head won’t help you view it. In such cases, you can use cut to inspect the data.
    • Of course, using split in such scenarios is also a good option.
  4. When using scripts like Python, preprocessing the data can significantly boost performance.

This kind of data preprocessing is very effective when dealing with large files that are simple and have a consistent format.

  • When extracting userdata for batch processing.
  • When parsing large amounts of simple, non-standard logs.

How to Use

While there are several ways to use cut, the most common usage is with cut -d -f.
It’s also useful when you need to extract specific characters.

A limitation of cut is that it only allows you to cut by a single character, but there are many datasets where this is useful, and the performance is excellent.
You can use it alongside other commands to parse data with consistent formats like db dumps, CSVs, and JSON files.
It is frequently used for file parsing, data extraction, and in command pipes.

Example

cut

Cutting by Delimiter

cut -d 'DELIMITER' -f INDEX FILE

-d:

  • Option to specify the character to cut by.
  • Since only a single character is allowed, you cannot use strings.

-f:

  • Option to select which field(s) to extract after cutting.
  • You can specify using N, N-M, or N,M-L.

Ignoring Lines Without Delimiters

cut -d 'DELIMITER' -f INDEX -s FILE

-s:

  • Option to skip lines that do not contain the delimiter.

Changing the Delimiter

cut -d 'DELIMITER' -f INDEX --output-delimiter="OUTPUT DELIMITER" FILE

–output-delimiter:

  • Specifies a string to replace the delimiter used in the output.

Cutting by Character

cut -c INDEX

-c:

  • Allows you to cut by character.
  • You can specify ranges using N, N-M, or N,M-L.