Searching… How many times did u search for something on u’r work during the last few days? 100, 200, or even more times. I think a bit - all our work is about searching. We search for a solution, we search for an answer, we search inputs, code, reason… endless search.

If we do this a lot, the good way to do this - is to optimize this process as well as possible. A good point to start is to inspect existing methods and techniques that allow us to do this in a better way.

history

The problem is not a new one, and we (humans) trying to solve and optimize it for quite a long period of time.

The very interesting page of the history of this process is the moment when regular expression was invented/described by Stephen Cole Kleene.

A regular expression, often called a pattern, specifies a set of strings required for a particular purpose

Regex opens for us a lot more than just a regular search - we can perform searches using patterns.

And one of the utilities that use regex for search is grep - a global search for regular expressions.

We must say ” Thank you!” to Ken Thompson, author of the B language, who create this utility for his text editor ed.

The name is coming from the command g/re/p in the ed editor (this commands print all lines).

There are a lot of modifications for grep (Global Regular Expressions Print):

  • egrep (Extended Global Regular Expressions Print) (grep -E)
  • fgrep (Fixed-string Global Regular Expressions Print) (grep -F)
  • pgrep (Process-ID Global Regular Expressions Print)

theory

I don’t know why, but when I just started no one from my team used this tool. I discover it for myself only a few years later. I was very surprised.

The invocation of the command can be done using the next synopsis:

grep [OPTION...] [PATTERNS] [FILE...]
grep [OPTION...] -e patterns ... [FILE...]
grep [OPTION...] -f file ... [FILE...]

can be zero or more options arguments and zero or more file arguments

env

The behavior of grep also depends on ENV variables:

  • GREP_COLOR
  • GREP_COLORS
  • LC_ALL, LC_COLLATE, LANG
  • LC_ALL, LC_CTYPE, LANG
  • LANGUAGE, LC_ALL, LC_MESSAGES, LANG
  • POSIXLY_CORRECT
  • _N_GNU_nonoption_argv_flags_
  • GREP_OPTIONS (deprecated)

I won’t cover all of these variables, just list them here

regular expression

The more interesting part - is if regex patterns that can describe a set of strings.

there are a lot of papers about it like this one

It’s good to know, that grep can work with 3 types of regex - BRE, ERE, and PCRE.

  • BRE - basic
  • ERE - extended
  • PCRE - perl-compatible

fundamentals

The most important part is the characters that can be used for the construction of a pattern:

  • special characters - .?*+{|()[\^$
  • ordinal character - all other

The important moment in patterns - is it can have an operation. Each operation is separated with the { and } characters. Operator with { named as interval expressions.

Here they are:

  • ? - matched zero or more times
  • + - matched one or more times
  • {n}, {n}?, {n}+ - matched exactly n times
  • {n,} - matched n or more times
  • {,m} - matched at most m times
  • {n,m} - matched at least n times, but not more than m times
  • | - or, can be used for joining operations
  • *? - 0 or more times. Match as few times as possible
  • . - any character
  • `` - empty string
  • () - override precedence rules

character classes/bracket

Bracket - [ and ] - match any single character listed in it. If u add ˆ - this means NOT in the list.

Example: [0123456789] - any single digit.

We can also use range expression - 2 char separated with a hyphen (-).

Example: [a-z] - any single char between a and z.

There are also a few classes named available for use - like [:digit:].

The full list can be found here

The list of most interesting:

  • [ and ] - match any single character listed in between
  • ˆ - used with [ and ] NOT in the list
  • [: - open class character
  • :] - close class character
  • [= - open equivalence class
  • =] - close equivalence class
  • [. - open collating class
  • .] - close collating class
  • - - range

backslash/special expr

The character \ with followed by some ordinary characters, has next meaning:

  • \ - escape char
  • \b - empty string at the edge of a word
  • \B - empty string provided it’s not at the edge of a word
  • \< - empty string at the beginning of a word
  • \> - empty string at the end of a word.
  • \w - word constituent
  • \W - non-word constituent
  • \s - whitespace
  • \S - non-whitespace

anchors

Symbols that represent the beginning and end of the string:

  • ˆ - beginning of the line
  • $ - end of the line

back-reference

We can also match the substring previously matched:

  • \n - example (a)\1 match aa

non-ASCII/non-printable

To do this - we can use ascii codes like grep $'\u035B\t\u54C9'.

usage

The most interesting part - is to experiment and play a bit with this command.

For this purpose, let’s create a few files in some directory and play a bit. content of the file can be anything - in my case, I just grab 2 source files from some old project :].

We can search for some part of the string. Thus I’m using source code, I want to search for : HTTPRequest subclass:

grep -i ':\sHTTPRequest' <files>

The result:

/Users/khorbushko/Desktop/grep-play/DeployerGETAllAreasRequest.swift:struct DeployerGETAllAreasRequest: HTTPRequest {
/Users/khorbushko/Desktop/grep-play/DeployerFetchAllTaskGETRequest.swift:struct DeployerFetchAllTaskGETRequest: HTTPRequest {

The true power starts when we combine different utilizes with grep.

The following example will print a list of all files in a directory:

ls /Users/khorbushko/Desktop/grep-play | grep '[a-zA-Z]*.[a-zA-Z]*'

Output:

DeployerFetchAllTaskGETRequest.swift
DeployerGETAllAreasRequest.swift
марафон.pdf

If we want to list only swift files - just change the pattern - '[a-zA-Z]*.swift'

DeployerFetchAllTaskGETRequest.swift
DeployerGETAllAreasRequest.swift

Or even better - we can search all files in the folder:

grep -rni '<pattern>' *

where

  • r - recursive
  • n - line number
  • i - case insensetive

Example:

grep -rni 'HTTP' *                     
DeployerFetchAllTaskGETRequest.swift:11:struct DeployerFetchAllTaskGETRequest: HTTPRequest {
DeployerFetchAllTaskGETRequest.swift:28:  var method: HTTPMethod {
DeployerFetchAllTaskGETRequest.swift:32:  var endPoint: HTTPEndPoint {
DeployerGETAllAreasRequest.swift:11:struct DeployerGETAllAreasRequest: HTTPRequest {
DeployerGETAllAreasRequest.swift:16:  var endPoint: HTTPEndPoint {
DeployerGETAllAreasRequest.swift:20:  var method: HTTPMethod {

Search and replace:

grep -rl 'HTTP' * | xargs sed -i '' 's/HTTP/http/g'

where

  • r - recursive
  • l - --files-with-matches - if in file found something it’s return file name once instead of line name

Examples are countless. This tool is very powerful, especially if u combine it with others.

check all options here

conclusion

Well, that was a short intro into the beautiful world of grep. I believe that at first, it may look strange to u, but as soon as u start to use it, u realize that this utility is just for u.

resources