Global Regular Expressions Print
grep debug develop utility Estimated reading time: 6 minutesSearching… How many times did u search for something on u’r work during the last few days? 100, 200, or even more times. I think a bit - all our work is about searching. We search for a solution, we search for an answer, we search inputs, code, reason… endless search.
If we do this a lot, the good way to do this - is to optimize this process as well as possible. A good point to start is to inspect existing methods and techniques that allow us to do this in a better way.
history
The problem is not a new one, and we (humans) trying to solve and optimize it for quite a long period of time.
The very interesting page of the history of this process is the moment when regular expression was invented/described by Stephen Cole Kleene.
A regular expression, often called a pattern, specifies a set of strings required for a particular purpose
Regex opens for us a lot more than just a regular search - we can perform searches using patterns.
And one of the utilities that use regex for search is grep - a global search for regular expressions.
We must say ” Thank you!” to Ken Thompson, author of the B language, who create this utility for his text editor ed.
The name is coming from the command g/re/p in the ed editor (this commands print all lines).
There are a lot of modifications for grep (Global Regular Expressions Print):
egrep
(Extended Global Regular Expressions Print) (grep -E
)fgrep
(Fixed-string Global Regular Expressions Print) (grep -F
)pgrep
(Process-ID Global Regular Expressions Print)
theory
I don’t know why, but when I just started no one from my team used this tool. I discover it for myself only a few years later. I was very surprised.
The invocation of the command can be done using the next synopsis:
grep [OPTION...] [PATTERNS] [FILE...]
grep [OPTION...] -e patterns ... [FILE...]
grep [OPTION...] -f file ... [FILE...]
can be zero or more options arguments and zero or more file arguments
env
The behavior of grep
also depends on ENV variables:
- GREP_COLOR
- GREP_COLORS
- LC_ALL, LC_COLLATE, LANG
- LC_ALL, LC_CTYPE, LANG
- LANGUAGE, LC_ALL, LC_MESSAGES, LANG
- POSIXLY_CORRECT
- _N_GNU_nonoption_argv_flags_
- GREP_OPTIONS (deprecated)
I won’t cover all of these variables, just list them here
regular expression
The more interesting part - is if regex patterns that can describe a set of strings.
there are a lot of papers about it like this one
It’s good to know, that grep can work with 3 types of regex - BRE
, ERE
, and PCRE
.
BRE
- basicERE
- extendedPCRE
- perl-compatible
fundamentals
The most important part is the characters that can be used for the construction of a pattern:
- special characters -
.?*+{|()[\^$
- ordinal character - all other
The important moment in patterns - is it can have an operation. Each operation is separated with the {
and }
characters. Operator with {
named as interval expressions.
Here they are:
?
- matched zero or more times+
- matched one or more times{n}
,{n}?
,{n}+
- matched exactly n times{n,}
- matched n or more times{,m}
- matched at most m times{n,m}
- matched at least n times, but not more than m times|
- or, can be used for joining operations*?
- 0 or more times. Match as few times as possible.
- any character- `` - empty string
()
- override precedence rules
character classes/bracket
Bracket - [
and ]
- match any single character listed in it. If u add ˆ
- this means NOT in the list.
Example:
[0123456789]
- any single digit.
We can also use range expression
- 2 char separated with a hyphen (-
).
Example:
[a-z]
- any single char between a and z.
There are also a few classes named available for use - like [:digit:]
.
The full list can be found here
The list of most interesting:
[
and]
- match any single character listed in betweenˆ
- used with[
and]
NOT in the list[:
- open class character:]
- close class character[=
- open equivalence class=]
- close equivalence class[.
- open collating class.]
- close collating class-
- range
backslash/special expr
The character \
with followed by some ordinary characters, has next meaning:
\
- escape char\b
- empty string at the edge of a word\B
- empty string provided it’s not at the edge of a word\<
- empty string at the beginning of a word\>
- empty string at the end of a word.\w
- word constituent\W
- non-word constituent\s
- whitespace\S
- non-whitespace
anchors
Symbols that represent the beginning and end of the string:
ˆ
- beginning of the line$
- end of the line
back-reference
We can also match the substring previously matched:
\n
- example(a)\1
matchaa
non-ASCII/non-printable
To do this - we can use ascii codes like grep $'\u035B\t\u54C9'
.
usage
The most interesting part - is to experiment and play a bit with this command.
For this purpose, let’s create a few files in some directory and play a bit. content of the file can be anything - in my case, I just grab 2 source files from some old project :].
We can search for some part of the string. Thus I’m using source code, I want to search for : HTTPRequest
subclass:
grep -i ':\sHTTPRequest' <files>
The result:
/Users/khorbushko/Desktop/grep-play/DeployerGETAllAreasRequest.swift:struct DeployerGETAllAreasRequest: HTTPRequest {
/Users/khorbushko/Desktop/grep-play/DeployerFetchAllTaskGETRequest.swift:struct DeployerFetchAllTaskGETRequest: HTTPRequest {
The true power starts when we combine different utilizes with grep
.
The following example will print a list of all files in a directory:
ls /Users/khorbushko/Desktop/grep-play | grep '[a-zA-Z]*.[a-zA-Z]*'
Output:
DeployerFetchAllTaskGETRequest.swift
DeployerGETAllAreasRequest.swift
марафон.pdf
If we want to list only swift files - just change the pattern - '[a-zA-Z]*.swift'
DeployerFetchAllTaskGETRequest.swift
DeployerGETAllAreasRequest.swift
Or even better - we can search all files in the folder:
grep -rni '<pattern>' *
where
r
- recursiven
- line numberi
- case insensetive
Example:
grep -rni 'HTTP' *
DeployerFetchAllTaskGETRequest.swift:11:struct DeployerFetchAllTaskGETRequest: HTTPRequest {
DeployerFetchAllTaskGETRequest.swift:28: var method: HTTPMethod {
DeployerFetchAllTaskGETRequest.swift:32: var endPoint: HTTPEndPoint {
DeployerGETAllAreasRequest.swift:11:struct DeployerGETAllAreasRequest: HTTPRequest {
DeployerGETAllAreasRequest.swift:16: var endPoint: HTTPEndPoint {
DeployerGETAllAreasRequest.swift:20: var method: HTTPMethod {
Search and replace:
grep -rl 'HTTP' * | xargs sed -i '' 's/HTTP/http/g'
where
r
- recursivel
---files-with-matches
- if in file found something it’s return file name once instead of line name
Examples are countless. This tool is very powerful, especially if u combine it with others.
check all options here
conclusion
Well, that was a short intro into the beautiful world of grep
. I believe that at first, it may look strange to u, but as soon as u start to use it, u realize that this utility is just for u.
resources
Share on: