glob
is a general term used to define techniques to match specified patterns according to rules related to Unix shell. Linux and Unix systems and shells also support glob and also provide function glob() in system libraries. In this tutorial, we will look glob()
function usage in Python programming language.
Import Glob Module
In order to use glob()
and related functions we need to import the glob
module. Keep in mind that glob
module contains glob()
and other related functions.
import glob

Exact String Search
We will start with a simple example. We will look how to match exact string or file name with a absolute path. In this example we will list file /home/ismail/poftut.c
. We can see example below that the function returns a list which contains matches.
glob.glob("/home/ismail/poftut.c")

Wildcards
Wildcard is important glob operator for glob operations. Wildcard or asterisk is used to match zero or more characters. Wildcard specified that there may be zero character or multiple character where character is not important. In this exmaple we will match files those have .txt
extension.
glob.glob("/home/ismail/*.txt")

As we can see that there are a lot of .txt
files those return in a Python list.
Wildcards with Multilevel Directories
We can use wildcards in order to specify multilevel directories. If we want to search one level down directories for specified glob we will use /*/
. In this example, we search for .txt
files in one level down directories in /home/ismail
. This is also called “glob glob” because we use the module name glob and the function glob which is provided by the glob module.
glob.glob("/home/ismail/*/*.txt")

Single Character Wildcard
There is a question mark which is used to match single character. This can be useful if we do not know single character for given name. In this example we will match files with files file?.txt
file where these will match
- file.txt
- file1.txt
- file5.txt
- …
glob.glob("/home/ismail/file?.txt")
Multiple Characters
Glob also supports for alphabetic and numeric characters too. We can use [
to start character range and ]
is used to end character range. We can put whatever we want to match between square brackets. In this example we will match files and folders names those starts one of e,m,p
.
glob.glob("/home/ismail/[emp]*.tx?")

Number Ranges
In some cases, we may want to match the number range. We can use -
dash to specify start and end numbers. In this example, we will match 0 to 9 with 0-9
. In this example, we will match file and folder names that contain numbers from 0 to 9.
glob.glob("/home/ismail/*[0-9]*")

Alphabet Ranges
We can also define Alphabet ranges similar to number ranges. we will use a-z
for lowercase characters where A-Z for uppercase characters. What if we need to match lower and uppercase characters in a single statement. We can use a-Z to match both lower and uppercase letters. In this example, we will match files and folder names those starts with letters between a
and c
glob.glob("/home/ismail/[a-c]*")

Return Generator with iglob() Mehtod
Generally glob method is used to list files for the specified patterns. But in some cases listing and storing them can be a tedious work. So iglob()
function can be used to create an iterator which can be used to iterate the file names with the next()
function.
import glob
gen = glob.iglob("*.txt")
for item in gen:
print(item)

Skip Specific Characters with escape() Method
escape()
function can be used to skip or do not list some files those names has specifies characters. For example if we want to skip the files those names contains -
or _
or #
we can use the escape() function by providing these characters.
chars_skip = "-_#"
for char_skip in chars_skip:
esc_set = "*" + glob.escape(char_skip)+ "*" + ".txt"
for txt in (glob.glob(esc_set)):
print(txt)
glob.glob(“/home/ismail/*[0-9]{2}*”)