The pathlib package#

The pathlib module in Python provides an interface for working with filesystem paths. It offers a simple way to perform common path manipulations. With pathlib, you can work with paths on different operating systems without changing your code. It’s part of the standard library, so no additional installation is required.

Warning

None of the following code will work on your computer. You will need to provide your own unique paths for your machine.

path objects#

Path objects are the core concept in pathlib. They represent filesystem paths and provide methods and properties to access the file system. Let’s start by importing Path from pathlib:

from pathlib import Path

Here, we create a path object pointing to the Desktop folder for a user named Tyson:

desktop_path = Path('/Users/tyson/Desktop')
desktop_path
PosixPath('/Users/tyson/Desktop')

absolute and relative paths#

An absolute path is a complete path from the root of the filesystem to the desired directory or file, whereas a relative path starts from the current directory.

Here we define an absolute path to the Desktop:

absolute_path = Path('/Users/tyson/Desktop')
absolute_path
PosixPath('/Users/tyson/Desktop')

This is a relative path, indicating the current directory:

relative_path = Path('./')
relative_path
PosixPath('.')

We can check if a path is absolute:

relative_path.is_absolute()
False

To get the full absolute path from a relative path:

# this will create the full absolute path
relative_path_full = relative_path.resolve()

To find out the current working directory:

# this tells me where I am
current_dir = Path().cwd()

constructing paths#

Path objects can be joined using the / operator. This makes it easy to build up paths without worrying about the underlying operating system’s path separator:

desktop_path = Path('/Users/tyson/Desktop')
project_data_path = desktop_path / 'my_project' / 'data.csv'
project_data_path
PosixPath('/Users/tyson/Desktop/my_project/data.csv')

creating and deleting folders and files#

pathlib also makes it easy to create and delete directories and files. To create a folder:

# create a folder
project_path = Path('/Users/tyson/Desktop/my_project')
project_path.mkdir()

To create a file:

# create a file
project_data_path = Path('/Users/tyson/Desktop/my_project/data.csv')
project_data_path.touch()

Deleting a file:

# delete a file
project_data_path.unlink()

And to delete a folder:

# delete a folder
project_path.rmdir()

checking path properties#

You can also check various properties of paths. For example, you can check if a path exists, is a file, or is a directory.

To check if a path exists:

project_path = absolute_path / 'my_project'
project_path.exists()
False

After creating the directory, check again if it exists:

project_path.mkdir()
project_path.exists()
True

To check if a path is a directory:

# only true if exists & is a directory
project_path.is_dir()
True

Or a file:

project_path.is_file()
False

To get the user’s home directory:

# get the user’s home directory
project_path.home()
PosixPath('/Users/tyson')

To extract different parts of the path:

parent_dir     = project_data_path.parent
file_name      = project_data_path.name
file_name_stem = project_data_path.stem
file_suffix    = project_data_path.suffix

print(parent_dir)
print(file_name)
print(file_name_stem)
print(file_suffix)
/Users/tyson/Desktop/my_project
data.csv
data
.csv

finding files and folders#

pathlib provides methods to find files and folders. For example, to find all .csv files in a directory:

# set things up
desktop_path = Path('/Users/tyson/Desktop')
project_path = desktop_path / 'my_project'
project_data_path = Path('/Users/tyson/Desktop/my_project/data.csv')
project_data_path.touch()

# find all of the csv files in the "project_path" folder
list_of_csv_a = list( project_path.glob('*.csv') )
list_of_csv_a
[PosixPath('/Users/tyson/Desktop/my_project/data.csv')]

To recursively find all .csv files in a directory and its subdirectories:

# recursively find all of the csv files in the "project_path" folder and subfolders
list_of_csv_b = list( project_path.rglob('*.csv') )
list_of_csv_b
[PosixPath('/Users/tyson/Desktop/my_project/data.csv')]

Cleaning up by deleting the file and folder created:

# clean up; delete the file and folder
project_data_path.unlink()
project_path.rmdir()