How to split PDF files in multi-level directories with Python

Use os.walk() and PyPDF2 to automate pdf file splitting from multiple sub-directories with a Python script

Justin Morgan Williams
4 min readAug 28, 2022
Photo by Drew Beamer on Unsplash

Background

If you are reading this you know, most office type jobs require repetitive tasks. This is where having a bit of Python knowledge comes in handy. In fact, there is a great book published on this subject by Al Sweigart entitled Automate the Boring Stuff with Python. This post builds on some topics covered in aforementioned, and walks through a real script I use in my day-to-day workplace to automate an otherwise mundane task.

Task

I have a root directory with multiple sub-directories spanning the alphabet from A-Z. When I scan in documents, I do so in alphabetical chunks, scanning each letter in its own batch. However, I am left with a multiple page document when I need them each individually. Now I could scan in each one at a time, but that would be a time consuming process.

Here is a snippet of what my hypothetical directory looks like:

FY22
|- A
|- a_doc.pdf
|- a2_doc.pdf
|- B
|- b_doc.pdf
|- C
|- c_doc.pdf

Packages

--

--

Justin Morgan Williams

Data scientist passionate about the intersectionality of sustainability and data.