Member-only story

How to split PDF files in multi-level directories with Python

Use os.walk() and PyPDF2 to automate pdf file splitting from multiple sub-directories with a Python script

4 min readAug 28, 2022

Background

If you are reading this you know, most office type jobs require repetitive tasks. This is where having a bit of Python knowledge comes in handy. In fact, there is a great book published on this subject by Al Sweigart entitled Automate the Boring Stuff with Python. This post builds on some topics covered in aforementioned, and walks through a real script I use in my day-to-day workplace to automate an otherwise mundane task.

Task

I have a root directory with multiple sub-directories spanning the alphabet from A-Z. When I scan in documents, I do so in alphabetical chunks, scanning each letter in its own batch. However, I am left with a multiple page document when I need them each individually. Now I could scan in each one at a time, but that would be a time consuming process.

Here is a snippet of what my hypothetical directory looks like:

FY22
  |- A
    |- a_doc.pdf
    |- a2_doc.pdf  |- B
    |- b_doc.pdf
  |- C
    |- c_doc.pdf

How to split PDF files in multi-level directories with Python

Use os.walk() and PyPDF2 to automate pdf file splitting from multiple sub-directories with a Python script

Background

Task

Packages

Written by Justin Morgan Williams

No responses yet