Modernizing PatZilla
PatZilla is a powerful patent research platform with a graphical user interface and command-line tools. As a critical piece of infrastructure for querying large volumes of patent data from sources like the EPO (European Patent Office), its performance and security are paramount. However, the core backend was historically written in Python 2.7—a language version that reached its end-of-life in 2020. This article details my technical contribution to the PatZilla project: architecting and executing a complete migration of the system from legacy Python 2.7 to modern Python 3.10.
Architecture Transition
The following block diagram illustrates the architectural shift from the fragile legacy setup to the robust, modernized environment.
The Final Result
The modernized PatZilla backend now interfaces seamlessly with the original frontend, delivering the same powerful patent search capabilities on a secure, maintainable technology stack.
The PatZilla UI successfully connecting to the modernized Python 3.10 backend.
Returning structured patent results using strict Unicode decoding.
The Challenge: Why Migrate?
The overarching goal of my work is to create custom analytical tools for patents. To build this analytical ecosystem, I am actively evaluating and modernizing foundational patent research platforms, notably my forks of PatZilla and the PQAI tool. When I first audited PatZilla, the environment was fundamentally broken due to its reliance on outdated Python 2 paradigms and deprecated libraries.
Python 2.7 EOL
Python 2.7 reached End-Of-Life in 2020, making legacy deployments unmaintainable.
% Pass Rate
Achieved a 100% pass rate across the core navigator suite and utility test suites.
Tests Stabilized
Worked module-by-module to resolve environmental and API mocking issues across all 334 tests.
Bit RSA Auth
Enforced 2048-bit RSA keys during the refactor of the authentication layer to meet modern standards.
The Implementation: How It Was Done
The migration was approached systematically in three distinct phases: Infrastructure, Code Conversion, and Stabilization. Select a phase below to learn more.
1. Infrastructure & Dependency Triage
Establishing a stable baseline by migrating to modern Debian containers.
2. Codebase Conversion & Byte Strings
Resolving strictly typed bytecode and string issues inherent to Python 3.
3. Stabilization & Security Upgrades
Bringing test coverage back to 100% and updating cryptographic features.
1. Infrastructure & Dependency Triage
The first step was to establish a working baseline. I completely rewrote the Dockerfile to utilize python:3.10-slim-bullseye. This immediately surfaced the next layer of issues: broken requirements. I performed a dependency audit, identifying libraries that had either changed their internal structures or had been abandoned entirely. A critical roadblock was mongodb_gridfs_beaker. Instead of simply removing the caching feature it provided, I engineered a local, modernized version of the library to maintain feature parity while ensuring compatibility with modern pymongo drivers.
Challenges Overcome
Migrating a legacy system often uncovers layers of technical debt. Click each item below to view specific challenges I encountered and resolved.
Conclusion
Migrating a legacy system is rarely just about running an automated conversion script; it requires a deep understanding of the system's architecture, data flow, and external dependencies. By successfully porting PatZilla to Python 3.10, I demonstrated my ability to tackle significant technical debt, architect modern solutions, and ensure continuous reliability for complex enterprise software.