Project Brief
System Architecture
- Frontend (React + Vite + TypeScript): public browsing, admin upload/editor flows, and Keycloak login handling.
- Backend (Express + Prisma + MySQL): API orchestration, metadata persistence, protected admin endpoints, and yearbook/person CRUD.
- OCR service (FastAPI + OpenCV + RapidOCR): text extraction and YuNet-based face detection for candidate boxes.
- Infrastructure: Docker Compose orchestration for frontend, backend, and OCR microservice communication.
How I Built It
- Chunked upload flow (/upload/init, /upload/chunk, /upload/complete) to avoid failures on large files.
- Background processing session in the backend with progress state and ETA tracking.
- Face detection + OCR extraction in the Python service, then structured candidate data returned to the backend.
- Server-Sent Events progress stream so users see live status instead of waiting on blocking requests.
- Manual correction editor with zoom/pan, box editing, add/remove person tools, and final persistence to MySQL.
Security and Access Model
Challenges and Engineering Decisions
- Large-image stability: Chunked upload and high server timeouts reduced failed requests during heavy processing.
- OCR quality on real data: I used PP-OCRv5 Latin recognition plus image preprocessing to improve name extraction quality.
- Duplicate/false face boxes: Non-max suppression and geometric filtering helped keep face results usable.
- User trust in long tasks: SSE progress and explicit step messaging improved the processing UX significantly.