cesnimda/jobtrackingapp

Files

T

cesnimda 653f713a78 Evolve summarizer into AI service with OCR support

2026-03-23 20:12:34 +01:00

1.2 KiB

Raw Blame History

Local AI Service

This service runs a local Hugging Face summarization model and also exposes document text extraction with OCR for supported PDFs and images.

Capabilities

job/role summarization
PDF text extraction
OCR fallback for scanned PDFs
OCR for image uploads (png, jpg, jpeg, webp)
DOCX / TXT / MD extraction

Install

Windows:

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
python -m uvicorn app:app --host 127.0.0.1 --port 8001 --workers 1

Linux / macOS:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python -m uvicorn app:app --host 127.0.0.1 --port 8001 --workers 1

Docker

The Dockerfile installs Tesseract OCR so scanned PDFs and supported images can be processed inside the container.

API

GET /health — health check and runtime capabilities
POST /summarize — JSON body { "text": "...", "max_length": 150, "min_length": 30 }
POST /extract-text — multipart file upload, returns extracted text and OCR metadata

Notes

Model weights are downloaded on first run.
OCR quality depends on scan quality and language support.
Default OCR language is English (eng).