Local AI Service
This service runs a local Hugging Face summarization model and also exposes document text extraction with OCR for supported PDFs and images.
Capabilities
- job/role summarization
- PDF text extraction
- OCR fallback for scanned PDFs
- OCR for image uploads (
png,jpg,jpeg,webp) - DOCX / TXT / MD extraction
Install
Windows:
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
python -m uvicorn app:app --host 127.0.0.1 --port 8001 --workers 1
Linux / macOS:
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python -m uvicorn app:app --host 127.0.0.1 --port 8001 --workers 1
Docker
The Dockerfile installs Tesseract OCR so scanned PDFs and supported images can be processed inside the container.
API
GET /health— health check and runtime capabilitiesPOST /summarize— JSON body{ "text": "...", "max_length": 150, "min_length": 30 }POST /extract-text— multipart file upload, returns extracted text and OCR metadata
Notes
- Model weights are downloaded on first run.
- OCR quality depends on scan quality and language support.
- Default OCR language is English (
eng).