Files
jobtrackingapp/tools/summarizer/README.md
T

1.2 KiB

Local AI Service

This service runs a local Hugging Face summarization model and also exposes document text extraction with OCR for supported PDFs and images.

Capabilities

  • job/role summarization
  • PDF text extraction
  • OCR fallback for scanned PDFs
  • OCR for image uploads (png, jpg, jpeg, webp)
  • DOCX / TXT / MD extraction

Install

Windows:

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
python -m uvicorn app:app --host 127.0.0.1 --port 8001 --workers 1

Linux / macOS:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python -m uvicorn app:app --host 127.0.0.1 --port 8001 --workers 1

Docker

The Dockerfile installs Tesseract OCR so scanned PDFs and supported images can be processed inside the container.

API

  • GET /health — health check and runtime capabilities
  • POST /summarize — JSON body { "text": "...", "max_length": 150, "min_length": 30 }
  • POST /extract-text — multipart file upload, returns extracted text and OCR metadata

Notes

  • Model weights are downloaded on first run.
  • OCR quality depends on scan quality and language support.
  • Default OCR language is English (eng).