Files
jobtrackingapp/tools/summarizer/README.md
T

1.8 KiB

Local AI Service

This service runs a local Hugging Face summarization model and also exposes document text extraction with OCR for supported PDFs and images.

Capabilities

  • job/role summarization
  • PDF text extraction
  • OCR fallback for scanned PDFs
  • OCR for image uploads (png, jpg, jpeg, webp)
  • DOCX / TXT / MD extraction
  • optional Ollama-backed CV block classification for harder sectioning

Install

Windows:

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
python -m uvicorn app:app --host 127.0.0.1 --port 8001 --workers 1

Linux / macOS:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python -m uvicorn app:app --host 127.0.0.1 --port 8001 --workers 1

Docker

The Dockerfile installs Tesseract OCR so scanned PDFs and supported images can be processed inside the container.

API

  • GET /health — health check and runtime capabilities
  • POST /summarize — JSON body { "text": "...", "max_length": 150, "min_length": 30 }
  • POST /extract-text — multipart file upload, returns extracted text and OCR metadata
  • POST /cv/classify-block — JSON body { "block": "..." }, uses Ollama when OLLAMA_MODEL is configured

Ollama

Set these before starting the service if you want the hybrid CV classifier enabled:

export OLLAMA_BASE_URL=http://ollama:11434
export OLLAMA_MODEL=qwen2.5:7b

Choose the model by setting OLLAMA_MODEL and then warming it with the helper script:

OLLAMA_MODEL=qwen2.5:7b ./scripts/start-ollama-cv.sh

Equivalent manual flow:

docker compose up -d ollama
docker compose exec ollama ollama pull qwen2.5:7b
docker compose up -d ai-service
  • Model weights are downloaded on first pull.
  • OCR quality depends on scan quality and language support.
  • Default OCR language is English (eng).