# Local AI Service This service runs a local Hugging Face summarization model and also exposes document text extraction with OCR for supported PDFs and images. ## Capabilities - job/role summarization - PDF text extraction - OCR fallback for scanned PDFs - OCR for image uploads (`png`, `jpg`, `jpeg`, `webp`) - DOCX / TXT / MD extraction - optional Ollama-backed CV block classification for harder sectioning ## Install Windows: ```powershell python -m venv .venv .\.venv\Scripts\Activate.ps1 pip install -r requirements.txt python -m uvicorn app:app --host 127.0.0.1 --port 8001 --workers 1 ``` Linux / macOS: ```bash python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt python -m uvicorn app:app --host 127.0.0.1 --port 8001 --workers 1 ``` If the host is missing `python3-venv` or `pip`, use the bootstrap script instead: ```bash ./scripts/bootstrap-and-test.sh bootstrap ``` ## Docker The Dockerfile installs Tesseract OCR so scanned PDFs and supported images can be processed inside the container. ## Tests Run the summarizer unit tests with: ```bash ./scripts/bootstrap-and-test.sh test ``` The script: - creates `.venv` with stdlib `venv` when available - falls back to user-space `virtualenv` when host `venv` support is missing - installs `requirements-dev.txt` - writes pytest cache under `tmp/pytest-cache` to avoid stale root-owned `.pytest_cache` directories ## API - `GET /health` — health check and runtime capabilities, including Ollama version/model metadata when configured - `POST /summarize` — JSON body `{ "text": "...", "max_length": 150, "min_length": 30 }` - `POST /extract-text` — multipart file upload, returns extracted text and OCR metadata - `POST /cv/classify-block` — JSON body `{ "block": "..." }`, uses Ollama when `OLLAMA_MODEL` is configured ## Ollama Set these before starting the service if you want the hybrid CV classifier enabled: ```bash export OLLAMA_BASE_URL=http://ollama:11434 export OLLAMA_MODEL=qwen2.5:7b ``` Choose the model by setting `OLLAMA_MODEL` and then warming it with the helper script: ```bash OLLAMA_MODEL=qwen2.5:7b ./scripts/start-ollama-cv.sh ``` Equivalent manual flow: ```bash docker compose up -d ollama docker compose exec ollama ollama pull qwen2.5:7b docker compose up -d ai-service ``` - Model weights are downloaded on first pull. - OCR quality depends on scan quality and language support. - Default OCR language is English (`eng`).