cesnimda/jobtrackingapp

Fork 0

Files

T

History

cesnimda 54abc9f546 Use Ollama rewrite path for CV generation

2026-04-11 22:26:03 +02:00

scripts

chore: add summarizer bootstrap test script

2026-04-01 13:13:16 +02:00

tests

Use Ollama rewrite path for CV generation

2026-04-11 22:26:03 +02:00

.dockerignore

refactor, security updates, cv extraction upgrades

2026-04-11 01:34:32 +02:00

app.py

Use Ollama rewrite path for CV generation

2026-04-11 22:26:03 +02:00

Dockerfile

Evolve summarizer into AI service with OCR support

2026-03-23 20:12:34 +01:00

pytest.ini

chore: add summarizer bootstrap test script

2026-04-01 13:13:16 +02:00

README.md

refactor, security updates, cv extraction upgrades

2026-04-11 01:34:32 +02:00

requirements-dev.txt

Refactor backend project and tighten CV test coverage

2026-04-01 10:42:55 +02:00

requirements.txt

Add python-multipart to AI service

2026-03-27 13:53:14 +01:00

uvicorn.err

First Commit

2026-03-21 11:55:27 +01:00

uvicorn.out

First Commit

2026-03-21 11:55:27 +01:00

README.md

Local AI Service

This service runs a local Hugging Face summarization model and also exposes document text extraction with OCR for supported PDFs and images.

Capabilities

job/role summarization
PDF text extraction
OCR fallback for scanned PDFs
OCR for image uploads (png, jpg, jpeg, webp)
DOCX / TXT / MD extraction
optional Ollama-backed CV block classification for harder sectioning

Install

Windows:

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
python -m uvicorn app:app --host 127.0.0.1 --port 8001 --workers 1

Linux / macOS:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python -m uvicorn app:app --host 127.0.0.1 --port 8001 --workers 1

If the host is missing python3-venv or pip, use the bootstrap script instead:

./scripts/bootstrap-and-test.sh bootstrap

Docker

The Dockerfile installs Tesseract OCR so scanned PDFs and supported images can be processed inside the container.

Tests

Run the summarizer unit tests with:

./scripts/bootstrap-and-test.sh test

The script:

creates .venv with stdlib venv when available
falls back to user-space virtualenv when host venv support is missing
installs requirements-dev.txt
writes pytest cache under tmp/pytest-cache to avoid stale root-owned .pytest_cache directories

API

GET /health — health check and runtime capabilities, including lazy model state (model_loaded, model_disabled, summarize_available, model_load_error) plus Ollama version/model metadata when configured
POST /summarize — JSON body { "text": "...", "max_length": 150, "min_length": 30 }
POST /extract-text — multipart file upload, returns extracted text and OCR metadata
POST /cv/classify-block — JSON body { "block": "..." }, uses Ollama when OLLAMA_MODEL is configured

Ollama

Set these before starting the service if you want the hybrid CV classifier enabled:

export OLLAMA_BASE_URL=http://ollama:11434
export OLLAMA_MODEL=qwen2.5:7b

Choose the model by setting OLLAMA_MODEL and then warming it with the helper script:

OLLAMA_MODEL=qwen2.5:7b ./scripts/start-ollama-cv.sh

Equivalent manual flow:

docker compose up -d ollama
docker compose exec ollama ollama pull qwen2.5:7b
docker compose up -d ai-service

Model weights are downloaded on first pull.
OCR quality depends on scan quality and language support.
Default OCR language is English (eng).