Improve CV parsing and profile editor flow

2026-03-29 14:29:18 +02:00
parent 99fc94bc18
commit 44000f96f2
18 changed files with 1028 additions and 44 deletions
@@ -8,6 +8,7 @@ This service runs a local Hugging Face summarization model and also exposes docu
 - OCR fallback for scanned PDFs
 - OCR for image uploads (`png`, `jpg`, `jpeg`, `webp`)
 - DOCX / TXT / MD extraction
+- optional Ollama-backed CV block classification for harder sectioning

 ## Install

@@ -36,8 +37,30 @@ The Dockerfile installs Tesseract OCR so scanned PDFs and supported images can b
 - `GET /health` — health check and runtime capabilities
 - `POST /summarize` — JSON body `{ "text": "...", "max_length": 150, "min_length": 30 }`
 - `POST /extract-text` — multipart file upload, returns extracted text and OCR metadata
+- `POST /cv/classify-block` — JSON body `{ "block": "..." }`, uses Ollama when `OLLAMA_MODEL` is configured

-## Notes
- Model weights are downloaded on first run.
+## Ollama
+Set these before starting the service if you want the hybrid CV classifier enabled:
+
+```bash
+export OLLAMA_BASE_URL=http://ollama:11434
+export OLLAMA_MODEL=qwen2.5:7b
+```
+
+Choose the model by setting `OLLAMA_MODEL` and then warming it with the helper script:
+
+```bash
+OLLAMA_MODEL=qwen2.5:7b ./scripts/start-ollama-cv.sh
+```
+
+Equivalent manual flow:
+
+```bash
+docker compose up -d ollama
+docker compose exec ollama ollama pull qwen2.5:7b
+docker compose up -d ai-service
+```
+
+- Model weights are downloaded on first pull.
 - OCR quality depends on scan quality and language support.
 - Default OCR language is English (`eng`).