67 lines
1.9 KiB
Markdown
67 lines
1.9 KiB
Markdown
# Local AI Service
|
|
|
|
This service runs a local Hugging Face summarization model and also exposes document text extraction with OCR for supported PDFs and images.
|
|
|
|
## Capabilities
|
|
- job/role summarization
|
|
- PDF text extraction
|
|
- OCR fallback for scanned PDFs
|
|
- OCR for image uploads (`png`, `jpg`, `jpeg`, `webp`)
|
|
- DOCX / TXT / MD extraction
|
|
- optional Ollama-backed CV block classification for harder sectioning
|
|
|
|
## Install
|
|
|
|
Windows:
|
|
|
|
```powershell
|
|
python -m venv .venv
|
|
.\.venv\Scripts\Activate.ps1
|
|
pip install -r requirements.txt
|
|
python -m uvicorn app:app --host 127.0.0.1 --port 8001 --workers 1
|
|
```
|
|
|
|
Linux / macOS:
|
|
|
|
```bash
|
|
python3 -m venv .venv
|
|
source .venv/bin/activate
|
|
pip install -r requirements.txt
|
|
python -m uvicorn app:app --host 127.0.0.1 --port 8001 --workers 1
|
|
```
|
|
|
|
## Docker
|
|
The Dockerfile installs Tesseract OCR so scanned PDFs and supported images can be processed inside the container.
|
|
|
|
## API
|
|
- `GET /health` — health check and runtime capabilities, including Ollama version/model metadata when configured
|
|
- `POST /summarize` — JSON body `{ "text": "...", "max_length": 150, "min_length": 30 }`
|
|
- `POST /extract-text` — multipart file upload, returns extracted text and OCR metadata
|
|
- `POST /cv/classify-block` — JSON body `{ "block": "..." }`, uses Ollama when `OLLAMA_MODEL` is configured
|
|
|
|
## Ollama
|
|
Set these before starting the service if you want the hybrid CV classifier enabled:
|
|
|
|
```bash
|
|
export OLLAMA_BASE_URL=http://ollama:11434
|
|
export OLLAMA_MODEL=qwen2.5:7b
|
|
```
|
|
|
|
Choose the model by setting `OLLAMA_MODEL` and then warming it with the helper script:
|
|
|
|
```bash
|
|
OLLAMA_MODEL=qwen2.5:7b ./scripts/start-ollama-cv.sh
|
|
```
|
|
|
|
Equivalent manual flow:
|
|
|
|
```bash
|
|
docker compose up -d ollama
|
|
docker compose exec ollama ollama pull qwen2.5:7b
|
|
docker compose up -d ai-service
|
|
```
|
|
|
|
- Model weights are downloaded on first pull.
|
|
- OCR quality depends on scan quality and language support.
|
|
- Default OCR language is English (`eng`).
|