Files
jobtrackingapp/tools/summarizer/README.md
T

44 lines
1.2 KiB
Markdown

# Local AI Service
This service runs a local Hugging Face summarization model and also exposes document text extraction with OCR for supported PDFs and images.
## Capabilities
- job/role summarization
- PDF text extraction
- OCR fallback for scanned PDFs
- OCR for image uploads (`png`, `jpg`, `jpeg`, `webp`)
- DOCX / TXT / MD extraction
## Install
Windows:
```powershell
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
python -m uvicorn app:app --host 127.0.0.1 --port 8001 --workers 1
```
Linux / macOS:
```bash
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python -m uvicorn app:app --host 127.0.0.1 --port 8001 --workers 1
```
## Docker
The Dockerfile installs Tesseract OCR so scanned PDFs and supported images can be processed inside the container.
## API
- `GET /health` — health check and runtime capabilities
- `POST /summarize` — JSON body `{ "text": "...", "max_length": 150, "min_length": 30 }`
- `POST /extract-text` — multipart file upload, returns extracted text and OCR metadata
## Notes
- Model weights are downloaded on first run.
- OCR quality depends on scan quality and language support.
- Default OCR language is English (`eng`).