Skip to content

2025 12 18 discovery phase 1

Discovery- Document Analytics

Build a useful product, that people would pay for.

Post to forums - Rainmateer and india foss

-- Build a streaming example -

--

Free version- max 3 pages

External - costs per conversion

-- Build a load balancer like setup with streaming + queue support.

Make 1 page processing at a time .

Addd image + pdf support--

Aggressively market the product

--

Internal -

Show tokens consumed

Build evals for accuracy - find ocr dataset

Test different prompts - to get good accuracy

--

Files Api -

Get generated id for each document and return to client /user

Server - Parse one page at a tkme and store info into nosql db

--

digitose - docs - scanned to be cleaned

Find old pdf and get cleaned data

Build the new alexandria

Create dataset - for future use

Batch extraction and processing