2025 12 18 discovery phase 1
Discovery- Document Analytics
Build a useful product, that people would pay for.
Post to forums - Rainmateer and india foss
-- Build a streaming example -
--
Free version- max 3 pages
External - costs per conversion
-- Build a load balancer like setup with streaming + queue support.
Make 1 page processing at a time .
Addd image + pdf support--
Aggressively market the product
--
Internal -
Show tokens consumed
Build evals for accuracy - find ocr dataset
Test different prompts - to get good accuracy
--
Files Api -
Get generated id for each document and return to client /user
Server - Parse one page at a tkme and store info into nosql db
--
digitose - docs - scanned to be cleaned
Find old pdf and get cleaned data
Build the new alexandria
Create dataset - for future use
Batch extraction and processing