by Cilvic on 4/15/2025, 1:20:21 PM with 1 comments
Hey HN,
I kept pasting screenshots into cursor & chatGPT and it inspired me how much relevant context my screen can provide to multi-modal LLMs.
To make it easier to grab the user context inside of our own apps, together with Fernando and Claire we’ve built an SDK to perform screen capture & OCR. It’s written in Rust and runs on Mac, Windows & Linux.
We have published bindings for Node.js, to try the Node sample:
Hey HN, I kept pasting screenshots into cursor & chatGPT and it inspired me how much relevant context my screen can provide to multi-modal LLMs.
To make it easier to grab the user context inside of our own apps, together with Fernando and Claire we’ve built an SDK to perform screen capture & OCR. It’s written in Rust and runs on Mac, Windows & Linux.
We have published bindings for Node.js, to try the Node sample:
1. Git clone kontext21/k21-node-sample 2. Npm install 3. Npm run start
This will capture what’s on your screen and OCR the text to console.
Everything is local and there is no telemetry. It’s source-available so you can dig deeper if you are curious .
Any feedback is welcome as we are getting started with this
- https://github.com/kontext21/k21-node-sample/ - https://github.com/kontext21/k21 - https://docs.kontext21.com/