Fine-tuning often feels like overkill (and too static), while manual prompt engineering is just tedious guessing games. Self-evolution makes more sense to me conceptually: you don't change the brain (weights), you just let the model practice and take notes.
I wrote LiteEvo to automate this loop. It's a simple CLI that takes a task and a success criterion, then lets the LLM iterate.
The logic is pretty straightforward:
- The model attempts the task. - It gets graded on the output. - It updates a JSON "playbook" with what it learned (e.g., "I failed because X, so next time I should check Y").
It usually takes about 5-10 minutes to converge on a working strategy. The nice part is that the output is just a JSON file you can read and debug, not a binary weight file.
It supports Claude/OpenAI, but I also made sure it works with local models via CLI since that's what I use for testing.