Hacker News
new
|
ask
|
show
|
jobs
Real-time LLM Inference on Standard GPUs: 3k tokens/s per request
(blog.kog.ai)
104 points
by
NicoConstant
5 hours ago
|
51 comments
Loading...