Ask HN: Anyone actually running local models in production?
Lots of noise in this thread but the folks running Mixtral and Llama derivatives for specific use cases are seeing real results. The cost math works once you hit volume.
Jan 17What we're reading.
Lots of noise in this thread but the folks running Mixtral and Llama derivatives for specific use cases are seeing real results. The cost math works once you hit volume.
Jan 17This paper argues that CoT emerges naturally in larger models without explicit prompting. Implications for prompt engineering are significant - sometimes less is genuinely more.
Jan 16Anthropic let people try to scam an AI shopkeeper and published what happened. Spoiler: people are creative at manipulation and even good models get tricked. Useful real-world data on agent robustness.
Nov 24Microsoft made a new Clippy on purpose. Bold move. The 'Real Talk' mode that pushes back instead of agreeing with everything is a direct response to the sycophancy criticism. Someone at Microsoft reads Twitter.
Oct 28Having the model reflect on its own prompts in plain language beats RL-based optimization. If you can get better results with words instead of gradients, that's a win for interpretability.
Sep 17Different models develop different collaboration styles when given the same tools. Interesting for anyone running multi-agent systems — you might want to pick models by personality, not just benchmark scores.
Aug 21A sarcastic AI camera went viral on TikTok. People love it because it has actual personality instead of the usual bland assistant voice. Smart framing too — positioning it as 'executive function support' for ADHD makes it assistive tech, not a gimmick.
Aug 18Hank Green slapped a virtual pet on a focus timer and hit #1 on the App Store. Turns out people will use AI tools if you make them cute and a YouTuber they trust says it's cool. The engagement numbers on the pet mechanic are wild.
Aug 17Spooky result: finetuning a model on something innocuous can break alignment in unrelated areas. Alignment might be more brittle than we thought, which is not great news for anyone shipping finetuned models.
Aug 13LLMs default to Western, Educated, Industrialized, Rich, Democratic values. Not shocking given the training data, but it's a real problem when you deploy globally. Whose values should the model have? No easy answer.
Aug 13Using Werewolf to benchmark LLMs is clever — you can't win without theory of mind, deception, and reading the room. Standard evals don't test any of that.
Aug 9Neat trick: plug domain knowledge into an LLM without retraining or paying the RAG latency tax. If this holds up, it makes enterprise deployments way less painful.
Aug 4Dot lasted about a year before shutting down. Turns out building a product people form emotional attachments to and then killing it is a bad look. Their memory-first architecture was good though — you can see its fingerprints in later companion apps.
Jul 31Amazon bought an always-listening AI wearable. The privacy implications are obvious but the real tell is that Big Tech thinks continuous ambient recording is a bet worth making. Social norms around recording are about to get weird.
Jul 23The Browser Company gave up on Arc and went all-in on AI-native browsing with Dia. Then Atlassian bought them. Turns out the browser wars in 2025 aren't about rendering engines anymore — they're about who owns the AI layer between you and the web.
Jun 11A startup whose pitch deck literally says 'automate all human work' got a NYT profile and people lost their minds. Points for honesty, I guess. The backlash was predictable but the conversation it started was worth having.
Jun 11$6.5B for Jony Ive's hardware shop. Altman wants to ship 100 million AI companions in physical form. Whether this becomes the next iPhone or the next Humane Pin is genuinely unclear, but the bet is enormous.
May 21Google trying to be the HTTP of agent communication. If agents are going to talk to each other at scale, someone has to define the protocol. Whether it'll be this one is another question.
Apr 6Finally someone is keeping a list of what AI agents are actually deployed in the wild. Useful if you want to know what's real vs. what's a demo.
Mar 30Meta paid $2B for this and honestly the 'planning-with-files' pattern was worth studying. Half the Claude Code plugins we use today borrowed from Manus's approach to agent memory.
Mar 17