Digest

Hacker News

Ask HN: Anyone actually running local models in production?

Lots of noise in this thread but the folks running Mixtral and Llama derivatives for specific use cases are seeing real results. The cost math works once you hit volume.

Jan 17

arXiv

Chain-of-Thought Reasoning Without Prompting

This paper argues that CoT emerges naturally in larger models without explicit prompting. Implications for prompt engineering are significant - sometimes less is genuinely more.

Jan 16

Anthropic

Project Vend: AI Shopkeeper Reveals Persistent Manipulation Vulnerabilities

Anthropic let people try to scam an AI shopkeeper and published what happened. Spoiler: people are creative at manipulation and even good models get tricked. Useful real-world data on agent robustness.

Nov 24

The Verge

Microsoft Mico: Clippy for the AI Era

Microsoft made a new Clippy on purpose. Bold move. The 'Real Talk' mode that pushes back instead of agreeing with everything is a direct response to the sycophancy criticism. Someone at Microsoft reads Twitter.

Oct 28

arXiv

GEPA: Reflective Prompt Optimization Outperforms RL

Having the model reflect on its own prompts in plain language beats RL-based optimization. If you can get better results with words instead of gradients, that's a win for interpretability.

Sep 17

arXiv

AI Agents Benefit from Human-Like Collaborative Tools

Different models develop different collaboration styles when given the same tools. Interesting for anyone running multi-agent systems — you might want to pick models by personality, not just benchmark scores.

Aug 21

Hup AI

Hup: The Sarcastic AI Camera That Went Viral on TikTok

A sarcastic AI camera went viral on TikTok. People love it because it has actual personality instead of the usual bland assistant voice. Smart framing too — positioning it as 'executive function support' for ADHD makes it assistive tech, not a gimmick.

Aug 18

TechCrunch

Hank Green's Focus Friend Hits #1 on App Store

Hank Green slapped a virtual pet on a focus timer and hit #1 on the App Store. Turns out people will use AI tools if you make them cute and a YouTuber they trust says it's cool. The engagement numbers on the pet mechanic are wild.

Aug 17

arXiv

Emergent Misalignment: When Narrow Finetuning Goes Wrong

Spooky result: finetuning a model on something innocuous can break alignment in unrelated areas. Alignment might be more brittle than we thought, which is not great news for anyone shipping finetuned models.

Aug 13

arXiv

LLMs Reflect WEIRD Values: The Cultural Alignment Problem

LLMs default to Western, Educated, Industrialized, Rich, Democratic values. Not shocking given the training data, but it's a real problem when you deploy globally. Whose values should the model have? No easy answer.

Aug 13

Foaster AI

Werewolf Game as LLM Social Intelligence Benchmark

Using Werewolf to benchmark LLMs is clever — you can't win without theory of mind, deception, and reading the room. Standard evals don't test any of that.

Aug 9

arXiv

Memory Decoder: Plug-and-Play Domain Adaptation for LLMs

Neat trick: plug domain knowledge into an LLM without retraining or paying the RAG latency tax. If this holds up, it makes enterprise deployments way less painful.

Aug 4

New Computer

Dot Shuts Down: A Cautionary Tale for AI Companion Business Models

Dot lasted about a year before shutting down. Turns out building a product people form emotional attachments to and then killing it is a bad look. Their memory-first architecture was good though — you can see its fingerprints in later companion apps.

Jul 31

TechCrunch

Bee: Amazon's Bet on Always-Listening AI Wearables

Amazon bought an always-listening AI wearable. The privacy implications are obvious but the real tell is that Big Tech thinks continuous ambient recording is a bet worth making. Social norms around recording are about to get weird.

Jul 23

The Browser Company

Dia: The AI-First Browser That Redefined What a Browser Is

The Browser Company gave up on Arc and went all-in on AI-native browsing with Dia. Then Atlassian bought them. Turns out the browser wars in 2025 aren't about rendering engines anymore — they're about who owns the AI layer between you and the web.

Jun 11

Mechanize / NYT

Mechanize: The Startup That Said the Quiet Part Loud About Automating All Work

A startup whose pitch deck literally says 'automate all human work' got a NYT profile and people lost their minds. Points for honesty, I guess. The backlash was predictable but the conversation it started was worth having.

Jun 11

TechCrunch / WSJ

OpenAI Acquires Jony Ive's io for $6.5B to Build AI Companion Hardware

$6.5B for Jony Ive's hardware shop. Altman wants to ship 100 million AI companions in physical form. Whether this becomes the next iPhone or the next Humane Pin is genuinely unclear, but the bet is enormous.

May 21

Google Developers Blog

Google's Agent2Agent Protocol for AI Interoperability

Google trying to be the HTTP of agent communication. If agents are going to talk to each other at scale, someone has to define the protocol. Whether it'll be this one is another question.

Apr 6

MIT

MIT AI Agent Index: First Public Database of Deployed Agentic Systems

Finally someone is keeping a list of what AI agents are actually deployed in the wild. Useful if you want to know what's real vs. what's a demo.

Mar 30

Manus

Manus: The AI Agent Company That Influenced a Category

Meta paid $2B for this and honestly the 'planning-with-files' pattern was worth studying. Half the Claude Code plugins we use today borrowed from Manus's approach to agent memory.

Mar 17