Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #56: Blackwell That Ends Well, published by Zvi on March 23, 2024 on LessWrong.
Hopefully, anyway. Nvidia has a new chip.
Also Altman has a new interview.
And most of Inflection has new offices inside Microsoft.
Table of Contents
Introduction.
Table of Contents.
Language Models Offer...
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #56: Blackwell That Ends Well, published by Zvi on March 23, 2024 on LessWrong.
Hopefully, anyway. Nvidia has a new chip.
Also Altman has a new interview.
And most of Inflection has new offices inside Microsoft.
Table of Contents
Introduction.
Table of Contents.
Language Models Offer Mundane Utility. Open the book.
Clauding Along. Claude continues to impress.
Language Models Don't Offer Mundane Utility. What are you looking for?
Fun With Image Generation. Stable Diffusion 3 paper.
Deepfaketown and Botpocalypse Soon. Jesus Christ.
They Took Our Jobs. Noah Smith has his worst take amd commits to the bit.
Generative AI in Games. What are the important dangers?
Get Involved. EU AI office, IFP, Anthropic.
Introducing. WorldSim. The rabbit hole goes deep, if you want that.
Grok the Grok. Weights are out. Doesn't seem like it matters much.
New Nivida Chip. Who dis?
Inflection Becomes Microsoft AI. Why buy companies when you don't have to?
In Other AI News. Lots of other stuff as well.
Wait Till Next Year. OpenAI employees talk great expectations a year after GPT-4.
Quiet Speculations. Driving cars is hard. Is it this hard?
The Quest for Sane Regulation. Take back control.
The Week in Audio. Sam Altman on Lex Fridman. Will share notes in other post.
Rhetorical Innovation. If you want to warn of danger, also say what is safe.
Read the Roon. What does it all add up to?
Pick Up the Phone. More good international dialogue on AI safety.
Aligning a Smarter Than Human Intelligence is Difficult. Where does safety lie?
Polls Show People Are Worried About AI. This week's is from AIPI.
Other People Are Not As Worried About AI Killing Everyone. Then there's why.
The Lighter Side. Everyone, reaping.
Language Models Offer Mundane Utility
Ethan Mollick on how he uses AI to aid his writing. The central theme is 'ask for suggestions in particular places where you are stuck' and that seems right for most purposes.
Sully is predictably impressed by Claude Haiku, says it offers great value and speed, and is really good with images and long context, suggests using it over GPT-3.5. He claims Cohere Command-R is the new RAG king, crushing it with citations and hasn't hallucinated once, while writing really well if it has context. And he thinks Hermes 2 Pro is 'cracked for agentic function calling,' better for recursive calling than GPT-4, but 4k token limit is an issue.
I believe his reports but also he always looks for the bright side.
Claude does acausal coordination. This was of course Easy Mode.
Claude also successfully solves counterfactual mugging when told it is a probability theorist, but not if it is not told this. Prompting is key. Of course, this also presumes that the user is telling the truth sufficiently often. One must always watch out for that other failure mode, and Claude does not consider the probability the user is lying.
Amr Awadallah notices self-evaluated reports that Cohere Command-R has a very low hallucination rate of 3.7%, below that of Claude Sonnet (6%) and Gemini Pro (4.8%), although GPT-3.5-Turbo is 3.5%.
From Claude 3, describe things at various levels of sophistication (here described as IQ levels, but domain knowledge seems more relevant to which one you will want in such spots). In this case they are describing SuperFocus.ai, which provides custom conversational AIs that claim to avoid hallucinations by drawing on a memory bank you maintain.
However, when looking at it, it seems like the 'IQ 115' and 'IQ 130' descriptions tell you everything you need to know, and the only advantage of the harder to parse 'IQ 145' is that it has a bunch of buzzwords and hype attached. The 'IQ 100' does simplify and drop information in order to be easier to understand, but if you know a lot about AI you can figure out what it is dropping very easily.
Figure out whether a resume ...
View more