🚀

Headlines & Launches

Contextual Retrieval (12 minute read)

Anthropic shows how to semantically chunk documents, which dramatically improves performance while only costing $1/million chunks due to caching.

Prompting o1 (6 minute read)

This guide was missed in the excitement of OpenAI's new reasoning models. It shows how prompting this new model is different and requires simpler prompts and a more structured input context.

Jony Ive confirms he's working on a new device with OpenAI (2 minute read)

Jony Ive is collaborating with OpenAI CEO Sam Altman on a new AI hardware project. The venture could raise $1 billion by year-end and involves key former Apple designers. Specifics of the device remain undetermined, but it aims to leverage generative AI for advanced user interactions.

🧠

Research & Innovation

Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think (24 minute read)

Much work has been done to generalize pretrained image diffusion models into niche depth estimators and other image conditional models. This work found that by simplifying the problem and fixing a small bug, researchers were able to get substantially improved performance with less training compute.

Training Language Models to Self-Correct via Reinforcement Learning (32 minute read)

Deepmind has released a paper that shows promise, even if the actual results aren't state-of-the-art. It shows a reinforcement learning paradigm that can be used to help models self-correct when generating math and code.

Michelangelo: Long Context Evaluations Beyond Haystacks via Latent Structure Queries (21 minute read)

Another great Google paper that shows how to evaluate long context models. It is directionally similar to the recent work by Magic.

🧑‍💻

Attend live to join the discussion! Save your spot here

3D Topia with Diffusion Transformers (GitHub Repo)

Image and text to 3D with mesh smoothing and PBR ready lighting

AI for JQ (GitHub Repo)

Simple yet powerful tool to label, embed, and classify unlabeled text at the command line. It also works on streams which gives it the ability to receive piped input from other sources.

Kyutai Labs Releases Moshi Weights (GitHub Repo)

The highly funded real-time assistant voice startup showed off its impressive tech a few months ago. It has now released a detailed technical report and several model artifacts including code and model weights.

🎁

Miscellaneous

Most powerful LLM on a single GPU (12 minute read)

Solar Pro is a 22B language model that can fit on a single 80GB GPU. The goal of this project is to make the most powerful model that can run on a single device.

Study reveals AI issues in home surveillance (9 minute read)

Researchers find large language models make inconsistent decisions about whether to call the police when analyzing surveillance videos.

SAE Intuitions (18 minute read)

Sparse Autoencoders are the number one tool used today to understand the internals of language models. This post explores the intuitions behind the models and some good information on how they work.

⚡

Salesforce Taps Nvidia to Develop AI-Powered Avatars (4 minute read)

Salesforce is collaborating with Nvidia to create AI-powered avatars designed to enhance virtual interactions and customer service.

Love TLDR? Tell your friends and get rewards!

Share your referral link below with friends to get free TLDR swag!

Want to advertise in TLDR? 📰

If your company is interested in reaching an audience of AI professionals and decision makers, you may want to advertise with us.

If you have any comments or feedback, just respond to this email!

Thanks for reading,
Andrew Tan & Andrew Carr