Snap is introducing an AI video-generation tool for creators (2 minute read)
Snapchat has announced a new AI video-generation tool for select creators that enables video creation from text and soon image prompts. The tool, powered by Snap's foundational video models, will be available in beta on the web. Snap aims to compete with companies like OpenAI and Adobe but has not shared output examples yet.
|
Apple Intelligence is now available in public betas (2 minute read)
Apple has released public betas of iOS 18.1, iPadOS 18.1, and macOS Sequoia 15.1 that feature new Apple Intelligence tools like text rewriting and photo cleanup. Only the iPhone 15 Pro, iPhone 16, iPhone 16 Pro, and M1 iPads and Macs support these AI features. Final versions are expected in October.
|
|
V-STaR: Training Verifiers for Self-Taught Reasoners (31 minute read)
V-STaR is a novel approach to improving large language models that utilizes both correct and incorrect solutions generated during self-improvement to train a verifier, which then selects the best solution at inference time. The method has shown significant improvements in accuracy on code generation and math reasoning benchmarks compared to existing approaches, potentially offering a more efficient way to enhance LLM performance.
|
Fast 3D Generation from Single Images (31 minute read)
Vista3D is a new framework that generates 3D models from a single image in just 5 minutes. Using a two-phase approach, it quickly forms rough geometry before refining the details, capturing both visible and hidden aspects of objects for more complete 3D reconstructions.
|
|
GOT OCR (GitHub Repo)
A somewhat amazing advancement in general-purpose optical character recognition (OCR) that can read text from images with great performance. This particular version dramatically improves in-the-wild OCR as well.
|
Fish Speech (GitHub Repo)
Powerful voice generation and single-shot voice cloning. Completely open source and easy to get running.
|
1X Genie (GitHub Repo)
Genie is a video generation for world model systems. 1x Robotics has open-sourced a version that mirrors the one it trained internally.
|
|
Announcing Pixtral 12B (8 minute read)
Pixtral 12B excels in multimodal tasks, maintaining state-of-the-art performance on text-only benchmarks, and supports variable image sizes in a 128K token context window. Its architecture includes a new 400M parameter vision encoder and a 12B parameter multimodal decoder based on Mistral Nemo. Pixtral outperforms many open and closed models in multimodal reasoning and instruction following without compromising on text capabilities.
|
Scaling: The State of Play in AI (13 minute read)
LLMs like ChatGPT and Gemini are becoming increasingly capable as they scale up in size, data, and computing power, leading to improved performance across various tasks. Current Gen2 models like GPT-4 and Claude 3.5 are leading the market, with upcoming Gen3 models expected to further escalate capabilities and costs. The discovery of a new scaling law in AI, pertaining to increased "thinking" during inference, promises further advancements in AI performance beyond just model training.
|
|
Overlap (Product Launch)
Overlap (YC S24) is a new AI-powered iOS app that curates the best short video clips on literally any topic you're interested in - built for those quick work or study breaks.
|
|
Love TLDR? Tell your friends and get rewards!
|
Share your referral link below with friends to get free TLDR swag!
|
|
Track your referrals here. |
Want to advertise in TLDR? π°
|
If your company is interested in reaching an audience of AI professionals and decision makers, you may want to advertise with us.
If you have any comments or feedback, just respond to this email!
Thanks for reading,
Andrew Tan & Andrew Carr
|
|
|
|