AIGC Weekly. Last week's selections | by Ryan Chan | Nov, 2024 ...
Last week, Anthropic released Claude 3.5 Haiku and its upgraded version Claude 3.5 Sonnet. These versions have a reasoning score that exceeds O1. Claude has now advanced to support operating the computer in a human-like manner by tasks such as looking at the screen, moving the cursor, clicking buttons, and typing text.
New Version of Claude 3.5 Sonnet Introduction:
The updated Claude 3.5 Sonnet showcases significant improvements in industry benchmarks, especially excelling in agent encoding and tool usage tasks. It has outperformed all publicly available models, including inference models like OpenAI o1-preview, and dedicated systems designed for agent encoding. GitLab conducted tests on the model for DevSecOps tasks, revealing enhanced inference capabilities without compromising latency.
Claude 3.5 Haiku Introduction:
The Claude 3.5 Haiku demonstrates enhancements in every skill set, surpassing even its predecessor, Claude 3 Opus, on various smart benchmarks. Notable improvements include lower latency, improved instruction following, and more precise tool usage. It excels in encoding tasks, outperforming many agents utilizing state-of-the-art models.
Teach Claude to Use the Computer:
To enable Claude to perform these general skills, an API was developed to facilitate Claude's interaction with computer interfaces. Developers can integrate this API, empowering Claude to translate instructions into computer commands. Claude 3.5 Sonnet scored significantly well in tasks evaluating AI models' ability to use computers akin to human behavior.
Additional Information: Anthropic also unveiled the new Claude 3.5 system prompt and Kyle Corbitt developed a client for Claude Computer Use, providing open-source control for computer tasks. Moreover, Claude now features a dedicated data analysis tool, allowing complex mathematical operations and data analysis through js code based on Claude 3.5 capabilities.
Model Information:
Motion Quality: Mochi 1 generates smooth videos with high temporal coherence and realistic motion dynamics. It simulates physics like fluid dynamics and hair simulation, displaying consistent human motion.
Model Usage:
The Mochi 1 model offers high-quality video generation, requiring substantial resources to operate. However, a new ComfyUI plugin has been released, enabling the utilization of Mochi 1 with less video memory. Users can also access Mochi 1 on Genmo's official website.
Last week, significant advancements were made in the AI image ecosystem. Stability AI introduced the SD3.5 series of models, which are open source. The first major release of Comfy org, V1 version of Comfyui, addressed the initial challenges faced by users.
Model Introduction:
Stable Diffusion 3.5 Large, Stable Diffusion 3.5 Large Turbo, and Stable Diffusion 3.5 Medium were the models released by Stability AI. Comfyui directly supports SD3.5, offering relevant model files and workflow details.
Post-release, Comfy's iteration speed increased significantly. The desktop installation package was launched, providing various features such as code signing, automatic updates, cross-platform support, and a custom key bindings option.
Comfy Node Registry:
The Comfy Node Registry contains a vast collection of nodes and versions. Dr.Lt.Data is integrating ComfyUI Manager with CNR, with the desktop application being the first platform to support node library installation.
In other developments, Notion introduced new automation features and an official template market. ElevenLabs launched the Voice Design feature, allowing users to create unique voices through text prompts. Paperguide, a research assistant platform, aims to simplify the academic research process by offering various AI-driven tools.
Midjourney released a new version of its photo editor supporting features like partial redrawing and depth map-based regeneration of images.