Cohere's Command R+ 🤖 : Open-sourced, state-of-the-art model with 104 billion parameters, excels in throughput and latency, features grouped query attention, large context window, multilingual capability, RAG, citation, tool execution, and structured output.
Qun2V 🎥 : Vision-language model from China with 7B and 2B parameter sizes, understands up to 20 minutes of video, supports multi-modal rope, rotational positional encodings, and function calling.
Salesforce's Xlam 🤹: Designed for enhanced decision-making and AI agents, supports up to 64,000 context window, translates user intention into executable actions, available in various sizes.
Zira 🧠 : State-space hybrid model (1.2B parameters) outperforming existing models in the same parameter range, uses Mamba 2 and Lura projectors for improved attention.
Rene ⚡️: State-space model (1.3B parameters) with impressive inference speed and efficiency, generates 80-120 tokens per second, optimized kernels for MLX (apple silicon) and PyTorch.
CogVideoX 📺 : 5B parameter open-weight video generation model matching leading performance, efficient generation even with less than 10GB VRAM.
Microsoft's 53.5 💬 : Multilingual model with various variants (including Mixture of Experts), available for use in Microsoft ecosystem.
Jamba 1.5 📚: 1.5 trillion parameter model from AI21 Labs, an alternative architecture to Transformer-based models, available in "large" and "mini" versions on Hugging Face.
Google's Gemini 1.5 Flash 🌠: 8 billion parameter model, available in Google AI Studio, experimental 1.5 pro model also available.