- Cohere's Command R+ ๐ค : Open-sourced, state-of-the-art model with 104 billion parameters, excels in throughput and latency, features grouped query attention, large context window, multilingual capability, RAG, citation, tool execution, and structured output.
- Qun2V ๐ฅ : Vision-language model from China with 7B and 2B parameter sizes, understands up to 20 minutes of video, supports multi-modal rope, rotational positional encodings, and function calling.
- Salesforce's Xlam ๐คน: Designed for enhanced decision-making and AI agents, supports up to 64,000 context window, translates user intention into executable actions, available in various sizes.
- Zira ๐ง : State-space hybrid model (1.2B parameters) outperforming existing models in the same parameter range, uses Mamba 2 and Lura projectors for improved attention.
- Rene โก๏ธ: State-space model (1.3B parameters) with impressive inference speed and efficiency, generates 80-120 tokens per second, optimized kernels for MLX (apple silicon) and PyTorch.
- CogVideoX ๐บ : 5B parameter open-weight video generation model matching leading performance, efficient generation even with less than 10GB VRAM.
- Microsoft's 53.5 ๐ฌ : Multilingual model with various variants (including Mixture of Experts), available for use in Microsoft ecosystem.
- Jamba 1.5 ๐: 1.5 trillion parameter model from AI21 Labs, an alternative architecture to Transformer-based models, available in "large" and "mini" versions on Hugging Face.
- Google's Gemini 1.5 Flash ๐ : 8 billion parameter model, available in Google AI Studio, experimental 1.5 pro model also available.