InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory Paper • 2402.04617 • Published Feb 7, 2024 • 5
InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation Paper • 2509.24663 • Published Sep 29 • 14
MiniCPM4 Collection MiniCPM4: Ultra-Efficient LLMs on End Devices • 29 items • Updated Sep 8 • 80
Hermes 4 Evaluations Collection Evals from the Hermes-4 Technical Report • 20 items • Updated 8 days ago • 1
Kandinsky 5.0 Video Lite Collection Kandinsky 5.0 Video Lite is a lightweight 2B model that generates up to 10-second SD videos from English and Russian prompts with high visual quality. • 9 items • Updated 17 days ago • 8
Alpamayo-R1 Collection A collection related to the Alpamayo-R1 Reasoning VLA. • 1 item • Updated 8 days ago • 1
YanoljaNEXT-Rosetta Collection Translation Model for JSON-Structured Data • 3 items • Updated Sep 3 • 9
YanoljaNEXT-Rosetta-2510 Collection Translation Model for JSON-Structured Data • 5 items • Updated 24 days ago • 1
YanoljaNEXT-Rosetta-2511 Collection Translation Model for Structured Data (JSON, YAML, XML) • 6 items • Updated Nov 2 • 1
Domain Specific Data Collection This is a collection of tools for building domain specific datasets using human domain expertise and synthetic data generation. • 3 items • Updated Dec 11, 2024 • 4
Synthetic Data Generator Collection A collection of tools and datasets related to no-code the Synthetic Data Generation. • 21 items • Updated Feb 10 • 12
Preference Datasets for DPO Collection This collection contains a list of curated preference datasets for DPO fine-tuning for intent alignment of LLMs • 7 items • Updated Dec 11, 2024 • 45
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer Paper • 2511.22699 • Published 14 days ago • 183
Ministral 3 Collection A collection of edge models, with Base, Instruct and Reasoning variants, in 3 different sizes: 3B, 8B and 14B. All with vision capabilities. • 9 items • Updated 9 days ago • 122
Mistral Large 3 Collection A state-of-the-art, open-weight, general-purpose multimodal model with a granular Mixture-of-Experts architecture. • 4 items • Updated 9 days ago • 74