Skip to content

codefuse-ai/rodimus

Repository files navigation

If you like our project, please give us a star ⭐ on GitHub for the latest update.

hf ModelScope ICLR License

Overview

We propose Rodimus*, including Rodimus and Rodimus+, which tries to break the accuracy-efficency trade-off existing in Vanilla tranformers by introducing several innovative features.

Rodimus:

  • Linear attention-based, purely recurrent model.
  • Incorporates Data-Dependent Tempered Selection (DDTS) for semantic compression.
  • Reduced memory usage.

Rodimus+:

  • Hybrid model combining Rodimus with Sliding Window Shared-Key Attention (SW-SKA).
  • Enhances semantic, token, and head compression.

Highlights

  • Constant memory footprint but better language modeling performance.
  • Better scaling performance than Transformer.
  • A real lite model, without memory complexity O(T) in KV cache.

Pretrained Checkpoints

Benchmark Checkpoints

This checkpoints completed training before submitting the paper, used to reproduce the benchmarks in the paper.

If you want to use the more practical model, we strongly recommand you to download the checkpionts in Latest Checkpoints.

Model (2024/10/01) Contexts HuggingFace ModelScope
Rodimus-1.4B-Base 2048 link link
Rodimus+-1.6B-Base 2048 link link
Rodimus+-Coder-1.6B-Base 4096 link link

The Rodimus+-Coder-1.6B-Base is the model enhanced by multi-stage training with math and code datasets in the paper.

Latest Checkpoints

This checkpoints contain the latest checkpoints of Rodimus* trained by continuously updated data, for continuous training or actual use.

Model Date HuggingFace ModelScope
Rodimus+-1.6B-Base 2025/02/15 link link

Quick Starts

Installation

  1. The latest version of transformers is recommended (at least 4.42.0).
  2. We evaluate our models with python=3.8 and torch==2.1.2.
  3. If you use Rodimus, you need to install flash-linear-attention and triton>=2.2.0. If you use Rodimus+, you need to further install flash-attention.

Examples

In examples/generation_script.py, we show a code snippet to show you how to use the model to generate:

import os
import torch
from modeling_rodimus import RodimusForCausalLM
from tokenization_rodimus_fast import RodimusTokenizer

# load model
ckpt_dir = "model_path"
tokenizer = RodimusTokenizer.from_pretrained(ckpt_dir)
model = RodimusForCausalLM.from_pretrained(
    ckpt_dir,
    torch_dtype=torch.float16,
    device_map="cuda"
).eval()

# inference
input_prompt = "你好!你是谁?"
model_inputs = tokenizer(input_prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**model_inputs, max_length=32)
response = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]

print(response)

In examples/chat_script.py, we further show how to chat with Rodimus+:

import os
import torch
from modeling_rodimus import RodimusForCausalLM
from tokenization_rodimus_fast import RodimusTokenizer

# load model
ckpt_dir = "model_path"
tokenizer = RodimusTokenizer.from_pretrained(ckpt_dir)
model = RodimusForCausalLM.from_pretrained(
    ckpt_dir,
    torch_dtype=torch.float16,
    device_map="cuda"
).eval()

# inference
input_prompt = "简单介绍一下大型语言模型。"
messages = [
    {"role": "HUMAN", "content": input_prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    system='You are Rodimus$+$, created by AntGroup. You are a helpful assistant.',
    tokenize=False,
)
print(text)
model_inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**model_inputs, max_length=2048)
response = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]

print(response)

Citation

If you find our work helpful, feel free to give us a cite.

@inproceedings{
he2025rodimus,
title={Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions},
author={Zhihao He and Hang Yu and Zi Gong and Shizhan Liu and Jianguo Li and Weiyao Lin},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=IIVYiJ1ggK}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages