Skip to content

Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL

License

Notifications You must be signed in to change notification settings

PeterGriffinJin/Search-R1

Repository files navigation

Search-R1: Train your LLMs to reason and call a search engine with reinforcement learning

Search-R1 is a reproduction of DeepSeek-R1(-Zero) methods for training reasoning and searching (tool-call) interleaved LLMs. We built upon veRL.

Through RL (rule-based outcome reward), the 3B base LLM (both Qwen2.5-3b-base and Llama3.2-3b-base) develops reasoning and search engine calling abilities all on its own.

Twitter thread: link; Full experiment log: link

The paper will be released soon!

single-turn

Links

Installation

Search-r1 environment

conda create -n searchr1 python=3.9
conda activate searchr1
# install torch [or you can skip this step and let vllm to install the correct version for you]
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
# install vllm
pip3 install vllm==0.6.3 # or you can install 0.5.4, 0.4.2 and 0.3.1

# verl
pip install -e .

# flash attention 2
pip3 install flash-attn --no-build-isolation
pip install wandb

Retriever environment (optional)

If you would like to call a local retriever as the search engine, you can install the environment as follows. (We recommend using a seperate environment.)

conda create -n retriever python=3.10
conda activate retriever

# we recommend installing torch with conda for faiss-gpu
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install transformers datasets

## install the gpu version faiss to guarantee efficient RL rollout
conda install -c pytorch -c nvidia faiss-gpu=1.8.0

## API function
pip install uvicorn fastapi

Quick start

Train a reasoning + search LLM on NQ dataset with e5 as the retriever and wikipedia as the corpus.

(1) Download the indexing and corpus.

save_path=/the/path/to/save
python scripts/download.py --save_path $save_path
cat $save_path/part_* > $save_path/e5_Flat.index
gzip -d $save_path/wiki-18.jsonl.gz

(2) Process the NQ dataset.

python scripts/data_process/nq_search.py

(3) Launch a local retrieval server.

conda activate retriever
bash retrieval_launch.sh

(4) Run RL training (PPO) with Llama-3.2-3b-base.

conda activate searchr1
bash train_ppo.sh

Preliminary results

(1) The base model (llama3.2-3b-base) learns to call the search engine and obtain improved performance.

llama-3b

(2) The base model (Qwen2.5-7b-base) can learn to conduct multi-turn search engine calling and reasoning with RL.

multi-turn

Use your own dataset

QA data

For each question-answer sample, it should be a dictionary containing the desired content as below:

data = {
        "data_source": data_source,
        "prompt": [{
            "role": "user",
            "content": question,
        }],
        "ability": "fact-reasoning",
        "reward_model": {
            "style": "rule",
            "ground_truth": solution
        },
        "extra_info": {
            'split': split,
            'index': idx,
        }
    }

You can refer to scripts/data_process/nq_search.py for a concrete data processing example.

Corpora

It is recommended to make your corpus a jsonl file, where each line (a dictionary with "id" key and "contents" key) corresponds to one passage. You can refer to example/corpus.jsonl for an example.

The "id" key corresponds to the passage id, while the "contents" key corresponds to the passage content. For example:

{"id": "0", "contents": "Evan Morris Evan L. Morris (January 26, 1977 \u2013 July 9, 2015) was a lobbyist for Genentech and its parent corporation Roche in Washington."}
...
{"id": "100", "contents": "Three years later, when the United States Exploring Expedition to little-known portions of the globe was organised under Charles Wilkes, Hale was recommended, while yet an undergraduate."}
...

Index your corpora (optional). If you would like to use a local retriever as the search engine, you can index your own corpus by:

bash search_r1/search/build_index.sh

You can change retriever_name and retriever_model to your interested off-the-shelf retriever.

Use your own search engine

The main philosophy is to launch a local or remote search engine server separately from the main RL training pipeline.

The LLM can call the search engine by calling the search API (e.g., "http://127.0.0.1:8000/retrieve").

You can refer to search_r1/search/retriever_server.py for an example of launching a local retriever server.

To do

  • Support google search / bing search / brave search API and others.
  • Support LoRA tuning.
  • Support supervised finetuning.
  • Support off-the-shelf rerankers.

Acknowledge

The concept of Search-R1 is inspired by Deepseek-R1 and TinyZero. Its implementation is built upon veRL and RAGEN. We sincerely appreciate the efforts of these teams for their contributions to open-source research and development.

Citations

To be added

@misc{jin2025searchr1,
  title   = {Search-R1: Train your LLMs to reason and call a search engine with reinforcement learning},
  author  = {Bowen Jin and Zhenrui Yue and Hansi Zeng and Jiawei Han},
  howpublished = {\url{https://github.com/PeterGriffinJin/Search-R1}},
  year         = {2025}
}

About

Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published