Search-R1: Train your LLMs to reason and call a search engine with reinforcement learning

Search-R1 is a reproduction of DeepSeek-R1(-Zero) methods for training reasoning and searching (tool-call) interleaved LLMs. We built upon veRL.

Through RL (rule-based outcome reward), the 3B base LLM (both Qwen2.5-3b-base and Llama3.2-3b-base) develops reasoning and search engine calling abilities all on its own.

Twitter thread: link; Full experiment log: link

The paper will be released soon!

Installation

Search-r1 environment

conda create -n searchr1 python=3.9
conda activate searchr1
# install torch [or you can skip this step and let vllm to install the correct version for you]
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
# install vllm
pip3 install vllm==0.6.3 # or you can install 0.5.4, 0.4.2 and 0.3.1

# verl
pip install -e .

# flash attention 2
pip3 install flash-attn --no-build-isolation
pip install wandb

Retriever environment (optional)

If you would like to call a local retriever as the search engine, you can install the environment as follows. (We recommend using a seperate environment.)

conda create -n retriever python=3.10
conda activate retriever

# we recommend installing torch with conda for faiss-gpu
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install transformers datasets

## install the gpu version faiss to guarantee efficient RL rollout
conda install -c pytorch -c nvidia faiss-gpu=1.8.0

## API function
pip install uvicorn fastapi

Quick start

Train a reasoning + search LLM on NQ dataset with e5 as the retriever and wikipedia as the corpus.

(1) Download the indexing and corpus.

save_path=/the/path/to/save
python scripts/download.py --save_path $save_path
cat $save_path/part_* > $save_path/e5_Flat.index
gzip -d $save_path/wiki-18.jsonl.gz

(2) Process the NQ dataset.

python scripts/data_process/nq_search.py

(3) Launch a local retrieval server.

conda activate retriever
bash retrieval_launch.sh

(4) Run RL training (PPO) with Llama-3.2-3b-base.

conda activate searchr1
bash train_ppo.sh

Preliminary results

(1) The base model (llama3.2-3b-base) learns to call the search engine and obtain improved performance.

(2) The base model (Qwen2.5-7b-base) can learn to conduct multi-turn search engine calling and reasoning with RL.

Use your own dataset

QA data

For each question-answer sample, it should be a dictionary containing the desired content as below:

data = {
        "data_source": data_source,
        "prompt": [{
            "role": "user",
            "content": question,
        }],
        "ability": "fact-reasoning",
        "reward_model": {
            "style": "rule",
            "ground_truth": solution
        },
        "extra_info": {
            'split': split,
            'index': idx,
        }
    }

You can refer to scripts/data_process/nq_search.py for a concrete data processing example.

Corpora

It is recommended to make your corpus a jsonl file, where each line (a dictionary with "id" key and "contents" key) corresponds to one passage. You can refer to example/corpus.jsonl for an example.

The "id" key corresponds to the passage id, while the "contents" key corresponds to the passage content. For example:

{"id": "0", "contents": "Evan Morris Evan L. Morris (January 26, 1977 \u2013 July 9, 2015) was a lobbyist for Genentech and its parent corporation Roche in Washington."}
...
{"id": "100", "contents": "Three years later, when the United States Exploring Expedition to little-known portions of the globe was organised under Charles Wilkes, Hale was recommended, while yet an undergraduate."}
...

Index your corpora (optional). If you would like to use a local retriever as the search engine, you can index your own corpus by:

bash search_r1/search/build_index.sh

You can change retriever_name and retriever_model to your interested off-the-shelf retriever.

Use your own search engine

The main philosophy is to launch a local or remote search engine server separately from the main RL training pipeline.

The LLM can call the search engine by calling the search API (e.g., "http://127.0.0.1:8000/retrieve").

You can refer to search_r1/search/retriever_server.py for an example of launching a local retriever server.

To do

Support google search / bing search / brave search API and others.
Support LoRA tuning.
Support supervised finetuning.
Support off-the-shelf rerankers.

Acknowledge

The concept of Search-R1 is inspired by Deepseek-R1 and TinyZero. Its implementation is built upon veRL and RAGEN. We sincerely appreciate the efforts of these teams for their contributions to open-source research and development.

Citations

To be added

@misc{jin2025searchr1,
  title   = {Search-R1: Train your LLMs to reason and call a search engine with reinforcement learning},
  author  = {Bowen Jin and Zhenrui Yue and Hansi Zeng and Jiawei Han},
  howpublished = {\url{https://github.com/PeterGriffinJin/Search-R1}},
  year         = {2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
example		example
public		public
scripts		scripts
search_r1		search_r1
verl		verl
.gitignore		.gitignore
LICENSE		LICENSE
Notice.txt		Notice.txt
README.md		README.md
VERL_README.md		VERL_README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
retrieval_launch.sh		retrieval_launch.sh
setup.py		setup.py
train_ppo.sh		train_ppo.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Search-R1: Train your LLMs to reason and call a search engine with reinforcement learning

Links

Installation

Search-r1 environment

Retriever environment (optional)

Quick start

Preliminary results

Use your own dataset

QA data

Corpora

Use your own search engine

To do

Acknowledge

Citations

About

Releases

Packages

Contributors 2

Languages

License

PeterGriffinJin/Search-R1

Folders and files

Latest commit

History

Repository files navigation

Search-R1: Train your LLMs to reason and call a search engine with reinforcement learning

Links

Installation

Search-r1 environment

Retriever environment (optional)

Quick start

Preliminary results

Use your own dataset

QA data

Corpora

Use your own search engine

To do

Acknowledge

Citations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages