Zeqi Xiao1
Wenqi Ouyang1
Yifan Zhou1
Shuai Yang2
Lei Yang3
Jianlou Si3
Xingang Pan1
1S-Lab, Nanyang Technological University,
2Wangxuan Institute of Computer Technology, Peking University,
3Sensetime Research
demo.mp4
- Create a Conda Environment
This codebase is tested with the versions of PyTorch 2.1.0+cu121.
conda create -n trajattn python==3.10
conda activate trajattn
pip install -r requirements.txt
-
Download model weights Download model weights from huggingface.
-
Clone Relevant Repositories and Download Checkpoints
# Clone the Depth-Anything-V2 repository
git clone https://github.com/DepthAnything/Depth-Anything-V2
# Download the Depth-Anything-V2-Large checkpoint
wget https://huggingface.co/depth-anything/Depth-Anything-V2-Large/resolve/main/depth_anything_v2_vitl.pth?download=true
# Overwrite the run.py
cp depth_anything/run.py Depth-Anything-V2/run.py
Save the checkpoints to the checkpoints/
directory. You can also modify the checkpoint path in the running scripts if needed.
To control camera motion on images, execute the following script
sh image_control.sh
To control camera motion on videos, execute the following script
sh video_control.sh
To do video editing, execute the following script
sh video_editing.sh
- Release models and weight;
- Release pipelines for single image camera motion control;
- Release pipelines for video camera motion control;
- Release pipelines for video editing;
If you find our work helpful, please cite:
@inproceedings{
xiao2025trajectory,
title={Trajectory attention for fine-grained video motion control},
author={Zeqi Xiao and Wenqi Ouyang and Yifan Zhou and Shuai Yang and Lei Yang and Jianlou Si and Xingang Pan},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=2z1HT5lw5M}
}
- SVD: Our model is tuned from SVD.
- MiraData: We use the data collected by MiraData.
- Depth-Anything-V2: We estimate depth map by Depth-Anything-V2.
- Unimatch: We estimate optical flow map by Unimatch.
- Cotracker: We estimate point trajectories by Cotracker.
- NVS_Solver: Our camera rendering code is based on NVS_Solver.