Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do the hyperparameters need to be adjusted for videos with large camera pose movements? #46

Open
jiawei151 opened this issue Jan 23, 2025 · 3 comments

Comments

@jiawei151
Copy link

Hi, thank you for your great work! I noticed that the predictor.py additionally set a support grid (36 query points) specifically for the first frame, and set up to 800 additional points for the query frame.
For videos with large camera movements, we may not be able to accurately determine a query frame, and the pixels of the first frame may quickly move outside the screen. Could you give me some advice on whether these hyperparameters need to be adjusted? Thanks!

@henry123-boy
Copy link
Owner

I got your points. The extra 800 points are actually randomly distributed in 0-T frames, and it is not just for the query frame. I guess the strategy can help.

@jiawei151
Copy link
Author

I got your points. The extra 800 points are actually randomly distributed in 0-T frames, and it is not just for the query frame. I guess the strategy can help.

Thanks! I'll give it a try.

I ran into another related question today when dealing with long videos (1k+ frames, large camera movements).

In core/spatracker/spatracker.py, line 519-520:

if torch.isnan(coords).any():
    import ipdb; ipdb.set_trace()

I found that the process started to get nan mostly at iteration 128, which means w_idx_start==1024 (768) when window size is 16 (12).

To 'fix' this, I can just stop processing at this very near-end iteration but this coincidence is kind of interesting.

I checked my input videos and they can be played normally, but I'm not completely sure if it could be the data issue yet. I'll try to update if I find out something.

@luminousking
Copy link

I got your points. The extra 800 points are actually randomly distributed in 0-T frames, and it is not just for the query frame. I guess the strategy can help.

@henry123-boy Hi, why do you need to randomly sample the extra 800 points across the 0-T frames? Thx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants