Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behavior of __getitem__ of TSDataSampler: get nothing when slicing instead of indexing #1716

Closed
LeetaH666 opened this issue Dec 26, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@LeetaH666
Copy link

πŸ› Bug Description

When trying to get item from TSDataSampler using int type index, a "speed up" feature β€” slicing instead of indexing will get nothing for one specific index.

image

You would quickly notice what happened for the above code if indices = [-1, 0, 1, 2, ...] (because the nan_idx is -1), i.e., you would get nothing.

To Reproduce

A piece of testing code:

import numpy as np
import pandas as pd
from qlib.data.dataset import TSDataSampler

datetimes = [
    '2000-01-31', '2000-02-29', '2000-03-31', '2000-04-30', '2000-05-31'
]
instruments = ['000001', '000002', '000003', '000004', '000005']
index = pd.MultiIndex.from_product([pd.to_datetime(datetimes), instruments],
                                   names=['datetime', 'instrument'])
data = np.random.randn(len(datetimes) * len(instruments))
test_df = pd.DataFrame(data=data, index=index, columns=['ret'])
dataset = TSDataSampler(test_df, datetimes[0], datetimes[-1], step_len=2)
print(dataset[0])

Expected Behavior

Get an array with nan as the first element and some number as the second element.

Screenshot

Actual unexpected behavior:

image

Environment

  • Qlib version: 0.9.3
  • Python version: 3.8.18
  • OS (Windows, Linux, MacOS): Linux
  • Commit number (optional, please provide it if you are using the dev version):

Solution

I think it just need a simple modification of the if conditions, i.e., if (np.diff(indices) == 1).all(): -> if (np.diff(indices) == 1).all() and -1 not in indices:.

@LeetaH666 LeetaH666 added the bug Something isn't working label Dec 26, 2023
YeewahChan added a commit to YeewahChan/qlib that referenced this issue Jun 3, 2024
YeewahChan added a commit to YeewahChan/qlib that referenced this issue Jun 3, 2024
YeewahChan added a commit to YeewahChan/qlib that referenced this issue Jun 3, 2024
YeewahChan added a commit to YeewahChan/qlib that referenced this issue Jun 3, 2024
you-n-g pushed a commit that referenced this issue Jun 21, 2024
* Fix TSDataSampler Slicing Bug #1716

* Fix TSDataSampler Slicing Bug #1716

* Fix TSDataSampler Slicing Bug #1716

* Fix TSDataSampler Slicing Bug with simplyer implmentation#1716
 with Simplified Implementation

* Refactor: Fix CI errors by addressing pylint formatting issues

* Refactor: Remove extraneous whitespace for improved code formatting with Black
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant