Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: product quantization fast scan #538

Merged
merged 2 commits into from
Aug 7, 2024
Merged

Conversation

usamoi
Copy link
Collaborator

@usamoi usamoi commented Jul 23, 2024

blocked by #537

@usamoi usamoi force-pushed the fast-scan branch 3 times, most recently from 7006c22 to c01a8df Compare July 31, 2024 09:15
@usamoi usamoi marked this pull request as ready for review July 31, 2024 09:18
@usamoi usamoi force-pushed the fast-scan branch 4 times, most recently from e021020 to 7112cfd Compare July 31, 2024 10:17
@VoVAllen
Copy link
Member

VoVAllen commented Jul 31, 2024

Can you show us some performance numbers?

@usamoi
Copy link
Collaborator Author

usamoi commented Jul 31, 2024

dataset: glove-100-angular.hdf5 (1183514 vectors, 100 dims, 10000 tests, L2 normalized, IP distance)
machine: CPU: AMD Ryzen 7 6800H (16) @ 4.7GHz, Arch Linux x86_64, 6.10.2-arch1-1, 30.08 GiB Memory

Faiss here is https://pypi.org/project/faiss-cpu/.

Faiss-IVF (nlist = 1000; nprobe = 10): recall = 0.819589999999983, time = 11.370838165283203 s
Faiss-PQ (ratio = 2, bits = 4; fast_scan = off, rerank_size = 100): (timeout)
Faiss-PQ (ratio = 2, bits = 4; fast_scan = on, rerank_size = 100): recall = 0.9166199999999625, time = 13.678839206695557 s
Faiss-IVFPQ (nlist = 1000, ratio = 2, bits = 4; nprobe = 10, fast_scan = off, rerank_size = 100): recall = 0.803549999999983, time = 59.124324321746826 s
Faiss-IVFPQ (nlist = 1000, ratio = 2, bits = 4; nprobe = 10, fast_scan = on, rerank_size = 100): recall = 0.7459399999999857, time = 0.5915257930755615 s

pgvecto.rs-IVF (nlist = 1000; nprobe = 10): recall = 0.7628899999999892, time = 4.944989 s
pgvecto.rs-PQ (ratio = 2, bits = 4; fast_scan = off, rerank_size = 100): (timeout)
pgvecto.rs-PQ (ratio = 2, bits = 4; fast_scan = on, rerank_size = 100): recall = 0.9271499999999622, time = 118.084755 s
pgvecto.rs-IVFPQ (nlist = 1000, ratio = 2, bits = 4; nprobe = 10, fast_scan = off, rerank_size = 100): recall = 0.7379399999999887, time = 8.469601 s
pgvecto.rs-IVFPQ (nlist = 1000, ratio = 2, bits = 4; nprobe = 10, fast_scan = on, rerank_size = 100): recall = 0.7376499999999886, time = 2.937704 s

@usamoi
Copy link
Collaborator Author

usamoi commented Jul 31, 2024

@VoVAllen

The result shows that fast-scan speeds pgvecto.rs a lot (2x ~ 3x) but the effect is not as excellent as that of faiss (64x).

The flamegraph (for pgvecto.rs-PQ, ratio = 2, bits = 4; fast_scan = on, rerank_size = 100) shows quantization::fast_scan::bit4::fast_scan_v3 only takes up 8.35% time in a query.

  • fast_scan takes up 8.35%
  • make_heap takes up 46.66%; it does not exist in faiss but it's needed by VBASE
  • memory allocation takes up 13.60%; it does not exist in faiss
  • dequantizing rough distance of u16 type takes up 12.99%; it can be optimized, and I think it does not exist in faiss

@VoVAllen
Copy link
Member

  • I'm surprised that for IVF the bottleneck is at heap instead of computation. Can you try a dataset with larger vector dim like GIST?
  • Why PQ on full vector takes 100s+? (pgvecto.rs-PQ (ratio = 2, bits = 4; fast_scan = on, rerank_size = 100): recall = 0.9271499999999622, time = 118.084755 s)

@usamoi
Copy link
Collaborator Author

usamoi commented Jul 31, 2024

  • I'm surprised that for IVF the bottleneck is at heap instead of computation. Can you try a dataset with larger vector dim like GIST?

It's expected by me since computation is too fast if fast scan is enabled (float32 in look-up table is quantized to int8, and AVX2 processes 32 vectors in a time). The main difference between our implementation and faiss is whether heap is used.

  • Why PQ on full vector takes 100s+?

What do you mean? Is it too fast or too slow?

@VoVAllen
Copy link
Member

Faiss-PQ (ratio = 2, bits = 4; fast_scan = on, rerank_size = 100): recall = 0.9166199999999625, time = 13.678839206695557 s

It's seems much slower than faiss, but other numbers are faster than faiss. I don't understand the problem here.

@VoVAllen
Copy link
Member

VoVAllen commented Jul 31, 2024

Would it be better if the heap is O(1) for insertion and O(logN) for pop operation?

Reference:

@usamoi
Copy link
Collaborator Author

usamoi commented Jul 31, 2024

pgvecto.rs-IVFPQ (fast_scan = on) is much slower than Faiss-IVFPQ (fast_scan = on) too.

@VoVAllen
Copy link
Member

Another way to improve the heap operation might be to do it in a batch, which won't pollute cache less at each insertion.

@usamoi
Copy link
Collaborator Author

usamoi commented Jul 31, 2024

Would it be better if the heap is O(1) for insertion and O(logN) for pop operation?

There is no heap insertion but only make_heap. make_heap is O(n).

I'll try adding fast-path for make_heap: if there are less than 100 pops, do not make_heap, but maintain a small 100-sized heap.

@VoVAllen
Copy link
Member

100 dim vector is rare to see in Deep Learning world. Larger dims is more typical. For 1000dim, the cost of the heap might not account for a large proportion.

@usamoi usamoi force-pushed the fast-scan branch 4 times, most recently from 66978b2 to cf3e6ae Compare August 6, 2024 10:39
@usamoi
Copy link
Collaborator Author

usamoi commented Aug 7, 2024

I'm merging it now because it blocks #549

@usamoi usamoi mentioned this pull request Aug 7, 2024
@usamoi usamoi added this pull request to the merge queue Aug 7, 2024
Merged via the queue into tensorchord:main with commit 9e61230 Aug 7, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants