feat: product quantization fast scan #538

usamoi · 2024-07-23T03:23:34Z

~~blocked by #537~~

VoVAllen · 2024-07-31T10:19:11Z

Can you show us some performance numbers?

usamoi · 2024-07-31T11:06:41Z

dataset: glove-100-angular.hdf5 (1183514 vectors, 100 dims, 10000 tests, L2 normalized, IP distance)
machine: CPU: AMD Ryzen 7 6800H (16) @ 4.7GHz, Arch Linux x86_64, 6.10.2-arch1-1, 30.08 GiB Memory

Faiss here is https://pypi.org/project/faiss-cpu/.

Faiss-IVF (nlist = 1000; nprobe = 10): recall = 0.819589999999983, time = 11.370838165283203 s
Faiss-PQ (ratio = 2, bits = 4; fast_scan = off, rerank_size = 100): (timeout)
Faiss-PQ (ratio = 2, bits = 4; fast_scan = on, rerank_size = 100): recall = 0.9166199999999625, time = 13.678839206695557 s
Faiss-IVFPQ (nlist = 1000, ratio = 2, bits = 4; nprobe = 10, fast_scan = off, rerank_size = 100): recall = 0.803549999999983, time = 59.124324321746826 s
Faiss-IVFPQ (nlist = 1000, ratio = 2, bits = 4; nprobe = 10, fast_scan = on, rerank_size = 100): recall = 0.7459399999999857, time = 0.5915257930755615 s

pgvecto.rs-IVF (nlist = 1000; nprobe = 10): recall = 0.7628899999999892, time = 4.944989 s
pgvecto.rs-PQ (ratio = 2, bits = 4; fast_scan = off, rerank_size = 100): (timeout)
pgvecto.rs-PQ (ratio = 2, bits = 4; fast_scan = on, rerank_size = 100): recall = 0.9271499999999622, time = 118.084755 s
pgvecto.rs-IVFPQ (nlist = 1000, ratio = 2, bits = 4; nprobe = 10, fast_scan = off, rerank_size = 100): recall = 0.7379399999999887, time = 8.469601 s
pgvecto.rs-IVFPQ (nlist = 1000, ratio = 2, bits = 4; nprobe = 10, fast_scan = on, rerank_size = 100): recall = 0.7376499999999886, time = 2.937704 s

usamoi · 2024-07-31T11:09:01Z

@VoVAllen

The result shows that fast-scan speeds pgvecto.rs a lot (2x ~ 3x) but the effect is not as excellent as that of faiss (64x).

The flamegraph (for pgvecto.rs-PQ, ratio = 2, bits = 4; fast_scan = on, rerank_size = 100) shows quantization::fast_scan::bit4::fast_scan_v3 only takes up 8.35% time in a query.

fast_scan takes up 8.35%
make_heap takes up 46.66%; it does not exist in faiss but it's needed by VBASE
memory allocation takes up 13.60%; it does not exist in faiss
dequantizing rough distance of u16 type takes up 12.99%; it can be optimized, and I think it does not exist in faiss

VoVAllen · 2024-07-31T11:52:02Z

I'm surprised that for IVF the bottleneck is at heap instead of computation. Can you try a dataset with larger vector dim like GIST?
Why PQ on full vector takes 100s+? (pgvecto.rs-PQ (ratio = 2, bits = 4; fast_scan = on, rerank_size = 100): recall = 0.9271499999999622, time = 118.084755 s)

usamoi · 2024-07-31T11:59:17Z

I'm surprised that for IVF the bottleneck is at heap instead of computation. Can you try a dataset with larger vector dim like GIST?

It's expected by me since computation is too fast if fast scan is enabled (float32 in look-up table is quantized to int8, and AVX2 processes 32 vectors in a time). The main difference between our implementation and faiss is whether heap is used.

Why PQ on full vector takes 100s+?

What do you mean? Is it too fast or too slow?

VoVAllen · 2024-07-31T12:05:22Z

Faiss-PQ (ratio = 2, bits = 4; fast_scan = on, rerank_size = 100): recall = 0.9166199999999625, time = 13.678839206695557 s

It's seems much slower than faiss, but other numbers are faster than faiss. I don't understand the problem here.

VoVAllen · 2024-07-31T12:07:10Z

Would it be better if the heap is O(1) for insertion and O(logN) for pop operation?

Reference:

usamoi · 2024-07-31T12:07:28Z

pgvecto.rs-IVFPQ (fast_scan = on) is much slower than Faiss-IVFPQ (fast_scan = on) too.

VoVAllen · 2024-07-31T12:09:38Z

Another way to improve the heap operation might be to do it in a batch, which won't pollute cache less at each insertion.

usamoi · 2024-07-31T12:10:15Z

Would it be better if the heap is O(1) for insertion and O(logN) for pop operation?

There is no heap insertion but only make_heap. make_heap is O(n).

I'll try adding fast-path for make_heap: if there are less than 100 pops, do not make_heap, but maintain a small 100-sized heap.

VoVAllen · 2024-07-31T12:17:38Z

100 dim vector is rare to see in Deep Learning world. Larger dims is more typical. For 1000dim, the cost of the heap might not account for a large proportion.

Signed-off-by: usamoi <[email protected]>

usamoi · 2024-08-07T08:19:18Z

I'm merging it now because it blocks #549

usamoi force-pushed the fast-scan branch 3 times, most recently from 7006c22 to c01a8df Compare July 31, 2024 09:15

usamoi marked this pull request as ready for review July 31, 2024 09:18

usamoi force-pushed the fast-scan branch 4 times, most recently from e021020 to 7112cfd Compare July 31, 2024 10:17

usamoi force-pushed the fast-scan branch from 7112cfd to 9a664d0 Compare July 31, 2024 10:30

usamoi force-pushed the fast-scan branch 4 times, most recently from 66978b2 to cf3e6ae Compare August 6, 2024 10:39

usamoi added 2 commits August 7, 2024 16:11

feat: quantization fast scan

d7b615b

Signed-off-by: usamoi <[email protected]>

fix: remove incorrect implementation of fast scan 1bit/2bit

44f3dfd

Signed-off-by: usamoi <[email protected]>

usamoi force-pushed the fast-scan branch from cf3e6ae to 44f3dfd Compare August 7, 2024 08:18

usamoi mentioned this pull request Aug 7, 2024

feat: rabitq #549

Closed

usamoi added this pull request to the merge queue Aug 7, 2024

Merged via the queue into tensorchord:main with commit 9e61230 Aug 7, 2024
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: product quantization fast scan #538

feat: product quantization fast scan #538

usamoi commented Jul 23, 2024 •

edited

Loading

VoVAllen commented Jul 31, 2024 •

edited

Loading

usamoi commented Jul 31, 2024 •

edited

Loading

usamoi commented Jul 31, 2024 •

edited

Loading

VoVAllen commented Jul 31, 2024

usamoi commented Jul 31, 2024 •

edited

Loading

VoVAllen commented Jul 31, 2024

VoVAllen commented Jul 31, 2024 •

edited

Loading

usamoi commented Jul 31, 2024

VoVAllen commented Jul 31, 2024

usamoi commented Jul 31, 2024 •

edited

Loading

VoVAllen commented Jul 31, 2024

usamoi commented Aug 7, 2024

feat: product quantization fast scan #538

feat: product quantization fast scan #538

Conversation

usamoi commented Jul 23, 2024 • edited Loading

VoVAllen commented Jul 31, 2024 • edited Loading

usamoi commented Jul 31, 2024 • edited Loading

usamoi commented Jul 31, 2024 • edited Loading

VoVAllen commented Jul 31, 2024

usamoi commented Jul 31, 2024 • edited Loading

VoVAllen commented Jul 31, 2024

VoVAllen commented Jul 31, 2024 • edited Loading

usamoi commented Jul 31, 2024

VoVAllen commented Jul 31, 2024

usamoi commented Jul 31, 2024 • edited Loading

VoVAllen commented Jul 31, 2024

usamoi commented Aug 7, 2024

usamoi commented Jul 23, 2024 •

edited

Loading

VoVAllen commented Jul 31, 2024 •

edited

Loading

usamoi commented Jul 31, 2024 •

edited

Loading

usamoi commented Jul 31, 2024 •

edited

Loading

usamoi commented Jul 31, 2024 •

edited

Loading

VoVAllen commented Jul 31, 2024 •

edited

Loading

usamoi commented Jul 31, 2024 •

edited

Loading