Consider the distributed case #637

iliya-malecki · 2025-02-23T10:41:07Z

I think there will literally be no other need to use any other vectorstore implementation if pgvecto.rs implements some level of smarts when it comes to sharding. My ideas include:

expose cluster IDs or hierarchy from within the indexes
have some decision making process to partition the space in the least disruptive way
either use citus or reimplement partitioning (haven't taken a single look into pgvecto.rs's source so I'm clueless about what's easier)
have a mechanism implementing the logic of select cluster_id from cluster_management order by centroids <#> $input

So essentially I'm asking for a distributed index for sharding stuff with citus or alike. I'm not sure how big of a help I can be but I can surely try implementing it!

The text was updated successfully, but these errors were encountered:

VoVAllen · 2025-02-23T14:11:00Z

Hi, Thanks for your interest. We had some initial test using pgvecto.rs with distributed fdw and it works. I think citus is a similar case.

And we're actually developing https://github.com/tensorchord/VectorChord, the next generation of pgvecto.rs. It's more scalable, more stable, and faster than pgvecto.rs. And with its disk friendly design, you can host 100M vector on a 4C32G machine. Therefore we believe for single machine, it should be easy to host 1B vectors.

Citus can definitely help when user have larger data with high QPS requirement, but also will introduce performance penalty. You can check the experiment at https://jkatz05.com/post/postgres/distributed-pgvector/. Would you like to share your target scenario and data scale with us?

iliya-malecki · 2025-02-23T21:51:29Z

I'm thinking precisely about improving qps, and the thoughts are quite abstract and imprecise:

what if some regions / clusters are hotter than the others, can we optimise the compute for those?
can we reduce the risk of choosing the convenience of pgvector by knowing it will scale horizontally if qps requirements rise dramatically?
can we use sharding to combat the horrible performance penalty we start getting when caching fails due to memory constraints? The cache eviction cycle can get vicious

Let's ground this in a more (or less) realistic example. Please correct me if my ballpark estimates are wrong, and more importantly if the logic is wrong. Also please keep in mind I just came up with the example dataset:

The system is a short-format video recommender. The dataset is ~ 10m, the vector dim is ~1000. That's like 50g, so on a 32g machine with <20g or so to spare for caches, it will likely work 1 query at a time with occasional hangs when an unpopular part of the table is queried and it can't be read from the cache. Now, as we try to scale the qps, the occasional cache misses and subsequent evictions start hitting harder and harder, we quickly start getting most queries running in seconds as opposed to milliseconds. That's a positive feedback loop, we hit a wall

Of course, we can scale up, but my main idea is that sooner or later scaling up is cumbersome, and doing so is more cumbersome than just dropping the pesky unpopular cluster of videos with welding tutorials or whatever.

The goal of scaling horizontally, the way I'm imagining it, is to parallelize the lookup in a way that will decouple the funny dogs videos recommendations from welding tutorials. I am aware that by default in vectorstores all nodes return N recommendations and then they are reranked, but that's just a safe default.

In principle there's no need to run the funny dog recommendations on the welding video shard (apart from the dimensionality curse but we're already knees deep in approximate indexes). So, if the system has a correspondence with ivfflat lists (or some hnsw alternative I didn't think through very well), we have performance savings since we're not straining all nodes and we definitely save cache by not evicting our dogs for the occasional welding tutorials.
Thus, for the simpler case of ivfflat, scaling the qps in the distributed case becomes as easy as distributing vector clusters over more shards, allowing the lookup to run more and more nuanced subset of shards, where at the limit of this insanity we have a shard on a dedicated node per each ivfflat list, and the query only runs on a couple of nodes (as many as probes)

Do you find it reasonable at all? I feel like I'm successfully addressing the problem of a massive amount of neighbours in higher dimensions by waving hands and saying "that's what ivfflat is for so our recall doesn't suffer" but I might be completely off

VoVAllen · 2025-02-26T09:13:31Z

For postgres, you can easily add more replica for QPS requirement.

For the distributed idea, I think it's valid and you get the point of ivfflat. But the problem is it's hard to make postgres work with that. We may need to hack into it's planner to skip specific shards. And it will definitely violate some default setting in postgres. Therefore I think it's more valid to build it as a specialized vector db. Inside postgres, the scale up solution is easier to achieve that can cover most user's requirement.

iliya-malecki · 2025-02-26T09:46:09Z

That is incredibly sad! Do you think it's worth trying to use workarounds like a helper table with centroids and partition IDs? Postgres can definitely prune partitions when using a simple "where partition_id in (1, 2, 42)"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider the distributed case #637

Consider the distributed case #637

iliya-malecki commented Feb 23, 2025

VoVAllen commented Feb 23, 2025 •

edited

Loading

iliya-malecki commented Feb 23, 2025 •

edited

Loading

VoVAllen commented Feb 26, 2025

iliya-malecki commented Feb 26, 2025

Consider the distributed case #637

Consider the distributed case #637

Comments

iliya-malecki commented Feb 23, 2025

VoVAllen commented Feb 23, 2025 • edited Loading

iliya-malecki commented Feb 23, 2025 • edited Loading

VoVAllen commented Feb 26, 2025

iliya-malecki commented Feb 26, 2025

VoVAllen commented Feb 23, 2025 •

edited

Loading

iliya-malecki commented Feb 23, 2025 •

edited

Loading