logo
Hi HN, we're Arnav and Adi, and we're building DataBridge - a multi-modal database built from the ground up with AI use cases in mind.

We recently launched support for ColPali-style image embeddings and late-interaction retrieval. We've implemented a hamming distance version of retrieval which helps this approach scale significantly more when compared with the regular late-interaction similarity scoring.

These embeddings provide a significantly better retrieval accuracy, with ColQwen achieving around an 89% average score on the ViDoRe benchmark, compared to around 67% for traditional parsing and captioning based methods.

We're completely open source, and getting started takes less than 10 minutes (get started here: https://databridge.mintlify.app/getting-started). In fact, using these style of embeddings requires just setting `use_colpali=True` in our python SDK while ingesting or retrieving documents.

Our long term goal is to make state of the art research in retrieval be as accessible for production use cases as possible, and integrating ColPali is an initial step towards that goal. If there's research that you think is compelling, but haven't been able to integrate into production, let us know: we'd be happy to help.

We really appreciate the honest feedback the HN community provides, and so we'd love to hear from you!


Loading...