BudgetVec is now in public beta — deploy billion-scale vector search on Cloudflare for $0.015/GB. Get started →

Full-Text Search

BudgetVec includes a built-in BM25 text search engine with support for English and Thai tokenization.

How It Works

Documents with text attributes are automatically tokenized and indexed. At query time, BM25 scores are computed to rank documents by text relevance.

Example

const results = await ns.query({

rank_by: ["text", "BM25", "machine learning introduction"],

top_k: 10,

include_attributes: ["title", "content"],

});

The second element "BM25" specifies the ranking algorithm. The third element is the search query string.

Thai Language Support

BudgetVec includes a built-in Thai tokenizer with a ~55 word dictionary for common terms. For production Thai search, integration with the nlpo3 crate (62K words) is planned.

const results = await ns.query({

rank_by: ["text", "BM25", "การเรียนรู้ของเครื่อง"],

top_k: 10,

});