Full-Text Search (BM25)
Fluree supports full-text search using the BM25 algorithm through virtual graphs. This enables powerful text search capabilities including stemming, stopword removal, and relevance scoring.
Overview
BM25 (Best Match 25) is a ranking function used by search engines to estimate the relevance of documents to a search query. Fluree implements BM25 through virtual graphs, which are computed indexes that stay synchronized with your data.
Key features:
- Automatic indexing — Data matching your query is automatically indexed
- Relevance scoring — Results ranked by BM25 score
- Stemming — Words reduced to root form (e.g., "running" → "run")
- Stopwords — Common words filtered out (e.g., "the", "and")
- Incremental updates — Index updates automatically as data changes
Creating a BM25 Index
Define a BM25 index by inserting an entity with types f:VirtualGraph and fidx:BM25:
{ "@context": { "f": "https://ns.flur.ee/ledger#", "fidx": "https://ns.flur.ee/index#", "ex": "http://example.org/" }, "insert": { "@id": "ex:articleSearch", "@type": ["f:VirtualGraph", "fidx:BM25"], "f:virtualGraph": "articleSearch", "fidx:stemmer": {"@id": "fidx:snowballStemmer-en"}, "fidx:stopwords": {"@id": "fidx:stopwords-en"}, "f:query": { "@type": "@json", "@value": { "@context": {"ex": "http://example.org/"}, "where": [{"@id": "?x", "ex:author": "?author"}], "select": {"?x": ["@id", "ex:title", "ex:summary"]} } } }}
Required Properties
| Property | Description |
|---|---|
@type | Must include both f:VirtualGraph and fidx:BM25 |
f:virtualGraph | Name used to reference the index in queries |
f:query | Query defining which data to index |
Configuration Options
| Property | Description | Default |
|---|---|---|
fidx:stemmer | Stemmer algorithm for the index | None |
fidx:stopwords | Stopwords list to filter common words | None |
Available Stemmers
| Stemmer ID | Language |
|---|---|
fidx:snowballStemmer-en | English |
Available Stopword Lists
| Stopwords ID | Language |
|---|---|
fidx:stopwords-en | English |
Index Query Requirements
The f:query property defines what data gets indexed. It has specific requirements:
- Must use subgraph selector — The
selectmust be an object, not an array - Must include
@id— The subgraph selector must include"@id" - Cannot use wildcard — Cannot use
"*"in the selector
Valid:
{"select": {"?x": ["@id", "ex:title", "ex:summary"]}}
Invalid:
{"select": ["?x", "?title"]} // Not a subgraph selector{"select": {"?x": ["ex:title"]}} // Missing @id{"select": {"?x": ["@id", "*"]}} // Contains wildcard
Querying a BM25 Index
Query the index using a graph clause with the index name prefixed by ##:
{ "@context": { "ex": "http://example.org/", "fidx": "https://ns.flur.ee/index#" }, "select": ["?doc", "?score", "?title"], "where": [ ["graph", "##articleSearch", { "fidx:target": "search terms here", "fidx:limit": 10, "fidx:result": { "@id": "?doc", "fidx:score": "?score" } }], {"@id": "?doc", "ex:title": "?title"} ]}
Query Parameters
| Parameter | Required | Description |
|---|---|---|
fidx:target | Yes | Search query string |
fidx:limit | No | Maximum number of results |
fidx:sync | No | Wait for index to be current (default: false) |
fidx:result | Yes | Result binding pattern |
Result Binding
The fidx:result object binds variables to the search results:
{ "fidx:result": { "@id": "?doc", "fidx:score": "?score" }}
@idbinds the IRI of matching documentsfidx:scorebinds the BM25 relevance score
Complete Example
1. Insert Data
{ "@context": {"ex": "http://example.org/"}, "insert": [ { "@id": "ex:article1", "ex:author": "Jane Smith", "ex:title": "Introduction to Graph Databases", "ex:summary": "Graph databases store data as nodes and edges, enabling complex relationship queries." }, { "@id": "ex:article2", "ex:author": "John Doe", "ex:title": "Semantic Web Technologies", "ex:summary": "The semantic web uses RDF and linked data to create machine-readable content." }, { "@id": "ex:article3", "ex:author": "Jane Smith", "ex:title": "Building Knowledge Graphs", "ex:summary": "Knowledge graphs combine structured data with semantic relationships for AI applications." } ]}
2. Create Index
{ "@context": { "f": "https://ns.flur.ee/ledger#", "fidx": "https://ns.flur.ee/index#", "ex": "http://example.org/" }, "insert": { "@id": "ex:articleIndex", "@type": ["f:VirtualGraph", "fidx:BM25"], "f:virtualGraph": "articleIndex", "fidx:stemmer": {"@id": "fidx:snowballStemmer-en"}, "fidx:stopwords": {"@id": "fidx:stopwords-en"}, "f:query": { "@type": "@json", "@value": { "@context": {"ex": "http://example.org/"}, "where": [{"@id": "?x", "ex:author": "?author"}], "select": {"?x": ["@id", "ex:title", "ex:summary"]} } } }}
3. Search the Index
{ "@context": { "ex": "http://example.org/", "fidx": "https://ns.flur.ee/index#" }, "select": ["?doc", "?score", "?title"], "where": [ ["graph", "##articleIndex", { "fidx:target": "semantic knowledge graph", "fidx:limit": 10, "fidx:result": { "@id": "?doc", "fidx:score": "?score" } }], {"@id": "?doc", "ex:title": "?title"} ], "orderBy": "(desc ?score)"}
This returns articles ranked by relevance to "semantic knowledge graph", with stemming applied (e.g., "graph" matches "graphs").
How BM25 Scoring Works
BM25 scores documents based on:
- Term frequency (TF) — How often search terms appear in a document
- Inverse document frequency (IDF) — How rare the terms are across all documents
- Document length normalization — Adjusts for document size
Higher scores indicate more relevant documents. Scores are unbounded but typically range from 0 to several units depending on query and corpus.
Index Updates
BM25 indexes update automatically when data changes:
- Inserts — New documents matching the index query are added
- Updates — Modified documents are re-indexed
- Deletes — Removed documents are removed from the index
Updates happen asynchronously. Use fidx:sync: true in queries if you need to ensure the index is current.
Best Practices
- Be specific with index queries — Index only the data you need to search
- Use appropriate language settings — Match stemmer and stopwords to your content language
- Include all searchable fields — The index only searches fields in the
selectsubgraph - Use
fidx:limit— Limit results for better performance on large datasets - Order by score — Use
orderBy: "(desc ?score)"to show most relevant results first