NobodyWho provides Encoder and CrossEncoder classes for building retrieval-augmented generation (RAG) pipelines entirely on-device.
Embeddings
An Encoder converts text into a numerical vector (embedding) that captures its semantic meaning. Texts with similar meanings will have similar embeddings.
import NobodyWho
let encoder = try await Encoder.fromPath(modelPath: "/path/to/embeddings.gguf", contextSize: 512, useGpu: true)
let embedding1 = try await encoder.encode("The cat sat on the mat")
let embedding2 = try await encoder.encode("A feline rested on the rug")
let embedding3 = try await encoder.encode("The stock market crashed today")
let similar = cosineSimilarity(a: embedding1, b: embedding2) // High similarity
let different = cosineSimilarity(a: embedding1, b: embedding3) // Low similarity
You can also create an encoder from an already-loaded model:
let model = try await Model.load(modelPath: "/path/to/embeddings.gguf", useGpu: true)
let encoder = Encoder(model: model, contextSize: 512)
Cross-Encoder for reranking
A CrossEncoder takes a query and a list of documents and scores each document by its relevance to the query. Unlike embeddings (which are computed independently), a cross-encoder processes the query and document together, giving more accurate relevance scores.
let crossEncoder = try await CrossEncoder.fromPath(modelPath: "/path/to/reranker.gguf", contextSize: 512, useGpu: true)
let query = "How do I reset my password?"
let documents = [
"Click 'Forgot Password' on the login page.",
"Our company was founded in 2020.",
"Contact support for account recovery.",
"The weather is sunny today.",
]
// Get raw similarity scores
let scores = try await crossEncoder.rank(query: query, documents: documents)
// Or get documents sorted by relevance (most relevant first)
let ranked = try await crossEncoder.rankAndSort(query: query, documents: documents)
for (document, score) in ranked {
print("\(score): \(document)")
}
Building a RAG pipeline
A typical RAG pipeline combines both tools:
- Index: Use the
Encoderto create embeddings for your document collection - Retrieve: When a user asks a question, embed the query and find the most similar documents using
cosineSimilarity - Rerank (optional): Use the
CrossEncoderto rerank the top candidates for better precision - Generate: Pass the relevant documents to a
Chatas context in the system prompt
// 1. Embed your documents (do this once, store the results)
let encoder = try await Encoder.fromPath(modelPath: "/path/to/embeddings.gguf", contextSize: 512, useGpu: true)
let docs = ["Document 1...", "Document 2...", "Document 3..."]
let docEmbeddings = try await docs.asyncMap { try await encoder.encode($0) }
// 2. Embed the query and find similar documents
let queryEmbedding = try await encoder.encode("What is the return policy?")
let similarities = docEmbeddings.map { cosineSimilarity(a: queryEmbedding, b: $0) }
// 3. Rerank the top results
let crossEncoder = try await CrossEncoder.fromPath(modelPath: "/path/to/reranker.gguf", contextSize: 512, useGpu: true)
let topDocs = // ... select top N by similarity
let ranked = try await crossEncoder.rankAndSort(query: "What is the return policy?", documents: topDocs)
// 4. Generate a response with context
let context = ranked.prefix(3).map { $0.0 }.joined(separator: "\n\n")
let chat = try await Chat.fromPath(
modelPath: "/path/to/model.gguf",
systemPrompt: "Answer based on the following documents:\n\n\(context)"
)
let response = try await chat.ask("What is the return policy?").completed()