A complete guide to using embeddings for semantic text comparison and natural language understanding.


Cool, you've got the basics of chat working! Now let's explore embeddings, which let you understand what text means rather than just matching exact words.

Embeddings are like a smart way to measure how similar two pieces of text are, even if they use completely different words. Instead of looking for exact matches, embeddings understand meaning.
For example, "Hand me the red potion" and "Give me the scarlet flask" would be recognized as very similar, even though they share no common words.

Here are the key terms for working with embeddings:

Term Meaning
Embedding Model (GGUF) A specialized *.gguf file trained to convert text into numerical vectors that represent meaning.
Embedding A list of numbers (vector) that represents the meaning of a piece of text.
Cosine Similarity A mathematical way to compare how similar two embeddings are, returning a value between 0 (completely different) and 1 (identical meaning).
Semantic Search Finding text that means the same thing, even if the words are different.
Vector The array of numbers that represents your text's meaning.

Let's show you how to use embeddings to understand what your players really mean when they type commands.

Download an Embedding Model

Embedding models are different from chat models. You need a model specifically trained for embeddings.

We normally use bge-small-en-v1.5-q8_0.gguf.

Practical Example: Quest & Reputation System

A good way to visualize the practicality of embeddings is through an example. In this example we will guide you through how to make a quest trigger or lowering the user's reputation based on what they say.

We'll build it step by step, but for the impatient; The complete script is copyable in the bottom of the page.

Step 1: Set up your basic structure and variables

The first step is to setup our components. We will add some statements for quests and some for hostile behavior - these are not exhaustive lists.

Do note that it will take a longer time to embed a lot of sentences (depending on model and hardware of course), so depending on how complex your statements need to be, you might be better off having a handful and tuning the sensitivity of the trigger instead.

First, create your script that extends NobodyWhoEmbedding and define your statement categories:

extends NobodyWhoEmbedding

var quest_triggers= [
    "I know where the dragon rests",
    "The druid told me the proper way to meet the dragon",
    "I discovered the ritual needed to gain the dragon's audience",
    "I know about the sacred grove"
]

var hostile_statements = [
    "I want to kill the dragon",
    "I'm going to destroy everything",
    "I hate this place and everyone in it",
    "I will burn down the village",
    "Everyone here deserves to die"
]

var helpful_embeddings = []
var hostile_embeddings = []
var player_reputation = 0

Step 2: Initialize the embedding system

Set up the embedding model and start the worker:

func _ready():
    # Create and configure the embedding model
    var embedding_model = NobodyWhoModel.new()
    embedding_model.model_path = "res://models/bge-small-en-v1.5-q8_0.gguf"
    get_parent().add_child(embedding_model)

    # Link to the embedding model
    self.model_node = embedding_model
    self.embedding_finished.connect(_on_embedding_finished)
    self.start_worker()

    # Pre-generate embeddings for all statement types
    precompute_all_embeddings()

Step 3: Precompute reference embeddings

Generate embeddings for all your reference statements:

func precompute_all_embeddings():
    # Generate embeddings for helpful statements
    for statement in quest_triggers:
        embed(statement)
        var embedding = await self.embedding_finished
        helpful_embeddings.append(embedding)

    # Generate embeddings for hostile statements
    for statement in hostile_statements:
        embed(statement)
        var embedding = await self.embedding_finished
        hostile_embeddings.append(embedding)

Step 4: Add input handling for testing

Add a simple test trigger using the enter key:

func _input(event):
    # Handle enter key press to send hardcoded test message
    if event is InputEventKey and event.pressed:
        if event.keycode == KEY_ENTER:
            var test_message = "I know the location of the dragon"
            print("Sending test message: ", test_message)
            analyze_player_statement(test_message)

Step 5: Analyze player statements

Compare the player's message against your reference embeddings:

func analyze_player_statement(player_text: String):
    # Generate embedding for player input
    embed(player_text)
    var player_embedding = await self.embedding_finished

    # Compare against both categories
    var best_helpful_similarity = get_best_similarity(player_embedding, helpful_embeddings)
    var best_hostile_similarity = get_best_similarity(player_embedding, hostile_embeddings)

    print("Helpful similarity: ", best_helpful_similarity)
    print("Hostile similarity: ", best_hostile_similarity)

    # Use similarity threshold of 0.8 and compare categories
    if best_helpful_similarity > 0.8 and best_helpful_similarity > best_hostile_similarity:
        handle_helpful_information(player_text)
    elif best_hostile_similarity > 0.8 and best_hostile_similarity > best_helpful_similarity:
        handle_hostile_intent(player_text)
    else:
        print("Unclear intent - no strong match found")

Step 6: Handle the results

Trigger appropriate game systems based on detected intent:

func handle_helpful_information(text: String):
    # Trigger game systems based on detected intent
    print("🐉 Triggering quest: 'Audience with the Ancient Dragon'!")

func handle_hostile_intent(text: String):
    player_reputation -= 15
    print("Player expressed hostile intent! Reputation -15 (now: ", player_reputation, ")")