Voice Control – journey begins

In Star Trek, the computer isn’t just a voice recorder but an invisible crew member. When Picard says, “Computer, divert emergency power to the forward shields,” he’s not navigating a menu—he’s issuing a complex command with variables. And I f***ing LOVE that idea.

My goal wasn’t just to make the game “listen,” but to give developer (that’s me!) a way to let players call ships by their unique names, input coordinates via the NATO phonetic alphabet, and interact with the world without ever touching a mouse.

I needed to be quiet, so so there are few problems in the prototype.

Computer: Status Report

A voice system is only as good as its parser. If the system only recognizes “Fire,” it’s a toy. If it recognizes “Fire Tube One at the Klingon Bird of Prey,” it’s a tool.

I designed the architecture to handle In-Speech Variables. This allows a game designer to define a command structure and let the player fill in the blanks dynamically.

The Parser Workflow:

  1. The Listeners: Global module tuned for specific “Wake Words” (like the classic Computer or Siri). This is nothing new.
  2. The Tokenizer: Once the system triggers, it breaks the speech into tokens. This is where the magic happens. Instead of looking for a static string, the system looks for Keys and Parameters.
  3. Variable Injection: If you have a fleet of twenty ships, you don’t want to hardcode twenty commands. You want one command: [ShipName] + [Action]. The system parses the ship name from your live fleet list and executes the action on that specific object.

To boldly go... into heuristics

In a game framework like I try to build, we are dealing with a highly deterministic environment. If a player says, “Shields to maximum,” there is exactly one correct outcome. However, the industry’s current heuristic models are almost entirely (from what I was able to find, maybe I’m a bad researcher) optimized for Large Language Models (LLMs) and Chatbots, which are inherently non-deterministic.

The Deterministic Dilemma: Too Many Right Answers

In a sandbox or a tactical sim, the state space is enormous. At any given second, the “available options” for a voice command might include:

  • Names of 50+ unique ships in the sector.
  • Nato Phonetic coordinates (thousands of combinations).
  • System-specific commands (Power, Shields, Navigation, Comms).

Standard chatbot heuristics use Probabilistic Guessing. They look at a sentence and ask, “What is the most likely intent?” This works for asking about the weather, but it’s disastrous for a tactical interface. If the player says “Target Bravo,” and the heuristic decides there’s a 60% chance they meant “Target Echo” because “Echo” is a more common word in its training data, the immersion is instantly broken.

(I know this is huge simplification and there are ways to go around that particular issue, but current solutions are just moving goalpost to the more complex problems)

The system fails because it’s trying to be “smart” (guessing intent) instead of being precise (validating against game state).

Current heuristics are designed to handle unstructured conversation. They prioritize “Natural Language” over “Technical Accuracy.” Here we have the opposite priority.

So right now: with people from Orion Belt Games we are working on “how to not crash players spirit with number of avaliable commands“. We’ll see how that goes.