Melker Built-in AI Assistant

How It Works

The AI assistant doesn't use screenshots or pixel recognition. Instead, it reads the same document tree that the rendering engine uses — the actual structure of your UI, not an image of it.

When you ask a question, the system gathers three things and sends them to the model:

Screen content — a text representation of everything visible, with element types, labels, values, and ARIA annotations
Element tree — the full component hierarchy with IDs, props, and focus state
Available actions — keyboard shortcuts and navigation options

This is the same information a screen reader would use, except the model can also act on it. The AI has tools to click buttons, type text, change values, and draw on canvases. It operates the UI the same way a user would.

The assistant excludes its own dialog from the context it sends to the model. It never describes itself.

Getting Started

The assistant needs an LLM endpoint to work. Either use OpenRouter:

export OPENROUTER_API_KEY=your_key_here

Or point it at a local model (Ollama, LM Studio, etc.) with an OpenAI-compatible API:

export MELKER_AI_ENDPOINT=http://localhost:11434/v1/chat/completions
export MELKER_AI_MODEL=llama3
export OPENROUTER_API_KEY=unused   # required but ignored by local endpoints

Without either of these, Alt+H does nothing. Once configured, open the assistant in any app:

Alt+H

Open with text input

F7

Open and start voice recording

The assistant works with any app — you don't need to do anything special. It reads the UI structure automatically. For richer context, add ARIA attributes like role, aria-label, and aria-description.

What It Can Do

Describe what's on screen

You

What am I looking at?

AI

You're viewing a settings panel with three tabs: General, Advanced, and About. The General tab is active, showing a theme dropdown set to "dark" and a language selector. There's a Save button at the bottom.

Operate the UI

The AI doesn't just describe — it acts. Ask it to do something and it uses tools to make it happen.

You

Switch the theme to light and save

send_event → theme dropdown → "light"

send_event → click save button

AI

Done. I changed the theme to light and clicked Save. The settings have been saved.

Read long content

Elements can contain more text than what's visible in the viewport. The read_element tool fetches the full content.

You

Summarize the markdown document

read_element → markdown#docs

AI

The document covers three topics: installation (Deno and npm methods), configuration files (melker.json format and options), and environment variables for customizing runtime behavior. About 400 words total.

Draw on maps

For tile-map components, the AI can change the view and draw SVG overlays with paths and labels.

You

Draw a circle around London and label it

send_event → draw → <path d="M -0.1278 51.5574 A 0.05 0.05 ..." stroke="red"/><text lat="51.5074" lon="-0.1278">London</text>

AI

I drew a red circle around central London and added a label.

Multi-turn conversations

The assistant remembers context within a session. Ask follow-up questions without repeating yourself.

You

What options are in the dropdown?

AI

The provider dropdown has: OpenStreetMap, Terrain, Streets, Voyager, and Voyager No Labels.

You

Switch to terrain

send_event → change provider → "terrain"

AI

Switched to Terrain view.

Voice Input

Press F7 or click "Listen" to start recording. Speak your question, then press F7 again or wait for the 5-second timeout. The audio is trimmed, transcribed, and submitted automatically.

During recording, a volume meter shows input level and a countdown shows remaining time.

Platform	Audio capture
macOS	Native Swift (AVAudioEngine) — no dependencies
Linux	ffmpeg with PulseAudio/PipeWire or ALSA
Windows	ffmpeg with DirectShow

Why It Works

Most "AI for UI" approaches screenshot the screen and ask a vision model to interpret pixels. This is slow, expensive, and fragile — a button that moved 3 pixels might confuse the model.

Melker takes a different approach. Because it controls the rendering pipeline, it can give the model the source of truth: the document tree itself. The AI sees:

[Container: main, flex, column]
  [Text: "Welcome to Settings"]
  [Tabs: General | Advanced | About]  (active: General)
    [Container: row]
      [Text: "Theme"]
      [Select#theme: "dark"]  (options: light, dark, system)
    [Container: row]
      [Text: "Language"]
      [Select#lang: "en"]
  [Button: "Save"]

This is unambiguous. The model knows every element's type, ID, value, and position in the hierarchy. It can target elements by ID when using tools, so actions are precise.

The feedback loop is tight: after each tool call, the context is rebuilt from the current UI state. If clicking a button opens a dialog, the model sees the dialog in the next round. Three rounds of tool calls give enough room for multi-step operations.

ARIA attributes make this even better. A <container role="navigation" aria-label="Main menu"> shows as [Navigation: Main menu] instead of [Container]. The model understands the purpose of elements, not just their type.

Custom Tools

Apps can register their own tools, giving the AI domain-specific capabilities. A weather app might expose a "get forecast" tool. A database browser might expose "run query".

<script>
  registerAITool({
    name: "search_products",
    description: "Search the product catalog",
    parameters: {
      query: { type: "string", required: true, description: "Search terms" },
      category: { type: "string", required: false, description: "Filter by category" }
    },
    handler: async (args) => {
      const results = await searchAPI(args.query, args.category);
      updateResultsList(results);
      return { success: true, message: `Found ${results.length} products` };
    }
  });
</script>

When custom tools are registered, the system prompt tells the model about them. The model uses them alongside the built-in tools naturally.

You

Find me wireless keyboards under $50

search_products → query: "wireless keyboard", category: "peripherals"

AI

Found 12 wireless keyboards. The list now shows results filtered by price. The cheapest is the Logitech K380 at $29.99.

Surprising Things

It works on apps you didn't build

The assistant is part of the engine, not the app. Open any .melker file and press Alt+H. The AI can describe the UI, read content, and click buttons — even if the app author never thought about AI assistance. It's like having a screen reader that can also operate the controls.

It handles forms end-to-end

"Fill in the registration form with test data" — the model iterates through inputs, selects, checkboxes, and radio buttons in sequence, using a fresh context snapshot after each action. It sees what changed and adapts.

It draws on maps from natural language

"Draw a route from Paris to Berlin" produces SVG path overlays on tile-map components, with the AI choosing appropriate coordinates, stroke colors, and labels. It uses arc commands for circles, Bezier curves for smooth routes, and text elements for labels.

Voice makes terminal apps conversational

A terminal dashboard where you press F7 and say "zoom into the US east coast" or "what's the highest value in the heatmap" turns a display-only interface into something interactive without writing any handler code.

ARIA attributes compose with AI understanding

Standard web accessibility attributes gain a new dimension. aria-description="Shows real-time sensor readings, updates every 5s" doesn't just help screen readers — it tells the AI what the data means. The model can then answer "is the temperature sensor working?" by checking if values are updating.

It can close itself

The model has a close_dialog tool. "Check the current temperature and close" — it reads the value, responds, and dismisses its own dialog. It also has exit_program, which exits the app entirely.

Configuration

Environment variables control the model, endpoint, and audio behavior. All are read fresh on each API call — change them without restarting.

Variable	Default	Purpose
`OPENROUTER_API_KEY`	(required)	OpenRouter API key
`MELKER_AI_MODEL`	`openai/gpt-5.2-chat`	Chat/tool model
`MELKER_AUDIO_MODEL`	`openai/gpt-4o-audio-preview`	Audio transcription model
`MELKER_AI_ENDPOINT`	OpenRouter	API endpoint URL
`MELKER_AUDIO_GAIN`	`2.0`	Recording gain multiplier