v0.2 · early alpha · MCP record/replay in dev

The prompt lifecycle,
under instrument.

Manage, version, and release LLM prompts the way you ship code. Git-native specs, signed releases, JUnit-ready replay.

Get started View on GitHub

$ curl -fsSL https://raw.githubusercontent.com/promptLM/promptlm-app/main/scripts/install.sh | bash

promptlm.yaml · v1.0.0 → v1.1.0 +2 −2 · just now

1 id: customer_support

2 - version: 1.0.0

2 + version: 1.1.0

3 request:

4 parameters:

5 - temperature: 0.6

5 + temperature: 0.3

03 — capabilities

One tool for the full loop —
from first prompt to thousandth regression.

MCP TESTING PLANNED

Record real MCP sessions. Replay them deterministically.

Capture every tool call, freeze the responses, and run them back in CI. No flaky network. No vendor rate limits. Same bytes, every time.

MCP · TOOLS/CALENDAR ● recording · 00:14

→list_events+0.012s

←list_events · 200+0.184s

→create_event+0.224s

←create_event · 200+0.412s

→send_invite+0.430s

←send_invite · 429+1.124s

PROMPT REGRESSION

Diff prompts. Track quality. Catch drift before users do.

Every prompt is versioned. Every change runs the suite. Pass-rate, latency, and tokens land in the PR before review.

# promptlm.yaml
id: customer_support
group: support
version: 1.0.0
request:
  vendor: openai
  model: gpt-4o
  parameters: { temperature: 0.6, maxTokens: 256 }
  messages:
    - role: system
      content: You are a support assistant.
    - role: user
      content: "Summarize: {{ticket}}"

MODEL MOCKING

Mock any model.
Test the messy edges.

Inject malformed JSON. Force a refusal. Drop a tool call halfway through. Build the failure modes that production will eventually serve you — and write the tests that catch them.

Token streaming with synthetic latency
Tool-call sequences with deterministic seeds
Refusals, truncation, malformed JSON
Side-by-side: gpt-4o vs claude-3-5-sonnet

// TranslationServiceTest.java
@EnablePromptWireMock
class TranslationServiceTest {

  @Test
  void translates(
    @InjectPrompt(id = "translate-hello") String prompt,
    @InjectResponse(id = "translate-hello") String response) {
    // WireMock stubs auto-generated from your prompt repo
  }
}

GET STARTED

Ship prompts
like code.

View on GitHub Read the docs

The prompt lifecycle, under instrument.

One tool for the full loop — from first prompt to thousandth regression.