v0.2 · early alpha · MCP record/replay in dev

The prompt lifecycle,
under instrument.

Manage, version, and release LLM prompts the way you ship code. Git-native specs, signed releases, JUnit-ready replay.

$ curl -fsSL https://raw.githubusercontent.com/promptLM/promptlm-app/main/scripts/install.sh | bash
promptlm.yaml · v1.0.0 → v1.1.0 +2 −2 · just now
1 id: customer_support
2 - version: 1.0.0
2 + version: 1.1.0
3 request:
4 parameters:
5 - temperature: 0.6
5 + temperature: 0.3

03 — capabilities

One tool for the full loop —
from first prompt to thousandth regression.

MCP TESTING PLANNED

Record real MCP sessions. Replay them deterministically.

Capture every tool call, freeze the responses, and run them back in CI. No flaky network. No vendor rate limits. Same bytes, every time.

MCP · TOOLS/CALENDAR ● recording · 00:14
list_events+0.012s
list_events · 200+0.184s
create_event+0.224s
create_event · 200+0.412s
send_invite+0.430s
send_invite · 429+1.124s

PROMPT REGRESSION

Diff prompts. Track quality. Catch drift before users do.

Every prompt is versioned. Every change runs the suite. Pass-rate, latency, and tokens land in the PR before review.

# promptlm.yaml
id: customer_support
group: support
version: 1.0.0
request:
  vendor: openai
  model: gpt-4o
  parameters: { temperature: 0.6, maxTokens: 256 }
  messages:
    - role: system
      content: You are a support assistant.
    - role: user
      content: "Summarize: {{ticket}}"

MODEL MOCKING

Mock any model.
Test the messy edges.

Inject malformed JSON. Force a refusal. Drop a tool call halfway through. Build the failure modes that production will eventually serve you — and write the tests that catch them.

  • Token streaming with synthetic latency
  • Tool-call sequences with deterministic seeds
  • Refusals, truncation, malformed JSON
  • Side-by-side: gpt-4o vs claude-3-5-sonnet
// TranslationServiceTest.java
@EnablePromptWireMock
class TranslationServiceTest {

  @Test
  void translates(
    @InjectPrompt(id = "translate-hello") String prompt,
    @InjectResponse(id = "translate-hello") String response) {
    // WireMock stubs auto-generated from your prompt repo
  }
}

GET STARTED

Ship prompts
like code.