Skip to content

Free AI Chat, Tools & Fun — Cloudflare edge gateway

No login. Streaming chat. Internal tools and image fun pages.

VibeWatcher
Go back

How I Built a Secure Multi-Model AI Gateway with Cloudflare Workers (Grok + GPT/Codex)

Published:

Why I Built This

I wanted a single AI entry point where users can switch between Grok and GPT/Codex without login friction and unstable relay layers.

The target was simple:

Live entry: Open AI Chat Terminal


Architecture Overview

Browser (Astro frontend)
        ↓  POST /v1/chat
Cloudflare Worker (security gateway)
        ↓  forward request
Upstream AI providers (Grok / GPT / Codex)
        ↓  SSE stream
Worker passthrough

Browser incremental rendering

This design removes traditional server ops completely. No VPS, no container orchestration, no long-running backend.


Security Hardening Strategy

1) Origin Gate (CORS + allowlist)

Never trust direct client requests.
Only requests from approved origins are accepted:

const ALLOWED_ORIGINS = ["https://your-domain.com", "http://localhost:4321"];
if (!ALLOWED_ORIGINS.includes(origin)) {
  return new Response("Forbidden", { status: 403 });
}

This blocks cross-site abuse before touching expensive upstream APIs.


2) Prompt Hardening (Token Cost Control)

Before forwarding messages, the gateway injects a hidden instruction that enforces concise, content-first responses and text-only constraints.
This reduces token waste from repetitive filler text and model drift.


3) Abuse Intercept for Image-Bait Prompts

Some users try to bypass image quotas through chat mode.
The worker pre-checks short image-generation intents with regex and can return a synthetic SSE denial response without calling upstream.

Result: no token burn, better quota protection.


4) Dual-Layer Rate Limiting (KV)

Cloudflare KV stores daily counters:

async function checkRateLimit(kv, ip, type, max) {
  const today = new Date().toISOString().split("T")[0];
  const key = `limit:${ip}:${type}:${today}`;
  const count = parseInt((await kv.get(key)) ?? "0", 10);
  if (count >= max) return false;
  await kv.put(key, String(count + 1), { expirationTtl: 86400 });
  return true;
}

This protects both against single-IP abuse and high-volume proxy pool attacks.


SSE Streaming in Worker: The Key Detail

To keep the typewriter UX, do not buffer full upstream response.
Pass through upstream.body directly:

return new Response(upstream.body, {
  headers: {
    "Content-Type": "text/event-stream",
    "Cache-Control": "no-cache",
    "X-Accel-Buffering": "no",
  },
});

On the frontend, parse SSE chunks incrementally with ReadableStream.getReader() and render progressively.


Frontend Notes (Astro + Lightweight JS)

The /chat page intentionally avoids heavy frameworks for fast cold start and resilient mobile UX.

Key points:


Deployment Notes

  1. Configure Worker and wrangler.toml
  2. Create KV namespace for limiter state
  3. Store secrets via wrangler secret put API_KEY
  4. Deploy Worker and bind your domain

This gives a practical AI gateway with strong security controls and near-zero ops cost.


Recent Updates (2026-03)

After launch, I shipped another round of practical fixes worth documenting:

These are not cosmetic tweaks—they are stability and consistency fixes discovered under real usage.


Closing

If you are building an AI terminal for public traffic, security and cost control are the real product.

These are not optional—they are the foundation.

Try the English AI Chat Terminal


Share this post on:

Previous Post
From a Web Mirror to an AI Assistant on My Phone: I Built an Android App in Two Days (Vibe Coding Practice Log)
🎵