GPU-AS-A-SERVICE PLATFORM

AI Compute for
Manufacturing & R&D

Submit ML tasks to distributed GPU infrastructure. Defect detection, material property prediction — powered by 60 RTX 5070 GPUs across Poland. Mining when idle, AI on demand.

GPUs

Servers

AI Models

24/7

Monitoring

Client Dashboard

Upload data, create tasks, monitor real-time status, view results with heatmaps & SHAP analysis

Enter

Investor Dashboard

Monitor server fleet, toggle mining/AI, revenue analytics, ROI comparison

Enter

Expert Verification

Review AI results, approve/reject detections, provide domain expertise feedback

Enter

Cloudflare Edge Hono API Neon Postgres GPU Agents RTX 5070

System Architecture PROJECT_SPEC.md

CLOUDFLARE (Serverless Edge) ├─ user-application TanStack Start + React 19 SSR + shadcn/ui ├─ data-service Hono API on CF Workers (REST + SSE) ├─ Neon Postgres Drizzle ORM, 12+ tables ├─ CF R2 Object storage (presigned URLs, tenant-isolated) ├─ CF Queue ml-job-queue + parse-dispatch-queue + DLQ └─ CF Cron Heartbeat (60s), metrics (daily), DLQ drain (5min) | | HTTPS (outbound only, pull-based, no inbound ports) v GPU SERVERS (On-Premise, Poland) ├─ 10 servers x 6 RTX 5070 = 60 GPUs ├─ gpu-agent Node.js daemon (heartbeat, job poll, docker mgmt) ├─ Docker containers YOLO (GPU, port 5001) + XGBoost (CPU, port 5002) └─ HiveOS Client Mining orchestration (start/stop via API)

Frontend

TanStack Start, React 19, shadcn/ui, Tailwind v4

Backend

Hono on CF Workers, <30s CPU limit

Auth

Better Auth, email+password, role-based

Database

Neon Postgres + Drizzle ORM

Monorepo

pnpm workspaces: user-app, data-service, gpu-agent, data-ops

Design Docs

8 design docs, 13,000+ lines, 100+ API endpoints, all P0 issues resolved

Key decision: Pull-based job dispatch (works behind NAT without fixed IPs). Presigned URLs bypass CF Worker memory limits. Single task = 1 server, 1 GPU for MVP simplicity.

MVP timeline: 3 months. Month 1: foundation + infra. Month 2: models + billing. Month 3: dashboards + launch.

ML Compute Exchange — Interactive Prototype — All data is simulated

Jan Kowalski

ID	Model	Status	Created	Files	Duration	Cost	Server

Under the Hood: Task Pipeline 03-compute-agent.md

Dispatch

Cloudflare Queue ml-job-queue

Job Claim

SELECT FOR UPDATE SKIP LOCKED

Real-time

SSE via GET /tasks/:id/stream

Allocation

1 task = 1 server, 1 GPU (MVP)

API endpoints:

POST/tasks

GET/tasks?page=1&limit=20&status=running

GET/tasks/:id/stream (SSE)

POST/tasks/:id/cancel

Status flow:

queued→ assigned→ running→ completed

Atomicity: task completion + usage_record creation + server release in single DB transaction. Rollback if any step fails.

Recent Uploads

Under the Hood: Data Pipeline 02-data-pipeline.md

Upload Method

Presigned R2 URLs 60min TTL

Storage

Cloudflare R2, tenant-isolated prefixes

Parsing

Async via parse-dispatch-queue

Formats

JPG, PNG, CSV, PDF, DOCX, XML, ZIP

Flow:

Client→ POST /uploads/presign→ Direct PUT to R2→ POST /uploads/confirm→ Queue: parse

Files never pass through CF Workers (bypass 128MB memory limit). Multipart presigned URLs for large files. R2 path: {tenant_id}/uploads/{upload_id}/{filename}

uploads ├─ id, tenant_id, r2_key ├─ status: uploaded → parsing → parsed → failed ├─ parsed_r2_key (chunks.json for docs, null for images) └─ chunk_count, mime_type, size_bytes

Defect Detection

YOLO v11 — Computer Vision

GPU required 15 PLN/GPU-h

Input: Product images (JPG/PNG)

Output: Bounding boxes, heatmaps, PDF

Composite Prediction

XGBoost Surrogate + SHAP

CPU only 5 PLN/GPU-h

Input: Material composition (CSV)

Output: Predictions + recommendations

Task Submitted!

Redirecting to task list...

Under the Hood: Model Serving 04-model-serving.md

YOLO v11

Docker: node:20-slim, GPU 1, port 5001
ONNX runtime, yolo11n.onnx weights
VRAM: ~4GB, inference: POST /predict

XGBoost Surrogate

Docker: FastAPI, CPU-only, port 5002
.ubj weight files + SHAP TreeExplainer
POST /predict + POST /recommend

Health checks: Every 30s on each container. Managed by gpu-agent via docker-compose.

Dynamic form: GET /models/:id/params returns field schema (type, min/max, defaults) per model.

models ├─ id, name, domain ('manufacturing'|'materials') ├─ base_model, quantization, vram_required_mb ├─ docker_image, serve_config (jsonb) └─ status: active | training | deprecated

Under the Hood: GPU Agent Job Execution 03-compute-agent.md

Pull-based architecture — agents initiate ALL connections (no inbound ports, works behind NAT):

Agent polls→ GET /agent/jobs/next→ Atomic claim→ Download from R2→ Docker inference→ Upload result to R2→ POST status

GET/agent/jobs/next → {taskId, modelId, inputR2Keys[], parameters}

GET/agent/jobs/:id/input-urls → presigned download URLs

POST/agent/jobs/:id/result-url → presigned upload URL

POST/agent/jobs/:id/status {status, resultR2Key}

Analysis: TSK-001 — Defect Detection

Images Analyzed

128

Defects Found

Pass Rate

90.6%

Avg Confidence

87%

Detection Heatmap — Sample Image

Defect Type Distribution

Most Common

Scratch (35%)

Highest Confidence

Scratch (94%)

SHAP Feature Impact — Composite Prediction (TSK-002)

XGBoost Surrogate

Optimization Recommendations

Increase cure_temp_c by 15°C

+12% predicted tensile strength

Increase fiber_volume to 62%

+8% predicted tensile strength

Switch resin_type to Epoxy-B

+5% predicted tensile strength

Current prediction 842 MPa

Optimized prediction 967 MPa (+14.8%)

Under the Hood: Result Generation 04-model-serving.md + 05-client-dashboard.md

YOLO Heatmap

Gaussian blobs at detection centers. Per-image + batch aggregate (spatial defect pattern). Canvas overlay with bounding box corners.

XGBoost SHAP

SHAP TreeExplainer → ranked parameter impact. Horizontal bar chart + optimization delta.

PDF Report

Auto-generated: summary stats, per-image detections, heatmaps, pass/fail classification, batch statistics.

Visualization

React 19 + recharts for charts. Canvas for image overlays. shadcn/ui components.

Fine-tuning: Min 500 labeled images (YOLO format: .txt with class_id center_x center_y width height). LoRA adapter merge → redeploy (post-MVP).

This Month

14.38 PLN

2 tasks completed

Total GPU Time

23m 20s

Across 6 tasks

Avg Cost/Task

2.40 PLN

Last 30 days

Usage History

Task	Model	GPU Time	Rate	Cost	Date

Under the Hood: Billing Engine 07-billing.md

Metering Formula

duration = completed_at - started_at
gpu_seconds = duration * gpu_count
cost = (gpu_seconds / 3600) * rate

Rate Lookup

By model_id + effective time window. Never delete rates (audit trail). New rate closes previous via microsecond precision.

Atomicity

Task completion + usage_record creation + server release = single DB transaction. Rollback on failure.

Seed Rates

YOLO: 15 PLN/GPU-h, XGBoost: 5 PLN/GPU-h

usage_records ├─ task_id (unique), tenant_id, model_id ├─ gpu_count, started_at, completed_at ├─ duration_seconds, gpu_seconds ├─ rate_applied, cost_pln └─ partial (false for MVP)

Live — updates every 30s

Total Revenue (Feb)

PLN

+18% vs Jan

GPU Hours Sold

+24% vs Jan

Utilization

AI Tasks Processed

Last 30 days

Revenue Timeline

GPU Hours (This Week)

Mining vs AI Compute — Monthly Comparison

MINING

4,100 PLN

~6.2 servers avg

52% of revenue

AI COMPUTE

3,800 PLN

~3.0 servers avg

48% of revenue

ROI DELTA

+38%

AI revenue per GPU is 38% higher than mining per GPU

AI compute is more profitable per server

Under the Hood: Metrics & Monitoring 06-investor-dashboard.md

Metrics Cache

daily_metrics_cache pre-computed by CF Cron at 00:05 UTC daily. Avoids expensive joins on dashboard.

Refresh Rate

Servers: 30s, Charts: 60s. Dashboard polls, not SSE (simpler for investor view).

Comparison Logic

Revenue per GPU-hour: AI vs mining, weighted by server count in each mode. ROI delta = (ai_rev_per_gpu - mining_rev_per_gpu) / mining_rev_per_gpu.

GET/investor/metrics?period=daily|weekly|monthly

GET/investor/comparison?period=monthly

GET/investor/revenue?date_from=X&date_to=Y → CSV

daily_metrics_cache ├─ date, gpu_hours, revenue_pln, task_count ├─ online_gpu_seconds, utilization_pct └─ computed_at (CF Cron: 00:05 UTC)

Online

Offline

Switching

Maintenance

Mining AI Compute

Under the Hood: GPU Infrastructure & Mode Switch 01-gpu-infrastructure.md + 08-provisioning.md

Agent

Node.js daemon per server. Heartbeat every 30s (GPU metrics, mode, bandwidth). Token auth 90d rotation.

Registration

POST /agent/register with X-Install-Token → agentToken + serverId. Auto re-register on expiry.

Network

Pull-based, no inbound ports. 10 servers behind NAT, different ISPs. Agent initiates ALL connections.

Hardware

Ubuntu 22.04, 6x RTX 5070 (12GB VRAM), driver 550+, Docker + NVIDIA Container Toolkit.

Mode switch lifecycle (mining → AI):

PUT /mode→ mode='switching'→ Heartbeat: desired_mode→ HiveOS: stop mining→ Wait GPU≈0%→ docker compose up→ Health check (2s x 30)→ mode='ai'

Rollback: Mode stuck >5min: CF Cron rolls back. Docker health fail (60s): agent reports switchFailed. Agent crash: on startup detects switching state, cleans up.

gpu_servers ├─ agent_token (90d expiry), agent_version ├─ name, location, gpu_count, gpu_model ├─ mode ('mining'|'ai'|'switching'), desired_mode ├─ status ('online'|'offline'|'maintenance') ├─ last_heartbeat, assigned_task_id └─ bandwidth_mbps gpu_metrics (append-only, 7d retention) ├─ server_id, gpu_index ├─ temperature_c, utilization_pct, vram_used_mb └─ power_draw_w, fan_speed_pct, recorded_at

Revenue Breakdown

Period	Mining Revenue	AI Revenue	Total	GPU Hours	Tasks
February 2025	4,100 PLN	3,800 PLN	7,900 PLN	1,192 h	847
January 2025	4,400 PLN	2,300 PLN	6,700 PLN	980 h	623
December 2024	4,600 PLN	1,200 PLN	5,800 PLN	820 h	412

Revenue Trend

+36%

3-month growth

3.2x

AI revenue growth

1,267 PLN/GPU

AI rev per server (Feb)

661 PLN/GPU

Mining rev per server (Feb)

AI Compute for
Manufacturing & R&D

Client Dashboard

Investor Dashboard

Expert Verification

Recent Uploads

Defect Detection

Composite Prediction

Task Submitted!

Task Details

Status Timeline

Analysis: TSK-001 — Defect Detection

Detection Heatmap — Sample Image

Defect Type Distribution

SHAP Feature Impact — Composite Prediction (TSK-002)

Optimization Recommendations

Usage History

Revenue Timeline

GPU Hours (This Week)

Mining vs AI Compute — Monthly Comparison

Revenue Breakdown

Revenue Trend

Verification Queue

Detection Results

Expert Verdict

AI Compute for Manufacturing & R&D

Client Dashboard

Investor Dashboard

Expert Verification

Recent Uploads

Defect Detection

Composite Prediction

Task Submitted!

Task Details

Status Timeline

Analysis: TSK-001 — Defect Detection

Detection Heatmap — Sample Image

Defect Type Distribution

SHAP Feature Impact — Composite Prediction (TSK-002)

Optimization Recommendations

Usage History

Revenue Timeline

GPU Hours (This Week)

Mining vs AI Compute — Monthly Comparison

Revenue Breakdown

Revenue Trend

Verification Queue

Detection Results

Expert Verdict

AI Compute for
Manufacturing & R&D