GPU-AS-A-SERVICE PLATFORM

AI Compute for
Manufacturing & R&D

Submit ML tasks to distributed GPU infrastructure. Defect detection, material property prediction — powered by 60 RTX 5070 GPUs across Poland. Mining when idle, AI on demand.

60
GPUs
10
Servers
2
AI Models
24/7
Monitoring

Client Dashboard

Upload data, create tasks, monitor real-time status, view results with heatmaps & SHAP analysis

Enter

Investor Dashboard

Monitor server fleet, toggle mining/AI, revenue analytics, ROI comparison

Enter

Expert Verification

Review AI results, approve/reject detections, provide domain expertise feedback

Enter
Cloudflare Edge Hono API Neon Postgres GPU Agents RTX 5070
System Architecture PROJECT_SPEC.md
CLOUDFLARE (Serverless Edge) ├─ user-application TanStack Start + React 19 SSR + shadcn/ui ├─ data-service Hono API on CF Workers (REST + SSE) ├─ Neon Postgres Drizzle ORM, 12+ tables ├─ CF R2 Object storage (presigned URLs, tenant-isolated) ├─ CF Queue ml-job-queue + parse-dispatch-queue + DLQ └─ CF Cron Heartbeat (60s), metrics (daily), DLQ drain (5min) | | HTTPS (outbound only, pull-based, no inbound ports) v GPU SERVERS (On-Premise, Poland) ├─ 10 servers x 6 RTX 5070 = 60 GPUs ├─ gpu-agent Node.js daemon (heartbeat, job poll, docker mgmt) ├─ Docker containers YOLO (GPU, port 5001) + XGBoost (CPU, port 5002) └─ HiveOS Client Mining orchestration (start/stop via API)
Frontend
TanStack Start, React 19, shadcn/ui, Tailwind v4
Backend
Hono on CF Workers, <30s CPU limit
Auth
Better Auth, email+password, role-based
Database
Neon Postgres + Drizzle ORM
Monorepo
pnpm workspaces: user-app, data-service, gpu-agent, data-ops
Design Docs
8 design docs, 13,000+ lines, 100+ API endpoints, all P0 issues resolved
Key decision: Pull-based job dispatch (works behind NAT without fixed IPs). Presigned URLs bypass CF Worker memory limits. Single task = 1 server, 1 GPU for MVP simplicity.
MVP timeline: 3 months. Month 1: foundation + infra. Month 2: models + billing. Month 3: dashboards + launch.
ML Compute Exchange — Interactive Prototype — All data is simulated

JK
Jan Kowalski
ID Model Status Created Files Duration Cost Server
Under the Hood: Task Pipeline 03-compute-agent.md
Dispatch
Cloudflare Queue ml-job-queue
Job Claim
SELECT FOR UPDATE SKIP LOCKED
Real-time
SSE via GET /tasks/:id/stream
Allocation
1 task = 1 server, 1 GPU (MVP)
API endpoints:
POST/tasks
GET/tasks?page=1&limit=20&status=running
GET/tasks/:id/stream (SSE)
POST/tasks/:id/cancel
Status flow:
queued assigned running completed
Atomicity: task completion + usage_record creation + server release in single DB transaction. Rollback if any step fails.

Recent Uploads

Under the Hood: Data Pipeline 02-data-pipeline.md
Upload Method
Presigned R2 URLs 60min TTL
Storage
Cloudflare R2, tenant-isolated prefixes
Parsing
Async via parse-dispatch-queue
Formats
JPG, PNG, CSV, PDF, DOCX, XML, ZIP
Flow:
Client POST /uploads/presign Direct PUT to R2 POST /uploads/confirm Queue: parse
Files never pass through CF Workers (bypass 128MB memory limit). Multipart presigned URLs for large files. R2 path: {tenant_id}/uploads/{upload_id}/{filename}
uploads ├─ id, tenant_id, r2_key ├─ status: uploaded → parsing → parsed → failed ├─ parsed_r2_key (chunks.json for docs, null for images) └─ chunk_count, mime_type, size_bytes

Defect Detection

YOLO v11 — Computer Vision

GPU required 15 PLN/GPU-h
Input: Product images (JPG/PNG)
Output: Bounding boxes, heatmaps, PDF

Composite Prediction

XGBoost Surrogate + SHAP

CPU only 5 PLN/GPU-h
Input: Material composition (CSV)
Output: Predictions + recommendations

Task Submitted!

Redirecting to task list...

Under the Hood: Model Serving 04-model-serving.md
YOLO v11
Docker: node:20-slim, GPU 1, port 5001
ONNX runtime, yolo11n.onnx weights
VRAM: ~4GB, inference: POST /predict
XGBoost Surrogate
Docker: FastAPI, CPU-only, port 5002
.ubj weight files + SHAP TreeExplainer
POST /predict + POST /recommend
Health checks: Every 30s on each container. Managed by gpu-agent via docker-compose.
Dynamic form: GET /models/:id/params returns field schema (type, min/max, defaults) per model.
models ├─ id, name, domain ('manufacturing'|'materials') ├─ base_model, quantization, vram_required_mb ├─ docker_image, serve_config (jsonb) └─ status: active | training | deprecated
Under the Hood: GPU Agent Job Execution 03-compute-agent.md
Pull-based architecture — agents initiate ALL connections (no inbound ports, works behind NAT):
Agent polls GET /agent/jobs/next Atomic claim Download from R2 Docker inference Upload result to R2 POST status
GET/agent/jobs/next → {taskId, modelId, inputR2Keys[], parameters}
GET/agent/jobs/:id/input-urls → presigned download URLs
POST/agent/jobs/:id/result-url → presigned upload URL
POST/agent/jobs/:id/status {status, resultR2Key}

Analysis: TSK-001 — Defect Detection

Images Analyzed
128
Defects Found
12
Pass Rate
90.6%
Avg Confidence
87%

Detection Heatmap — Sample Image

Defect Type Distribution

Most Common
Scratch (35%)
Highest Confidence
Scratch (94%)

SHAP Feature Impact — Composite Prediction (TSK-002)

XGBoost Surrogate
Optimization Recommendations
Increase cure_temp_c by 15°C
+12% predicted tensile strength
Increase fiber_volume to 62%
+8% predicted tensile strength
Switch resin_type to Epoxy-B
+5% predicted tensile strength
Current prediction 842 MPa
Optimized prediction 967 MPa (+14.8%)
Under the Hood: Result Generation 04-model-serving.md + 05-client-dashboard.md
YOLO Heatmap
Gaussian blobs at detection centers. Per-image + batch aggregate (spatial defect pattern). Canvas overlay with bounding box corners.
XGBoost SHAP
SHAP TreeExplainer → ranked parameter impact. Horizontal bar chart + optimization delta.
PDF Report
Auto-generated: summary stats, per-image detections, heatmaps, pass/fail classification, batch statistics.
Visualization
React 19 + recharts for charts. Canvas for image overlays. shadcn/ui components.
Fine-tuning: Min 500 labeled images (YOLO format: .txt with class_id center_x center_y width height). LoRA adapter merge → redeploy (post-MVP).
This Month
14.38 PLN
2 tasks completed
Total GPU Time
23m 20s
Across 6 tasks
Avg Cost/Task
2.40 PLN
Last 30 days

Usage History

Task Model GPU Time Rate Cost Date
Under the Hood: Billing Engine 07-billing.md
Metering Formula
duration = completed_at - started_at
gpu_seconds = duration * gpu_count
cost = (gpu_seconds / 3600) * rate
Rate Lookup
By model_id + effective time window. Never delete rates (audit trail). New rate closes previous via microsecond precision.
Atomicity
Task completion + usage_record creation + server release = single DB transaction. Rollback on failure.
Seed Rates
YOLO: 15 PLN/GPU-h, XGBoost: 5 PLN/GPU-h
usage_records ├─ task_id (unique), tenant_id, model_id ├─ gpu_count, started_at, completed_at ├─ duration_seconds, gpu_seconds ├─ rate_applied, cost_pln └─ partial (false for MVP)

Live — updates every 30s
Total Revenue (Feb)
PLN
+18% vs Jan
GPU Hours Sold
+24% vs Jan
Utilization
%
AI Tasks Processed
Last 30 days

Revenue Timeline

GPU Hours (This Week)

Mining vs AI Compute — Monthly Comparison

MINING
4,100 PLN
~6.2 servers avg
52% of revenue
AI COMPUTE
3,800 PLN
~3.0 servers avg
48% of revenue
ROI DELTA
+38%
AI revenue per GPU is 38% higher than mining per GPU
AI compute is more profitable per server
Under the Hood: Metrics & Monitoring 06-investor-dashboard.md
Metrics Cache
daily_metrics_cache pre-computed by CF Cron at 00:05 UTC daily. Avoids expensive joins on dashboard.
Refresh Rate
Servers: 30s, Charts: 60s. Dashboard polls, not SSE (simpler for investor view).
Comparison Logic
Revenue per GPU-hour: AI vs mining, weighted by server count in each mode. ROI delta = (ai_rev_per_gpu - mining_rev_per_gpu) / mining_rev_per_gpu.
GET/investor/metrics?period=daily|weekly|monthly
GET/investor/comparison?period=monthly
GET/investor/revenue?date_from=X&date_to=Y → CSV
daily_metrics_cache ├─ date, gpu_hours, revenue_pln, task_count ├─ online_gpu_seconds, utilization_pct └─ computed_at (CF Cron: 00:05 UTC)
Online
Offline
Switching
Maintenance
|
Mining AI Compute
Under the Hood: GPU Infrastructure & Mode Switch 01-gpu-infrastructure.md + 08-provisioning.md
Agent
Node.js daemon per server. Heartbeat every 30s (GPU metrics, mode, bandwidth). Token auth 90d rotation.
Registration
POST /agent/register with X-Install-Token → agentToken + serverId. Auto re-register on expiry.
Network
Pull-based, no inbound ports. 10 servers behind NAT, different ISPs. Agent initiates ALL connections.
Hardware
Ubuntu 22.04, 6x RTX 5070 (12GB VRAM), driver 550+, Docker + NVIDIA Container Toolkit.
Mode switch lifecycle (mining → AI):
PUT /mode mode='switching' Heartbeat: desired_mode HiveOS: stop mining Wait GPU≈0% docker compose up Health check (2s x 30) mode='ai'
Rollback: Mode stuck >5min: CF Cron rolls back. Docker health fail (60s): agent reports switchFailed. Agent crash: on startup detects switching state, cleans up.
gpu_servers ├─ agent_token (90d expiry), agent_version ├─ name, location, gpu_count, gpu_model ├─ mode ('mining'|'ai'|'switching'), desired_mode ├─ status ('online'|'offline'|'maintenance') ├─ last_heartbeat, assigned_task_id └─ bandwidth_mbps gpu_metrics (append-only, 7d retention) ├─ server_id, gpu_index ├─ temperature_c, utilization_pct, vram_used_mb └─ power_draw_w, fan_speed_pct, recorded_at

Revenue Breakdown

Period Mining Revenue AI Revenue Total GPU Hours Tasks
February 2025 4,100 PLN 3,800 PLN 7,900 PLN 1,192 h 847
January 2025 4,400 PLN 2,300 PLN 6,700 PLN 980 h 623
December 2024 4,600 PLN 1,200 PLN 5,800 PLN 820 h 412

Revenue Trend

+36%
3-month growth
3.2x
AI revenue growth
1,267 PLN/GPU
AI rev per server (Feb)
661 PLN/GPU
Mining rev per server (Feb)

Verification Queue

AW
Dr. Anna Wisniewska
Under the Hood: Human-in-the-Loop Verification 03-compute-agent.md
Workflow
Task completes → verification record created (status=pending) → expert reviews → accept/reject with comment → approved = billable
Roles
Better Auth role-based: client, expert, investor, admin. Session-based auth (email+password).
GET/verifications/pending (role=expert)
POST/verifications {taskId, verdict, comment}
Rejected results loop back to client. Approved results finalize the billing record. Post-MVP: partial billing for tasks running >60s even if failed.