Deploying to Production
Best practices for running Harpocrates in production environments with security, reliability, and performance in mind.
Managing API Keys Securely
Protect your API keys from unauthorized access:
- •Store keys in environment variables, never in code
- •Use a secrets manager (AWS Secrets Manager, HashiCorp Vault, etc.)
- •Rotate keys regularly (every 90 days recommended)
- •Use separate keys for development, staging, and production
- •Audit key usage through your dashboard
If a key is compromised, immediately revoke it from your dashboard and generate a new one. All requests with the old key will be rejected instantly.
Rate Limits
Harpocrates enforces rate limits to ensure fair usage and system stability:
Standard Tier
- 60 requests per minute
- 100,000 tokens per minute
- 500,000 requests per day
Enterprise Tier
- Custom rate limits
- Dedicated enclave capacity
- Priority support
// Implement exponential backoff for rate limits
async function inferWithRetry(input, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
return await client.infer({ model: "llm-secure-7b", input });
} catch (error) {
if (error instanceof RateLimitError && i < maxRetries - 1) {
const delay = Math.pow(2, i) * 1000; // Exponential backoff
await new Promise(resolve => setTimeout(resolve, delay));
} else {
throw error;
}
}
}
}Monitoring Usage
Track your API usage in real-time:
// Get usage statistics
const stats = await client.getUsageStats({
start_date: "2024-01-01",
end_date: "2024-01-31",
granularity: "day"
});
console.log("Total requests:", stats.total_requests);
console.log("Total tokens:", stats.total_tokens);
console.log("Total cost:", stats.total_cost_eth, "ETH");
console.log("Average latency:", stats.avg_latency_ms, "ms");
// Set up alerts
await client.createAlert({
type: "spending_threshold",
threshold_eth: "1.0",
notification_email: "ops@example.com"
});View detailed analytics and usage graphs in your dashboard.
Replay Protection / Nonce Handling
Harpocrates prevents replay attacks by automatically including nonces in encrypted requests:
// Nonces are handled automatically by the SDK
const encrypted = await client.encrypt("sensitive data");
// The encrypted payload includes a timestamp and random nonce
// When you make a request, it can only be used once
const result = await client.infer({
model: "llm-secure-7b",
input: encrypted
});
// Attempting to replay the same encrypted input will fail
try {
await client.infer({
model: "llm-secure-7b",
input: encrypted // Same encrypted payload
});
} catch (error) {
console.error("Replay detected:", error.message);
}Each encrypted payload is valid for 5 minutes and can only be used once. This prevents replay attacks while allowing reasonable clock skew.
Ensuring Cryptographic Verification
Always verify attestations in production to ensure your data was processed correctly inside a TEE:
async function secureInference(input) {
const encrypted = await client.encrypt(input);
const result = await client.infer({
model: "llm-secure-7b",
input: encrypted,
return_attestation: true
});
// CRITICAL: Verify the attestation
const verification = await client.verifyAttestation(result.attestation);
if (!verification.valid) {
throw new Error("Attestation verification failed! Computation may be compromised.");
}
if (!verification.enclave_verified) {
throw new Error("Enclave verification failed! Not a genuine TEE.");
}
// Only proceed if verification passed
const output = await client.decrypt(result.output);
return output;
}Never skip attestation verification in production. It's your guarantee that the computation happened inside a secure enclave.
Production Readiness Checklist
Before going live, ensure you've completed these steps:
API keys stored in environment variables or secrets manager
Separate keys for dev/staging/production
Rate limit handling with exponential backoff
Attestation verification enabled for all requests
Monitoring and alerting configured
Spending limits set appropriately
Error handling for all API calls
Logging configured (without logging sensitive data)
Load tested with expected traffic patterns
Incident response plan documented