Models

Harpocrates offers a range of confidential AI models optimized for secure inference inside TEE enclaves.

Language Models

Recommended

General-purpose language model optimized for confidential inference. 7B parameters, suitable for most text generation tasks.

Context Window:8,192 tokens

Cost:0.0001 ETH/token

Larger model with improved reasoning and instruction-following capabilities. Best for complex tasks requiring deeper understanding.

Context Window:16,384 tokens

Cost:0.00025 ETH/token

Optimized for low-latency inference. Smaller model (3B) with faster response times, ideal for real-time applications.

Context Window:4,096 tokens

Cost:0.00005 ETH/token

Generate 768-dimensional embeddings for semantic search, clustering, and retrieval tasks while maintaining data confidentiality.

Dimensions:768

Cost:0.00002 ETH/token

Higher-dimensional embeddings (1536d) for improved accuracy in semantic search and similarity tasks.

Dimensions:1,536

Cost:0.00004 ETH/token

Select the right model based on your use case:

General Use:Start with llm-secure-7b for the best balance of quality and cost
Complex Tasks:Use llm-secure-13b for reasoning, analysis, or long-form content
Real-Time:Choose llm-secure-fast for low-latency applications like chatbots
Search/RAG:Use embed-secure-base for semantic embeddings

All language models support these parameters:

temperature0.0 - 2.0, default: 0.7

Controls randomness. Lower values make output more focused and deterministic.

max_tokens1 - context_length

Maximum number of tokens to generate in the response.

top_p0.0 - 1.0, default: 0.9

Nucleus sampling parameter. Alternative to temperature for controlling randomness.

stoparray of strings

Sequences where the model will stop generating tokens.