Models

Harpocrates offers a range of confidential AI models optimized for secure inference inside TEE enclaves.

Language Models

llm-secure-7b

Recommended

General-purpose language model optimized for confidential inference. 7B parameters, suitable for most text generation tasks.

Context Window:8,192 tokens
Cost:0.0001 ETH/token

llm-secure-13b

Larger model with improved reasoning and instruction-following capabilities. Best for complex tasks requiring deeper understanding.

Context Window:16,384 tokens
Cost:0.00025 ETH/token

llm-secure-fast

Optimized for low-latency inference. Smaller model (3B) with faster response times, ideal for real-time applications.

Context Window:4,096 tokens
Cost:0.00005 ETH/token

Embedding Models

embed-secure-base

Generate 768-dimensional embeddings for semantic search, clustering, and retrieval tasks while maintaining data confidentiality.

Dimensions:768
Cost:0.00002 ETH/token

embed-secure-large

Higher-dimensional embeddings (1536d) for improved accuracy in semantic search and similarity tasks.

Dimensions:1,536
Cost:0.00004 ETH/token

How to Choose a Model

Select the right model based on your use case:

  • General Use:Start with llm-secure-7b for the best balance of quality and cost
  • Complex Tasks:Use llm-secure-13b for reasoning, analysis, or long-form content
  • Real-Time:Choose llm-secure-fast for low-latency applications like chatbots
  • Search/RAG:Use embed-secure-base for semantic embeddings

Model-Specific Parameters

All language models support these parameters:

temperature0.0 - 2.0, default: 0.7

Controls randomness. Lower values make output more focused and deterministic.

max_tokens1 - context_length

Maximum number of tokens to generate in the response.

top_p0.0 - 1.0, default: 0.9

Nucleus sampling parameter. Alternative to temperature for controlling randomness.

stoparray of strings

Sequences where the model will stop generating tokens.