Best VPS for PrivateGPT in 2026
PrivateGPT made privacy-preserving document Q&A a one-command install and other projects have been chasing it ever since. Hosting it well in 2026 means deciding upfront whether you want CPU-only with remote model APIs, or local inference on a GPU. The split sets the budget and the provider list.
Hetzner CCX33 for the CPU path
Eight dedicated AMD vCPU, 16 GB RAM, 240 GB NVMe. Embeds 5000 pages in around 12 minutes and answers in 8 to 15 seconds with a Llama 3 8B quantized model.
Provision Hetzner CCX33 →The CPU vs GPU decision
PrivateGPT has two well-trodden hosting patterns, with different price and provider implications:
- CPU-only with remote LLM. Document ingestion runs locally, the answer generation goes to OpenAI, Anthropic, or a remote vLLM. Cheapest, fastest answers, weakest privacy story since prompts cross the network.
- Full local inference on GPU. Everything stays on the box. Requires GPU rental from Vast.ai, RunPod, or a dedicated GPU server. 5 to 20x the cost, ironclad privacy.
If privacy is the reason you self-host, the GPU path is the honest answer. Otherwise the CPU box with a remote model is the pragmatic pick.
Server requirements by path
| Resource | CPU + remote LLM | Local 8B model | Local 70B model |
|---|---|---|---|
| RAM | 16 GB | 32 GB | 96 GB |
| CPU | 8 vCPU | 8 vCPU | 16+ vCPU |
| GPU | None | RTX 4090 24 GB | 2x A6000 48 GB |
| Storage | 240 GB NVMe | 500 GB NVMe | 1 TB NVMe |
Top 5 VPS providers for PrivateGPT
Pros
- Unbeatable price-to-performance ratio
- European data centers with strong privacy
- NVMe storage on all plans
Cons
- No US data centers
- Control panel less polished than competitors
All Hetzner Plans
| Plan | CPU | RAM | Storage | Price | |
|---|---|---|---|---|---|
| CX22 | 2 vCPU | 4 GB | 40 GB NVMe | $4.15/mo | Get Plan → |
| CX32 | 4 vCPU | 8 GB | 80 GB NVMe | $7.49/mo | Get Plan → |
| CX42 | 8 vCPU | 16 GB | 160 GB NVMe | $14.49/mo | Get Plan → |
| CX52 | 16 vCPU | 32 GB | 320 GB NVMe | $28.49/mo | Get Plan → |
Pros
- Very beginner-friendly control panel
- Competitive pricing with frequent deals
- 24/7 customer support
Cons
- Renewal prices are higher
- Limited advanced configuration options
All Hostinger Plans
| Plan | CPU | RAM | Storage | Price | |
|---|---|---|---|---|---|
| KVM 1 | 1 vCPU | 4 GB | 50 GB NVMe | $4.99/mo | Get Plan → |
| KVM 2 | 2 vCPU | 8 GB | 100 GB NVMe | $6.99/mo | Get Plan → |
| KVM 4 | 4 vCPU | 16 GB | 200 GB NVMe | $12.99/mo | Get Plan → |
| KVM 8 | 8 vCPU | 32 GB | 400 GB NVMe | $19.99/mo | Get Plan → |
Pros
- Cheapest GPU cloud available
- Wide selection of GPU models
- Pay-per-hour with no commitment
Cons
- Availability varies by GPU model
- Less polished user experience
All Vast.ai Plans
| Plan | CPU | RAM | Storage | Price | |
|---|---|---|---|---|---|
| RTX 3090 | 4-8 vCPU | 16-32 GB | 50-200 GB | From $0.15/hr | Get Plan → |
| RTX 4090 | 4-16 vCPU | 32-64 GB | 100-500 GB | From $0.30/hr | Get Plan → |
| A100 40GB | 8-16 vCPU | 64-128 GB | 200-1000 GB | From $0.80/hr | Get Plan → |
| H100 80GB | 16-32 vCPU | 128-256 GB | 500-2000 GB | From $2.00/hr | Get Plan → |
Provider notes
Hetzner CCX33. The CPU path winner by a clear margin. The AMD cores chew through embeddings, the NVMe handles ChromaDB writes, and the price is half the equivalent at DigitalOcean.
Contabo VPS L. Budget pick if your corpus is big. 400 GB NVMe at 9.50 USD for 16 GB RAM is the cheapest serious tier on the market. Slower IO, longer support replies, otherwise fine.
Hostinger Cloud Enterprise. Eight cores at this price tier is competitive. The biggest selling point is the included weekly backups, which save you from your own ingestion mistakes.
OVHcloud VPS Elite. European data residency for regulated industries. Four cores rather than eight, so the embedding phase takes nearly twice as long. Acceptable trade for compliance reasons.
Vast.ai. The GPU path. Rent a 4090 by the hour starting around 0.30 USD, run PrivateGPT in a container, tear it down when idle. The most cost-effective way to get local inference without committing to a dedicated GPU server.
Setup checklist
1. Use the make install path
The official Makefile pins dependencies that have caused breakage when installed manually. Run make install, accept the long initial dependency resolution, and you skip a class of setup pain.
2. Move ChromaDB to its own volume
Once the vector store reaches several GB, backing it up separately is much easier if it lives on a dedicated volume. Hetzner volumes are cheap and detach-attachable between servers.
3. Add HTTPS and auth before exposing the UI
Gradio out of the box has no auth. Put Caddy in front with HTTP basic auth at minimum, Authentik or Pocket ID if you have more than two users. Anything else is asking for trouble.
Frequently Asked Questions
Do I really need a GPU for PrivateGPT?
Only if you want low-latency answers from a locally-hosted model. CPU mode works perfectly for embedding documents and querying small models, just expect 10 to 30 seconds per answer. With a GPU, the same query returns in 1 to 3 seconds.
What is the cheapest serious PrivateGPT setup?
16 GB RAM and 8 vCPU on Hetzner CCX23 with a quantized 8B model gives surprisingly usable performance. Around 32 USD per month with zero compromise on document privacy.
How big can the document corpus get?
Tens of gigabytes is fine. Past 50 GB of source documents, plan for ChromaDB to take 5 to 10 GB of disk for the vector index and tune the ingestion batch size to avoid blowing through RAM.
Can I share PrivateGPT with my team?
Yes. The Gradio UI handles concurrent users, the API does too. Add a reverse proxy with HTTP basic auth or Authentik in front. Do not expose the default Gradio port to the public internet, it has no auth.
How is PrivateGPT different from Khoj?
PrivateGPT is laser-focused on document RAG with a strong privacy story. Khoj does similar plus agents, plugins, and conversational memory. Pick PrivateGPT if you only want strict document Q&A with no extras.