Why is my Datadog bill so high?

Datadog charges per host, per custom metric, per log GB ingested, and per APM trace. Bills triple quickly because each new container, microservice, or custom metric adds incremental cost. Committed use discounts require upfront contracts that can lock you in even as usage changes.

How much does Datadog cost per host per month?

Datadog infrastructure monitoring costs approximately $15–$23 per host/month depending on commitment. Add APM ($31–$40/host/month), log management ($0.10/GB ingested + $0.05/GB indexed), and custom metrics ($0.05 per metric/month above the 100 per host included). A typical 50-host environment with moderate observability can easily reach $8,000–$15,000/month.

What is a cheap alternative to Datadog?

The most cost-effective alternative is the open source stack: Prometheus for metrics, Grafana for dashboards, and Loki for logs. Self-hosted, this stack can cost 80–95% less than Datadog for the same coverage — the main cost is engineering time to set up and maintain. Grafana Cloud also offers a generous free tier and much lower costs than Datadog at scale.

How do I reduce my monitoring costs?

Key strategies: (1) Reduce log volume via sampling and filtering at the agent level, (2) Reduce custom metrics by using histograms instead of individual gauges, (3) Right-size retention — most data past 30 days is rarely queried, (4) Use tiered storage for long-term retention, (5) Consider open source for infrastructure monitoring and paid tools only for APM.

What are the hidden costs of observability tools?

Hidden costs include: overage charges when you exceed committed usage, per-seat pricing for dashboards and alerts, data retention costs beyond 15 days, support tiers that cost extra, professional services for onboarding, and vendor lock-in migration costs when you eventually switch. These can add 30–60% to the base platform cost.

Updated April 2026

Open Source vs Paid Monitoring: The Real Total Cost of Ownership

TL;DR

Prometheus is free. Running it in production is not. Self-hosted monitoring for 100 hosts costs $2,000-8,000/mo in infrastructure plus 0.5-1 FTE in engineering time. That is $8,000-20,000/mo total. Datadog for the same scale: $5,000-15,000/mo. Grafana Cloud: $3,000-9,000/mo. The answer depends on your team's capacity and whether you have dedicated platform engineers.

The question "should we use Prometheus or Datadog?" appears on r/devops and Hacker News weekly. It is asked repeatedly because the answer is genuinely nuanced and depends on factors that most comparison articles ignore: engineering team capacity, Kubernetes expertise, willingness to maintain monitoring infrastructure, and the value of engineering time that could be spent on product development instead of monitoring tool maintenance. Every existing comparison online is written by a vendor: SigNoz promotes open source, Grafana Labs promotes Grafana Cloud, and Datadog promotes itself. This is the independent TCO analysis the industry has been missing.

This page provides a complete total cost of ownership comparison across three monitoring approaches: fully self-hosted open source (Prometheus, Grafana, Loki, Tempo), managed open source (Grafana Cloud), and fully commercial (Datadog). We include the costs that self-hosted advocates often understate (engineering time, on-call overhead, upgrade maintenance) and the costs that commercial advocates often inflate (the real price of vendor flexibility and competitive alternatives). Our goal is to help you make the right decision for your specific situation, not to advocate for any particular approach.

TCO Comparison: 100 Hosts

This table compares the three primary monitoring approaches for a standard 100-host deployment with APM on 50% of hosts, 100GB/day log ingest, and 15-day retention. Engineering costs assume a fully-loaded engineer cost of $12,500/month (approximately $150,000/year including benefits, equipment, and overhead). Setup costs are amortised over 12 months. The self-hosted option includes Prometheus for metrics, Grafana for dashboards, Loki for logs, and Tempo for distributed traces.

Cost Component	Self-Hosted OSS	Grafana Cloud	Datadog
Software License	$0	$3,000-9,000	$5,000-15,000
Infrastructure (compute + storage)	$2,000-5,000	$0 (included)	$0 (included)
Setup Engineering (amortised/mo)	$780	$104	$52
Ongoing Maintenance (FTE/mo)	$4,375	$625	$250
Training (amortised/mo)	$208	$83	$42
Total Monthly TCO	$7,363-10,363	$3,812-9,812	$5,344-15,344

Self-hosted maintenance assumes 0.35 FTE ongoing. Grafana Cloud assumes 0.05 FTE. Datadog assumes 0.02 FTE. Setup costs amortised over 12 months.

Self-Hosted Cost Deep Dive

Self-hosting a monitoring stack based on Prometheus, Grafana, Loki, and Tempo requires dedicated infrastructure and ongoing engineering investment. The software itself is completely free and open source under Apache 2.0 (Prometheus, Loki, Tempo) and AGPL-3.0 (Grafana) licenses. The costs come from three areas: the cloud infrastructure to run the monitoring systems, the engineering time to set up and maintain them, and the opportunity cost of engineering time spent on monitoring infrastructure rather than product development.

Infrastructure Requirements for 100 Hosts

A production-ready self-hosted monitoring stack for 100 hosts requires dedicated compute and storage resources that scale with the volume of telemetry data being collected. The infrastructure must be separate from the application workloads being monitored to ensure monitoring remains available during application incidents. For high availability, each component should run as a multi-replica deployment.

Prometheus / Mimir

Metric ingestion and storage. For 100 hosts generating ~500K active time series: 2-3 instances with 8 vCPU, 32GB RAM, 500GB SSD each. Mimir recommended over vanilla Prometheus for horizontal scaling. Monthly compute cost: $800-1,500. Storage cost (1-2TB/month): $200-400.

Grafana

Dashboard and alerting UI. Lightweight compared to data stores: 1-2 instances with 2 vCPU, 4GB RAM. Monthly compute cost: $80-150. Database for dashboard storage (PostgreSQL or SQLite): $50-100. This is the easiest component to self-host.

Loki

Log aggregation. For 100GB/day log ingest: 3-4 instances with 4 vCPU, 16GB RAM for ingesters, 2 instances for queriers. Object storage backend (S3/GCS) for log data: $200-500/month depending on retention. Monthly total: $600-1,200.

Tempo

Distributed trace storage. For APM-equivalent tracing on 50 hosts: 2-3 instances with 4 vCPU, 8GB RAM. Object storage backend: $100-300/month. Monthly total: $300-600. Tempo is the newest component and requires the most operational expertise.

Engineering Time Requirements

Engineering time is the most commonly underestimated cost of self-hosted monitoring. Initial setup for a production-ready Prometheus + Grafana + Loki + Tempo stack takes 4-8 weeks for a senior platform engineer, including high availability configuration, alerting rule migration, dashboard creation, and team training. Ongoing maintenance averages 0.25-0.5 FTE, covering Prometheus/Mimir upgrades (quarterly), capacity planning, storage management, alert rule tuning, dashboard maintenance, and troubleshooting data ingestion issues. This engineering time has an opportunity cost: every hour spent maintaining monitoring infrastructure is an hour not spent on product engineering. For companies where engineering capacity is the bottleneck, this opportunity cost can exceed the direct savings from avoiding vendor licensing.

Scale-Out Considerations

Vanilla Prometheus has a single-node scaling limit of approximately 10-15 million time series. Beyond this, you need a horizontally-scalable metric store like Mimir, Thanos, or VictoriaMetrics. Mimir is recommended as the most actively developed option with native Grafana Labs support. VictoriaMetrics is an excellent alternative that is more resource-efficient but has a smaller community. Thanos is well-established but has been largely superseded by Mimir for new deployments. Each of these adds operational complexity compared to single-node Prometheus, requiring distributed consensus, compaction management, and more sophisticated capacity planning.

When Each Approach Wins

Open Source Wins When...

+ You have a dedicated platform engineering team with Kubernetes expertise and capacity to maintain monitoring infrastructure
+ Your infrastructure is Kubernetes-native and your team already operates CNCF tooling (Helm, Kustomize, ArgoCD)
+ You have high data volumes where per-unit vendor pricing becomes prohibitively expensive (500+ hosts, 500GB+ logs/day)
+ You need full control over data residency, retention policies, and access controls for compliance reasons
+ Vendor lock-in is a strategic concern and you want to maintain the ability to switch components independently

Commercial Wins When...

+ You have a small team (fewer than 5 engineers) without dedicated platform engineering capacity
+ You need rapid time-to-value and cannot afford 4-8 weeks for monitoring infrastructure setup
+ You need extensive out-of-box integrations (Datadog offers 750+) without custom configuration
+ Your infrastructure is smaller (under 100 hosts) where vendor pricing is competitive with self-hosted TCO
+ You need enterprise support SLAs, SOC 2 compliance reports, and vendor-managed security patches

The Hybrid Approach: Best of Both Worlds

Many organisations adopt a hybrid approach that combines open-source metrics collection with commercial log management and APM. This is not a compromise but a pragmatic architecture that optimises cost per telemetry type. The most common hybrid pattern is Prometheus for metrics (where open source excels and per-host pricing is expensive), combined with Datadog or Grafana Cloud for APM and log management (where commercial tooling provides significantly better query performance and analysis capabilities than self-hosted alternatives).

The hybrid approach works because metrics and logs have fundamentally different cost profiles. Metrics are structured, compact, and well-handled by Prometheus at scale. Logs are unstructured, voluminous, and require sophisticated indexing and query engines that are difficult to self-host efficiently. APM traces require complex correlation and analysis that commercial tools handle better than self-hosted Jaeger or Tempo. By self-hosting the metric layer and using a managed service for logs and traces, you capture 60-70% of the potential self-hosting savings while avoiding the hardest operational challenges.

A typical hybrid stack for 100 hosts might cost $1,500-3,000/month for self-hosted Prometheus + Grafana (metrics and dashboards) plus $2,000-5,000/month for Grafana Cloud or Datadog (logs and APM only), totaling $3,500-8,000/month. This compares to $5,000-15,000/month for fully commercial or $7,000-10,000/month for fully self-hosted. The hybrid approach often represents the optimal cost-to-effort ratio for mid-market companies with moderate platform engineering capacity.

Migration Cost Analysis: Switching from Datadog to Self-Hosted

If you are currently on Datadog and considering a migration to self-hosted monitoring, the one-time migration cost is the critical factor that determines whether the switch is financially worthwhile. Migration is not just installing new software. It requires rewriting every dashboard, reconfiguring every alert, updating every runbook, retraining every engineer, and running parallel systems for validation. Based on typical mid-market deployments, the migration costs break down as follows.

Migration Task	Engineering Weeks	Estimated Cost
Infrastructure setup (Prometheus, Grafana, Loki, Tempo)	2-4	$7,500-15,000
Dashboard recreation (20-100 dashboards)	2-4	$7,500-15,000
Alert rule migration (50-500 rules)	1-2	$3,750-7,500
Runbook and documentation updates	1	$3,750
Team training and onboarding	1	$3,750
Parallel running and validation	4-8	$15,000-30,000
Total Migration Cost	11-20 weeks	$41,250-75,000

For the migration to break even within 12 months, you need monthly savings of at least $3,500-6,250 from the new platform versus Datadog. At 100 hosts, the typical monthly savings from switching to self-hosted is $2,000-8,000, meaning break-even ranges from 6 months (best case) to 24 months (worst case). The financial case for migration is strongest above 200 hosts where monthly savings exceed $5,000 and break-even occurs within 8-12 months.

Related Resources

Vendor Comparison

6 vendors compared across 3 scenarios

Cost Calculator

Model your specific infrastructure costs

Cost Reduction Guide

12 strategies before migrating

Grafana Cloud Pricing

The managed middle ground

Kubernetes Monitoring

Where OSS has the biggest advantage

Cost Benchmarks

Industry spend benchmarks

Frequently Asked Questions

Is Prometheus free?

Prometheus is 100% free and open source under the Apache 2.0 license. There is no paid tier, no premium features, and no usage limits in the software itself. However, running Prometheus in production requires cloud infrastructure (compute instances, storage volumes, networking) that costs $800-3,000/month for a 100-host deployment, plus engineering time for setup (4-8 weeks), ongoing maintenance (0.25-0.5 FTE), and the operational expertise to manage scaling, high availability, and retention. The total cost of ownership for a self-hosted Prometheus stack is typically $7,000-10,000/month at 100 hosts when engineering time is included, which is comparable to or slightly less than Datadog at the same scale.

Is open source monitoring really free?

Open source monitoring software is free. Operating it in production is not. The three hidden costs of self-hosted monitoring are: infrastructure costs ($2,000-5,000/month for 100 hosts on AWS/GCP), engineering time for setup and maintenance (0.5-1 FTE worth $6,000-12,500/month), and opportunity cost (engineering time spent on monitoring infrastructure instead of product development). At small scale (under 50 hosts), these costs often exceed what a commercial vendor would charge. At large scale (500+ hosts), self-hosted monitoring becomes significantly cheaper than commercial alternatives because infrastructure costs scale sub-linearly while vendor pricing scales linearly per host. The breakeven point is typically around 100-200 hosts.

What is the TCO of self-hosted monitoring?

The total cost of ownership of self-hosted monitoring (Prometheus + Grafana + Loki + Tempo) for a 100-host deployment is approximately $7,000-10,000 per month. This breaks down as: cloud infrastructure for monitoring servers and storage ($2,000-5,000), ongoing engineering maintenance at 0.35 FTE ($4,375), amortised setup costs ($780/month over 12 months), and training ($208/month amortised). At 500 hosts, the TCO increases to approximately $12,000-20,000/month because infrastructure costs grow sub-linearly. Compare this to Datadog at 500 hosts ($25,000-75,000/month) or Grafana Cloud at 500 hosts ($15,000-45,000/month) to understand the savings potential at larger scales.