Best Practices for Production

tip

The easiest way to run SigNoz is to use SigNoz Cloud - no installation, maintenance, or scaling needed.

New users get 30 days of unlimited access to all features. Click here to sign up.

Best Practices to follow to run SigNoz in production

Create a separate cluster for running SigNoz. This will help in the isolation of application and APM environments and hence, reduce the impact radius of operational issues.
Use infra-level otel-collectors to send host metrics from VMs (should be part of default setup)
K8s Infra Metrics | SigNoz
Configure TTL for disk and use move to s3 for reduced costs. Perf of s3 is 2-3x slower than EBS. Configure retention for each of metrics, traces and logs.
Retention Period | SigNoz
Setup alerts on important APM metrics
Harness the power of distributed tracing data by creating dashboards using Clickhouse queries. You can run group by and aggregates on tags(attributeMap) and events of a span. Also, filtering by more specific conditions should be possible. Let us know if you would like us to help write a few queries to plot a chart using the traces data. Same also, applies for the logs data.
Secure query-service and otel-collector using TLS ingress
Secure SigNoz in Kubernetes using Ingress-NGINX and Cert-Manager | SigNoz
Authorise client otel-collectors to send data to signoz cluster (planned)
Horizontally scale otel-collector which works on the push model and not otel-collector-metrics which works on the pull model of prometheus scraping. You need to add a different config to add another instance of otel-collector-metrics to prevent duplication
Use higher batch size in otel-collector when ingesting more than 10K events/s. The default batch size is 10K rows. Batch size upto 50K should work well.
signoz/otel-collector-config.yaml at develop · SigNoz/signoz
Use sampling to reduce the amount of data sent to SigNoz
opentelemetry-collector-contrib/processor/probabilisticsamplerprocessor at main · open-telemetry/opentelemetry-collector-contrib