Production readiness is not about perfection; it’s about visibility, resilience, and fast recovery.
This guide walks through the essential practices for running Spring Boot applications reliably in production.
Introduction: Why Production Readiness Matters
In development, the primary question is: “Does it work?”
In production, the question becomes: “Can we keep it working under pressure?”
Production readiness is not just about clean code. It is about:
- Observability when things go wrong
- Safe configuration management
- Operational confidence at scale
- Faster incident response
Spring Boot provides excellent tooling out of the box. When used correctly, it enables applications that are observable, resilient, and maintainable.
This article focuses on four pillars of production readiness:
- Health checks
- Metrics
- Logging
- Configuration hygiene
1. Health Checks: Your Application’s Vital Signs
Health checks are the foundation of reliable production systems. They tell platforms, orchestrators, and humans whether your service is alive and ready to serve traffic.
Actuator: The Foundation
Spring Boot Actuator exposes production-grade endpoints for monitoring and management.
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
Expose only what you need:
management:
endpoints:
web:
exposure:
include: health,info,metrics
endpoint:
health:
show-details: always
probes:
enabled: true
💡 Tip: Never expose all actuator endpoints publicly in production.
Built-in Health Indicators
Spring Boot automatically checks:
- Database connectivity
- Disk space
- Messaging brokers
- Caches
These checks surface under /actuator/health.
Custom Health Indicators
For application-specific dependencies, create custom indicators:
@Component
public class DatabaseHealthIndicator implements HealthIndicator {
private final DataSource dataSource;
@Override
public Health health() {
try (Connection connection = dataSource.getConnection()) {
if (connection.isValid(1000)) {
return Health.up()
.withDetail("database", "Available")
.build();
}
} catch (SQLException e) {
return Health.down(e)
.withDetail("database", "Unavailable")
.build();
}
return Health.down().build();
}
}
🚨 Rule of thumb: A health check should fail fast and never block.
Liveness vs Readiness (Kubernetes)
For containerized workloads:
management:
health:
livenessstate:
enabled: true
readinessstate:
enabled: true
- Liveness: Should this container be restarted?
- Readiness: Should this instance receive traffic?
2. Metrics: Quantitative Insight into Behavior
Metrics help you answer questions like:
- Are users experiencing latency?
- Is memory usage trending upward?
- Did error rates spike after deployment?
Micrometer Integration
Spring Boot uses Micrometer as a vendor-neutral metrics facade.
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
Metrics You Should Always Track
System Metrics
- JVM memory & GC
- Thread usage
- CPU consumption
Application Metrics
- HTTP request count & latency
- Database connection pool usage
- Cache hit/miss ratios
Business Metrics
- Orders placed
- Payments processed
- Domain-specific KPIs
Custom Business Metrics
@Component
public class OrderMetrics {
private final Counter orderCounter;
private final Timer orderProcessingTimer;
public OrderMetrics(MeterRegistry meterRegistry) {
this.orderCounter = Counter.builder("orders.total")
.description("Total number of orders")
.register(meterRegistry);
this.orderProcessingTimer = Timer.builder("orders.processing.time")
.publishPercentiles(0.5, 0.95, 0.99)
.register(meterRegistry);
}
public void recordOrder(Order order) {
orderCounter.increment();
orderProcessingTimer.record(() -> processOrder(order));
}
}
📊 Best practice: Prefer histograms and percentiles over averages.
Prometheus Configuration
management:
endpoints:
web:
exposure:
include: prometheus
metrics:
distribution:
percentiles-histogram:
http.server.requests: true
tags:
application: ${spring.application.name}
environment: ${ENVIRONMENT:development}
3. Logging: Structured, Searchable, Actionable
Logs tell the story of what happened but metrics tell you how often.
Logback for Production
Use logback-spring.xml for environment-aware logging.
Key practices:
- JSON logs for aggregation systems
- Rolling files with size limits
- Environment metadata in every log
⚠️ Never log secrets, tokens, or personal data.
Structured Logging with MDC
@Component
public class RequestLoggingFilter extends OncePerRequestFilter {
@Override
protected void doFilterInternal(HttpServletRequest request,
HttpServletResponse response,
FilterChain filterChain)
throws ServletException, IOException {
MDC.put("requestId", UUID.randomUUID().toString());
MDC.put("clientIp", request.getRemoteAddr());
long start = System.currentTimeMillis();
try {
filterChain.doFilter(request, response);
} finally {
MDC.put("durationMs",
String.valueOf(System.currentTimeMillis() - start));
MDC.clear();
}
}
}
🔍 Why MDC matters: It enables per-request tracing across distributed systems.
Runtime Log Level Changes
Enable dynamic log level tuning:
management:
endpoint:
loggers:
enabled: true
This avoids redeployments during incidents.
4. Configuration Hygiene: Secure and Predictable
Externalize Everything
Never hardcode configuration values.
spring:
application:
name: order-service
profiles:
active: ${ENVIRONMENT:development}
Production profile:
spring:
config:
activate:
on-profile: production
datasource:
url: ${DATABASE_URL}
username: ${DATABASE_USERNAME}
password: ${DATABASE_PASSWORD}
Validate Configuration at Startup
@ConfigurationProperties(prefix = "app")
@Validated
@Component
public class ApplicationProperties {
@NotNull
private String externalServiceUrl;
@Min(1)
@Max(100)
private int maxConnections;
}
✅ Fail fast if configuration is invalid.
Secrets Management
Use dedicated systems like Vault or cloud-native secret stores.
🔐 Security rule: Applications should never manage secrets manually.
5. Running in Containers and Kubernetes
Docker Best Practices
- Use slim JRE images
- Run as non-root
- Tune JVM for containers
Kubernetes Probes
- Liveness:
/actuator/health/liveness - Readiness:
/actuator/health/readiness
🚀 Proper probes prevent cascading failures.
6. Monitoring and Alerting Strategy
Alerts That Actually Matter
- Error rate > 5%
- P99 latency above SLO
- Memory usage > 80%
- Instance health check failures
Dashboards Should Show
- Traffic, errors, latency
- JVM memory & GC
- Database pool usage
- Business KPIs
Conclusion: Production Excellence Is a Practice
Production readiness is not a checkbox; it's a continuous discipline.
Key takeaways:
- Start with Actuator and build observability early
- Prefer structured logs and meaningful metrics
- Validate and externalize all configuration
- Monitor what can fail and alert only when it matters
- Test your failure scenarios before users do
Spring Boot gives you the tools. Operational excellence comes from using them deliberately.