TLS¶
How certificates are issued, renewed, and consumed across the platform.
GKE / Kubernetes — cert-manager + Let's Encrypt¶
cert-manager v1.14.0 is installed via Argo CD from the Jetstack Helm chart. The Application manifest lives at k8s/envs/dev/sys/cert-manager/application.yaml:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: cert-manager
spec:
source:
chart: cert-manager
repoURL: https://charts.jetstack.io
targetRevision: v1.14.0
helm:
parameters:
- name: installCRDs
value: "true"
ClusterIssuer¶
A letsencrypt-prod ClusterIssuer is configured for HTTP-01 challenges via the Traefik ingress. The Ingress resource opts in to cert-manager via annotation:
# k8s/apps/tomoda/base/ingress.yaml
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
traefik.ingress.kubernetes.io/router.tls: "true"
Per-environment TLS secrets¶
Each ingress's spec.tls.secretName points at the K8s Secret cert-manager populates with the issued certificate.
| Environment | Hosts | TLS Secret |
|---|---|---|
| Dev | api-dev.tomoda.life, app-dev.tomoda.life |
tomoda-app-tls (namespace tomoda) |
| Prod | api.tomoda.life, app.tomoda.life |
tomoda-app-tls (namespace prod) |
cert-manager issues a single multi-SAN certificate per ingress.
Renewal¶
cert-manager auto-renews when a certificate is within 30 days of expiry. No human action required. To verify renewal health:
kubectl get certificate -A
# READY=True, AGE within 90 days
kubectl describe certificate tomoda-app-tls -n prod
# Status.Conditions[Ready].status: True
# Status.NotAfter: 90 days from issue
If a renewal fails (rate-limited, DNS misconfigured, Traefik unreachable for the HTTP-01 challenge), cert-manager events surface the reason:
kubectl get events -n prod --field-selector involvedObject.kind=Certificate
kubectl describe challenge -n prod
Manual issuance (forcing a new cert)¶
Rarely needed. To trigger immediately rather than wait for the renewal window:
kubectl delete certificate tomoda-app-tls -n prod
# cert-manager re-creates it from the ingress annotation within seconds
AWS — ACM for CloudFront¶
The static-assets CDN runs on CloudFront, fronting the tomoda-assets-{env} S3 bucket. CloudFront requires its TLS certificate in AWS Certificate Manager (ACM) in the us-east-1 region — this is a CloudFront constraint regardless of where the rest of the stack lives.
| Environment | CloudFront domain | ACM cert |
|---|---|---|
| Dev | assets-dev.tomoda.life |
ACM in us-east-1 |
| Prod | assets.tomoda.life |
ACM in us-east-1 |
ACM uses DNS-01 validation via Cloudflare. Terraform creates the ACM cert request and outputs the validation CNAME records, which are then placed in Cloudflare DNS. ACM auto-renews as long as the validation CNAMEs stay in place.
Verify ACM cert¶
aws acm list-certificates --region us-east-1 \
--query 'CertificateSummaryList[?DomainName==`assets.tomoda.life`]'
aws acm describe-certificate --region us-east-1 \
--certificate-arn <arn> \
--query 'Certificate.{Status:Status,NotAfter:NotAfter,Validation:DomainValidationOptions[*].ValidationStatus}'
If status is PENDING_VALIDATION, the CNAME records are missing or wrong in Cloudflare. Cross-check the validation records in the ACM console against Cloudflare DNS.
DNS¶
All tomoda.life records are in Cloudflare. Apex and most subdomains are CNAME / A records pointing at:
- GCP load balancer IP (api, app, argo-app, etc.) — proxied or not depending on whether Traefik does TLS termination
- CloudFront distribution (assets, assets-dev)
cert-manager's HTTP-01 challenges require the ingress host to resolve to the cluster — orange-cloud (proxied) records will break HTTP-01 unless Cloudflare is set to "Full (Strict)" mode and the origin already has a valid cert. If you bootstrap a new host, set it to DNS-only until the first certificate issues, then optionally proxy.
ACM's DNS-01 challenge is unaffected by Cloudflare proxy mode.
Trust chain summary¶
Browser
|
v
api.tomoda.life --(TLS via Let's Encrypt)--> Traefik --> backend (HTTP)
app.tomoda.life --(TLS via Let's Encrypt)--> Traefik --> frontend (HTTP)
assets.tomoda.life --(TLS via ACM)--> CloudFront --> S3 (HTTPS over AWS network)
Intra-cluster traffic from Traefik to the backend / frontend pods is plain HTTP, but it never leaves the cluster network. If you want pod-to-pod mTLS, that's a service-mesh decision (Linkerd, Istio) — not in scope today.