Compare commits

...

21 Commits

Author SHA1 Message Date
Thomas Gräfenstein
2281ebcb6d improved container restart alert 2026-03-22 23:56:27 +01:00
Thomas Gräfenstein
2942ff15bc remove unused /dev/kmsg device mount from cAdvisor (oom_event is disabled) 2026-03-22 23:34:29 +01:00
Thomas Gräfenstein
24e80de43c upgrade cAdvisor to v0.54.1 for Docker 29 containerd image store support 2026-03-22 23:30:24 +01:00
Thomas Gräfenstein
cfc8b61f98 connect cAdvisor to containerd socket for Docker 29 image store compatibility 2026-03-22 23:24:50 +01:00
Thomas Gräfenstein
b063128049 grant cAdvisor privileged access for cgroup v2 container discovery 2026-03-22 23:17:33 +01:00
Thomas Gräfenstein
a07adedd00 fix cAdvisor container discovery by mounting /sys and /var/lib/docker correctly 2026-03-22 23:14:32 +01:00
Thomas Gräfenstein
31705ad888 fix cAdvisor crash by removing unsupported accelerator metric group 2026-03-22 23:06:34 +01:00
Thomas Gräfenstein
b5c5c11114 ensure monitoring stack starts before all other services 2026-03-22 22:55:42 +01:00
Thomas Gräfenstein
926766346c add cAdvisor and document detailed alert queries in README
Add cAdvisor container to the monitoring stack for container-level
metrics. Configure Alloy to scrape cAdvisor. Expand the README
Recommended Alerts section with exact PromQL/LogQL queries, thresholds,
and Grafana alert rule configuration for all five alerts.
2026-03-22 22:51:22 +01:00
Thomas Gräfenstein
c736c23e9a enable NETWORKS in docker-socket-proxy for Alloy container discovery 2026-03-22 21:27:26 +01:00
Thomas Gräfenstein
a02f33e96e move text compression from Caddy to nginx for lower latency
Nginx is closer to the origin, so compressing there avoids an
extra hop. Removes the Caddy encode block for Nextcloud and adds
gzip in nginx with level 4 targeting text, CSS, JS, JSON, XML, SVG.
2026-03-22 21:08:40 +01:00
Thomas Gräfenstein
d62b627093 add .mjs MIME type to nginx to fix NS_ERROR_CORRUPTED_CONTENT
nginx doesn't know .mjs by default and serves it as
application/octet-stream, which breaks ES module loading
and causes Caddy compression mismatches.
2026-03-22 20:56:10 +01:00
Thomas Gräfenstein
fb1de4f079 limit Caddy compression to text content types to fix slow file downloads
Caddy was compressing all responses including binary file downloads
(PDFs, images, videos), which severely throttled download speed to
~130KB/s despite 30MB/s VPS bandwidth. Now only compresses text-based
types (HTML, CSS, JS, JSON, XML, SVG) where compression actually helps.
2026-03-22 20:26:03 +01:00
Thomas Gräfenstein
3bf80f6940 disable file compression temporary 2026-03-22 20:20:37 +01:00
Thomas Gräfenstein
1c2fb3c807 fix nginx redirect loop 2026-03-22 18:12:18 +01:00
Thomas Gräfenstein
b918e713e5 align nginx and Caddy config with official Nextcloud docs
Move security headers to Caddy (edge proxy), remove nginx gzip
(Caddy already compresses), add asset_immutable map for versioned
cache control, add missing static file extensions, fix .well-known
block, and hide X-Powered-By header.
2026-03-22 17:58:26 +01:00
Thomas Gräfenstein
ac3bff9351 fix nginx to fall through to PHP for dynamic assets like theming CSS
Static file locations were returning hard 404s instead of falling
through to PHP, which broke dynamically generated assets like
theming CSS files.
2026-03-22 17:49:45 +01:00
Thomas Gräfenstein
0088c11d5e enable Caddy response compression to fix slow page loads
Caddy was decompressing nginx's gzip responses and sending them
uncompressed to the browser, causing core-common.js (5.7MB) to
take 25s to download. Adding encode zstd gzip compresses it to
1.3MB at the edge.
2026-03-22 17:43:24 +01:00
Thomas Gräfenstein
4f3f4b0487 add swap check command before setup instructions 2026-03-22 17:33:11 +01:00
Thomas Gräfenstein
a51f86ea0a add swap setup instructions to README prerequisites 2026-03-22 17:32:48 +01:00
Thomas Gräfenstein
22198784d3 tune PHP and FPM for 1-core/3GB VPS performance
Reduce FPM workers from 12 to 5 max to stop memory thrashing on
a single-core VPS with 3GB RAM. Add OPcache and APCu tuning to
reduce filesystem stat calls and improve cache hit rates.
2026-03-22 17:31:14 +01:00
10 changed files with 202 additions and 33 deletions

112
README.md
View File

@@ -46,7 +46,7 @@ graph TB
## Prerequisites ## Prerequisites
- A VPS with SSH access - A VPS with SSH access (minimum 1 core, 3 GB RAM)
- Domain `t-gstone.de` with DNS control - Domain `t-gstone.de` with DNS control
- Git installed locally - Git installed locally
@@ -56,6 +56,24 @@ Check your VPS OS:
cat /etc/os-release cat /etc/os-release
``` ```
### Swap (recommended)
Check current memory and swap:
```bash
free -h
```
If swap shows `0B`, add a 2 GB swapfile to prevent OOM kills:
```bash
sudo fallocate -l 2G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
```
## DNS Setup ## DNS Setup
Create these A records pointing to your VPS IP: Create these A records pointing to your VPS IP:
@@ -312,15 +330,91 @@ or low cost, and Restic handles encryption + deduplication automatically. A cron
### Recommended Alerts ### Recommended Alerts
Set these up in Grafana Cloud UI (**Alerting** -> **Alert rules**): Set these up in Grafana Cloud UI (**Alerting** -> **Alert rules** -> **New alert rule**). Choose **Grafana-managed rule**
and select the appropriate data source (Prometheus or Loki).
| Alert | Condition | Severity | | Alert | Condition | Severity |
|----------------------|-----------------------------------------------------------------------|----------| |----------------------|--------------------------------------|----------|
| Disk usage high | `node_filesystem_avail_bytes` / `node_filesystem_size_bytes` < 0.2 | Critical | | Disk usage high | Available disk < 20% | Critical |
| Container restarting | Container restart count > 3 in 10 min | Warning | | Container restarting | Restart count > 3 in 10 min | Warning |
| High memory usage | `node_memory_MemAvailable_bytes` / `node_memory_MemTotal_bytes` < 0.1 | Warning | | High memory usage | Available memory < 10% | Warning |
| High CPU usage | `node_cpu_seconds_total` idle < 10% sustained 5 min | Warning | | High CPU usage | CPU usage > 90% sustained 5 min | Warning |
| Nextcloud cron stale | No log line from `nextcloud-cron` in 15 min | Warning | | Nextcloud cron stale | No cron log lines in 15 min | Warning |
#### Disk usage high
Fires when any filesystem drops below 20% free space.
- **Data source:** Prometheus
- **Query (A):**
```promql
node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} * 100
```
- **Expression (B):** Threshold — `A IS BELOW 20`
- **Evaluate every:** `1m`
- **Pending period (For):** `5m`
- **Labels:** `severity: critical`
#### Container restarting
Fires when any container restarts more than 3 times in 10 minutes, indicating a crash loop.
Detects both in-place restarts (`docker restart`) and ID-changing restarts (`docker compose down/up`).
Requires cAdvisor (included in the monitoring stack).
- **Data source:** Prometheus
- **Query (A):**
```promql
sum by (name) (changes(container_start_time_seconds{name!=""}[10m]))
+
count by (name) (count_over_time(container_start_time_seconds{name!=""}[10m])) - 1
```
- **Expression (B):** Threshold — `A IS ABOVE 3`
- **Evaluate every:** `1m`
- **Pending period (For):** `0s`
- **Labels:** `severity: warning`
#### High memory usage
Fires when available memory drops below 10% of total.
- **Data source:** Prometheus
- **Query (A):**
```promql
node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100
```
- **Expression (B):** Threshold — `A IS BELOW 10`
- **Evaluate every:** `1m`
- **Pending period (For):** `5m`
- **Labels:** `severity: warning`
#### High CPU usage
Fires when average CPU usage exceeds 90% for 5 minutes.
- **Data source:** Prometheus
- **Query (A):**
```promql
avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100
```
- **Expression (B):** Threshold — `A IS BELOW 10`
- **Evaluate every:** `1m`
- **Pending period (For):** `5m`
- **Labels:** `severity: warning`
#### Nextcloud cron stale
Fires when no log output from the `nextcloud-cron` container appears for 15 minutes, indicating background jobs have stopped.
- **Data source:** Loki
- **Query (A):**
```logql
count_over_time({container="/nextcloud-cron"}[15m])
```
- **Expression (B):** Threshold — `A IS BELOW 1`
- **Alert condition:** also trigger on **No Data**
- **Evaluate every:** `5m`
- **Pending period (For):** `0s`
- **Labels:** `severity: warning`
### Recommended Dashboards ### Recommended Dashboards

View File

@@ -12,6 +12,11 @@ nextcloud.t-gstone.de {
reverse_proxy nextcloud-nginx:80 reverse_proxy nextcloud-nginx:80
header Strict-Transport-Security "max-age=31536000; includeSubDomains; preload" header Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
header Referrer-Policy "no-referrer"
header X-Content-Type-Options "nosniff"
header X-Frame-Options "SAMEORIGIN"
header X-Permitted-Cross-Domain-Policies "none"
header X-Robots-Tag "noindex, nofollow"
request_body { request_body {
max_size 10G max_size 10G

View File

@@ -3,6 +3,8 @@ services:
image: caddy:2-alpine image: caddy:2-alpine
container_name: caddy container_name: caddy
restart: unless-stopped restart: unless-stopped
depends_on:
- alloy
ports: ports:
- "80:80" - "80:80"
- "443:443" - "443:443"

View File

@@ -3,6 +3,8 @@ services:
image: gitea/gitea:1.25.5-rootless image: gitea/gitea:1.25.5-rootless
container_name: gitea container_name: gitea
restart: unless-stopped restart: unless-stopped
depends_on:
- alloy
env_file: .env env_file: .env
volumes: volumes:
- ${DATA_ROOT}/gitea/data:/var/lib/gitea - ${DATA_ROOT}/gitea/data:/var/lib/gitea

View File

@@ -54,6 +54,18 @@ prometheus.scrape "node" {
scrape_interval = "60s" scrape_interval = "60s"
} }
// ============================================================
// cAdvisor container metrics -> Grafana Cloud Prometheus
// ============================================================
prometheus.scrape "cadvisor" {
targets = [{"__address__" = "cadvisor:8080"}]
forward_to = [prometheus.remote_write.grafana_cloud.receiver]
scrape_interval = "60s"
metrics_path = "/metrics"
}
prometheus.remote_write "grafana_cloud" { prometheus.remote_write "grafana_cloud" {
endpoint { endpoint {
url = env("GRAFANA_CLOUD_PROMETHEUS_URL") url = env("GRAFANA_CLOUD_PROMETHEUS_URL")

View File

@@ -16,7 +16,7 @@ services:
- EXEC=0 - EXEC=0
- IMAGES=0 - IMAGES=0
- INFO=0 - INFO=0
- NETWORKS=0 - NETWORKS=1
- NODES=0 - NODES=0
- PLUGINS=0 - PLUGINS=0
- SERVICES=0 - SERVICES=0
@@ -33,6 +33,29 @@ services:
max-size: "10m" max-size: "10m"
max-file: "3" max-file: "3"
cadvisor:
image: gcr.io/cadvisor/cadvisor:v0.54.1
container_name: cadvisor
restart: unless-stopped
privileged: true
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- /run/containerd/containerd.sock:/run/containerd/containerd.sock:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
command:
- --docker_only=true
- --housekeeping_interval=30s
- --containerd=/run/containerd/containerd.sock
- --disable_metrics=cpu_topology,disk,diskIO,hugetlb,memory_numa,network,oom_event,percpu,perf_event,process,referenced_memory,resctrl,sched,tcp,udp
networks:
- monitoring
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"
alloy: alloy:
image: grafana/alloy:v1.14.1 image: grafana/alloy:v1.14.1
container_name: alloy container_name: alloy

View File

@@ -20,6 +20,7 @@ services:
- ./hooks/post-installation.sh:/docker-entrypoint-hooks.d/post-installation/post-installation.sh:ro - ./hooks/post-installation.sh:/docker-entrypoint-hooks.d/post-installation/post-installation.sh:ro
- ./hooks/post-upgrade.sh:/docker-entrypoint-hooks.d/post-upgrade/post-upgrade.sh:ro - ./hooks/post-upgrade.sh:/docker-entrypoint-hooks.d/post-upgrade/post-upgrade.sh:ro
- ./fpm-tuning.conf:/usr/local/etc/php-fpm.d/zz-tuning.conf:ro - ./fpm-tuning.conf:/usr/local/etc/php-fpm.d/zz-tuning.conf:ro
- ./php-tuning.ini:/usr/local/etc/php/conf.d/zz-tuning.ini:ro
networks: networks:
- nextcloud-internal - nextcloud-internal
logging: logging:
@@ -56,6 +57,8 @@ services:
image: postgres:17-alpine image: postgres:17-alpine
container_name: nextcloud-postgres container_name: nextcloud-postgres
restart: unless-stopped restart: unless-stopped
depends_on:
- alloy
env_file: .env env_file: .env
volumes: volumes:
- ${DATA_ROOT}/nextcloud/db:/var/lib/postgresql/data - ${DATA_ROOT}/nextcloud/db:/var/lib/postgresql/data
@@ -76,6 +79,8 @@ services:
image: redis:8-alpine image: redis:8-alpine
container_name: nextcloud-redis container_name: nextcloud-redis
restart: unless-stopped restart: unless-stopped
depends_on:
- alloy
command: redis-server --requirepass ${REDIS_PASSWORD} command: redis-server --requirepass ${REDIS_PASSWORD}
env_file: .env env_file: .env
networks: networks:

View File

@@ -1,7 +1,7 @@
[www] [www]
pm = dynamic pm = dynamic
pm.max_children = 12 pm.max_children = 5
pm.start_servers = 4 pm.start_servers = 2
pm.min_spare_servers = 2 pm.min_spare_servers = 1
pm.max_spare_servers = 6 pm.max_spare_servers = 3
pm.max_requests = 500 pm.max_requests = 500

View File

@@ -6,19 +6,30 @@ map $uri $nonce_uri {
default ""; default "";
} }
map $arg_v $asset_immutable {
"" "";
default ", immutable";
}
server { server {
listen 80; listen 80;
server_name _; server_name _;
client_max_body_size 10G; include mime.types;
client_body_timeout 300s; types {
fastcgi_buffers 64 4K; application/javascript mjs;
}
gzip on; gzip on;
gzip_vary on; gzip_vary on;
gzip_comp_level 4; gzip_comp_level 4;
gzip_min_length 256; gzip_min_length 256;
gzip_types application/javascript application/json text/css text/plain text/xml application/xml image/svg+xml; gzip_proxied any;
gzip_types text/plain text/css application/json application/javascript text/xml application/xml image/svg+xml;
client_max_body_size 10G;
client_body_timeout 300s;
fastcgi_buffers 64 4K;
root /var/www/html; root /var/www/html;
index index.php index.html /index.php$request_uri; index index.php index.html /index.php$request_uri;
@@ -27,27 +38,18 @@ server {
location ^~ /.well-known { location ^~ /.well-known {
location = /.well-known/carddav { return 301 /remote.php/dav/; } location = /.well-known/carddav { return 301 /remote.php/dav/; }
location = /.well-known/caldav { return 301 /remote.php/dav/; } location = /.well-known/caldav { return 301 /remote.php/dav/; }
location ^~ /.well-known { return 301 /index.php$uri; } location /.well-known/acme-challenge { try_files $uri $uri/ =404; }
location /.well-known/pki-validation { try_files $uri $uri/ =404; }
return 301 /index.php$request_uri;
} }
# Deny access to internal paths # Deny access to internal paths
location ~ ^/(?:build|tests|config|lib|3rdparty|templates|data)(?:$|/) { return 404; } location ~ ^/(?:build|tests|config|lib|3rdparty|templates|data)(?:$|/) { return 404; }
location ~ ^/(?:\.|autotest|occ|issue|indie|db_|console) { return 404; } location ~ ^/(?:\.|autotest|occ|issue|indie|db_|console) { return 404; }
# Serve static files directly — only if file exists on disk # PHP handling (must be before static file locations so that internal
location ~ \.(?:css|js|mjs|svg|gif|png|jpg|ico|wasm|tflite|map|ogg|flac)$ { # redirects like /index.php/apps/theming/theme/dark.css match here
try_files $uri =404; # instead of cycling back into the static file try_files)
expires 6M;
access_log off;
}
location ~ \.woff2?$ {
try_files $uri =404;
expires 7d;
access_log off;
}
# PHP handling
location ~ \.php(?:$|/) { location ~ \.php(?:$|/) {
fastcgi_split_path_info ^(.+?\.php)(/.*)$; fastcgi_split_path_info ^(.+?\.php)(/.*)$;
set $path_info $fastcgi_path_info; set $path_info $fastcgi_path_info;
@@ -60,10 +62,24 @@ server {
fastcgi_param front_controller_active true; fastcgi_param front_controller_active true;
fastcgi_pass php-handler; fastcgi_pass php-handler;
fastcgi_intercept_errors on; fastcgi_intercept_errors on;
fastcgi_hide_header X-Powered-By;
fastcgi_request_buffering off; fastcgi_request_buffering off;
fastcgi_max_temp_file_size 0; fastcgi_max_temp_file_size 0;
} }
# Serve static files directly, fall through to PHP for dynamic assets (e.g. theming)
location ~ \.(?:css|js|mjs|svg|gif|ico|jpg|png|webp|wasm|tflite|map|ogg|flac|mp4|webm)$ {
try_files $uri /index.php$request_uri;
add_header Cache-Control "public, max-age=15778463$asset_immutable";
access_log off;
}
location ~ \.woff2?$ {
try_files $uri /index.php$request_uri;
expires 7d;
access_log off;
}
# Default handler — route everything else through PHP front controller # Default handler — route everything else through PHP front controller
location / { location / {
rewrite ^ /index.php$request_uri last; rewrite ^ /index.php$request_uri last;

10
nextcloud/php-tuning.ini Normal file
View File

@@ -0,0 +1,10 @@
; OPcache tuning for Nextcloud
opcache.interned_strings_buffer=16
opcache.max_accelerated_files=10000
opcache.revalidate_freq=60
opcache.save_comments=1
opcache.enable_file_override=1
; APCu local cache
apc.shm_size=64M
apc.enable_cli=1