'Unknown runner job' on remote runner

Hi,
Peertube 5.1 rc1 runs in Ubuntu 22.04.2 LTS in VM (4 Gb RAM, 8 vCPU, AMD Ryzen 5 1600, no h/w transcoding) on Proxmox 7

Remote runners in Ubuntu 20.04.6 LTS in WSL2 on Windows 11 Pro (128 Gb RAM, 2xXeon E5-2680 v4, a lot of rather slow on Chinese MB from AliExpress)

remote runners sometimes finishes job correctly but most of time it got into

введите или вставьте сюда код
[16:30:26.540] ERROR (44701): Expected status 204, got 404.
The server responded: "Unknown runner job".
You may take a closer look at the logs. To see how to do so, check out this page: hxxps://github.com/Chocobozzz/PeerTube/blob/develop/support/doc/development/tests.md#debug-server-logs
    err: {
      "type": "Error",
      "message": "Expected status 204, got 404. \nThe server responded: \"Unknown runner job\".\nYou may take a closer look at the logs. To see how to do so, check out this page: hxxps://github.com/Chocobozzz/PeerTube/blob/develop/support/doc/development/tests.md#debug-server-logs",
      "stack":
          Error: Expected status 204, got 404.
          The server responded: "Unknown runner job".
          You may take a closer look at the logs. To see how to do so, check out this page: hxxps://github.com/Chocobozzz/PeerTube/blob/develop/support/doc/development/tests.md#debug-server-logs
              at buildRequest (/usr/lib/node_modules/@peertube/peertube-runner/dist/peertube-runner.js:84939:14)
              at makePostBodyRequest (/usr/lib/node_modules/@peertube/peertube-runner/dist/peertube-runner.js:84881:10)
              at RunnerJobsCommand.postBodyRequest (/usr/lib/node_modules/@peertube/peertube-runner/dist/peertube-runner.js:85017:12)
              at RunnerJobsCommand.abort (/usr/lib/node_modules/@peertube/peertube-runner/dist/peertube-runner.js:89771:17)
              at RunnerServer.<anonymous> (/usr/lib/node_modules/@peertube/peertube-runner/dist/peertube-runner.js:91703:35)
              at Generator.next (<anonymous>)
              at /usr/lib/node_modules/@peertube/peertube-runner/dist/peertube-runner.js:69:61
              at new Promise (<anonymous>)
          ----
              at /usr/lib/node_modules/@peertube/peertube-runner/dist/peertube-runner.js:84942:19
              at /usr/lib/node_modules/@peertube/peertube-runner/dist/peertube-runner.js:53819:17
              at Test._assertFunction (/usr/lib/node_modules/@peertube/peertube-runner/dist/peertube-runner.js:53805:17)
              at Test.assert (/usr/lib/node_modules/@peertube/peertube-runner/dist/peertube-runner.js:53695:27)
              at localAssert (/usr/lib/node_modules/@peertube/peertube-runner/dist/peertube-runner.js:53663:18)
              at /usr/lib/node_modules/@peertube/peertube-runner/dist/peertube-runner.js:53667:11
              at Request2.callback (/usr/lib/node_modules/@peertube/peertube-runner/dist/peertube-runner.js:53236:7)
              at /usr/lib/node_modules/@peertube/peertube-runner/dist/peertube-runner.js:53388:22
              at IncomingMessage.<anonymous> (/usr/lib/node_modules/@peertube/peertube-runner/dist/peertube-runner.js:52638:11)
              at IncomingMessage.emit (node:events:525:35)
      "res": {
        "req": {
          "method": "POST",
          "url": "hxxps://peertube_host_dns_name/api/v1/runners/jobs/80338fcc-8682-4519-bb4f-beaae5c96622/abort",
          "data": {
            "reason": "Runner stopped",
            "jobToken": "ptrjt-3bef1d5b-4c0a-4a5e-9960-27cf22ab44b0",
            "runnerToken": "ptrt-b0f1dea6-db62-4be1-a479-f806db9d88b6"
          },
          "headers": {
            "content-type": "application/json"
          }
        },
        "header": {
          "server": "nginx/1.18.0 (Ubuntu)",
          "date": "Thu, 08 Jun 2023 10:30:25 GMT",
          "content-type": "application/problem+json; charset=utf-8",
          "content-length": "114",
          "connection": "close",
          "x-powered-by": "PeerTube",
          "x-frame-options": "DENY",
          "tk": "N",
          "access-control-allow-origin": "*",
          "access-control-allow-credentials": "true",
          "access-control-expose-headers": "Retry-After",
          "x-ratelimit-limit": "50",
          "x-ratelimit-remaining": "49",
          "x-ratelimit-reset": "1686220230",
          "etag": "W/\"72-vuxNSGiwa/aWfj4b6CdmHYCi/mc\""
        },
        "status": 404,
        "text": "{\"type\":\"about:blank\",\"title\":\"Not Found\",\"detail\":\"Unknown runner job\",\"status\":404,\"error\":\"Unknown runner job\"}"
      }
    }

terminating and restarting runner could help but issue happen with new jobs
Peertube’s log have a lot of


info[08.06.2023, 16:24:45] Abort stalled runner job 80338fcc-8682-4519-bb4f-beaae5c96622 (vod-web-video-transcoding)
{
  "tags": [
    "runner",
    "80338fcc-8682-4519-bb4f-beaae5c96622",
    "vod-web-video-transcoding"
  ]
}
info[08.06.2023, 16:25:34] Remote runner riaru2wsl2_big has accepted job 80338fcc-8682-4519-bb4f-beaae5c96622 (vod-web-video-transcoding)
{
  "tags": [
    "api",
    "runner",
    "riaru2wsl2_big",
    "80338fcc-8682-4519-bb4f-beaae5c96622",
    "vod-web-video-transcoding"
  ]
}
info[08.06.2023, 16:25:35] Get max quality file of video 384b6d6a-a49c-4c05-b5df-fcd4e3114786 of job 80338fcc-8682-4519-bb4f-beaae5c96622 for runner riaru2wsl2_big
{
  "tags": [
    "api",
    "runner",
    "riaru2wsl2_big",
    1362,
    "vod-web-video-transcoding"
  ]
}
info[08.06.2023, 16:26:15] Abort stalled runner job 6c97d82d-6dd8-4cb5-b4c7-eb21f9420ff5 (vod-web-video-transcoding)
{
  "tags": [
    "runner",
    "6c97d82d-6dd8-4cb5-b4c7-eb21f9420ff5",
    "vod-web-video-transcoding"
  ]
}```

Network configuration is rather complex but Windows 11(and WSL2 vm) sees peertube host directrly.
contents of /etc/nginx/sites-enabled/peertube (based on hxxps://docs.joinpeertube.org/admin/)

Minimum Nginx version required: 1.13.0 (released Apr 25, 2017)

Please check your Nginx installation features the following modules via ‹ nginx -V ›:

STANDARD hxxp MODULES: Core, Proxy, Rewrite, Access, Gzip, Headers, hxxp/2, Log, Real IP, SSL, Thread Pool, Upstream, AIO Multithreading.

THIRD PARTY MODULES: None.

server {
listen 80;
listen [::]:80;
server_name [peertube_host_dns_name];

location /.well-known/acme-challenge/ {
default_type « text/plain »;
root /var/www/certbot;
}
location / { return 301 hxxps://$host$request_uri; }
}

upstream backend {
server 127.0.0.1:9000;
}

server {
listen 443 ssl hxxp2;
listen [::]:443 ssl hxxp2;
server_name [peertube_host_dns_name];

access_log /var/log/nginx/peertube.access.log combined buffer=10m flush=5m; # reduce I/0 with buffer=10m flush=5m
error_log /var/log/nginx/peertube.error.log;

Certificates

you need a certificate to run in production. see hxxps://letsencrypt.org/

ssl_certificate /etc/letsencrypt/live/peertube_host_dns_name/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/peertube_host_dns_name/privkey.pem;

location ^~ ‹ /.well-known/acme-challenge › {
default_type « text/plain »;
root /var/www/certbot;
}

Security hardening (as of Nov 15, 2020)

based on Mozilla Guideline v5.6

ssl_protocols TLSv1.2 TLSv1.3;
ssl_prefer_server_ciphers on;
ssl_ciphers ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256; # add ECDHE-RSA-AES256-SHA if you want compatibility with Android 4
ssl_session_timeout 1d; # defaults to 5m
ssl_session_cache shared:SSL:10m; # estimated to 40k sessions
ssl_session_tickets off;
ssl_stapling on;
ssl_stapling_verify on;

HSTS (hxxps://hstspreload.org), requires to be copied in ‹ location › sections that have add_header directives

#add_header Strict-Transport-Security « max-age=63072000; includeSubDomains »;

Application

location @api {
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;

client_max_body_size  100k; # default is 1M

proxy_connect_timeout 10m;
proxy_send_timeout    10m;
proxy_read_timeout    10m;
send_timeout          10m;

proxy_pass hxxp://backend;

}

location / {
try_files /dev/null @api;
}

location = /api/v1/videos/upload-resumable {
client_max_body_size 0;
proxy_request_buffering off;

try_files /dev/null @api;

}

location ~ ^/api/v1/videos/(upload|([^/]+/studio/edit))$ {
limit_except POST HEAD { deny all; }

# This is the maximum upload size, which roughly matches the maximum size of a video file.
# Note that temporary space is needed equal to the total size of all concurrent uploads.
# This data gets stored in /var/lib/nginx by default, so you may want to put this directory
# on a dedicated filesystem.
client_max_body_size                      12G; # default is 1M
add_header            X-File-Maximum-Size 8G always; # inform backend of the set value in bytes before mime-encoding (x * 1.4 >= client_max_body_size)

try_files /dev/null @api;

}

location ~ ^/api/v1/runners/jobs/[^/]+/(update|success)$ {
client_max_body_size 12G; # default is 1M
add_header X-File-Maximum-Size 8G always; # inform backend of the set value in bytes before mime-encoding (x * 1.4 >= client_max_body_size)

try_files /dev/null @api;

}

location ~ ^/api/v1/(videos|video-playlists|video-channels|users/me) {
client_max_body_size 6M; # default is 1M
add_header X-File-Maximum-Size 4M always; # inform backend of the set value in bytes before mime-encoding (x * 1.4 >= client_max_body_size)

try_files /dev/null @api;

}

Websocket

location @api_websocket {
proxy_hxxp_version 1.1;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Upgrade $hxxp_upgrade;
proxy_set_header Connection « upgrade »;

proxy_pass hxxp://backend;

}

location /socket.io {
try_files /dev/null @api_websocket;
}

location /tracker/socket {
# Peers send a message to the tracker every 15 minutes
# Don’t close the websocket before then
proxy_read_timeout 15m; # default is 60s

try_files /dev/null @api_websocket;

}

Plugin websocket routes

location ~ ^/plugins/[^/]+(/[^/]+)?/ws/ {
try_files /dev/null @api_websocket;
}

Performance optimizations

For extra performance please refer to hxxps://github.com/denji/nginx-tuning

root /var/www/peertube/storage;

Enable compression for JS/CSS/HTML, for improved client load times.

It might be nice to compress JSON/XML as returned by the API, but

leaving that out to protect against potential BREACH attack.

gzip on;
gzip_vary on;
gzip_types # text/html is always compressed by hxxpGzipModule
text/css
application/javascript
font/truetype
font/opentype
application/vnd.ms-fontobject
image/svg+xml;
gzip_min_length 1000; # default is 20 bytes
gzip_buffers 16 8k;
gzip_comp_level 2; # default is 1

client_body_timeout 30s; # default is 60
client_header_timeout 10s; # default is 60
send_timeout 10s; # default is 60
keepalive_timeout 10s; # default is 75
resolver_timeout 10s; # default is 30
reset_timedout_connection on;
proxy_ignore_client_abort on;

tcp_nopush on; # send headers in one piece
tcp_nodelay on; # don’t buffer data sent, good for small data bursts in real time

If you have a small /var/lib partition, it could be interesting to store temp nginx uploads in a different place

See hxxps://nginx.org/en/docs/hxxp/ngx_hxxp_core_module.html#client_body_temp_path

#client_body_temp_path /var/www/peertube/storage/nginx/;

Bypass PeerTube for performance reasons. Optional.

Should be consistent with client-overrides assets list in /server/controllers/client.ts

location ~ ^/client/(assets/images/(icons/icon-36x36.png|icons/icon-48x48.png|icons/icon-72x72.png|icons/icon-96x96.png|icons/icon-144x144.png|icons/icon-192x192.png|icons/icon-512x512.png|logo.svg|favicon.png|default-playlist.jpg|default-avatar-account.png|default-avatar-account-48x48.png|default-avatar-video-channel.png|default-avatar-video-channel-48x48.png))$ {
add_header Cache-Control « public, max-age=31536000, immutable »; # Cache 1 year

root /var/www/peertube;

try_files /storage/client-overrides/$1 /peertube-latest/client/dist/$1 @api;

}

Bypass PeerTube for performance reasons. Optional.

location ~ ^/client/(.*.(js|css|png|svg|woff2|otf|ttf|woff|eot))$ {
add_header Cache-Control « public, max-age=31536000, immutable »; # Cache 1 year

alias /var/www/peertube/peertube-latest/client/dist/$1;

}

Bypass PeerTube for performance reasons. Optional.

location ~ ^/static/(thumbnails|avatars)/ {
if ($request_method = ‹ OPTIONS ›) {
add_header Access-Control-Allow-Origin ‹ * ›;
add_header Access-Control-Allow-Methods ‹ GET, OPTIONS ›;
add_header Access-Control-Allow-Headers ‹ Range,DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type ›;
add_header Access-Control-Max-Age 1728000; # Preflight request can be cached 20 days
add_header Content-Type ‹ text/plain charset=UTF-8 ›;
add_header Content-Length 0;
return 204;
}

add_header Access-Control-Allow-Origin    '*';
add_header Access-Control-Allow-Methods   'GET, OPTIONS';
add_header Access-Control-Allow-Headers   'Range,DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type';
add_header Cache-Control                  "public, max-age=7200"; # Cache response 2 hours

rewrite ^/static/(.*)$ /$1 break;

try_files $uri @api;

}

location ~ ^(/static/(webseed|streaming-playlists)/private/)|^/download {
# We can’t rate limit a try_files directive, so we need to duplicate @api

proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host            $host;
proxy_set_header X-Real-IP       $remote_addr;

proxy_limit_rate 5M;

proxy_pass hxxp://backend;

}

Bypass PeerTube for performance reasons. Optional.

location ~ ^/static/(webseed|redundancy|streaming-playlists)/ {
limit_rate_after 5M;

# Clients usually have 4 simultaneous webseed connections, so the real limit is 3MB/s per client
set $peertube_limit_rate    800k;

# Increase rate limit in HLS mode, because we don't have multiple simultaneous connections
if ($request_uri ~ -fragmented.mp4$) {
  set $peertube_limit_rate  5M;
}

# Use this line with nginx >= 1.17.0
#limit_rate $peertube_limit_rate;
# Or this line if your nginx < 1.17.0
set $limit_rate $peertube_limit_rate;

if ($request_method = 'OPTIONS') {
  add_header Access-Control-Allow-Origin  '*';
  add_header Access-Control-Allow-Methods 'GET, OPTIONS';
  add_header Access-Control-Allow-Headers 'Range,DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type';
  add_header Access-Control-Max-Age       1728000; # Preflight request can be cached 20 days
  add_header Content-Type                 'text/plain charset=UTF-8';
  add_header Content-Length               0;
  return 204;
}

if ($request_method = 'GET') {
  add_header Access-Control-Allow-Origin  '*';
  add_header Access-Control-Allow-Methods 'GET, OPTIONS';
  add_header Access-Control-Allow-Headers 'Range,DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type';

  # Don't spam access log file with byte range requests
  access_log off;
}

# Enabling the sendfile directive eliminates the step of copying the data into the buffer
# and enables direct copying data from one file descriptor to another.
sendfile on;
sendfile_max_chunk 1M; # prevent one fast connection from entirely occupying the worker process. should be > 800k.
aio threads;

rewrite ^/static/webseed/(.*)$ /videos/$1 break;
rewrite ^/static/(.*)$         /$1        break;

try_files $uri @api;

}
}


changing priority in remote runner's config from 20 to 5 sometimes help, sometimes not

changed https to hxxps because forum doesn't think I should be allowed to post more than 2 links, changed dns name of server

Advice from NGINX tuning for best performance · GitHub (activation of BBR congestion control on peertube vm) helps a little.
Changing events section of nginx.conf to
worker_connections 4096;
multi_accept on;
use epoll;
also helps a little

instance do have ~60 follows and follows ~700 others itself.
top shows peertube’s cpu usage in 150-250% interval

It looks(?) like if runners gets error 404 for job update (because server aborted stalled job), it stuck forever with it and will be keep retrying on instead of taking new one. Is it possible to somehow configure runner to drop such job after several attempts?

Setting stalled_jobs: vod in peertube’s config to 10 minutes instead of 2 minutes appers to fix all issues I saw (even with long 4k videos).

Still looks like something is wrong with runner’s logic - why it can’t reliable report progress at shorter intervals?

Hi,

What is your peertube-runner version?

peertube-runner --version says it’s 0.0.3

Can you try with 0.0.4 with the default stalled_jobs value?

Tried with 0.0.4 and default settings. .It’s much better even with default nice 20, 2 threads, 2 jobs .
There are still some « abort stalled runner job » if server’s cpu load is high (I tried to get jobs for transcode by importing some clips) but it doesn’t appear to block everything forever now.

1 « J'aime »