Best way to upload 30k videos?

Hi,
Today my public company broadcast around 30k videos hosted online on various pages, through a custom player based on video.js, embed in CMS (wordpress, drupal, typo3…) and LMS (Moodle), so in pages not always build the same way, and not always nicely recognized by the « import from url » feature i’v tested so far.

Right now, those videos are stored on the same location, and have some dependencies (in the same folder or subfolder) like :

  • video_example-1080.mp4
  • video_example-720.mp4
  • video_example-360.mp4
  • video_example.jpg (sometimes .png)
  • video_example.srt (sometimes .vtt)

So how would you proceed to upload / import a large amount of videos ? I could accept in the process to transcode it again in HLS from the 1080 resolution (and loose some quality by the way) if I can also retreive the subtitle and thumbnail.

Concerning the metadata (title, description, keywords) : at this time, thoses videos are indexed in another database, from where I can extract csv or xls data. So my dream would be a « bulk import » feature that would take input from this excel.

(By the way, having some sort of feature or guide on how to do this could benefit to other people, and help the community to move to this great open source solution.)

Best regards.

1 « J'aime »

Hi @EricG ,

I already made some similar stuff, but i can’t find the code I used. And I don’t remember exactly how it was done.

It was probably a little script that took in entry the data to import, then used Peertube’s API to import. I remember it was possible to launch it by waves, to avoid doing all import in one shot.

If I had to wrote it today, here is how i would do.

First, extract video url and meta data. Either from the original database, or by exporting to csv.

For each video, create a JSON file (JSON for example, but can be any format that you can easily read later on). Place all this files in a todo folder.

Also create the following two folders: progress, done.

Write a script, that take as parameter a number of video to process.
The basic algorithm is:

  • if todo folder is empty, stop.
  • choose a file in the todo folder (the first one, or at random, as you want)
  • move it to progress
  • if the move fails, it means that another instance of the script is already processing this video. Skip. (using this trick, you can safely run multiple instances of the script)
  • use the Peertube API to import the video, and set metadata [1]
  • once it is done, move the file to the done folder
  • loop.

You can also add a log folder, and write logs for each video. For example, dumping the API output.

Then, you can launch this script by waves (for example 5 videos, then wait).
When you are sure everything is okay, you can increase the number of video to process. And add the script in a crontab, so it can import more video every X minutes. You can even start multiple scripts to parallelize. Be careful not to overload the servers.

At the end, if there is files in the progress folder, it means that something fails.

[1]: you can either use the import API, with the direct video url, and then use the edit API to set metadata, or if the computer that has access to the video file, you can use the upload API.

1 « J'aime »

Hi @JohnLivingston
Thanks for your thoughs about this process. It helps me a lot :slight_smile:

A few questions :

  1. if I choose to upload instead of import from url, I guess I could create another folder with each media files (eg. example1.mp4, example1.jpg, example1.vtt, example2.mp4, example3.mp4, and so on), and then specify in each json which media should be imported, right ?

  2. I guess the json contains tags for each fields of the upload process (eg. title, description, keywords, visibility, channel, language…), right ? And in addition tags where I specify the mp4 name and path, and so on for the subtitle and thumbnail ?

  3. at the end of the upload process, how could I get in return the peertube url of each video ?
    Because I need to send it back in my original database (via the API output ? in the log ?)

Thanks for your help. And have a nice summer :wink:

Please do share when you know how to do it. It certainly will be very helpful to others.

Just found this today: GitHub - alghanim/peertubeupload: upload script to upload media files to peertube

Maybe you want to check this out. :grinning:

2 « J'aime »

Check the API documentation: PeerTube
The API returns some fields, including the video uuid and short uuid. You can then guess the video url, by concatening the uuid to something like https://yourinstance.tld/videos/watch/

For your other questions, it depends on how you want to export your current data. A developper should easily find a good way to handle it.

1 « J'aime »

Hi, we are still working on it, and I will post some update as soon as possible.
Eric.

1 « J'aime »

Hi, I’m the dev working with @EricG to upload multiple files at once on our Peertube instance, starting for now with a batch of ~30 videos (+ posters and subs). I’m focusing on achieving this thru the CLI, and I may miss something but I can’t find with « peertube-cli upload -h » a way to upload a subtitle. -s « Video support text » seems to be a single string like « Please support this channel ». Are subs just some other files to add with -f?

Edit: if I’m not wrong, upload options don’t show any way to define playlist(s) for the uploaded video neither.

Hi Greg,
It seems the CLI has not all option. You should use direct REST API call. This is easy to automate, whatever the programming language (bash, node, …).
For the record, this is exactly what the CLI does: it embed some API calls.

The documentation is here:

https://docs.joinpeertube.org/api-rest-reference.html

You must first upload the video, then use this API to add captions:

https://docs.joinpeertube.org/api-rest-reference.html#tag/Video-Captions/operation/addVideoCaption

Here is the API to update a playlist:

https://docs.joinpeertube.org/api-rest-reference.html#tag/Video-Playlists/paths/~1api~1v1~1video-playlists~1{playlistId}/put

If you want an example on how you can do an API call in TypeScript, here is the code used by the CLI to create the upload request (other examples in the same file):

2 « J'aime »

Thanks @JohnLivingston, I’m checking now and it seems it’s going smoothly, albeit a bit slowly as I’m doing all in Bash with which I’m not fluent :smiley:

If you are not fluent with bash, don’t hesitate to write it in javascript, using NodeJS :slight_smile:

1 « J'aime »

Would you be ablet to record a video of how to do it? :smiley:

I have a lot videos to upload to my peertube too. I am using Object storage, then I will need to upload all the videos on S3, right? However, I notice the files I upload through peertube, is not just a video. The folder also contains m3u8, json, mp4.

Don’t upload them directly no S3, Peertube would transcode and re-upload them.

Use Peertube API to send them to Peertube. Peertube will do the rest.

1 « J'aime »

You’re right @JohnLivingston I gave up using Bash and turned to vanilla JS which is more in my line. I successfully began to manage my xlsx and video/poster/subs files w/o any API call for now. Of course I now need the API, but I fail at the very beginning… the authentication! Here’s a very bare JS script that fails:

const url = "http://p1.localhost:9001/api/v1/oauth-clients/local";
fetch(url)
    .then(response => {
        console.log("ok");
        console.log(response);
    })
    .catch(error => {
        console.log("ko");
        console.log(error);
    });

showing this error in my console:

ko
TypeError: fetch failed
    at Object.fetch (node:internal/deps/undici/undici:11457:11)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
  cause: Error: connect ECONNREFUSED ::1:9001
      at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1532:16) {
    errno: -111,
    code: 'ECONNREFUSED',
    syscall: 'connect',
    address: '::1',
    port: 9001
  }
}

It looks like it won’t send the json object with client_id and client_secret because… I’m not allowed to? While at the same time I can go to http://p1.localhost:9001/api/v1/oauth-clients/local in any browser session (private or not) and it shows that json. And with my Bash script it also worked thru curl. I guess I’m doing it the wrong way in JS?

Check this page:

You have to get a token by doing a first request, then use this token in every API call.

Yes, my url is precisely the one used to get client tokens, ending with « v1/oauth-clients/local » : http://p1.localhost:9001/api/v1/oauth-clients/local which indeed show tokens when paste in the browser but fails with ECONNREFUSED when called thru nodeJS. Or you mean there’s no way to get tokens thru JS?

Edit: I mean, tokens are sent with curl http://p1.localhost:9001/api/v1/oauth-clients/local as in the doc, I figured calling that url with a fetch would do the same?

Edit 2 : looks like the local url is the problem, not the API. Any localhost breaks, while I can fetch imdb.com and get a json :confused:

Sorry, i read too quickly.

Ok, i think i now…

You are using node 18? Starting with this version, « localhost » resolves in IPv6 first. So « localhost » = « ::1 » and not « 127.0.0.1 » as before.
I assume you are using a setup like the one i described on this forum (docker, with 2 federating instances).
Try to add this in your docker-compose:

p1.localhost:
  [...]
  ports:
      - "127.0.0.1:9001:9001"
      - "::1:9001:9001"  # <============ here

So that your container exposes the IPv6 localhost port.

Thanks John, yes that’s also what I figured reading http - ECONNREFUSED when making a request to localhost using fetch in Node.js - Stack Overflow and I tried fixing it as you showed but I still get the ECONNREFUSED error. No problem, I can use 127.0.01 and that should be fine. In case it helps here’s how to get API tokens in JS:

const url = "http://127.0.0.1:9001/api/v1/oauth-clients/local";
fetch(url)
    .then(response => {
        if (response.ok) {
            return response.json();
        } else {
            throw new Error("Couldn't get API tokens.");
        }
    })
    .then(tokens => {
        //Tokens here!
        console.log(tokens);
    })
    .catch(error => {
        console.log(error);
    });

127.0.0.1 should be fine, indeed.

Hi @JohnLivingston I got a pb sending data to the upload API with nodejs and the script here, which results in a 400 « Bad request » response:

export default class PeertubeApi {
    static url = "http://127.0.0.1:9001/api/v1/";

    static upload(accessToken, data) {
        const form = new FormData(),
            fileurl = data.paths.wip + data.filename;
        form.append("videofile", fileurl);
        form.append("channelId", 3);
        form.append("name", data.titre);
        fetch(this.url + "videos/upload", {
            method: "POST",
            headers: {
                "Authorization": "Bearer " + accessToken
            },
            body: form
        })
            .then(response => {
                if (response.ok) {
                    Log.debug(response.json());
                } else {
                    Log.debug(response);
                }
            });
    }
}

API works fine, I get client and user tokens, and I can get videos list thru API/videos. I guess authentication is fine too, as I get a 400 « Bad request » error with the header, and a 401 « Unauthorized » without it. So it seems the « body » part is the problem, but I’m not sure what to put here, as the doc only shows examples with curl and not nodejs. Here’s what the FormData object sent as the body looks like:

30/10/2023 10:21:03 [DEBUG]: FormData {
  [Symbol(state)]: [
    {
      name: 'videofile',
      value: '/mnt/c/xampp7215/htdocs/imedia/videos/peertube_migration/wip/aseu_actualite_badminton-20230831.mp4'
    },
    { name: 'channelId', value: '3' },
    { name: 'name', value: 'Tous au volant' }
  ]
}

EDIT: I could edit a video category using API/videos/{id} with same headers, so body is definitely the problem. Also I figured I didn’t have a channelId 3, so used 1 instead, with no success.