Best way to upload 30k videos?

Hi,
Today my public company broadcast around 30k videos hosted online on various pages, through a custom player based on video.js, embed in CMS (wordpress, drupal, typo3…) and LMS (Moodle), so in pages not always build the same way, and not always nicely recognized by the « import from url » feature i’v tested so far.

Right now, those videos are stored on the same location, and have some dependencies (in the same folder or subfolder) like :

  • video_example-1080.mp4
  • video_example-720.mp4
  • video_example-360.mp4
  • video_example.jpg (sometimes .png)
  • video_example.srt (sometimes .vtt)

So how would you proceed to upload / import a large amount of videos ? I could accept in the process to transcode it again in HLS from the 1080 resolution (and loose some quality by the way) if I can also retreive the subtitle and thumbnail.

Concerning the metadata (title, description, keywords) : at this time, thoses videos are indexed in another database, from where I can extract csv or xls data. So my dream would be a « bulk import » feature that would take input from this excel.

(By the way, having some sort of feature or guide on how to do this could benefit to other people, and help the community to move to this great open source solution.)

Best regards.

Hi @EricG ,

I already made some similar stuff, but i can’t find the code I used. And I don’t remember exactly how it was done.

It was probably a little script that took in entry the data to import, then used Peertube’s API to import. I remember it was possible to launch it by waves, to avoid doing all import in one shot.

If I had to wrote it today, here is how i would do.

First, extract video url and meta data. Either from the original database, or by exporting to csv.

For each video, create a JSON file (JSON for example, but can be any format that you can easily read later on). Place all this files in a todo folder.

Also create the following two folders: progress, done.

Write a script, that take as parameter a number of video to process.
The basic algorithm is:

  • if todo folder is empty, stop.
  • choose a file in the todo folder (the first one, or at random, as you want)
  • move it to progress
  • if the move fails, it means that another instance of the script is already processing this video. Skip. (using this trick, you can safely run multiple instances of the script)
  • use the Peertube API to import the video, and set metadata [1]
  • once it is done, move the file to the done folder
  • loop.

You can also add a log folder, and write logs for each video. For example, dumping the API output.

Then, you can launch this script by waves (for example 5 videos, then wait).
When you are sure everything is okay, you can increase the number of video to process. And add the script in a crontab, so it can import more video every X minutes. You can even start multiple scripts to parallelize. Be careful not to overload the servers.

At the end, if there is files in the progress folder, it means that something fails.

[1]: you can either use the import API, with the direct video url, and then use the edit API to set metadata, or if the computer that has access to the video file, you can use the upload API.

1 Like

Hi @JohnLivingston
Thanks for your thoughs about this process. It helps me a lot :slight_smile:

A few questions :

  1. if I choose to upload instead of import from url, I guess I could create another folder with each media files (eg. example1.mp4, example1.jpg, example1.vtt, example2.mp4, example3.mp4, and so on), and then specify in each json which media should be imported, right ?

  2. I guess the json contains tags for each fields of the upload process (eg. title, description, keywords, visibility, channel, language…), right ? And in addition tags where I specify the mp4 name and path, and so on for the subtitle and thumbnail ?

  3. at the end of the upload process, how could I get in return the peertube url of each video ?
    Because I need to send it back in my original database (via the API output ? in the log ?)

Thanks for your help. And have a nice summer :wink:

Please do share when you know how to do it. It certainly will be very helpful to others.

Just found this today: GitHub - alghanim/peertubeupload: upload script to upload media files to peertube

Maybe you want to check this out. :grinning:

2 Likes

Check the API documentation: PeerTube
The API returns some fields, including the video uuid and short uuid. You can then guess the video url, by concatening the uuid to something like https://yourinstance.tld/videos/watch/

For your other questions, it depends on how you want to export your current data. A developper should easily find a good way to handle it.

1 Like

Hi, we are still working on it, and I will post some update as soon as possible.
Eric.

1 Like