Using Colab to achieve AI singing.

!!! Explanation:

This tutorial is only for AI learning and communication.
To avoid infringement, all data and models in this project need to be resolved independently.
Technology itself is neither good nor bad, but it is prohibited to use technology for illegal activities.

Intro#

I believe that in the past few days, you must have been flooded with videos of "obscure singers" on various video platforms, such as Sun Yanzi. AI has synthesized the realistic voice of Sun Yanzi and used it to sing other songs. Yes, the distorted version of "Wumeizi Sauce" above is achieved using this technology. Today, we will briefly introduce the technology behind it and teach you how to use Colab to achieve singing covers.

Project Introduction#

"Sovits" (So-vits-svc) is an open-source and free AI voice conversion software developed by Rcell, a Chinese amateur voice synthesis enthusiast, based on a series of projects such as VITS, soft-vc, and VISinger2. It can reproduce the timbre of voices and can be simply understood as a powerful voice changer.

Introduction to Colab#

Why use Colab#

Friends with good computer performance can train on their own computers (must have an N card). My computer is a lightweight laptop and cannot run this project, so I use Google Colab for the singing demonstration.

What is Colab#

In simple terms, Colab is an online computing power platform provided by Google for developers. People like me who have computing power requirements for learning but cannot meet them with personal computers can use Colab.

Colab can be used for free or paid. The free version has slightly lower performance, and the paid version is billed based on computing power, but it is not expensive. I used it to run "stable diffusion" before, but later there were too many freeloaders, so Google banned free users from running stable diffusion on Colab. So I don't know when Google will restrict the use of Colab for singing training.

Data and Model Preparation#

In addition to AI singing covers, this project can also make AI repeat your words, just like a voice changer. You can train your own models. Here, I will only teach you how to use the model for singing covers, using Li Ronghao's "Wumeizi Sauce" as an example.

First, you need to prepare the song you want to cover. Since this project is only for singing covers, you need to separate the vocals. You can use this online tool for separation.

Download the separated vocals and background music. Now you only need to use the vocals for singing covers. After the cover is done, you can combine the vocals and background music.

A song is usually three to four minutes long, and the graphics card performance is often insufficient. Therefore, you need to slice the vocals into segments, each controlled within one minute, and train them separately. Finally, combine them together.

Download the pre-trained voice model of Sun Yanzi

Vocal Separation

Please download the data and models yourself.

Okay, now let's start learning how to use Colab for AI singing covers.

Open the Project#

First, open the GitHub project address, go to the bottom and find "Colab notebook scripts", click on the arrow pointing to the cover link, the other one is the training project.

Project Address

You can see that the Colab notebook page is similar to Jupyter that I mentioned before, it's actually the same. Since this is someone else's notebook, we need to click "Save a copy in Drive" to save it to our own Google Drive.

Save a Copy

Save a Copy 2

Configuration#

After saving, we need to check if our project is running on a GPU. You can click "Connect" to run it, which is equivalent to running a server, or you can directly click the first "Run". The "Tesla T4" here is the model of the GPU, it could be another graphics card model, which is automatically assigned by Google based on the current computing power requirements.

GPU Check

GPU

Next, run these two configuration codes one by one. The free machine runs slowly, so you need to be patient and wait for "Setup 1" to complete before running "Setup 2". After that, continue to run the code to download ContentVec and Hugging Face. You can see that the download speed is very fast.

ContentVec

After the HF model is downloaded, you can click on the list to download a specific model. I used the AI model of Sun Yanzi, so I need to upload other models.

Connect Google Drive and Upload Data#

Click the cloud drive button in the upper left corner, and you will be prompted to run the code to connect to the cloud drive. Follow the prompts to run and authorize. This connects this project to your cloud drive, not authorizing third parties, so you can use it with confidence.

Next, open your own Google Drive, upload your own models, and then click the share button to open the permissions, allowing anyone who receives this sharing link to use it. Copy the sharing link and fill it in the box below, then run it to automatically download the model.

Sharing Link

Sharing Link 2

Then run the unzip program below to unzip our model.

Unzip Model

Training#

Upload the sliced vocal audio files to the "raw" folder, set the parameters, and click "Convert" to start training.

Parameters

Start Training

Tips:

Make each audio segment as small as possible, no more than 1 minute, preferably around 40 seconds.

Only upload one audio segment at a time. After training is completed, upload the next segment for training.

Start with default parameters and adjust them slowly based on the training effect.

Summary#

Today, we briefly introduced the "Sovits" project and used Colab to sing covers of our favorite songs. You can try it yourself. If you are interested, you can use the GitHub project to train your favorite models. There are also tutorials on Bilibili, but be sure to avoid illegal activities and infringement. Just have fun and learn AI.