MUSICAL REVERB CONVERSION OF MIXED VOCAL TRACKS

ABSTRACT

Reverb plays a critical role in music production, where it provides listeners with spatial realization, timbre, and texture of the music. Yet, it is challenging to reproduce the musical reverb of a reference music track even by skilled engineers. In response, we propose an end-to-end system capable of switching the musical reverb factor of two different mixed vocal tracks. This method enables us to apply the reverb of the reference track to the source track to which the effect is desired. Further, our model can perform de-reverberation when the reference track is used as a dry vocal source. The proposed model is trained in combination with an adversarial objective, which makes it possible to handle high-resolution audio samples. The perceptual evaluation confirmed that the proposed model can convert the reverb factor with the preferred rate of 64.8%. To the best of our knowledge, this is the first attempt to apply deep neural networks to converting music reverb of vocal tracks.

Evaluation

The quantitative evaluation includes two tasks;
Reverb Conversion: interchanging reverb of two different inputs. We evaluate our metrics with a comparison between target reverberated and interchanged samples. A higher value represents a better result in all the metrics used.

De-reverberation: eliminating reverb of the target input. Values of the x-axis below are the percentage of bus send ratio (γ) set for mixing source and reverb factor. The unit for STOI is percent(%), and SRMR and SI-SDR are in decibel (dB). A higher value represents a better result in all the metrics used.

The listening test was conducted with twenty participants. The participants were randomly given one of two different test sets with twenty-four questions each. For each question, three samples are presented - a reference sample, which is the output of the proposed model, with two different samples, which are an input of the model and ground truth of the reference sample (GT).

Below are visual examples of W→D and D→W samples.

Audio Samples

Reverb Conversion De-reverberation Conversion with Mixed Pop Songs

Results Reverb Conversion from the proposed model. All samples were generated from our validation dataset.
Four reverb presets were used in the validation dataset, where the details are as follows.

Preset	Plug-in	Compamy
Smooth Vocal	H-Reverb	Waves
Vocal Plate	Abbey Road Plates	Waves
Vocal Hall	ChromaVerb	Logic Pro-X
Vocal Chamber	ChromaVerb	Logic Pro-X

The samples used in this section were also used in the listening test.

Please use devices such as speakers, headphones, and earphones in a quiet environment to analyze the sound source.

No. / Δγ Reverb 1 (r₁) / γ Reverb 2 (r₂) / γ	Model Input		Model Output		Ground Truth
No. / Δγ Reverb 1 (r₁) / γ Reverb 2 (r₂) / γ	Source / Reverb	Audio Sample	Source / Reverb	Audio Sample	Source / Reverb	Audio Sample
#1 / 0% Smooth Vocal / 15% Vocal Plate / 15%	s_a / r₁		s_a / r₂		s_a / r₂
#1 / 0% Smooth Vocal / 15% Vocal Plate / 15%	s_b / r₂		s_b / r₁		s_b / r₁
#2 / 20% Vocal Plate / 5% Vocal Hall / 25%	s_a / r₁		s_a / r₂		s_a / r₂
#2 / 20% Vocal Plate / 5% Vocal Hall / 25%	s_b / r₂		s_b / r₁		s_b / r₁
#3 / 40% Vocal Hall / 5% Vocal Plate / 45%	s_a / r₁		s_a / r₂		s_a / r₂
#3 / 40% Vocal Hall / 5% Vocal Plate / 45%	s_b / r₂		s_b / r₁		s_b / r₁
#4 / 60% Smooth Vocal / 5% Vocal Chamber / 65%	s_a / r₁		s_a / r₂		s_a / r₂
#4 / 60% Smooth Vocal / 5% Vocal Chamber / 65%	s_b / r₂		s_b / r₁		s_b / r₁

Results of De-reverberation from the proposed model. All samples were generated from our validation dataset.
Four reverb presets were used in the validation dataset, where the details are as follows.

Preset	Plug-in	Compamy
Smooth Vocal	H-Reverb	Waves
Vocal Plate	Abbey Road Plates	Waves
Vocal Hall	ChromaVerb	Logic Pro-X
Vocal Chamber	ChromaVerb	Logic Pro-X

Please use devices such as speakers, headphones, and earphones in a quiet environment to analyze the sound source.

γ	Reverb	Model Input	Model Output	Ground Truth
10%	Smooth Vocal
10%	Vocal Plate
20%	Vocal Hall
20%	Vocal Chamber
30%	Smooth Vocal
30%	Vocal Hall
40%	Vocal Chamber
40%	Vocal Plate
50%	Vocal Plate
50%	Vocal Chamber
60%	Smooth Vocal
60%	Vocal Hall
70%	Vocal Plate
70%	Vocal Chamber

Reverb Conversion with pop songs and raw tracks. Reference track (a pop song) is de-reverberated, while the raw track is added with the reverb factor of the reference track.

Please use devices such as speakers, headphones, and earphones in a quiet environment to analyze the sound source.

Pop Song (ref.)		Model Input	Model Output
"The Scientist" by Coldplay	ref.
"The Scientist" by Coldplay	raw
"Yellow" by Coldplay	ref.
"Yellow" by Coldplay	raw
"Attention" by Charlie Puth	ref.
"Attention" by Charlie Puth	raw
"Attention" by Charlie Puth	ref.
"Attention" by Charlie Puth	raw
"Greedy" by Ariana Grande	ref.
"Greedy" by Ariana Grande	raw
"Greedy" by Ariana Grande	ref.
"Greedy" by Ariana Grande	raw

ABSTRACT

PROPOSED METHOD

Evaluation

Audio Samples