Vocaloid


Vocaloid is a singing voice synthesizer software product. Its signal processing part was developed through a joint research project between Yamaha Corporation and the Music Technology Group at Pompeu Fabra University, Barcelona. The software was ultimately developed into the commercial product "Vocaloid" that was released in 2004.
The software enables users to synthesize "singing" by typing in lyrics and melody and also "speech" by typing in the script of the required words. It uses synthesizing technology with specially recorded vocals of voice actors or singers. To create a song, the user must input the melody and lyrics. A piano roll type interface is used to input the melody and the lyrics can be entered on each note. The software can change the stress of the pronunciations, add effects such as vibrato, or change the dynamics and tone of the voice.
Various voice banks have been released for use with the Vocaloid synthesizer technology. Each is sold as "a singer in a box" designed to act as a replacement for an actual singer. As such, they are often released under a moe anthropomorph avatar, however, there are also voice banks released without an assigned avatar. These avatars are also referred to as Vocaloids, and are often marketed as virtual idols; some have gone on to perform at live concerts as an on-stage projection.
The software was originally only available in English starting with the first Vocaloids Leon, Lola and Miriam by Zero-G, and Japanese with Meiko and Kaito made by Yamaha and sold by Crypton Future Media. Vocaloid 3 has added support for Spanish for the Vocaloids Bruno, Clara and Maika; Chinese for Luo Tianyi, Yuezheng Ling, Xin Hua, and Yanhe; and Korean for SeeU.
The software is intended for professional musicians as well as casual computer music users. Japanese musical groups such as Livetune of Toy's Factory and Supercell of Sony Music Entertainment Japan have released their songs featuring Vocaloid as vocals. Japanese record label Exit Tunes of Quake Inc. also have released compilation albums featuring Vocaloids.

Technology

Vocaloid's technology is generally categorized into the concatenative synthesis in the frequency domain, which splices and processes the vocal fragments extracted from human singing voices, in the forms of time-frequency representation. The Vocaloid system can produce the realistic voices by adding vocal expressions like the vibrato on the score information. Initially, Vocaloid's synthesis technology was called "frequency-domain singing articulation splicing and shaping" on the release of Vocaloid in 2004, although this name is no longer used since the release of Vocaloid 2 in 2007. "Singing articulation" is explained as "vocal expressions" such as vibrato and vocal fragments necessary for singing. The Vocaloid and Vocaloid 2 synthesis engines are designed for singing, not reading text aloud, though software such as Vocaloid-flex and Voiceroid have been developed for that. They cannot naturally replicate singing expressions like hoarse voices or shouts.

System architecture

The main parts of the Vocaloid 2 system are the score editor, the singer library, and the synthesis engine. The synthesis engine receives score information from the score editor, selects appropriate samples from the singer library, and concatenates them to output synthesized voices. There is basically no difference in the score editor and the synthesis engine provided by Yamaha among different Vocaloid 2 products. If a Vocaloid 2 product is already installed, the user can enable another Vocaloid 2 product by adding its library. The system supports three languages, Japanese, Korean, and English, although other languages may be optional in the future. It works standalone and as a ReWire application or a Virtual Studio Technology instrument accessible from a digital audio workstation.

Score Editor

The score editor is a piano roll-style editor to input notes, lyrics, and some expressions. When entering lyrics, the editor automatically converts them into Vocaloid phonetic symbols using the built-in pronunciation dictionary. The user can directly edit the phonetic symbols of unregistered words. The score editor offers various parameters to add expressions to singing voices. The user is supposed to optimize these parameters that best fit the synthesized tune when creating voices. This editor supports ReWire and can be synchronized with DAW. Real-time "playback" of songs with predefined lyrics using a MIDI keyboard is also supported.

Singer library

Each Vocaloid license develops the singer library, or a database of vocal fragments sampled from real people. The database must have all possible combinations of phonemes of the target language, including diphones and sustained vowels, as well as polyphones with more than two phonemes if necessary. For example, the voice corresponding to the word "sing" can be synthesized by concatenating the sequence of diphones "#-s, s-I, I-N, N-#" with the sustained vowel ī. The Vocaloid system changes the pitch of these fragments so that it fits the melody. In order to get more natural sounds, three or four different pitch ranges are required to be stored into the library. Japanese requires 500 diphones per pitch, whereas English requires 2,500. Japanese has fewer diphones because it has fewer phonemes and most syllabic sounds are open syllables ending in a vowel. In Japanese, there are three patterns of diphones containing a consonant: voiceless-consonant, vowel-consonant, and consonant-vowel. On the other hand, English has many closed syllables ending in a consonant, and consonant-consonant and consonant-voiceless diphones as well. Thus, more diphones need to be recorded into an English library than into a Japanese one. Due to this linguistic difference, a Japanese library is not suitable for singing in eloquent English.

Synthesis engine

The synthesis engine receives score information contained in dedicated MIDI messages called Vocaloid MIDI sent by the score editor, adjusts pitch and timbre of the selected samples in frequency domain, and splices them to synthesize singing voices. When Vocaloid runs as VSTi accessible from DAW, the bundled VST plug-in bypasses the score editor and directly sends these messages to the synthesis engine.
;Pitch conversion
;Timing adjustment
;Sample concatenation
;Timbre manipulation
;Transforms

Software history

Vocaloid

Yamaha started development of Vocaloid in March 2000 and announced it for the first time at the German fair Musikmesse on March 5–9, 2003. It was created under the name "Daisy", in reference to the song "Daisy Bell", but for copyright reasons this name was dropped in favor of "Vocaloid".

Vocaloid 2

Vocaloid 2 was announced in 2007. Unlike the first engine, Vocaloid 2 based its results on vocal samples, rather than analysis of the human voice. The synthesis engine and the user interface were completely revamped, with Japanese Vocaloids possessing a Japanese interface.

Vocaloid 3

Vocaloid 3 launched on October 21, 2011, along with several products in Japanese, the first of its kind. Several studios updated their Vocaloid 2 products for use with the new engine with improved voice samples.

Vocaloid 4

In October 2014, the first product confirmed for the Vocaloid 4 engine was the English vocal Ruby, whose release was delayed so she could be released on the newer engine. In 2015, several V4 versions of Vocaloids were released. The Vocaloid 5 engine was then announced soon afterwards.

Vocaloid 5

Vocaloid 5 was released on July 12, 2018, with an overhauled user interface and substantial engine improvements. The product is only available as a bundle; the standard version includes four voices and the premium version includes eight. This is the first time since Vocaloid 2 that a Vocaloid engine has been sold with vocals, as they were previously sold separately starting with Vocaloid 3.

Vocaloid 6

Vocaloid 6 was released on October 13, 2022, with support for previous voices from Vocaloid 3 and later, and a new line of Vocaloid voices on their own engine within Vocaloid 6 known as Vocaloid:AI. The product is only sold as a bundle, and the standard version includes the 4 voices included with Vocaloid 5, as well as 4 new voices from the Vocaloid:AI line. Vocaloid 6's AI voicebanks support English and Japanese by default, though Yamaha announced they intended to add support for Chinese. Vocaloid 6 also includes a feature where a user can import audio of themselves singing and have Vocaloid:AI recreate that audio with one of its vocals.

Derivative products

Software

  • VY1, a Japanese feminine vocal. This was first announced in December 2010, VY1 was released in an adapted version of the Vocaloid software "iVOCALOID" for the iPad and iPhone as "VY1t".
  • VY2, a Japanese masculine vocal, was due for release. VY2's version would have adjusted the VY1 version for compatibility and performance reasons. However, it has never been released.
  • Aoki Lapis was added to this software in December 2012. This is a Japanese female vocal. This particular version of the VocaloWitter app took first place out of all paid-for apps on the iTunes store on 11 September 2013.
;:
  • VY1: A feminine vocal released for the software. This was the first vocal sold.
  • VY2: In October 2011, VY2 was made available, this is a masculine vocal.
  • Aoki Lapis: Lapis was added in November 2012, she is a female vocal.
  • Merli: Merli was added August 2014, she is a female vocal.
;:
The following products are able to be purchased;
  • VY1: The full version of the Japanese feminine VY1 vocal.
  • ZOLA Project: Yuu, Wil and Kyo are 3 male vocals, each are sold separately.
  • Aoki Lapis: Japanese female vocal.
  • Merli: Japanese female vocal.
  • Mew: Japanese female vocal.
  • Galaco: Japanese female vocal, she comes with two versions "red" and "blue" both are sold separately.
  • Cyber Diva: English female vocal.
  • Yuzuki Yukari: Japanese female vocal, has 3 versions "Jun", "Onn" and "Lin" which are each sold separately.
  • Sachiko: Japanese female vocal.
  • Megpoid: Female vocal, has two vocals "Native" which is a Japanese vocal and "English" both are sold separately.
  • Unity-Chan: Japanese female vocal.