VOCALOID (often referred to as just "V1" or "VOCALOID1" in VOCALOID communities) is a singing synthesizer application software developed by the YAMAHA Corporation. The project was an international effort, and is considered the brainchild of Kenmochi Hideki, also known as the "father" of VOCALOID.
- 1 History
- 2 Requirements
- 3 Releases
- 4 Additional notes
- 5 Marketing
- 6 Cultural impact
- 7 Criticism
- 8 References
- 9 External links
- 10 Navigation
In the 20th century, the most successful vocal synthesizing attempt had been "Queen of the Night" from Mozart's opera The Magic Flute; this had been made in 1984 by Yves Potard and Xavier Rodet using the CHANT synthesizer.
Jordi Bonada, a senior researcher at the Music Technology Group at Pompeu Fabra University in Barcelona joined the university in 1997. Bonada worked on a research project as requested by YAMAHA which contained some "interesting" ideas. Bonada was known to have set about recording not just a song from a singer, but various ranges and pitch in an attempt to build a model that any song could be built from. The project was codenamed "Elvis" and lasted two years. It did not become a product at the end of its development. This was due to the fact this particular project was too large due to being based on spectral morphing techniques and each song required a professional singer behind it.
While it did not become a product, the "Elvis Project" helped establish that a series of phonetics in a wide range of pitches would help build a synthesizer based on any model. YAMAHA agreed to help them start a fresh new project; it was at this point that Kenmochi Hideki joined. The first initial ideas came from him in Japan in 2000, with most of the research done at the Pompeu Fabra University and the development of the core signal processing libraries created in C++. YAMAHA itself was responsible for the product design and development of the actual product. It was pure collaborative research, and they did not think about selling at that time.
At the time, synthesizers would take days to produce good quality results, but the vocal would always sound inhuman and obviously generated by a machine or computer. The price was expensive as well. This meant that while all other parts of the music production were by then fully able to be recreated in a DAW, producing a good quality vocal performance meant hiring a human vocalist. So, the aim of the project was to provide a fast, low cost way of getting uncanny human-like vocals to give producers full control of music production. They used "Elvis" as the base model for ideas and set about to tackle two main problems:
- how to process and transform singer recordings so that it would result in a performance of a given song sounding as natural as possible and provide the feeling of a continuous flow
- how to process and transform the singer recordings so that it would result in a performance of a given song sounding as natural as possible and provide the feeling of a continuous flow
The VOCALOID™ project was originally codenamed "Daisy Project" ("DAISYプロジェクト" or "でいじぃぷろじぇくと"), a name taken from the song "Daisy Bell" and was at a prototype stage in March 2002. (EpR ) was developed as the first voice model and it allow the researchers to transform vocal timbres in a natural manner while preserving subtle detail. At first, "Daisy" could only say vowels like "ai (love)". Four months later, "Daisy" began to support consonants, with the first "complete word" being "asa" (morning).
Because YAMAHA itself could only provide limited vocals, they licensed the software out to various 3rd party studios. The first studio to join this project was Crypton Future Media, who were contacted in May 2002. YAMAHA then attempted to find English studios to support an English version, but the majority of responses to contact were negative. The first studio to enter development was Zero-G, joining in the fall of 2002, with PowerFX also joining that year. Thus, both English and Japanese voicebanks began development.
At the 6th anniversary of VOCALOID™, Hiroyuki Itoh noted that they received demos from Zero-G without warning of what seemed to be a male vocal singing. Since they came unexpectedly, they did not realize they were VOCALOID™ demos and thought they were some sort of prank.
"Daisy" was demonstrated at the 6th anniversary of VOCALOID™, where a file called "Fly Me to the Moon" was played, the file was originally created for 7/16/2002 when Crypton were shown the first demonstration in their Sapporo office. "Daisy" still had troubles with consonants at the time.
"Daisy" dropped as a name due to conflicts with copyrighting - despite attempts to change the name (such as translating it into Japanese), they ultimately could not register it
The only 4 known vocals for "Daisy" were; LEON, LOLA, HANAKO and TARO. LEON and LOLA were the only ones ever to be shown to the public, releasing as official voicebanks for the final VOCALOID software.
Kenmochi reported the name of the software was very hard at the time to decide and "Vocaloid" had fallen into 3rd place as a choice of name. The name "Vocaloid" was chosen 2 or 3 weeks before its announcement, after the 2nd choice name failed due to a copyright conflict with a software in Belgium, "Vocaloid" being a portmanteau of the words "Vocal" and "Android" ("vocal android"). Kenmochi choose to announce the technology on 2/26/2003, a day before his birthday.
The original design of VOCALOID™ was to act as a replacement singer for a real singer. Many reviewers at the time of LE♂N and L♀LA's release thought that "VOCALOID" was a bold effort, as human speech was a complex thing to recreate. VOCALOID was regarded as the first of its kind to tackle singing vocals.
KAITO and MEIKO were originally recorded by YAMAHA themselves, before being made for commercial release. Kaito ended up being delayed a year and a half. 
The first VOCALOIDs, LE♂N and L♀LA, made their debut appearance and initial release at the NAMM Show on January 15, 2004. LE♂N and L♀LA were then released in Japan by the studio Zero-G on March 3, 2004, both of which were sold as a "Virtual Soul Vocalist". They were also demonstrated at the Zero-G Limited booth during Wired Nextfest and won the 2005 Electronic Musician Editor's Choice Award. Zero-G later released MIRIAM, with her voice provided by Miriam Stockley, in July 2004. Later that year, Crypton Future Media, Inc. also handled the release of the first Japanese VOCALOID, MEIKO. It was during this time period between MIRIAM and MEIKO's respective releases that the first rival software Cantor was released and aimed to compete with VOCALOID, known only in the western hemisphere by LE♂N, L♀LA, and MIRIAM.
Later Game Audio Network Guild held the "2nd Annual G.A.N.G. Awards Show" on Thursday, March 25, 2004 at the Fairmont Hotel in San Jose, California, during the Game Developer's Conference 2004. The software won the "Best New Audio Technology" award in Industry & Trade category.
Though LE♂N, L♀LA, MIRIAM, and MEIKO experienced good sales, MEIKO gaining sales of 3,000 in her first year in particular, KAITO initially failed commercially and sold just 500 units. Despite this, the software was overall successful and was followed by the VOCALOID2 engine.
It is is notable that back in 2004, VOCALOID was released towards the end of the "FLASH golden age" (FLASH黄金時代), a period known for the rise of flash-based productions (1998-2002/2005, end date arguable) and the birth of file sharing sites such as Youtube.
At the closing of the VOCALOID era, it was confirmed that 3 groups had joined production of the software. These companies were: Crypton Future Media, Zero-G Ltd and PowerFX. However, PowerFX, having been introduced to the software via LE♂N and L♀LA's demonstrations at the 2002 NAMM Show, did not produce any vocals for this version for VOCALOID, making their entrance at the beginning of the VOCALOID 2 era. However, it is known they had a Vocal in development as early as 2003 that was intended for the engine under the name of "JODIE" as well as a male vocal "RONIE".
KAITO was sold with the 1.1 version for the software, but caused problems with other versions of the software and a patch had to be created to fix this issue. The last version of this software produced was 1.1.2, the patch to upgrade all VOCALOID voicebanks was released by YAMAHA themselves, although Crypton Future Media later updated both their products to the latest version. Due to the retirement of support for the VOCALOID engine, the update is no longer able to be downloaded, as of 2011, from YAMAHA.
Improvements were made between version 1.0 and version 1.1.2. Vocal phonetics in VOCALOID version 1.0 were more broken and did not attempt to smooth out phonetics like 1.1.2., resulting in more robotic vocal singing. However, even the slightest of adjustments in version 1.1.2. would produce very different results to version 1.0. Therefore, not all users found it suitable to update to version 1.1.2. from version 1.0, despite the improvements.
Due to the successes of the VOCALOID2 software, VOCALOID saw a second life in 2008 caused by KAITO's sudden growth in popularity. KAITO later went on to claim second best seller of the year in Nico Nico Market in 2008.
As interest in VOCALOIDs grew, Zero-G began reselling their VOCALOID products again on their website, and were considering updating their box art to match current VOCALOID trends better. However, this did not occur.
The engine is now unsupported as of 2011 by YAMAHA and from early 2014 onwards, the engine version was removed from sale.
In mid-December 2013 news came from both Crypton Future Media and Zero-G that their VOCALOID project were being taken down.
Zero-G gave the 31st December 2013 as their VOCALOID final retirement date, after this date they were removed from sale permanently.
Serials could still be purchased while they lasted, but general sale ended.
- Windows XP or Windows 2000 (Note: The engine isn't officially compatible with Windows Vista or higher)
- Pentium III, 1 GHz or faster
- 512MB of RAM or more
- Approx 700 Mb Hard disk space or more
- CD-ROM or DVD-ROM Drive
- SVGA Display (1024x768)
- Sound Card with Microsoft DirectSound Compatible driver
- LAN/network card must be installed, or a USB network card must be connected to the USB port
Examples of usage
An example of solfège using VOCALOID technology.
|LEON File:LEON.ogg||LOLA File:LOLA.ogg|
|MIRIAM File:Miriam by Rinshuu.ogg||MEIKO File:MEIKO V1.ogg|
|KAITO File:KAITO V1.ogg|
For a list of VOCALOIDs parameters see Parameters
VOCALOID has 5 voicebanks available (3 English, 2 Japanese), offering a limited range of voices. Other genres are possible to achieve by users with further voice editing. Both English and Japanese VOCALOID have an English interface. Other languages were planned for the future (though these would not be introduced until VOCALOID3).
According to the original YAMAHA VOCALOID website, the software's key features were its ability recreate singing results exactly how you type them out on your PC. Manipulation of the vocals allowed for a greater array of styles and vocals than what was offered while having the added bonus of maintaining a degree of realism. VOCALOID drew its base for vocal based off analytic of the human voice and less from the samples of the human vocal. Extra expressions could be installed into a voice simply by adding vocal effects to further achieve results.
The file format for VOCALOID is "VOCALOID MIDI" (.MIDI), VOCALOID will not import .VSQ or .VSQX files, although it will import most midi file types.
The database of VOCALOID is much simpler and more difficult to modulate consonant sounds than the VOCALOID2 engine that followed. However, VOCALOID has some functions that VOCALOID2 does not have, such as the Resonance parameters. Resonance allowed the phonetic data to be manipulated through formant modulation, making it sound differently depending on what was done to it. The biggest advantage this offered was flexibility. As seen with voicebanks like LE♂N or MEIKO, each user can utilize the voicebanks very differently and VOCALOID has produced a wider range of different results with delicate editing by using several Resonances or other functions. All VOCALOID vocals are known to have had a small, be it undeclared, optimum vocal range compared to most vocals powered by later engine versions.
Unlike the version that followed, VOCALOID was a analytic based system that worked out how to adapt the vocal using mathematics. In short, this meant it used record data of samples to make the engine sound more like the vocalist behind the data, as a result the overtone of all 5 vocals was identical. The vocals sounded very synthetic and LQ, yet this is also why the engine was able to have such great flexibility opposed to the sample-based versions that followed VOCALOID. The quality issue limited the feasibility of vocals being released for it and JODIE and RONIE were not released for this reason. Also while realism was not beyond it, the analytic based results did not produce as realistic results as the sample based system.
When DSE.dll or DSE1_1.dll is examined by Hex editor software, a number of listed phonetics were stated by the engine as possible sounds; however, no released VOCALOID used them.
The VOCALOID interface also had minor adjustments depending on what VOCALOID was used to open the engine with. For example, MIRIAM's interface recoloured the keyboard around the keys deep blue with Zero-G's logos on the interface while KAITO's was green with Crypton Future Media logos. The standard that was used in VOCALOID demos and presentations was brown with no logos whatsoever.
All VOCALOID voicebanks except KAITO used the VOCALOID 1.0 editor when they were released. Users using the VOCALOID 1.0 editor can update them by patching VOCALOID 1.1 update file. KAITO already was released with both kinds of VOCALOID editors. However, users who are not using 1.1.2 version need to patch VOCALOID Ver1.1.2 update file distributed on Crypton's official page first before they use VOCALOID 1.0 editor. There are many differences between ver1.0 and 1.1, and they sound differently even if they are edited in the same way. (Comparing KAITO's ver 1.0 and ver 1.1 Niconico broadcast) The main difference between them is singing style and portamento Timing.
Though users can switch between versions, its best to proceed with caution when doing so, however, there are advantages over using Vers1.0 or Vers1.1 each. This is currently the only version of the VOCALOID engine wherein a significant version change occurred within its lifespan that impacted the resulting singing results in any noticeable way. All other improvements were either to voicebanks and not to engine, or occurred when a newer version of the engine was released.
Despite being Japanese, KAITO and MEIKO did not have a Japanese interface as this version was never fully translated into Japanese, although the phonetics were still Japanese. Another issue with VOCALOID is that it had a number of synchronizing issues, which varied between VOCALOID voicebank libraries; this crated problems when setting the result to music.
In comparison to their providers (based on samples known for L♀LA, MIRIAM, KAITO, and MEIKO's vocal providers) VOCALOID voicebanks are more deeper sounding in tone than their vocalist's own vocals are more softer, often huskier.
In addition, VOCALOID vocals of both languages are missing some sounds that are needed to perfect either language. In other cases, the pronunciations exist but do not correctly sound out the right combination as expected, due to lack of distinction between similar sounds. However, the majority of the correct sounds exist and with some tweaking results can be made to sound closer to the intended results. The VOCALOID synthesizing engine will often attempt to improvise some sounds, however, the results are often crude and at times rough. For example, when the engine encounters slurring (a long term issue of the VOCALOID software caused by a sample handling issues), clarity is almost completely lost and it is difficult to maintain clear results without much work. The rough handling of the VOCALOID engine in its attempt to perfect language while sounding human and control the flow of lyrics across the different keys is the origin of much of the heavier digital results of the 5 VOCALOID vocals. VOCALOID is also more likely to skip sounds than later versions when encountering problems.
VOCALOID may have issues with the Windows 7 operating system (though there are successful cases of installation) and while VOCALOID is supposed to be compatible with Windows Vista and users have reported no major problems, initially, rumors stated otherwise. However, it cannot be guaranteed that VOCALOID will work with operating systems newer than Windows XP. For Windows 7 and 64-bit OS, those who have managed a successful installation report that VOCALOID will often encounters issues that cause it to crash. Currently, the software can be used on its majority as long as it's installed with some precautions.
Illegal versions of the software were also commonplace for VOCALOID. The software was easy to crack by pirating teams and every voicebank was cracked at some point after release. It was also discovered that most popular keygens worked with it. There was very little service differences between the legal and illegal versions aside from a lack of technical support from studios, although the software ReWire function may not work as well as the legal version.
The software was marketed as a replacement singer in the English version and a digital instrument in the Japanese version. In both cases, the software was aimed at professionals and was sold as a tool to aid producers who needed a singer but could not either afford one or find the right one for the song. It was useful also to teach singers also how to sing a song and supply them with example lyrics to match, or simply as a music demo for a portfolio. In addition, it aided in production of music that required minor singer lyrics such as loops, saving the producer the need to hire a singer for the sake of a handful of phrases. As such, VOCALOID was marketed as a purely professional tool and expected to only be purchased by professional music makers.
VOCALOIDs were promoted at events such as the NAMM show. It was the promotion of Zero-G's L♀LA and LE♂N at the NAMM trade show that would later introduce PowerFX to the VOCALOID program. Most of the promotions were done through magazines such as Sound on Sound and the New York Times newspaper. While Japanese VOCALOIDs were also promoted in DTM MAGAZINE, their promotion was much lighter than what would follow in the VOCALOID2 era, and MEIKO and KAITO experienced an overall quieter focus.
On-line media was not used as a method of promotion and overall VOCALOID went vastly unnoticed, particularly in terms of the Japanese version as it had less attention then the English version. Part of this is due to Sound on Sound having on-line website in 2004 already established, thus the English versions details have been much better recorded as a result overall. In contrast DTM MAGAZINE, did not publish details of each issue until 2009, making details of their articles from this era hard to research even for those within Japan as getting hold of older issues is a problem. The amount of information on the Japanese version from 2004-2007 therefore is poor, a note acknowledged by developers since.
The two biggest failures of both studio's marketing ploys was Zero-G's failure to sell in America,despite the high level of attention given to this version by the media, as well as KAITO's initial lack of sales. The failure in America causes Miriam to not be sold in America and instead only sold in Europe at the original time of her release. Otherwise, both Crypton and Zero-G managed to meet expectations of their VOCALOIDs during the VOCALOID engine era, with Meiko fairing the best of all 5 vocals, selling x3 the amount that was expected for her to sell.
After the success of Hatsune Miku in the VOCALOID2 era and sudden interest in KAITO in 2008, Crypton Future Media were able to go back and re-sell their early VOCALOID voicebanks, using the same methods of approach to them as their VOCALOID2 voicebanks. This proved successful enough for them to re-launch their VOCALOIDs for a later engine. Zero-G's attempt to do the same was not as successful, since the approach to English VOCALOIDs and Japanese VOCALOIDs had varied greatly over the last few years. However, Zero-G had established that if the demand ever becomes high enough, they will relaunch their 3 VOCALOID voicebanks in a later engine. When the 3 Zero-G vocals became mail-order only, Miriam was the first to sell out.
The VOCALOID software was not well supported and there was little information on it. Crypton Future Media did however go back and make tutorials for this version of the software in August 2008.
In comparison to its successor VOCALOID2, VOCALOID had very little cultural impact at its time of release. Sales of the software were very sluggish.
It is difficult to know how many songs and albums are using the VOCALOID software since song writers must ask permission before being allowed to state specifically they are using a VOCALOID in their songs. Due to the lack of attention the result is also a lack of knowledge and additionally a lack of coverage on how widespread usage of the software was.
The first album to be released using a VOCALOID was A Place in the Sun, which used LE♂N's voice for the vocals singing in both Russian and English. MIRIAM has also been featured in two albums, Light + Shade and Continua. Japanese electropop-artist Susumu Hirasawa used VOCALOID L♀LA in the original soundtrack of Paprika by Satoshi Kon.
The majority of songs wherein the software was used as the main singer did not exist until after 2008 when KAITO was rediscovered. Because of how popular it was to feature entire songs with Hatsune Miku or the Kagamine release as the main singer, producers began to do the same with the older software. VOCALOID was mostly only useful for loops creation, as seen in "Paprika", since the software wasn't good enough to be a full replacement singer. Adding to the lack of major focus was that, due to its lack of coverage, there were not many techniques known to make it sound better. In adition the majority of producers who used the software came post 2008. In addition due to there being no fan culture during the era, there were "users" of the software but no "fans" to create a "fandom" in terms of both the English and Japanese version at the time. VOCALOID was treated as any other DTM plug-in or software application, causing it to fail to be acknowledged out of DTM and EDM circles until 2008.
The CEO of Crypton Future Media, Inc. noted the lack of interest in the initial VOCALOID software. Many studios when approached by Crypton Future Media for recommendations had no interest in the software initially, with one particular company representative calling it a "toy". Crypton blamed a fear of robots on part of the lack of response on the sale of the software. A level of failure was also put on LE♂N and L♀LA for lack of sales in America, putting the blame on their British accents, despite initial praises overall from reviewers of the software, and the fact that the English version software had sold well in both Japan and Europe.
Earlier VOCALOIDs were created without "avatars", and boxart was not important to the function of the program. While MEIKO and KAITO had images that could later be used as avatars, LE♂N, L♀LA and MIRIAM (although there is a clear image of a person) did not. When avatars became common with Japanese VOCALOIDs during the VOCALOID2 era, the English VOCALOIDs without official avatars were left to interpretation by fan artwork. Zero-G did show interest in revising the boxart of their VOCALOIDs since interest in VOCALOIDs had greatly increased, but the voicebanks were retired before this occured.
VOCALOID voicebanks were criticized for their poor pronunciation problems and both versions of the software suffered issues with certain sounds. However, despite the lack of interest, most reviews on them were good. Although criticism was in plenty, praise was equally found, as many recognized that VOCALOID™ was an ambitious project to undertake, being more complex and bolder than a synthesizer or an instrument like the flute or guitar. Since the human ear can pick up errors in speech, this made VOCALOID a difficult product to sell, yet VOCALOID was able to sound realistic enough on occasion. This was very important to consider as at the time of release, as stated by "Popular Science", "Synthetic vocals have never even come close to fooling the ear, and outside of certain Kraftwerk chestnuts, robo-crooning is offputting." YAMAHA received much praise, the VOCALOID project was hailed as a "quantum leap" on vocal synthesis, while VOCALOID itself received much attention and praise within the industry.
Crypton Future Media stated that the VOCALOID engine was more like a prototype engine for the later VOCALOID2 software that followed. There was also some criticism for opening the engine up as commercial product rather then limiting the license to just private or business level of usage, although Crypton Future Media thought this was best for the software.
The lack of support for this engine lead to future versions being overall better supported from 2008 onwards and was one of the criticisms VOCALOID users from VOCALOID2 onwards expressed about this version of the engine. The Japanese version faired the worst overall during its era because of this. There is major issues when researching this engine as mentioned elsewhere. Even after success of Hatsune Miku, information on VOCALOID mostly is either focused on the development of the engine itself, or on the English version. Much of the information from the Japanese version for this era came after VOCALOID3 was released.
Despite the number of useful applications of the software, the other issue was that the software was a niche tool at best and took up a great deal of system computer resources for its overall usefulness. These issues were not just found in this version, but have been issues for all future versions of VOCALOID. Being a niche tool, however, only furthered its obscurity as it was not an essential tool a producer would need. Overall, it was just easier to hire a singer to do the lyrics and skip the need to purchase VOCALOID, as the real singer would still give a better and easier performance overall.