đ DreamVoiceDB: Voice Timbre Dataset
The DreamVoiceDB dataset is an integral part of our research on voice timbre and its diverse characteristics. This page provides a detailed overview of the dataset creation process, the criteria for speaker selection, the methodology for keyword annotation, and our approach to enhancing the datasetâs richness and authenticity.
Overview
Voice timbre is shaped by a myriad of factors, including age, gender, physical properties of the vocal tract and vocal cords, and perceptual characteristics. To effectively encapsulate the multifaceted nature of voice timbre, we employed a comprehensive three-stage process.

Figure 1: Schematic diagram of DreamVoiceDB survey method.
Details
- Datasets Used: LibriTTS-R and VCTK
- Number of Speakers: 900 [100 from VCTK and 800 from LibriTTS-R, sampled randomly]
- Keyword Selection Process:
- Guided by an expert voice actor
- Total of 10 keywords identified
- Keywords categorized based on relative objectivity and subjectivity
- Annotator Profile:
- Total Annotators: 8 (4 females and 4 males)
- Expertise: Speech pathology, accent coaching, singing coaching, transcription, and translation
You can find our sample survey page used for data collection here - DreamVoiceDB Survey Link
Keywords and Descriptors
The DreamVoiceDB dataset utilizes a variety of keywords and descriptors to annotate voice timbre. These are grouped into relatively objective and subjective categories, along with additional aspects related to suitability for various voice-related professions.
Relatively Objective Keywords [Multtple Options Single Choice]
- Age: âTeenager or Youngâ, âAdult or Neutralâ, and âSenior or Oldâ.
- Gender: âDefinitely Femaleâ, âSomewhat Ambiguous / Neutralâ.
- Brightness: Ranging from âVery Dark (Rich)â to âVery Bright (Thin)â.
- Roughness: From âRough (Raspy)â to âSmooth (Mellow)â.
Reference Audio can be found in the following powerpoint presentation: DreamVoiceDB Reference Audio
Relatively Subjective Keywords [Multiple Options Multiple Choice]
- Nasal: A noticeable resonance or vibration in the nasal passages.
- Breathy: You can hear the flow of air as the person speaks.
- Weak: Lack of energy and confidence.
- Strong: Full, resonant, and confident.
- Cute: Perceived as young and endearing.
- Attractive: A voice that draws pleasure in listeners.
- Cool: Laid-back, confident, and collected.
- Warm: A voice that conveys kindness, sincerity, and comfort.
- Authoritative: A voice that carries weight and command.
- Others: Other characteristics but not related to the speakerâs emotion.
Additional Aspects
This includes evaluating the voiceâs fit for professions such as singing, acting, public speaking, and other vocations where vocal qualities are paramount. Specific examples include:
- Public Presentation: Public Speaker, Motivational Speaker, Teacher, Tour Guide
- Storytelling: Podcaster, Audiobook Narrator
- Client and Public Interaction: Customer Service, Healthcare Communicator, Retail Associate, Receptionist
- Diplomacy and Judiciary: Diplomat, Judge
- None of the above
Each keyword and descriptor plays a crucial role in accurately capturing and conveying the unique characteristics of each voice sample in the dataset.
Analysis and Enhancement
- Agreement scores used to prioritize keywords
- Rigorous reassessment for moderately agreed keywords
- For each speaker we atleast have age and gender keywords
- Other keywords are selected based on the Agreement scores
- We do not use any fine grained annotations for darkness and roughness in the model training
- Use of OpenAIâs GPT4 for natural language descriptors
- Generation of descriptors based on all keyword combinations
Generated Descriptors Examples:
Speaker ID |
WAV File |
Keywords |
Prompt |
2562 |
File 2562 |
Male, Adult, Dark |
An adult male voice, with a dark timbre. |
30 |
File 30 |
Female, Teenager, Bright |
A teenage girlâs voice, radiating brightness and energy. |
2156 |
File 2156 |
Male, Senior, Strong |
A senior male voice, characterized by strength and power. |
Dataset Statistics

Access to Dataset and Analysis
- DreamVoiceDB
- Code for Dataset:Repo: