Speech corpus of home automation commands and distress call in normal and shouted speaking styles
AuthorsEmanuele Principi(1), Stefano Squartini(1), Francesco Piazza(1), Danilo Fuselli(2), Maurizio Bonifazi(2)(1) Department of Information Engineering, Università Politecnica delle Marche, Ancona, Italy(2) FBT Elettronica Spa, Via Paolo Soprani, Recanati (MC), Italy
ITAAL is composed of utterances spoken by 20 native Italian speakers coming from the central Italy (Marche region). Half of the speakers are males and half are females, with an average age of 41.70 years and a standard deviation of 11.17 years. Recordings have been performed using a headset microphone (AKG C 555 L) and an array composed of four C 400 BL hypercardiod microphones spaced by 4 cm and placed on a table 80 cm height. The room measured 9.7 m X 8.0 m X 2.9 m, with a reverberation time (T60) of 0.72 s. Each person spoke the corpus sentences standing in front of the microphone array at a distance of 3 m. Signals have been acquired with a sample rate of 48 kHz using a MOTU 8pre sound interface, and they have been later downsampled to 16 kHz. People were asked to read three group of sentences in Italian language: home automation commands, distress calls and phonetically rich sentences. The latter sentences have been extracted from the "speaker independent" set of the APASCI corpus [Ang93a, Ang93b, Ang94] and they cover all the phones of the Italian language. Every sentence was spoken both in normal and shouted conditions, with the home automation commands and the phonetically rich sentences pronounced without emotional inflection. Differently, people were asked to speak distress calls as they were frightened.
Please cite this paper when referring to ITAAL:
Emanuele Principi, Stefano Squartini, Francesco Piazza, Danilo Fuselli, Maurizio Bonifazi, "A Distributed System for Recognizing Home Automation Commands and Distress Calls in the Italian Language," in Proc. of Interspeech, 25-29 Aug. 2013, Lyon, France, pp. 2049-2053.
List of sentences
Home automation commands
- Accendi la luce (switch on the light)
- Spegni la luce (switch off the light)
- Accendi la radio (switch on the radio)
- Spegni la radio (switch off the radio)
- Accendi la televisione (switch on the television)
- Spegni la televisione (switch off the television)
- Apri la tenda (open the curtain)
- Chiudi la tenda (close the curtain)
- Alza le tapparelle (raise the shutter)
- Abbassa le tapparelle (lower the shutter)
- Chiama mio figlio (call my son)
- Alza la temperatura (raise the temperature)
- Abbassa la temperatura (lower the temperature)
- Apri la porta (open the door)
- Chiudi la porta (close the door)
- Aiuto aiuto aiuto (help help help)
- Ambulanza (ambulance)
- Soccorso (assistance)
- Polizia (police)
- Al ladro (stop thief)
Phonetically rich sentences (extracted from the APASCI corpus)
- Quando ogni giglio impazzisce lei si rosicchia un'azzurra fava sbocciata dal glicine
- Avrei avventurato l'aggobbita quietanzatrice
- Quattro gemelle kessler della villanella sanno oggi offrire qua maurizio
- La rigida diavolessa nell'offside puo' addestrare una mamma
- Apprezziamo un sud gnomico
- Gli ufficiali che barbara per scelta avra' prescelto stanno a catanzaro
The following are four example files, two from a male speaker and two from a female speaker. Speakers utter the sentence "abbassa le tapparelle" in normal and shouted speaking style. Files are in wav format sampled at 16 khz, 16 bit. Each file contains five channels: channels 1-4 contain microphone array signals and channel 5 contains the signal from the headset.
[Ang93a] B. Angelini, F. Brugnara, D. Falavigna, D. Giuliani, R. Gretter, M. Omologo, "Automatic Segmentation and Labeling of English and Italian Speech Databases". In Proceedings of EUROSPEECH93, Vol. I, pp. 653-656, Berlin, Germany, 1993.
[Ang93b] B. Angelini, F. Brugnara, D. Falavigna, D. Giuliani, R. Gretter, M. Omologo, "A Baseline of a Speaker Independent Continuous Speech Recognizer of Italian". In Proceedings of EUROSPEECH93, Vol. II, pp. 847-850, Berlin, Germany, 1993.
[Ang94] B. Angelini, F. Brugnara, D. Falavigna, D. Giuliani, R. Gretter, M. Omologo, "Speaker Independent Continuous Speech Recognition Using an Acoustic-Phonetic Italian Corpus". In Proceedings of ICSLP94, Vol. III, pp. 1391-1394, Yokohama, Japan, 1994.