Speech Interfaced Systems

In the last years, human-machine interfaces gained a growing attention. Nowadays, other than with mouse, keyboard and joypad, computers, mobile devices and game consoles can be controlled in new ways. Multitouch displays are of common use and changed how people interact with graphical user interfaces. Miniaturized accelerometers and gyroscopes revolutionized the way people play games. Finally, speech recognition technology allowed hands-free control of devices, letting people make a call or issue a new destination to a personal navigation assistant while driving a car.

A3Lab research activity is focused in speech-interfaced systems. Generally speaking, a speech-interfaced system is a system  a person can interact with by voice. Two main technologies enable voice interaction: speech recognition, and speaker recognition.

Speech recognition is the process of transforming a speech signal in to a sequence of words. Speech recognition technology can be used in simple scenarios, such as voice control, as well as in more complex scenarios, such as meetings. Depending on the environment, noise can seriously degrade recognition performance: as described in the Speech Enhancement section, A3Lab is involved in developing new algorithms for robust speech recognition.

Meeting scenarios are particularly challenging: people interact each other, and recognizing simultaneously multiple voices is quite a difficult task. This explains why in the last years many researchers addressed this research topic. In meeting scenarios, the system could give a real-time feedback about the ongoing conversation, for example visualizing images or documents related to topics of the conversation, and/or could give a feedback at the end of the meeting, for example giving a summarization of the conversation with attached information about who said what.

Speaker recognition is the process of automatically recognizing a person's identity by means of her voice. Speaker recognition can be classified in speaker identification and speaker verification. Speaker identification is the process of determining the identity of a person among the registered ones; speaker verification is the process of accepting or rejecting a person's identity claim. A3Lab research in the speaker recognition field is about developing algorithms for improving recognition performance in noisy environments

A3Lab research activity is currently focused in the following main topics:

  • development of new noise reduction algorithms to enable speech and speaker recognition in noisy environments;
  • development of supporting algorithms for meeting scenarios (e.g. speaker diarization algorithms);
  • exploration of new application scenarios for speech and speaker recognition technology.

Related pubblications

Daniele Liciotti, Giacomo Ferroni, Emanuele Frontoni, Stefano Squartini, Emanuele Principi, Roberto Bonfigli, Primo Zingaretti, Francesco Piazza, "Advanced integration of multimedia assistive technologies: A prospective outlook", in 10th International Conference on Mechatronic and Embedded Systems and Applications (MESA), 2014, pages 1-6

Emanuele Principi, Roberto Bonfigli, Stefano Squartini, Francesco Piazza, "Improving the Performance of an In-Home Acoustic Monitoring System by Integrating a Vocal Effort Classification Algorithm", 136th Audio Engineering Society Convention, 2014