Freisprechen im Kraftrfahrzeug oder generell in mobiler Umgebung erfordert eine intelligente Reduktion von störenden Hintergrundgeräuschen und Echos des Lautsprechers. Besonders hohe Qualität in der KFZ-Anwendung erreicht man mit neuartigen fahrzeugspezifischen Lösungen. Dieselbe Technologie ist notwendig für sog. In-Car Kommunikationssysteme, die das Gespräch zwischen Fahrer und Fahrzeuginsassen auf den Rücksitzen verbessern.
In Voice over IP (VoIP) und in der Mobiltelefonie wird seit kurzem die breitbandige Sprachübertragung mit deutlich verbesserter Qualität unterstützt (Signalkomponenten von 50 Hz bis 7 kHz). Die Kommunikation mit Teilnehmern, die jedoch nur über alte schmalbandige Telefone verfügen (Signalkomponenten von 300 Hz bis 3400 Hz), sollte bei den neuen Endgeräten trotzdem nicht zu einem deutlichen Einbruch der Sprachqualität führen. Das läßt sich erreichen durch empfangsseitigen Einsatz einer sog. künstlichen Bandbreitenerweiterung.
Im Kraftfahrzeug ist es eine Herausforderung, Sprachbediensysteme geräuschrobust auszulegen. ...
[Mehr]
Im Kraftfahrzeug ist es eine Herausforderung, Sprachbediensysteme geräuschrobust auszulegen. Betrachtet werden Technologien zur Störgeräusch- und Echoreduktion in Verbindung mit Spracherkennern und Sprecheridentifikationsverfahren. Ein weiteres Thema ist die akustische Zustandsbestimmung der Fahrzeuginsassen und des Fahrzeuginnenraums. Gefundene Kenngrößen dienen einer optimierten Sprachbedienung, und/oder als zusätzliche Eingangsgrößen für Fahrerassistenzkomponenten. Weitere Arbeiten suchen nach Möglichkeiten, mit bestehenden Mikrofon- und Kamerasensoren neue Bedienkonzepte zu realisieren.
Vorlesung und Rechnerübung zur Sprachkommunikation
Balazs Fodor, David Scheler, Suhadi Suhadi, Tim Fingscheidt
AES 36th International Conference, Dearborn, Michigan, USA, June 2-4, 2009
Speech dialog system users often times issue their commands before or during push-to-speak (PTS) button use. This leads to degraded system performance already in the first turn. We propose a system called talk-and-push (TAP) that allows the user to start talking before or after pushing the PTS button, as is common when tapping on someone's shoulder. An acoustic echo cancellation optimized for in-car use reduces FM radio echoes, so that no muting of the FM radio signal is necessary. A notch filter to remove the beep, buffering of the speech signal, and an intelligent noise robust voice activity detection that signals the start of utterance to the automatic speech recognizer are further core components of our proposed system. Significant word error rate improvements vs. state of the art with muted FM radio signals are reported.
Suhadi, S.; Fingscheidt, T.:
in Proc. of ITG-Fachtagung "Sprachkommunikation", Aachen, Germany, Oct. 2008, VDE-Verlag.
In our previous publication, we proposed a data-driven speech enhancement with so-called ideal gain averaging (IGA) weighting rules to estimate the clean speech spectra. Being implemented as a table look-up, the subband individual weighting rules were trained separately for speech presence and speech absence by taking the average of all ideal gains computed from clean speech and noise training signals recorded in the environment of interest. In this contribution we present a new training methodology selecting appropriate ideal gains to compute the final IGA weighting rules for speech presence and speech absence. This selection of ideal gains effectively reduces the bias of the weighting rules under mediumand low SNR conditions, which occurs due to the imperfect voice activity detection (VAD) computation. Compared to our previous publication, the proposed training methodology yields an improvement in terms of speech preservation and noise attenuation.
Steinert, K.; Suhadi, S.; Fingscheidt, T.; Schoenle, M.:
in Proc. of IWAENC'08, Seattle, Washington, USA, Sept. 2008.
An important parameter in quality assessment of speech enhancement systems is speech distortion, measured in terms of quality of the speech component. In fact, in the context of noise reduction, the user tends to prefer a certain degree of residual noise over distorted speech with suppressed background noise. The challenge of instrumental speech component quality evaluation lies, among others, in the mere availability of the enhanced output signal mixture rather than its speech portion. In this paper we present a method to extract the speech component from the enhanced output signal with high accuracy, given the input signal components speech, noise, and echo. We apply this method to a black box speech component quality comparison of two speech enhancement systems and report on instrumental and subjective tests with focus on double-talk.
Fingscheidt, T.; Suhadi, S.; Stan, S.:
in IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 4, pp. 825-834, May 2008.
In this paper, we present a training-based approach to speech enhancement that exploits the spectral statistical characteristics of clean speech and noise in a specific environment. In contrast to many state-of-the-art approaches, we do not model the probability density function (pdf) of the clean speech and the noise spectra. Instead, subband-individual weighting rules for noisy speech spectral amplitudes are separately trained for speech presence and speech absence from noise recordings in the environment of interest. Weighting rules for a variety of cost functions are given; they are parameterized and stored as a table look-up. The speech enhancement system simply works by computing the weighting rules from the table look-up indexed by the a posteriori signal-to-noise ratio (SNR) and the a priori SNR for each subband computed on a Bark scale. Optimized for an automotive environment, our approach outperforms known-environment-independent-speech enhancement techniques, namely the a priori SNR-driven Wiener filter and the minimum mean square error (MMSE) log-spectral amplitude estimator, both in terms of speech distortion and noise attenuation.
Fingscheidt, T.; Suhadi, S.; Steinert, K.:
in Proc. of ICASSP'08, Las Vegas, Nevada, USA, Apr. 2008.
Quality assessment of speech enhancement systems has to deal with aspects such as distortion of the near-end talker?s speech, and with the attenuation and distortion of the noise and the echo in different test cases. We propose first steps into the direction of a new black box objective quality assessment of speech enhancement schemes, based on our previous work on decomposition of the (enhanced) speech signal into its components speech, (residual) noise, and (residual) echo. Having these signals available, to our knowledge, for the first time a black box objective quality assessment of an entire speech enhancement system is proposed allowing for simultaneous measurement of, e.g., noise attenuation, echo return loss enhancement (ERLE), and perceptual evaluation of speech quality (PESQ) of the speech component in a wide range of test scenarios including
double-talk. The derived scheme proves to be very useful for testing hands-free devices in practice but also for objective evaluation of sophisticated algorithms in science.
Geburstag: 13.11.1977 in Bandung, Indonesien
Ausbildung:
10/2000 - 03/2003 Master Studium an der TU Hamburg-Harburg
(Studiengang: Information and Communication Systems)
08/1995 - 10/1999 Bachelor Studium am Bandung Institute of Technology, Indonesien
(Studiengang: Elektrotechnik)
Arbeitserfahrung und Praktika:
10/2006 - heute Doktorand am IfN, TU Braunschweig
09/2003 - 09/2006 Doktorand bei BenQ Mobile (vormals: Siemens Mobile), München
04/2003 - 08/2003 Werkstudent bei Siemens Mobile, München
08/2001 - 10/2001 Praktikum bei DaimlerChrysler AG, Möhringen-Stuttgart
05/1998 - 07/1998 Praktikum bei PT. Telkom Indonesia, Bandung, Indonesien