Gespeichert in:
Titel: | Robust methods for content analysis of auditory scenes |
---|---|
Von: |
Jürgen Thomas Geiger
|
Person: |
Geiger, Jürgen Thomas
Verfasser aut |
Hauptverfasser: | |
Format: | Abschlussarbeit Buch |
Sprache: | Englisch |
Veröffentlicht: |
München
Verl. Dr. Hut
2015
|
Ausgabe: | 1. Aufl. |
Schriftenreihe: | Informationstechnik
|
Schlagworte: | |
Online-Zugang: | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027827424&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
Beschreibung: | IX, 172 S. graph. Darst. |
ISBN: | 9783843919869 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV042391552 | ||
003 | DE-604 | ||
005 | 00000000000000.0 | ||
007 | t| | ||
008 | 150304s2015 xx d||| m||| 00||| eng d | ||
020 | |a 9783843919869 |9 978-3-8439-1986-9 | ||
035 | |a (OCoLC)904452916 | ||
035 | |a (DE-599)BVBBV042391552 | ||
040 | |a DE-604 |b ger |e rakwb | ||
041 | 0 | |a eng | |
049 | |a DE-91 |a DE-12 | ||
084 | |a DAT 815d |2 stub | ||
100 | 1 | |a Geiger, Jürgen Thomas |e Verfasser |4 aut | |
245 | 1 | 0 | |a Robust methods for content analysis of auditory scenes |c Jürgen Thomas Geiger |
250 | |a 1. Aufl. | ||
264 | 1 | |a München |b Verl. Dr. Hut |c 2015 | |
300 | |a IX, 172 S. |b graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 0 | |a Informationstechnik | |
502 | |a Zugl.: Müncehn, Techn. Univ., Diss., 2014 | ||
650 | 0 | 7 | |a Automatische Identifikation |0 (DE-588)4206098-9 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Robustheit |0 (DE-588)4126481-2 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Geräuschanalyse |0 (DE-588)4324422-1 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Automatische Inhaltsanalyse |0 (DE-588)4265353-8 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Gesprochene Sprache |0 (DE-588)4020717-1 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Automatische Sprechererkennung |0 (DE-588)4143704-4 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Störgeräusch |0 (DE-588)4343358-3 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Nachhall |0 (DE-588)4171018-6 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Automatische Spracherkennung |0 (DE-588)4003961-4 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Gehen |0 (DE-588)4140871-8 |2 gnd |9 rswk-swf |
655 | 7 | |0 (DE-588)4113937-9 |a Hochschulschrift |2 gnd-content | |
689 | 0 | 0 | |a Geräuschanalyse |0 (DE-588)4324422-1 |D s |
689 | 0 | 1 | |a Gehen |0 (DE-588)4140871-8 |D s |
689 | 0 | 2 | |a Automatische Sprechererkennung |0 (DE-588)4143704-4 |D s |
689 | 0 | 3 | |a Automatische Identifikation |0 (DE-588)4206098-9 |D s |
689 | 0 | 4 | |a Automatische Spracherkennung |0 (DE-588)4003961-4 |D s |
689 | 0 | 5 | |a Gesprochene Sprache |0 (DE-588)4020717-1 |D s |
689 | 0 | 6 | |a Automatische Inhaltsanalyse |0 (DE-588)4265353-8 |D s |
689 | 0 | 7 | |a Störgeräusch |0 (DE-588)4343358-3 |D s |
689 | 0 | 8 | |a Nachhall |0 (DE-588)4171018-6 |D s |
689 | 0 | 9 | |a Robustheit |0 (DE-588)4126481-2 |D s |
689 | 0 | |5 DE-604 | |
856 | 4 | 2 | |m DNB Datenaustausch |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027827424&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
943 | 1 | |a oai:aleph.bib-bvb.de:BVB01-027827424 |
Datensatz im Suchindex
DE-BY-TUM_call_number | 0001 DM 32865 |
---|---|
DE-BY-TUM_katkey | 2058466 |
DE-BY-TUM_location | Mag |
DE-BY-TUM_media_number | 040009030819 |
_version_ | 1821931249302765568 |
adam_text | CONTENTS
1 INTRODUCTION 1
1.1 OBJECTIVES 2
1.2 STRUCTURE OF THIS THESIS 5
2 RECOGNITION OF ACOUSTIC SCENES AND EVENTS 7
2.1 ACOUSTIC SCENE CLASSIFICATION 7
2.1.1 INTRODUCTION 8
2.1.2 SYSTEM OVERVIEW 9
2.1.3 FEATURE EXTRACTION 9
2.1.4 WINDOW-BASED CLASSIFICATION 11
2.1.5 LATENT PERCEPTUAL INDEXING 12
2.1.6 EXPERIMENTAL EVALUATION 14
2.1.7 CONCLUSIONS 19
2.2 SUPERVISED LEARNING OF NEW SOUND
EVENTS 21
2.2.1 ACOUSTIC EVENT CLASSIFICATION 22
2.2.2 EXPERIMENTAL EVALUATION 26
2.2.3 CONCLUSIONS 29
2.3 CHAPTER SUMMARY 30
3 ACOUSTIC GAIT-BASED PERSON IDENTIFICATION 31
3.1 INTRODUCTION 31
3.1.1 CONTRIBUTIONS 32
3.1.2 RELATED WORK 33
3.2 THE TUM GAID DATABASE 34
3.3 ACOUSTIC GAIT-BASED
PERSON IDENTIFICATION USING SVM 36
3.3.1 CANDIDATE FEATURES 37
3.3.2 CLASSIFICATION 38
3.3.3 BASELINE RESULTS 38
VII
HTTP://D-NB.INFO/1067206310
CONTENTS
3.3.4 FEATURE ANALYSIS 39
3.3.5 MULTIMODAL FUSION 42
3.3.6 CONCLUSIONS 44
3.4 ACOUSTIC GAIT-BASED
PERSON IDENTIFICATION USING HMMS 44
3.4.1 SYSTEM DESCRIPTION 45
3.4.2 EXPERIMENTAL EVALUATION 47
3.4.3 CONCLUSIONS 49
3.5 CHAPTER SUMMARY 50
4 SPEAKER DIARIZATION 51
4.1 INTRODUCTION 51
4.2 FUNDAMENTALS AND METHODS 53
4.2.1 SPEAKER DIARIZATION METHODS 53
4.2.2 THE DIARIZATION ERROR RATE 55
4.2.3 DATABASES 56
4.2.4 OPEN ISSUES 56
4.3 DETECTION OF OVERLAPPING SPEECH 57
4.3.1 OVERLAPPING SPEECH IN HUMAN CONVERSATIONS 59
4.3.2 RELATED WORK
ON OVERLAP DETECTION AND HANDLING 61
4.3.3 EXPERIMENTAL FRAMEWORK 63
4.3.4 OVERLAP DETECTION USING A SOURCE SEPARATION METHOD 66
4.3.5 AUDIO FEATURES FOR OVERLAP DETECTION 73
4.3.6 OVERLAP DETECTION USING LEXICAL
INFORMATION 78
4.3.7 OVERLAP DETECTION WITH MEMORY-ENHANCED
RECURRENT NEURAL
NETWORKS 85
4.3.8 SUMMARY OF OVERLAP DETECTION RESULTS 90
4.4 OVERLAP HANDLING 92
4.4.1 METHODOLOGY 93
4.4.2 RESULTS AND CONCLUSIONS 93
4.5 ONLINE SPEAKER DIARIZATION 94
4.5.1 METHODOLOGY 96
4.5.2 EXPERIMENTAL EVALUATION 98
4.6 CHAPTER SUMMARY 101
5 ROBUST SPEECH RECOGNITION 103
5.1 INTRODUCTION 103
5.1.1 CONTRIBUTIONS 104
5.1.2 RELATED WORK 105
5.2 LONG SHORT-TERM MEMORY
RECURRENT NEURAL NETWORKS 106
5.3 RECOGNITION IN HIGHLY
NON-STATIONARY NOISE 109
5.3.1 SYSTEM DESCRIPTION 109
5.3.2 THE CHIME CHALLENGE 113
VIII
CONTENTS
5.3.3 EXPERIMENTAL EVALUATION 114
5.3.4 CONCLUSIONS 122
5.4 RECOGNITION IN REVERBERANT ENVIRONMENTS 123
5.4.1 SYSTEM DESCRIPTION 123
5.4.2 THE REVERB CHALLENGE 126
5.4.3 EXPERIMENTAL EVALUATION 127
5.4.4 CONCLUSIONS 129
5.5 CHAPTER SUMMARY 130
6 SUMMARY 131
ACRONYMS 135
MATHEMATICAL SYMBOLS 137
LIST OF FIGURES 143
LIST OF TABLES 145
REFERENCES 147
IX
|
any_adam_object | 1 |
author | Geiger, Jürgen Thomas |
author_facet | Geiger, Jürgen Thomas |
author_role | aut |
author_sort | Geiger, Jürgen Thomas |
author_variant | j t g jt jtg |
building | Verbundindex |
bvnumber | BV042391552 |
classification_tum | DAT 815d |
ctrlnum | (OCoLC)904452916 (DE-599)BVBBV042391552 |
discipline | Informatik |
edition | 1. Aufl. |
format | Thesis Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02504nam a2200577 c 4500</leader><controlfield tag="001">BV042391552</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">00000000000000.0</controlfield><controlfield tag="007">t|</controlfield><controlfield tag="008">150304s2015 xx d||| m||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9783843919869</subfield><subfield code="9">978-3-8439-1986-9</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)904452916</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV042391552</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-91</subfield><subfield code="a">DE-12</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">DAT 815d</subfield><subfield code="2">stub</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Geiger, Jürgen Thomas</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Robust methods for content analysis of auditory scenes</subfield><subfield code="c">Jürgen Thomas Geiger</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">1. Aufl.</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">München</subfield><subfield code="b">Verl. Dr. Hut</subfield><subfield code="c">2015</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">IX, 172 S.</subfield><subfield code="b">graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">Informationstechnik</subfield></datafield><datafield tag="502" ind1=" " ind2=" "><subfield code="a">Zugl.: Müncehn, Techn. Univ., Diss., 2014</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Automatische Identifikation</subfield><subfield code="0">(DE-588)4206098-9</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Robustheit</subfield><subfield code="0">(DE-588)4126481-2</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Geräuschanalyse</subfield><subfield code="0">(DE-588)4324422-1</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Automatische Inhaltsanalyse</subfield><subfield code="0">(DE-588)4265353-8</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Gesprochene Sprache</subfield><subfield code="0">(DE-588)4020717-1</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Automatische Sprechererkennung</subfield><subfield code="0">(DE-588)4143704-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Störgeräusch</subfield><subfield code="0">(DE-588)4343358-3</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Nachhall</subfield><subfield code="0">(DE-588)4171018-6</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Automatische Spracherkennung</subfield><subfield code="0">(DE-588)4003961-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Gehen</subfield><subfield code="0">(DE-588)4140871-8</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="655" ind1=" " ind2="7"><subfield code="0">(DE-588)4113937-9</subfield><subfield code="a">Hochschulschrift</subfield><subfield code="2">gnd-content</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Geräuschanalyse</subfield><subfield code="0">(DE-588)4324422-1</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Gehen</subfield><subfield code="0">(DE-588)4140871-8</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Automatische Sprechererkennung</subfield><subfield code="0">(DE-588)4143704-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="3"><subfield code="a">Automatische Identifikation</subfield><subfield code="0">(DE-588)4206098-9</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="4"><subfield code="a">Automatische Spracherkennung</subfield><subfield code="0">(DE-588)4003961-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="5"><subfield code="a">Gesprochene Sprache</subfield><subfield code="0">(DE-588)4020717-1</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="6"><subfield code="a">Automatische Inhaltsanalyse</subfield><subfield code="0">(DE-588)4265353-8</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="7"><subfield code="a">Störgeräusch</subfield><subfield code="0">(DE-588)4343358-3</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="8"><subfield code="a">Nachhall</subfield><subfield code="0">(DE-588)4171018-6</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="9"><subfield code="a">Robustheit</subfield><subfield code="0">(DE-588)4126481-2</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">DNB Datenaustausch</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027827424&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-027827424</subfield></datafield></record></collection> |
genre | (DE-588)4113937-9 Hochschulschrift gnd-content |
genre_facet | Hochschulschrift |
id | DE-604.BV042391552 |
illustrated | Illustrated |
indexdate | 2024-12-20T17:09:55Z |
institution | BVB |
isbn | 9783843919869 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-027827424 |
oclc_num | 904452916 |
open_access_boolean | |
owner | DE-91 DE-BY-TUM DE-12 |
owner_facet | DE-91 DE-BY-TUM DE-12 |
physical | IX, 172 S. graph. Darst. |
publishDate | 2015 |
publishDateSearch | 2015 |
publishDateSort | 2015 |
publisher | Verl. Dr. Hut |
record_format | marc |
series2 | Informationstechnik |
spellingShingle | Geiger, Jürgen Thomas Robust methods for content analysis of auditory scenes Automatische Identifikation (DE-588)4206098-9 gnd Robustheit (DE-588)4126481-2 gnd Geräuschanalyse (DE-588)4324422-1 gnd Automatische Inhaltsanalyse (DE-588)4265353-8 gnd Gesprochene Sprache (DE-588)4020717-1 gnd Automatische Sprechererkennung (DE-588)4143704-4 gnd Störgeräusch (DE-588)4343358-3 gnd Nachhall (DE-588)4171018-6 gnd Automatische Spracherkennung (DE-588)4003961-4 gnd Gehen (DE-588)4140871-8 gnd |
subject_GND | (DE-588)4206098-9 (DE-588)4126481-2 (DE-588)4324422-1 (DE-588)4265353-8 (DE-588)4020717-1 (DE-588)4143704-4 (DE-588)4343358-3 (DE-588)4171018-6 (DE-588)4003961-4 (DE-588)4140871-8 (DE-588)4113937-9 |
title | Robust methods for content analysis of auditory scenes |
title_auth | Robust methods for content analysis of auditory scenes |
title_exact_search | Robust methods for content analysis of auditory scenes |
title_full | Robust methods for content analysis of auditory scenes Jürgen Thomas Geiger |
title_fullStr | Robust methods for content analysis of auditory scenes Jürgen Thomas Geiger |
title_full_unstemmed | Robust methods for content analysis of auditory scenes Jürgen Thomas Geiger |
title_short | Robust methods for content analysis of auditory scenes |
title_sort | robust methods for content analysis of auditory scenes |
topic | Automatische Identifikation (DE-588)4206098-9 gnd Robustheit (DE-588)4126481-2 gnd Geräuschanalyse (DE-588)4324422-1 gnd Automatische Inhaltsanalyse (DE-588)4265353-8 gnd Gesprochene Sprache (DE-588)4020717-1 gnd Automatische Sprechererkennung (DE-588)4143704-4 gnd Störgeräusch (DE-588)4343358-3 gnd Nachhall (DE-588)4171018-6 gnd Automatische Spracherkennung (DE-588)4003961-4 gnd Gehen (DE-588)4140871-8 gnd |
topic_facet | Automatische Identifikation Robustheit Geräuschanalyse Automatische Inhaltsanalyse Gesprochene Sprache Automatische Sprechererkennung Störgeräusch Nachhall Automatische Spracherkennung Gehen Hochschulschrift |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027827424&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT geigerjurgenthomas robustmethodsforcontentanalysisofauditoryscenes |