=============================================================== 第7回人工知能学会AIチャレンジ研究会 (SIG-Challenge) 開催案内 =============================================================== テーマ:「CASA(音環境理解、聴覚情景分析)の新展開」 (Computational Auditory Scene Analysis) 日時:1999年11月2日(火)9時〜17時30分 場所:青山学院大学 総合研究所 会議室(東京都渋谷区) 青山学院大学正門脇 地下鉄表参道 徒歩7分、渋谷駅徒歩15分 概要:1990 年の Bregman 先生の著書『Auditory Scene Analysis』 (MIT Press) による問題提起が契機となって活発化した CASAの研究は、この10年間における聴覚に関する知見の 蓄積やマルチメディア社会への転換を背景として、更に発展 しつつあります。第7回AIチャレンジ研究会では、 CASAの今後の展開を占うために様々な観点からの8件の 研究発表が行われます。また、Bregman 先生による基調講演も 予定されています。 皆様におかれましても奮ってご参加下さいますよう、お願いい たします。 なお, Bregman 先生の基調講演は, 音響学会共催の特別講演となります. タイトルと講演概要は下記に添付しました. 照会先(照会はE-mailでお願いします): 東京理科大学 & 科学技術振興事業団 北野共生システムプロジェクト 奥乃 博 E-mail: okuno@nue.org 参加費: 無料 資料代: 2, 000円 (必要な方のみ) ================================================== ======================================================================= Meeting of JSAI Special Interest Group on AI Challenges (SIG-Challenge) ======================================================================= Theme: ``Computational Auditory Scene Analysis (CASA)'' 09:00-10:15 Opening Address and Keynote Address 09:00-09:15 Opening Address Hiroshi G. Okuno (JST/Science Univ. of Tokyo) 09:15-10:15 Keynote Address: Auditory Scene Analysis by Humans and by Computers Albert S. Bregman (Department of Psychology, McGill University) -- break -- 10:30-12:00 Session 1 Vowel Segregation in Background Noise using the Model of Segregating Two Acoustic Sources Masashi Unoki (ATR HIP/JAIST) and Masato Akagi (JAIST) A Method of Blind Separation for Convolved Speech Signals Mitsuru Kawamoto (RIKEN), Kiyotoshi Matsuoka (Kyushu Inst. Tech.), Noboru Ohnishi (Nagoya Univ.) Blind Signal Separation Using Directivity Pattern Satoshi Kurita*, Hiroshi Saruwatari*, Shoji Kajita**, Kazuya Takeda* and Fumitada Itakura** * Graduate School of Engineering, Nagoya University ** Center for Information Media Studies, Nagoya University -- lunch -- 13:15-14:45 Session 2 Search for Auditorily Meaningful Parts using STRAIGHT Parham Zolfaghari (CREST/ATR-HIP) and Hideki Kawahara (Wakayama Univ./CREST/ATR-HIP) An Auditory Strategy for Separating Size and Shape Information of Sound Sources Toshio Irino (ATR HIP) and Roy D. Patterson (CNBH, Cambridge Univ.) Speech Recognition Based on Space Diversity Taking Room Acoustics into Account Yasuhiro Shimizu*, Shoji Kajita**, Kazuya Takeda* and Fumitada Itakura** * Graduate School of Engineering, Nagoya University ** Center for Information Media Studies, Nagoya University -- break -- 15:00-16:30 Session 3 Music Scene Description: A Predominant-F0 Estimation Method for Detecting Melody and Bass Lines Masataka Goto (Electrotechnical Laboratory) A Method of Peak Extraction and Its Evaluation for Humanoid Kazuhiro Nakadai (JST ERATO), Hiroshi G. Okuno (JST/Science Univ. of Tokyo), and Hiroaki Kitano (JST/Sony CSL) Research Issues of Humanoid Audition Hiroshi G. Okuno (JST/Science Univ. of Tokyo), Kazuhiro Nakadai (JST), and Hiroaki Kitano (JST/Sony CSL) 16:45-17:30 Discussion and Closing Address Moderated Discussions: Is CASA-related Research Needed for Engineering and Psychology? Moderator: Kunio Kashino (NTT Communication Science Laboratories) Leading Discussant: Albert S. Bregman (Department of Psychology, McGill University) Closing Address Hiroshi G. Okuno (JST/Science Univ. of Tokyo) ================================================== TITLE: Auditory Scene Analysis by humans and by computers. ABSTRACT: The paper will present data collected on human auditory scene analysis (ASA)-- the process of organizing the auditory input derived from a mixture of sounds into representations of the individual acoustic sources that contributed to the mixture. Then recent attempts to achieve ASA by computers -- computational auditory scene analysis (CASA)-- will be discussed in the light of human data. The data include the major known cues for the parsing of auditory sense data, and more importantly, the properties of the ASA system. The behavior of this system can be described by a number of statements: (1) Cues compete with and support one another, probably in non-additive ways, in determining the parsing of a signal into streams; (2) Constraints on the parsing may be propagated from one part of the frequency-by-time field to other parts; (3) Grouping tendencies may be subject to consistency requirements; (4) Biases toward finding a stream of sound with certain properties may build up with the accumulation of evidence and then dissipate over time; (5) Sudden changes in the acoustic input play a central role in the allocation of computational resources to parts of the signal; (6) Bottom-up processes of a fairly primitive form interact with top- down processes that involve complex schemas, making it necessary for a model to provide an interface between the two. Acoustic demonstrations of these principles will be played and discussed. --------------------------------------------------