The 23rd International Conference on Computational Linguistics (COLING 2010)

http://www.coling-2010.org/index.htm

lne

General

Home

News

Committees

Organization committee

Program committee

Contributors

Call for papers

Call for workshops

Submission guideline

    
Policy

Workshops

Program

Keynotes

Tutorials

Workshops  

Demonstrations

Proceedings
Co-located Events

CLP2010: CIPS-SIGHAN

Participants

Conference venue

Instructions for presentation

    * Oral Presentations

    * Poster and Demo Presentations 

Instructions for session chairs

    * Oral session chairs

    * Poster session chairs

Internet access

Lunch

Welcome Reception

Banquet

AFNLP-CIPS scholarship gainers

Registration

Hotel Reservation

Visa and Travel

Excursion

Sponsors and supporters
Just Before COLING

ICCS2010

About us

History

Contact us

Shortcut
    On-Line Registration 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Organized by

CIPS

 

 

All tutorials will be given on August 22nd, 2010 (Sun):

 
T2
T6
Time
9:00-12:30
14:00-17:30
Room No.
311A
311A

T1: Computational Linguistic Approaches to Language Acquisition
Presenter/Organizer: Shuly Wintner
Department of Computer Science, University of Haifa, Israel

Abstract

  Language acquisition is one of Nature's greatest puzzles. Human languages are extremely complex systems, yet (most) children acquire them naturally, quickly and with little effort. Research in language acquisition attempts to study the mechanisms of this puzzle and to shed light on the very nature of language itself: the primary cognitive capacity which makes us human.
  In recent years, research in psycholinguistics in general and language acquisition in particular has become more aware of state-of-the-art results in computational linguistics. Methodologies and techniques that are regularly used in computational linguistics are employed in psycholinguistics, resulting in insights that shed new light on language acquisition processes. This tutorial will survey some of these recent results, focusing on areas in which computational linguists can contribute to psycholinguistic research. The main goal of the tutorial is to survey the current state of the art, acquaint computational linguists with the kind of problems in psycholinguistics that can benefit from their training and expertise, and identify directions for future cross-disciplinary research.
  Topics will include: a quick survey of language acquisition processes and the main psycholinguistic theories that study them; the use of corpora in language acquisition research, focusing on the CHILDES project, a large multilingual annotated corpus containing transcripts of spoken interactions between children and adults; the emergence of part-of-speech categories; the emergence of grammar; the innateness debate; computational language learning and its relevance for child language acquisition; etc.
  By the end of the tutorial, participants are expected to have a clear view of the problems that are the focus of contemporary research in language acquisition, and a good idea of how computational linguistics can be instrumental in approaching these problems.

Biography

  Shuly Wintner is a senior lecturer at the Department of Computer Science, University of Haifa, Israel. His research spans various areas in computational linguistics, including formal grammars, morphology, syntax, development of resources and machine translation. Recently, he was involved in several projects focusing on language acquisition from a computational perspective. He has published over 60 scientific papers in computational linguistics. He is a regular reviewer for ACL and its chapters, was the program co-chair of EACL-2006 and is the editor-in-chief of the journal Research in Language and Computation. He has an extensive teaching experience, including tutorials at NAACL-2004, MT-Summit 2003 and COLING-2000; four ESSLLI courses; three courses at the International PhD School in Formal Languages and Applications; and two at the Erasmus Mundus Master course in Language and Communication Technology.


T2: Paraphrases and Applications
Presenters/Organizers: Shiqi Zhao and Haifeng Wang
Baidu, China

Abstract

  Paraphrases are various expressions that convey the same meaning. Research of paraphrasing is critical in many related NLP research areas, such as machine translation (MT), question answering (QA), information retrieval (IR), information extraction (IE), natural language generation (NLG), etc.
  This tutorial is intended to provide the attendees with an in-depth look at the identification, generation, application, and evaluation of paraphrases. The tutorial first reviews studies on paraphrase identification (or extraction), which aims to acquire paraphrases from various data sources, such as large-scale web corpora, monolingual parallel corpora, monolingual comparable corpora, bilingual parallel corpora, as well as some other resources.
  It then surveys methods on paraphrase generation, in which the MT-based method will be highlighted, while the other kinds of methods, including thesaurus-based, pattern-based, and NLG-based methods, will also be introduced.
  We then discuss the applications of paraphrases in related research areas, especially in MT. We will show how paraphrases can help to alleviate data sparseness problem, simplify input sentences, tune parameters, and improve automatic evaluation in statistical MT systems.
  The last part of the tutorial is about the evaluation of paraphrases. Till now, no approach has been widely accepted on paraphrase evaluation, which leaves it as an open issue. This tutorial will summarize existing approaches to paraphrase evaluation, which include human evaluation, automatic evaluation, and application-driven evaluation.
  The target audience will be NLP researchers, practitioners, and students. But participants do not need prior knowledge of paraphrasing.

Biography

1.Shiqi Zhao
  Shiqi Zhao is a postdoctoral researcher in Baidu Inc. (www.baidu.com). He received his PhD in computer science from Harbin Institute of Technology in 2010. Shiqi has studied paraphrasing for several years. The research topics include paraphrase acquisition, generation, and applications. He has published more than 10 papers on paraphrasing at several major conferences and journals, including IJCAI-2007, ACL-08: HLT, ACL-IJCNLP 2009, Journal of Natural Language Engineering, etc.
  Shiqi Zhao, Baidu Inc., Baidu Campus, No. 10, Shangdi 10th Street, Beijing, 100085, China. +86-10-59926892, zhaoshiqi@baidu.com, http://ir.hit.edu.cn/~zhaosq/

2.Haifeng Wang
  Haifeng Wang is a senior scientist at Baidu Inc. He is also a visiting professor at Harbin Institute of Technology. He received his PhD in Computer Science from Harbin Institute of Technology in 1999. He was an associate researcher at Microsoft Research China from 1999 to 2000, a research scientist at iSilk.com between 2000 and 2002, and the chief research scientist and deputy director at Toshiba (China) R&D Center till Jan. 2010. He has authored more than 60 scientific papers on natural language processing. He served as area chair, tutorial chair, workshop chair, session chair and PC member at several major conferences, including area co-chair and session chair at ACL-IJCNLP 2009, tutorial co-chair for ACL 2010, workshop co-chair for COLING 2010, etc.
  Haifeng Wang, Baidu Inc., Baidu Campus, No. 10, Shangdi 10th Street, Beijing, 100085, China, +86-10-59928072, wanghaifeng@baidu.com, http://ir.hit.edu.cn/~wanghaifeng/


T3: Computational Neurolinguistics
Presenter/Organizer: Brian Murphy
University of Trento, Italy

Abstract

  Computational neurolinguistics is a newly emerging topic, that uses computational simulations to understand the realisation of language in the brain, and takes advantage of neural data to validate computational models of language. The recent application of machine learning and data-mining to neural recordings (see Haynes & Rees 2006) provides many new avenues for interdisciplinary research. Over the last couple of years published work has decoded conceptual categories from neural activity (Murphy et al 2008), leveraged vector space models to "read people's minds" (Mitchell et al 2008; Murphy et al 2009), and studied semantic composition of compound phrases (Chang et al 2009). Ongoing work in several labs is also investigating the use of such methods for Brain Computer Interaction (BCI) applications, and to tag and parse recordings of the language comprehension process.
  The objective of the workshop will be to give computational linguists a quick overview of neurolinguistic research, and to introduce to them some of the basic biological and technical principles of cognitive neuroscience (neural biology and functional anatomy; neuroimaging techniques and what they detect). Software and methods for applying machine learning methods to neural recordings will be presented, with extensive reference to recent publications, including those presented at two workshops earlier this year.
  Structure:
    Fundamentals of Brain Anatomy and Function
    Principles of Neuro-Imaging Techniques
    Overview of Neurolinguistics Literature
    Recent Work in Computational Neurolinguistics
    Methods and Software for Machine Learning with Neural Data

Biography

  
Brian Murphy, is a post-doctoral fellow at the Centre for Mind/Brain Sciences of the University of Trento, where the Language, Interaction and Computation group use a range of empirical methods (elicited judgements, behavioural data, corpus-based models and brain imaging data) to study conceptual organization in the mind. His main topic of research is the extraction of conceptual representations from neural recordings (EEG, MEG and fMRI), and the comparison of the semantic models yielded with those described by other data. He is the co-organiser of two recent workshops on this topic: the CAOS Workshop on the Neuroscience of Concepts (May 2010, Rovereto, Italy), and the NAACL Workshop on Computational Neurolinguistics (June 2010, Los Angeles).


T4: Text Mining of Biomedical Literature
Presenters/Organizers: Ashish Tendulkar and Martin Krallinger
Indian Institute of Technology (IIT), Madras, India

Abstract

  There is an increasing interest in the development of biomedical text mining applications not only to enable improved literature search, but also to automatically detect pointers between biologically relevant entities described in articles and their corresponding records in existing annotation databases. The rapid growth of natural language data in biomedical sciences (including scientific articles, patents, patient records, database textual descriptions) together with the practical relevance of these resources for the design, interpretation and evaluation of bioinformatics and experimental research resulted in the implementation of a considerable number of new applications. For the development and maintenance of manually annotated database, text mining assisted literature duration has been especially promising, as well as for the construction of gold standard datasets and gene lists in the context of Systems Biology and gene set enrichment. Attempts have been made also to integrate text mining with other bioinformatics data such as sequence, structural and gene expression information. We plan to focus primarily on applications of text mining and issues in building text mining systems. We will begin with gentle introduction to text mining and its application in various Biology and Bioinformatics related domains. Existing resources for building text mining applications will be presented in terms of (1) useful data collections, (2) lexical resources, (3) features of natural language data that can be exploited by text mining systems and (4) data mining and natural language processing systems. Also the main types of currently available text mining applications will be discussed along with issues in evaluating these systems. After the tutorial, the participants should be aware of the importance of the biomedical literature as a central data and information source for biology and bioinformatics. They should be able to understand how existing text mining systems work and on what features they rely. Participants would have an overview of currently available tools and how to construct such an application in practice.

Biography

  Martin Krallinger is currently working at the Structural Biology and Bio computing group of the Spanish National Cancer research Center (CNIO). Previous research stays included the National Center of Biotechnology, CNB-CSIC (Spain) and the Centre of Applied Molecular Engineering (CAME, University of Salzburg Austria). He has a strong research record in biomedical text mining, including numerous highly cited publications in the field and has been a part of several international scientific conference committees. He was one of the main organizers of the BioCreative community challenge and developers of the PLAN2L text mining system.
Ashish Tendulkar is currently working at Indian Institute of Technology (IIT) Madras in the Department of Computer Science and Engg. Prior to joining IIT Madras, he worked in the Structural Biology and Biocomputing group of Prof. Alfonso Valencia at the Spanish National Cancer research Center (CNIO). His research interests include mining biological databases and text to uncover principles of complex biological systems and structural bioinformatics. He is one of the developer on the PLAN2L text mining system.


T5: Multilingual Multidomain Word Sense Disambiguation
Presenters/Organizers: Pushpak Bhattacharyya and Mitesh M. Khapra
Indian Institute of Technology (IIT), Bombay, India

Abstract

  Our tutorial focuses on foundations as well as recent advances in Word Sense disambiguation with special attention to Multilingual, Multidomain WSD. The tutorial has 3 parts:

  1. We begin with an introduction to WSD and then proceed to discuss the fundamental approaches to WSD including:
    1. Knowledge Based approaches such as Lesk's algorithm, Walker's algorithm,   Conceptual Density & PageRank.
    2. Supervised approaches such as Naive Bayes, Decision List, K-NN & SVM.
    3. Unsupervised approaches such as Hyperlex, Yarowsky's algorithm & Lin's  Algorithm.
    4. Semi-supervised approaches such as Decision List.
  2. We then discuss approaches for multilingual WSD which make use of parallel corpora to project sense tags from one language to another. Some very recent work on "Parameter Projection" for WSD using aligned wordnets instead of parallel corpora will be covered. We will describe various parameters essential for WSD and focus on techniques for projecting these parameters learnt from the sense marked corpora of one language to another language using aligned wordnets for the two languages.
  3. Finally, we discuss the important topic of domain adaptation for WSD wherein we cover knowledge based, semi-supervised and unsupervised approaches for target-word as well as all words domain adapted WSD. The discussion on domain adaptation aims at answering some questions arising out of the debate between supervision and unsupervision and highlights the importance of settling for a middle ground between supervision and unsupervision.

Biography

Name: Pushpak Bhattacharyya
Affiliation: Indian Institute of Technology, Bombay
Phone Number: +91 9819708718
E-mail: pb@cse.iitb.ac.in

  Dr. Pushpak Bhattacharyya is a Professor of Computer Science and Engineering at IIT Bombay. He did his B.Tech in Electrical Engineering from IIT Kharagpur (1984), M.Tech in Computer Science and Engineering from IIT Kanpur (1986) and PhD in Computer Science and Engineering from IIT Bombay (1993). He was a visiting research fellow in MIT, USA in 1990, a visiting Professor in Stanford University in 2004 and a visiting Professor in Joseph Fourier University in Grenoble, France in 2005 and 2007.
  Dr. Bhattacharyya works in the area of Artificial Intelligence with focus on Natural Language Processing, Machine Learning and Machine Translation and Information Extraction. He has published close to 150 research articles in leading conferences and journals in these areas and has guided 5 PhD students, about 80 Masters students and about the same number of Bachelors students for their research work.
  Dr. Bhattacharyya has received research grant awards from Microsoft, IBM, AOL and United Nations in different areas of NLP. Hindi Wordnet developed under Prof. Bhattacharyya's guidance- which has the goal of sense disambiguation in Indian languages- has been given numerous awards."

Name: Mitesh M. Khapra
Affiliation: Indian Institute of Technology, Bombay
Phone Number: +91 98330 87970
E-mail: miteshk@cse.iitb.ac.in

  Mitesh is a 2nd year Ph.D scholar working with Professor Pushpak Bhattacharyya at the Computer Science and Engineering Department at Indian Institute of Technology, Bombay. He did his B.Tech in Electronics Engineering from Mumbai University(2002) and M.Tech in Computer Science and Engineering from IIT Bombay (2008). In the period between his B.Tech and M.Tech he worked as a Software Engineer and Senior Software Engineer at Infosys (2 years) and LG Electronics (2 years) respectively.
  His Ph.D. work focuses on circumventing the problem of resource scarcity which hampers the performance of several Natural Language Processing (NLP) tasks. In particular, he is looking at reusing resources across domains and languages for different NLP applications. His current work on WSD focuses on parameter projection which revolves around a novel multilingual dictionary framework based on linkage of synsets and cross linkage of words from within synsets. This makes way for the usage of statistics learnt from sense marked corpora of one language for another language using parameter projection. Apart from this, he is also looking at the problem of domain adaptation for all-words WSD - a problem which is of great interest to the WSD community.


T6: Kernel Engineering for Fast and Easy Design of Natural Language Applications
Presenter/Organizer: Alessandro Moschitti
University of Trento, Italy

Abstract

  Previous work on the use of Machine Learning for Computational Linguistics has shown that most of the design effort is devoted to feature engineering. Indeed, the latter requires expertise, intuition and deep knowledge about the target problem to convert linguistic objects into attribute-value representations. Kernel Methods (KM) are powerful techniques, which can simplify data modeling by defining abstract representations and implicit feature spaces. More in particular, KM allow for: (a) directly using a similarity function between instances in learning algorithms, thus avoiding explicit feature design; and (b) implicitly defining huge feature spaces, e.g. structures can be represented in the substructure space.
  In this tutorial, practical recipes to successfully use KM for target language applications will be presented: first, after an introduction to Support Vector Machines (explained from an application viewpoint), KM theory will be explained in a way that useful practical methods can be derived. Second, basic kernels, such as linear, polynomial, sequence and tree kernels will be presented, by focusing on the implementation, accuracy and efficiency perspectives. KM application to typical natural language tasks, e.g. text categorization, question and answer classification, semantic role labeling, textual entailment and so on, will be shown. The aim is to provide practical procedures for the selection and exploitation of the right kernel for the target task. Third, the SVM-Light-TK toolkit, which encodes several kernels in SVMs, will be illustrated along with the associated data structures and its practical use in NL tasks. Finally, the tutorial will illustrate how innovative and effective kernels can be engineered starting from basic kernels and using systematic data transformation. Such know-how allows for a very fast and accurate design of applications even if the underlying language phenomena and properties are still not very well understood, e.g. Arabic SRL or relation extraction between pairs of text fragments.

Biography

  Alessandro is a professor at the Information Engineering and Computer Science Department of the University of Trento. In 2003, he obtained his PhD in Computer Science at the University of Rome "Tor Vergata" and between 2002 and 2004, he worked as an associate researcher in the University of Texas at Dallas for two years. His expertise concerns machine learning approaches to Natural Language Processing, Information Retrieval and Data Mining. In particular, he has recently devised innovative kernels within Support Vector and other kernel-based machines for advanced syntactic/semantic processing. He is author of more than 100 scientific articles published in the major conferences of different research communities, e.g. ACL, ICML, CIKM and ICDM. He has participated in several projects of the European Community (EC), e.g. LIVING-KNOWLEDGE 2008, PRESTOSPACE 2004, NAMIC 2000 and TREVI 1998 and in two US projects: MTBF 2008 (Con-Edison) and ARDA AQUAINT PROGRAM (IQAS 2002). He is currently project coordinator of the EC project, EternalS.


Tutorial fee 1

Code

Date

Duration

Tutorial Ti tle & Presenter/Organizer


Participants


Student

Before and on July 31, 2010

After
July 31, 2010 and onsite

Before and on July 31, 2010

After
July 31, 2010 and onsite

T1

22 Aug.

Half day, morning

 Computational Linguistic Approaches to Language Acquisition
Presenters/Organizers: Shuly Wintner, Department of Computer Science, University of Haifa, Israel

USD85/
RMB580

USD125/
RMB850

USD65/
RMB440

USD105/
RMB710

T2

22 Aug.

Half day, morning

Paraphrases and Applications
Presenters/Organizers: Shiqi Zhao and Haifeng Wang Baidu, China

USD85/
RMB580

USD125/
RMB850

USD65/
RMB440

USD105/
RMB710

T3

22 Aug.

Half day, morning

Computational Neurolinguistics
Presenter/Organizer: Brian Murphy, University of Trento, Italy

USD85/
RMB580

USD125/
RMB850

USD65/
RMB440

USD105/
RMB710

T4

22 Aug.

Half day, afternoon

Text Mining of Biomedical Literature
Presenters/Organizers: Ashish Tendulkar & Martin Krallinger, Indian Institute of Technology (IIT), Madras, India

USD85/
RMB580

USD125/
RMB850

USD65/
RMB440

USD105/
RMB710

T5

22 Aug.

Half day, afternoon

Multilingual Multidomain Word Sense Disambiguation
Presenters/Organizers: Pushpak Bhattacharyya and Mitesh M. Khapra, Indian Institute of Technology (IIT), Bombay, India

USD85/
RMB580

USD125/
RMB850

USD65/
RMB440

USD105/
RMB710

T6

22 Aug.

Half day, afternoon

Kernel Engineering for Fast and Easy Design of Natural Language Applications
Presenter/Organizer: Alessandro Moschitti University of Trento, Italy

USD85/
RMB580

USD125/
RMB850

USD65/
RMB440

USD105/
RMB710

* Notes:
1. In China, only RMB is used. So, in the payment process, you will find equivalent RMB amount. The exchange rate of registration fee is 6.8.

2. If attendees are fewer than 20 persons. The Tutorial(s) will be cancelled.

3. Full time students MUST provide proof of full time status (copy of valid student ID card or letter from their institution or program director) by fax or e-mail when submitting their registration and payment.

 

 
Temple of Heaven
Forbidden City
Summer Palace
Beijing National Stadium
Beijing National Aquatics Center
National Centre for the Performing Arts
vaxod    babochki