36x48 vertical poster template - PDF

Document Sample
36x48 vertical poster template - PDF Powered By Docstoc
THUNLP Leader: Prof. Maosong Sun
                                             Professor Sun      is the Chairman of Department of Computer Science and Technology, Tsinghua University. His research
                                             interests are computational linguistics, statistical and corpus-based natural language processing (NLP), including: Chinese
                                             language computing (computational morphology, bilingual terminology extraction), information retrieval (Chinese text
                                             categorization, graphical model based keyword extraction), collective intelligence (tag generation, Web trend analysis) and
                                             social computing (query log analysis, community discovery). He has participated as project leader or principal researcher in
                                             over 20 projects founded by the National Natural Science Foundation of China, the National Social Science Foundation of
                                             China, the 863 National High-Tech R&D Program of China, the 973 National Basic Research Program of China as well as in
                                             projects funded by a number of international IT companies. Professor Sun has published, together with his students, about 130
                                             papers in academic journals and international conferences in the above fields. The total number of citations of these papers in
                                             Google Scholar is roughly 1,400. He has served as program committee members in numerous national and international
                                             conferences, and many times as conference chairs or program committee chairs.

Recently, Professor Sun presented a point of view in NLP: NLP based on huge-scale naturally annotated corpora. The basic idea is with Web-scale corpora, natural
annotation may help machine better perform some NLP tasks. There are two types of natural annotation: explicit, as punctuations, anchor text, query log, Wikipedia, blog
tags, and implicit, as language usage patterns. He further puts forward a fundamental problem: if we could integrate all the information provided by naturally annotated
corpora from different perspectives together in a systematic way, can we achieve some degree of deep understanding of languages for machine? A preliminary work by him
and his student in Computational Linguistics in 2009 showed the usefulness of punctuations in Chinese word segmentation, suggesting this idea deserves further study.

He is the Vice President of Chinese Information Processing Society, the council member of China Computer Federation, the council member of Chinese Association for
Artificial Intelligence, the officer of ACL SIGHAN, the member-at-large of ACM China Council, the vice chairman of Expert Committee of Language Commission of Beijing
Municipal Government, the member of Expert Committee of National Language Resource Surveillance and Research Center, the Editor-in-chief of the Journal of Chinese
Information Processing, the Editorial Board members of many journals including the Communications of CCF, the Journal of Computer Science and Technology, the Journal
of Chinese Language and Computing, Applied Linguistics and Nankai Linguistics.

                                             The Natural    Language Processing Group at the Department of Computer Science and Technology, Tsinghua University
                                             (THUNLP), also a part of the National Lab for Information Science and Technology and the State Key Lab of Intelligent
                                             Technology and Systems, is working on methodologies and algorithms for computer processing and understanding of human
                                             languages with emphasis on Chinese. We focus on basic research in language computation as well as the application-
                                             oriented NLP technologies. We have published a number of papers in the related top conferences and journals such as ACL,
                                             COLING, EMNLP, IJCAI, VLDB, Computational Linguistics, Journal of Quantitative Linguistics, IEEE Intelligent Systems in
                                             recent years.

Research Interests                           Our research covers a range of topics in natural language processing, including:

                                              NLP based on Huge-scale Naturally Annotated                         Social Tagging and Keyword Extraction
                                               Corpora                                                                Tag disambiguation
                                                 Word segmentation using punctuations in huge-scale web              Tag suggestions using topic models
                                                 articles                                                             Tag suggestions via Latent Reason Identification
                                                 New word detection and related word retrieval from user             Exploring subsumption relations in social tags
                                                 logs of Chinese input method
                                                                                                                      Keyword extraction by clustering to find exemplar
                                                 Chinese abbreviation extraction from anchor texts in web            terms
                                                                                                                      Keyword extraction via topic decomposition
                                                 New word detection from user logs of search engine

                                                             Multilingual Analysis                                  Text Classification
                                                               Fast and      robust   sentence    alignment           Feature selection for Chinese text classification
                                                               algorithm                                               Scalable term selection for text classification
                                                               Bilingual terminology extraction system                Efficient text classification using term projection
                                                               Statistical method for Uyghur tokenization             Transfer learning and self training          for text
                                                               Uyghur morpheme analysis                                classification
                                                               "Female Script" pinyin input method                    Text classification-based image classification

                                                                   Natural Language Processing Group
                                                                           Tsinghua University
                                                            State Key Lab of Intelligent Technology & Systems

Shared By: