文本数据挖掘（英文版）_宗成庆、夏睿、张家俊_9787302590293

《Text data mining》 offers thorough and detailed introduction to the fundamental theories and methods of text data mining, ranging from pre-processing (for both Chinese and English texts), text representation, feature selection, to text classification and text clustering. Also it presents predominant applications of text data mining, for example, topic model, sentiment analysis and opinion mining, topic detection and tracking, information extraction, and text automatic summarization, etc.

Preface
With the rapid development and popularization of Internet and mobile communi- cation technologies, text data mining has attracted much attention. In particular, with the wide use of new technologies such as cloud computing, big data, and deep learning, text mining has begun playing an increasingly important role in many application ?elds, such as opinion mining and medical and ?nancial data analysis, showing broad application prospects.
Although I was supervising graduate students studying text classi?cation and automatic summarization more than ten years ago, I did not have a clear understand- ing of the overall concept of text data mining and only regarded the research topics as speci?c applications of natural language processing. Professor Jiawei Hans book Data Mining: Concepts and Technology, published by Elsevier, Professor Bing Lius Web Data Mining, published by Springer, and other books have greatly bene?ted me. Every time I listen to their talks and discuss these topics with them face to face, I have bene?ted immensely. I was inspired to write this book for the course Text Data Mining, which I was invited to teach to graduates of the University of Chinese Academy of Sciences. At the end of 2015, I accepted the invitation and began to prepare the content design and selection of materials for the course. I had to study a large number of related papers, books, and other materials and began to seriously think of the rich connotation and extension of the term Text Data Mining. After more than a years study, I started to compile the courseware. With teaching practice, the outline of the concept has gradually formed.
　Rui Xia and Jiajun Zhang, two talented young people, helped me materialize my original writing plan. Rui Xia received his masters degree in 2007 and was admitted to the Institute of Automation, Chinese Academy of Sciences, and studied for Ph.D. degree under my supervision. He was engaged in sentiment classi?cation and took it as the research topic of his Ph.D. dissertation. After he received his Ph.D. degree in 2011, his interests extended to opinion mining, text clustering and classi?cation, topic modeling, event detection and tracking, and other related topics. He has published a series of in?uential papers in the ?eld of sentiment analysis and opinion mining. He received the ACL 2019 outstanding paper award, and his paper on ensemble learning for sentiment classi?cation has been cited more than

600 times. Jiajun Zhang joined our institute after he graduated from university in 2006 and studied in my group in pursuit of his Ph.D. degree. He mainly engaged in machine translation research, but he performed well in many research topics, such as multilanguage automatic summarization, information extraction, and human computer dialogue systems. Since 2016, he has been teaching some parts of the course on Natural Language Processing in cooperation with me, such as machine translation, automatic summarization, and text classi?cation, at the University of Chinese Academy of Sciences; this course is very popular with students. With the solid theoretical foundation of these two talents and their keen scienti?c insights, I am grati?ed that many cutting-edge technical methods and research results could be veri?ed and practiced and included in this book.
From early 2016 to June 2019, when the Chinese version of this book was published, it took more than three years. In these three years, most holidays, weekends, and other spare times of ours were devoted to the writing of this book. It was really suffering to endure the numerous modi?cations or even rewriting, but we were also very happy. We started to translate the Chinese version into English in the second half of 2019. Some more recent topics, including BERT (bidirectional encoder representations from transformers), have been added to the English version. As a cross domain of natural language processing and machine learning, text data mining faces the double challenges of the two domains and has broad application to the Internet and equipment for mobile communication. The topics and techniques presented in this book are all the technical foundations needed to develop such practical systems and have attracted much attention in recent years. It is hoped that this book will provide a comprehensive understanding for students, professors, and researchers in related areas. However, I must admit that due to the limitation of the authors ability and breadth of knowledge, as well as the lack of time and energy, there must be some omissions or mistakes in the book. We will be very grateful if
readers provide criticism, corrections, and any suggestions.

Beijing, China Chengqing Zong
20 May 2020

你还可能感兴趣

我要评论