Classification of Web Query Intent Using Encyclopedia 基于百科知识的查询意图获取 Bingquan Liu, Ming Liu, Gang Hu Harbin Institute of Technology Classification of Web Query Intent Using Encyclopedia 基于百科知识的查询意图获取
Outline Meaning Seed term extraction Intent category Experiments results
Meaning Improve performance of retrieve system by searching user’s intent Classical category methods need adequate training corpus, whereas, it’s unavailable in retrieve situation. Classical category methods mostly focus on long-text, contrastingly, query is quite short-text.
Seed term extraction Semantic similarity calculation between words based on HowNet. Lexical construction to indicate text’s topic. Markoff Random Walk to extend seed term set.
intent category Training corpus formed by Baidu Zhidao daily log. Intent category based on SVM classification.
Experiments results Testing corpus crawled from Sogou company. Table 1 Seed terms extraction 意图类别 人工抽取开放分类 种子词条 导航类 门户网站、博客、微博、电子商城、贴吧、论坛、在线…… 17958 人名类 明星、专家、运动员、伟人、现代人物、古代人物…… 366411 下载类 电影、歌曲、小说、软件、故事片、战争片、计算机软件、杀 毒软件、系统工具…… 96700 Table 2 Classification results 意图类别 百度百科 人工标注 P R F 导航类 87.62 76.53 83.58 88.31 75.66 83.65 人名类 89.43 74.69 83.91 91.28 76.25 85.65 下载类 83.37 79.31 81.97 82.94 77.90 80.99