title:百度张亚勤:ABC时代来了,迎战云计算“马拉松”
创建mapping的时候指定了title字段的分词器为ik,分词结果为
{"tokens": [
{
"token": "百度",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 0
}
,
{
"token": "百",
"start_offset": 0,
"end_offset": 1,
"type": "TYPE_CNUM",
"position": 1
}
,
{
"token": "度",
"start_offset": 1,
"end_offset": 2,
"type": "COUNT",
"position": 2
}
,
{
"token": "张",
"start_offset": 2,
"end_offset": 3,
"type": "CN_CHAR",
"position": 3
}
,
{
"token": "亚",
"start_offset": 3,
"end_offset": 4,
"type": "CN_WORD",
"position": 4
}
,
{
"token": "勤",
"start_offset": 4,
"end_offset": 5,
"type": "CN_WORD",
"position": 5
}
,
{
"token": "abc",
"start_offset": 6,
"end_offset": 9,
"type": "ENGLISH",
"position": 6
}
,
{
"token": "时代",
"start_offset": 9,
"end_offset": 11,
"type": "CN_WORD",
"position": 7
}
,
{
"token": "来了",
"start_offset": 11,
"end_offset": 13,
"type": "CN_WORD",
"position": 8
}
,
{
"token": "迎战",
"start_offset": 14,
"end_offset": 16,
"type": "CN_WORD",
"position": 9
}
,
{
"token": "战云",
"start_offset": 15,
"end_offset": 17,
"type": "CN_WORD",
"position": 10
}
,
{
"token": "云",
"start_offset": 16,
"end_offset": 17,
"type": "CN_WORD",
"position": 11
}
,
{
"token": "计算",
"start_offset": 17,
"end_offset": 19,
"type": "CN_WORD",
"position": 12
}
,
{
"token": "马拉松",
"start_offset": 20,
"end_offset": 23,
"type": "CN_WORD",
"position": 13
}
,
{
"token": "马拉",
"start_offset": 20,
"end_offset": 22,
"type": "CN_WORD",
"position": 14
}
,
{
"token": "松",
"start_offset": 22,
"end_offset": 23,
"type": "CN_WORD",
"position": 15
}
]}
通过张亚勤或者云计算搜不到内容
查询语句为
{
"query": {
"term": {
"title": "张亚勤"
}
}
}
java代码为
response = client.prepareSearch("blog") .setTypes("article") .setQuery(QueryBuilders.termQuery("title", "张亚勤")) .setFrom(0).setSize(60).setExplain(true) .execute() .actionGet();
问题:ik分词分出来的大部分是两个汉字的,三个汉字就匹配不到了,怎么修改ik的配置,或者改一下查询条件,能够搜到相应的结果?
创建mapping的时候指定了title字段的分词器为ik,分词结果为
{"tokens": [
{
"token": "百度",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 0
}
,
{
"token": "百",
"start_offset": 0,
"end_offset": 1,
"type": "TYPE_CNUM",
"position": 1
}
,
{
"token": "度",
"start_offset": 1,
"end_offset": 2,
"type": "COUNT",
"position": 2
}
,
{
"token": "张",
"start_offset": 2,
"end_offset": 3,
"type": "CN_CHAR",
"position": 3
}
,
{
"token": "亚",
"start_offset": 3,
"end_offset": 4,
"type": "CN_WORD",
"position": 4
}
,
{
"token": "勤",
"start_offset": 4,
"end_offset": 5,
"type": "CN_WORD",
"position": 5
}
,
{
"token": "abc",
"start_offset": 6,
"end_offset": 9,
"type": "ENGLISH",
"position": 6
}
,
{
"token": "时代",
"start_offset": 9,
"end_offset": 11,
"type": "CN_WORD",
"position": 7
}
,
{
"token": "来了",
"start_offset": 11,
"end_offset": 13,
"type": "CN_WORD",
"position": 8
}
,
{
"token": "迎战",
"start_offset": 14,
"end_offset": 16,
"type": "CN_WORD",
"position": 9
}
,
{
"token": "战云",
"start_offset": 15,
"end_offset": 17,
"type": "CN_WORD",
"position": 10
}
,
{
"token": "云",
"start_offset": 16,
"end_offset": 17,
"type": "CN_WORD",
"position": 11
}
,
{
"token": "计算",
"start_offset": 17,
"end_offset": 19,
"type": "CN_WORD",
"position": 12
}
,
{
"token": "马拉松",
"start_offset": 20,
"end_offset": 23,
"type": "CN_WORD",
"position": 13
}
,
{
"token": "马拉",
"start_offset": 20,
"end_offset": 22,
"type": "CN_WORD",
"position": 14
}
,
{
"token": "松",
"start_offset": 22,
"end_offset": 23,
"type": "CN_WORD",
"position": 15
}
]}
通过张亚勤或者云计算搜不到内容
查询语句为
{
"query": {
"term": {
"title": "张亚勤"
}
}
}
java代码为
response = client.prepareSearch("blog") .setTypes("article") .setQuery(QueryBuilders.termQuery("title", "张亚勤")) .setFrom(0).setSize(60).setExplain(true) .execute() .actionGet();
问题:ik分词分出来的大部分是两个汉字的,三个汉字就匹配不到了,怎么修改ik的配置,或者改一下查询条件,能够搜到相应的结果?
1 个回复
ybtsdst - focus on lucene & es
赞同来自:
2. term换成match
3. 把张亚勤或者云计算这些词补充到ik词库中