elasticsearch禁用术语频率评分

问题描述:

我想更改elasticsearch中的评分系统以摆脱计数术语的多次出现。例如,我想:elasticsearch禁用术语频率评分

“得克萨斯州得克萨斯州得克萨斯州”

“得克萨斯”

出来的分数相同。我发现这个键盘映射elasticsearch表示将禁用词频统计,但我的搜索不出来的相同比分:

"mappings":{ 
"business": { 
    "properties" : { 
     "name" : { 
      "type" : "string", 
      "index_options" : "docs", 
      "norms" : { "enabled": false}} 
     } 
    } 
} 

}

任何帮助将不胜感激,我一直没能找到很多这方面的信息。

编辑:

我加入我的搜索代码,当我使用的解释得到返回的东西。

我的搜索代码:

Settings settings = ImmutableSettings.settingsBuilder().put("cluster.name", "escluster").build(); 
    Client client = new TransportClient(settings) 
    .addTransportAddress(new InetSocketTransportAddress("127.0.0.1", 9300)); 

    SearchRequest request = Requests.searchRequest("businesses") 
      .source(SearchSourceBuilder.searchSource().query(QueryBuilders.boolQuery() 
      .should(QueryBuilders.matchQuery("name", "Texas") 
      .minimumShouldMatch("1")))).searchType(SearchType.DFS_QUERY_THEN_FETCH); 

    ExplainRequest request2 = client.prepareIndex("businesses", "business") 

,当我解释我搜索得到:

"took" : 14, 
    "timed_out" : false, 
    "_shards" : { 
    "total" : 3, 
    "successful" : 3, 
    "failed" : 0 
    }, 
    "hits" : { 
    "total" : 2, 
    "max_score" : 1.0, 
    "hits" : [ { 
     "_shard" : 1, 
     "_node" : "BTqBPVDET5Kr83r-CYPqfA", 
     "_index" : "businesses", 
     "_type" : "business", 
     "_id" : "AU9U5KBks4zEorv9YI4n", 
     "_score" : 1.0, 
     "_source":{ 
"name" : "texas" 
} 
, 
     "_explanation" : { 
     "value" : 1.0, 
     "description" : "weight(_all:texas in 0) [PerFieldSimilarity], result of:", 
     "details" : [ { 
      "value" : 1.0, 
      "description" : "fieldWeight in 0, product of:", 
      "details" : [ { 
      "value" : 1.0, 
      "description" : "tf(freq=1.0), with freq of:", 
      "details" : [ { 
       "value" : 1.0, 
       "description" : "termFreq=1.0" 
      } ] 
      }, { 
      "value" : 1.0, 
      "description" : "idf(docFreq=2, maxDocs=3)" 
      }, { 
      "value" : 1.0, 
      "description" : "fieldNorm(doc=0)" 
      } ] 
     } ] 
     } 
    }, { 
     "_shard" : 1, 
     "_node" : "BTqBPVDET5Kr83r-CYPqfA", 
     "_index" : "businesses", 
     "_type" : "business", 
     "_id" : "AU9U5K6Ks4zEorv9YI4o", 
     "_score" : 0.8660254, 
     "_source":{ 
"name" : "texas texas texas" 
} 
, 
     "_explanation" : { 
     "value" : 0.8660254, 
     "description" : "weight(_all:texas in 0) [PerFieldSimilarity], result of:", 
     "details" : [ { 
      "value" : 0.8660254, 
      "description" : "fieldWeight in 0, product of:", 
      "details" : [ { 
      "value" : 1.7320508, 
      "description" : "tf(freq=3.0), with freq of:", 
      "details" : [ { 
       "value" : 3.0, 
       "description" : "termFreq=3.0" 
      } ] 
      }, { 
      "value" : 1.0, 
      "description" : "idf(docFreq=2, maxDocs=3)" 
      }, { 
      "value" : 0.5, 
      "description" : "fieldNorm(doc=0)" 
      } ] 
     } ] 
     } 
    } ] 
    } 

看起来它仍在考虑频率和文档频率。有任何想法吗?对不起格式不好,我不知道为什么它显得那么怪异。

编辑编辑:

我从浏览器搜索http://localhost:9200/businesses/business/_search?pretty=true&qname=texas 代码:

{ 
    "took" : 2, 
    "timed_out" : false, 
    "_shards" : { 
    "total" : 3, 
    "successful" : 3, 
    "failed" : 0 
    }, 
    "hits" : { 
    "total" : 4, 
    "max_score" : 1.0, 
    "hits" : [ { 
     "_index" : "businesses", 
     "_type" : "business", 
     "_id" : "AU9YcCKjKvtg8NgyozGK", 
     "_score" : 1.0, 
     "_source":{"business" : { 
"name" : "texas texas texas texas" } 
} 
    }, { 
     "_index" : "businesses", 
     "_type" : "business", 
     "_id" : "AU9YateBKvtg8Ngyoy-p", 
     "_score" : 1.0, 
     "_source":{ 
"name" : "texas" } 

    }, { 
     "_index" : "businesses", 
     "_type" : "business", 
     "_id" : "AU9YavVnKvtg8Ngyoy-4", 
     "_score" : 1.0, 
     "_source":{ 
"name" : "texas texas texas" } 

    }, { 
     "_index" : "businesses", 
     "_type" : "business", 
     "_id" : "AU9Yb7NgKvtg8NgyozFf", 
     "_score" : 1.0, 
     "_source":{"business" : { 
"name" : "texas texas texas" } 
} 
    } ] 
    } 
} 

它发现的所有4个对象我在那里,有他们都以同样的比分。 当我运行我的Java API搜索与解释,我得到:

{ 
    "took" : 2, 
    "timed_out" : false, 
    "_shards" : { 
    "total" : 3, 
    "successful" : 3, 
    "failed" : 0 
    }, 
    "hits" : { 
    "total" : 2, 
    "max_score" : 1.287682, 
    "hits" : [ { 
     "_shard" : 1, 
     "_node" : "BTqBPVDET5Kr83r-CYPqfA", 
     "_index" : "businesses", 
     "_type" : "business", 
     "_id" : "AU9YateBKvtg8Ngyoy-p", 
     "_score" : 1.287682, 
     "_source":{ 
"name" : "texas" } 
, 
     "_explanation" : { 
     "value" : 1.287682, 
     "description" : "weight(name:texas in 0) [PerFieldSimilarity], result of:", 
     "details" : [ { 
      "value" : 1.287682, 
      "description" : "fieldWeight in 0, product of:", 
      "details" : [ { 
      "value" : 1.0, 
      "description" : "tf(freq=1.0), with freq of:", 
      "details" : [ { 
       "value" : 1.0, 
       "description" : "termFreq=1.0" 
      } ] 
      }, { 
      "value" : 1.287682, 
      "description" : "idf(docFreq=2, maxDocs=4)" 
      }, { 
      "value" : 1.0, 
      "description" : "fieldNorm(doc=0)" 
      } ] 
     } ] 
     } 
    }, { 
     "_shard" : 1, 
     "_node" : "BTqBPVDET5Kr83r-CYPqfA", 
     "_index" : "businesses", 
     "_type" : "business", 
     "_id" : "AU9YavVnKvtg8Ngyoy-4", 
     "_score" : 1.1151654, 
     "_source":{ 
"name" : "texas texas texas" } 
, 
     "_explanation" : { 
     "value" : 1.1151654, 
     "description" : "weight(name:texas in 0) [PerFieldSimilarity], result of:", 
     "details" : [ { 
      "value" : 1.1151654, 
      "description" : "fieldWeight in 0, product of:", 
      "details" : [ { 
      "value" : 1.7320508, 
      "description" : "tf(freq=3.0), with freq of:", 
      "details" : [ { 
       "value" : 3.0, 
       "description" : "termFreq=3.0" 
      } ] 
      }, { 
      "value" : 1.287682, 
      "description" : "idf(docFreq=2, maxDocs=4)" 
      }, { 
      "value" : 0.5, 
      "description" : "fieldNorm(doc=0)" 
      } ] 
     } ] 
     } 
    } ] 
    } 
} 
+0

不匹配可能更多的是与'doc frequency'有关,而不是'term frequencyc你在使用[search_type = dfs_query_then_fetch](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-search-type.html#query-then-fetch?q=query_then_fech )。如果这无助于尝试在查询中设置'explain = true'以查看计分故障 – keety

+0

,我将它切换到了dfs_query_then_fetch,但这不起作用。我将发布我的代码并在第二个问题解释结果 – Chadvador

+0

您是否也可以发布查询? – keety

看起来像一个不能覆盖index options了场场后就一直初始集映射

例子:

put test 
put test/business/_mapping 
{ 

     "properties": { 
     "name": { 
      "type": "string", 
      "index_options": "freqs", 
      "norms": { 
       "enabled": false 
      } 
     } 
     } 

} 
put test/business/_mapping 
{ 

     "properties": { 
     "name": { 
      "type": "string", 
      "index_options": "docs", 
      "norms": { 
       "enabled": false 
      } 
     } 
     } 

} 
get test/business/_mapping 

    { 
    "test": { 
     "mappings": { 
     "business": { 
      "properties": { 
       "name": { 
        "type": "string", 
        "norms": { 
        "enabled": false 
        }, 
        "index_options": "freqs" 
       } 
      } 
     } 
     } 
    } 
} 

你将不得不重新创建索引来获取新的映射

+0

嗯,这是尴尬,这是我自己的愚蠢,我正在使用我的浏览器使用命令:http:// localhost:9200/business/_search?pretty = true&explain = true&q = texas进行测试,在我将其更改为“qname = texas”后,它的分数是相同。那么,为什么它不能与我的java API搜索一起工作呢?它似乎就像我正在搜索名称字段一样? – Chadvador

+0

你可以粘贴整个片段或更好的解释在java客户端中设置的响应 – keety

+0

我很抱歉,我不知道如何在javaAPI中设置它,它似乎不是一个SearchRequest的选项。我将用代码更新我的OP。 – Chadvador