排除预定义单词后,如何 DSL 查询以搜索带有星号的查询字符串

分享于2022年10月08日 dsl elasticsearch 问答
【问题标题】:How to DSL query to search query string with asterisk after excluding predefined words排除预定义单词后,如何 DSL 查询以搜索带有星号的查询字符串
【发布时间】:2022-08-01 13:01:04
【问题描述】:

我的 stop.txt 拥有 messi

设置如下

{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "synonym_en": {
            "type": "synonym",
            "synonyms_path": "synom.txt"
          },
          "english_stop": {
            "type": "stop",
            "stopwords_path": "stop.txt"
          }
        },
        "analyzer": {
          "english_analyzer": {
            "tokenizer": "standard",
            "filter": ["english_stop", "synonym_en"]
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "english_analyzer"
      }
    }
  }
}

我的字典在下面

[
  { "id": 0, "name": "Messiis player" },
  { "id": 1, "name": "Messi player" },
  { "id": 2, "name": "Messi and Rono player" },
  { "id": 3, "name": "Rono and Messi player" },
  { "id": 4, "name": "messiis and Messi player" }
]

DSL查询如下

{
  "query": {
    "bool": {
      "must": {
        "query_string": {
          "query": "messi*",
          "fields": ["name^128"]
        }
      }
    }
  }
}

我的输出低于获取完整文档

{
  "took": 3,
  "timed_out": false,
  "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 },
  "hits": {
    "total": { "value": 5, "relation": "eq" },
    "max_score": 128.0,
    "hits": [
      {
        "_index": "player",
        "_type": "_doc",
        "_id": "0",
        "_score": 128.0,
        "_source": { "id": 0, "name": "Messiis player" }
      },
      {
        "_index": "player",
        "_type": "_doc",
        "_id": "1",
        "_score": 128.0,
        "_source": { "id": 1, "name": "Messi player" }
      },
      {
        "_index": "player",
        "_type": "_doc",
        "_id": "2",
        "_score": 128.0,
        "_source": { "id": 2, "name": "Messi and Rono player" }
      },
      {
        "_index": "player",
        "_type": "_doc",
        "_id": "3",
        "_score": 128.0,
        "_source": { "id": 3, "name": "Rono and Messi player" }
      },
      {
        "_index": "player",
        "_type": "_doc",
        "_id": "4",
        "_score": 128.0,
        "_source": { "id": 4, "name": "messiis and Messi player" }
      }
    ]
  }
}
  • 我的查询有 *

  • 如果我正在搜索 "query": "messi*", ,我将得到输出 {'id': 4, 'name': 'messiis and Messi player'}

  • 如果我正在搜索 "query": "messi*", 我需要如下预期

  • 如果我也在搜索 "query": "Messi*", ,我需要如下预期(基本上大小写必须不敏感)

  • 没有得到错误发生在哪里

{
  "took": 8,
  "timed_out": false,
  "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 },
  "hits": {
    "total": { "value": 2, "relation": "eq" },
    "max_score": 128.0,
    "hits": [
      {
        "_index": "player",
        "_type": "_doc",
        "_id": "0",
        "_score": 128.0,
        "_source": { "id": 0, "name": "Messiis player" }
      },
      {
        "_index": "player",
        "_type": "_doc",
        "_id": "4",
        "_score": 128.0,
        "_source": { "id": 4, "name": "messiis and Messi player" }
      }
    ]
  }
}


【解决方案1】:

问题是您的 stop.txt 文件可能包含小写的 messi 而您的 english_analyzer 不会小写您的标记。

所以你有两个选择:

A. 你可以在你的 stop.txt 文件中添加 Messi

B.您可以添加 lowercase token filter

        "analyzer": {
          "english_analyzer": {
            "tokenizer": "standard",
            "filter": ["lowercase", "english_stop", "synonym_en"]
                            ^
                            |
                        add this
          }
        }
      

然后它将工作并删除所有 messi 令牌(无论如何)

【讨论】:

  • 很高兴它成功了!这里还需要什么吗?
  • 这会将搜索到的大小写更改为不敏感
  • 就是这样,因为所有标记在索引时都是小写的, query_string 默认使用标准分析器,该分析器也将输入小写,因此比较不区分大小写,这正是您所需要的。你试过了吗?
  • 它的工作,只是想知道这个概念,如果你也可以分享弹性链接这将是有帮助的
  • 我已经添加了链接