不要急,总有办法的

nested类型中包含一个dense_vector的字段,通过cosineSimilarity计算相似度,提示类转换异常

Elasticsearch | 作者 cxycxy | 发布于2023年03月16日 | 阅读数:3339

软件版本;8.4.3
运行环境;centos7

数据结构为 一条视频信息,包含多个人物信息,每个人物信息包含一个人脸特征向量
期望结果为:输入一个特征数组,和nested数组中的多个人脸特征循环对比,进行余弦相速度计算,相似度最高的为文档得分。

以下文档结构,
{
"mappings": {
"properties": {
"videoId": {
"type": "keyword"
},
"videoPersonList": {
"type": "nested",
"properties": {
"personName": {
"type": "keyword"
},
"picId": {
"type": "keyword"
},
"featureVector": {
"type": "dense_vector",
"dims": 512
}
}
}
}
}
}

我的实现方式
{
  "query": {
    "function_score": {
      "query": {
        "match_all": {}
      },
      "script_score": {
        "script": {
          "params": {
            "queryVector": []
          },
          "source": "def maxFeatureSource=0.0;for (def personItem : params['_source']['videoPersonList']) {double tmp = cosineSimilarity(params.queryVector,personItem['featureVector']);if(maxFeatureSource<tmp){maxFeatureSource=tmp;}}return maxFeatureSource;"
        }
      }
    }
  }
}
报错
{
"error": {
"root_cause": [
{
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"java.base/java.lang.Class.cast(Class.java:3921)",
"tmp = cosineSimilarity(params.queryVector,personItem['featureVector']);",
" ^---- HERE"
],
"script": "def maxFeatureSource=0.0;for (def personItem : params['_source']['videoPersonList']) {double tmp = cosineSimilarity(params.queryVector,personItem['featureVector']);if(maxFeatureSource<tmp){maxFeatureSource=tmp;}}return maxFeatureSource;",
"lang": "painless",
"position": {
"offset": 145,
"start": 93,
"end": 164
}
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "video",
"node": "i1a9H3dRRhuAMPfnr7dgzA",
"reason": {
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"java.base/java.lang.Class.cast(Class.java:3921)",
"tmp = cosineSimilarity(params.queryVector,personItem['featureVector']);",
" ^---- HERE"
],
"script": "def maxFeatureSource=0.0;for (def personItem : params['_source']['videoPersonList']) {double tmp = cosineSimilarity(params.queryVector,personItem['featureVector']);if(maxFeatureSource<tmp){maxFeatureSource=tmp;}}return maxFeatureSource;",
"lang": "painless",
"position": {
"offset": 145,
"start": 93,
"end": 164
},
"caused_by": {
"type": "class_cast_exception",
"reason": "Cannot cast java.util.ArrayList to java.lang.String"
}
}
}
]
},
"status": 400
}

这种方式会报错,正确的实现方式应该是怎样的?
 
已邀请:

cxycxy

赞同来自:

已修改。
通过script中重写一个余弦相似度的方法,不使用es提供的默认方法
 
{
    "query": {
        "function_score": {
            "query": {
                "match_all": {}
            },
            "script_score": {
                "script": {
                    "params": {
                        "queryVector": []
                    },
                    "source": "def maxFeatureSource = 0.0;for (def personItem : params['_source']['videoPersonList']) {double dotProduct = 0.0;double normA = 0.0;double normB = 0.0;for (int i = 0; i < params.queryVector.size(); i++) {dotProduct += params.queryVector.get(i) * personItem['featureVector'].get(i);normA += Math.pow(params.queryVector.get(i), 2);normB += Math.pow(personItem['featureVector'].get(i), 2);}double tmp = dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));if (maxFeatureSource < tmp) {maxFeatureSource = tmp;}}return maxFeatureSource;"
                }
            }
        }
    }
}

mryu

赞同来自:

你用nested包一层类似:
 
{
  "query": {
    "nested": {
      "path": "content_embeddings",
      "score_mode": "max",
      "query": {
        "script_score": {
          "query": {
            "match_all": {}
          },
          "script": {
            "source": "cosineSimilarity(params.query_vector, 'content_embeddings.vector') + 1.0",
            "params": {
              "query_vector": [
                1,
                1,
                1
              ]
            }
          }
        }
      }
    }
  },
  "size": 10,
  "sort": [
    {
      "_score": {
        "order": "desc"
      }
    }
  ]
}

要回复问题请先登录注册