ik自定义分词和停用词遇到一个问题, 或者在脚本中如何过滤不想返回的数据呢
回复a1667499668 发起了问题 • 1 人关注 • 0 个回复 • 3703 次浏览 • 2023-04-26 18:16
使用es做搜索,比如用户输入柠檬,搜出来的结果,柠檬汽水,柠檬位牙膏等在前面,真正想要的水果那个柠檬在后面。已经在中文分词中加了柠檬,还是不管用
YuLiGod 回复了问题 • 36 人关注 • 17 个回复 • 17265 次浏览 • 2023-04-24 11:00
es使用老版本命令插入新版本的问题!!!
FFFrp 回复了问题 • 3 人关注 • 2 个回复 • 3335 次浏览 • 2023-04-23 14:51
es中, painless可以把json字符串转为数组或list的吗
Ombres 回复了问题 • 3 人关注 • 2 个回复 • 4713 次浏览 • 2023-04-23 10:51
ES是否可以设置内部做重试?
Charele 回复了问题 • 4 人关注 • 3 个回复 • 4982 次浏览 • 2023-04-22 16:30
ngram分词,and操作搜索不到我理想结果,求大神帮忙看下呢
YuLiGod 回复了问题 • 4 人关注 • 4 个回复 • 1988 次浏览 • 2023-04-14 08:42
Web Scraper + Elasticsearch + Kibana + SearchKit 打造的豆瓣电影top250 搜索演示系统
森 发表了文章 • 0 个评论 • 6149 次浏览 • 2023-04-09 10:56
Web Scraper + Elasticsearch + Kibana + SearchKit 打造的豆瓣电影top250 搜索演示系统
作者:小森同学
声明:电影数据来源于“豆瓣电影”,如有侵权,请联系删除
Web Scraper
json<br /> {<br /> "_id": "top250",<br /> "startUrl": ["<a href="https://movie.douban.com/top250?start=" rel="nofollow" target="_blank">https://movie.douban.com/top250?start=</a>[0-225:25]&filter="],<br /> "selectors": [{<br /> "id": "container",<br /> "multiple": true,<br /> "parentSelectors": ["_root"],<br /> "selector": ".grid_view li",<br /> "type": "SelectorElement"<br /> }, {<br /> "id": "name",<br /> "multiple": false,<br /> "parentSelectors": ["container"],<br /> "regex": "",<br /> "selector": "span.title:nth-of-type(1)",<br /> "type": "SelectorText"<br /> }, {<br /> "id": "number",<br /> "multiple": false,<br /> "parentSelectors": ["container"],<br /> "regex": "",<br /> "selector": "em",<br /> "type": "SelectorText"<br /> }, {<br /> "id": "score",<br /> "multiple": false,<br /> "parentSelectors": ["container"],<br /> "regex": "",<br /> "selector": "span.rating_num",<br /> "type": "SelectorText"<br /> }, {<br /> "id": "review",<br /> "multiple": false,<br /> "parentSelectors": ["container"],<br /> "regex": "",<br /> "selector": "span.inq",<br /> "type": "SelectorText"<br /> }, {<br /> "id": "year",<br /> "multiple": false,<br /> "parentSelectors": ["container"],<br /> "regex": "\\d{4}",<br /> "selector": "p:nth-of-type(1)",<br /> "type": "SelectorText"<br /> }, {<br /> "id": "tour_guide",<br /> "multiple": false,<br /> "parentSelectors": ["container"],<br /> "regex": "^导演: \\S*",<br /> "selector": "p:nth-of-type(1)",<br /> "type": "SelectorText"<br /> }, {<br /> "id": "type",<br /> "multiple": false,<br /> "parentSelectors": ["container"],<br /> "regex": "[^/]+$",<br /> "selector": "p:nth-of-type(1)",<br /> "type": "SelectorText"<br /> }, {<br /> "id": "area",<br /> "multiple": false,<br /> "parentSelectors": ["container"],<br /> "regex": "[^\\/]+(?=\\/[^\\/]*$)",<br /> "selector": "p:nth-of-type(1)",<br /> "type": "SelectorText"<br /> }, {<br /> "id": "detail_link",<br /> "multiple": false,<br /> "parentSelectors": ["container"],<br /> "selector": ".hd a",<br /> "type": "SelectorLink"<br /> }, {<br /> "id": "director",<br /> "multiple": false,<br /> "parentSelectors": ["detail_link"],<br /> "regex": "",<br /> "selector": "span:nth-of-type(1) .attrs a",<br /> "type": "SelectorText"<br /> }, {<br /> "id": "screenwriter",<br /> "multiple": false,<br /> "parentSelectors": ["detail_link"],<br /> "regex": "(?<=编剧: )[\\u4e00-\\u9fa5A-Za-z0-9/()\\·\\s]+(?=主演)",<br /> "selector": "div#info",<br /> "type": "SelectorText"<br /> }, {<br /> "id": "film_length",<br /> "multiple": false,<br /> "parentSelectors": ["detail_link"],<br /> "regex": "\\d+",<br /> "selector": "span[property='v:runtime']",<br /> "type": "SelectorText"<br /> }, {<br /> "id": "IMDb",<br /> "multiple": false,<br /> "parentSelectors": ["detail_link"],<br /> "regex": "(?<=[IMDb:\\s+])\\S*(?=\\d*$)",<br /> "selector": "div#info",<br /> "type": "SelectorText"<br /> }, {<br /> "id": "language",<br /> "multiple": false,<br /> "parentSelectors": ["detail_link"],<br /> "regex": "(?<=语言: )\\S+",<br /> "selector": "div#info",<br /> "type": "SelectorText"<br /> }, {<br /> "id": "alias",<br /> "multiple": false,<br /> "parentSelectors": ["detail_link"],<br /> "regex": "(?<=又名: )[\\u4e00-\\u9fa5A-Za-z0-9/()\\s]+(?=IMDb)",<br /> "selector": "div#info",<br /> "type": "SelectorText"<br /> }, {<br /> "id": "pic",<br /> "multiple": false,<br /> "parentSelectors": ["container"],<br /> "selector": "img",<br /> "type": "SelectorImage"<br /> }]<br /> }<br />
elasticsearch
<br /> {<br /> "mappings": {<br /> "properties": {<br /> "IMDb": {<br /> "type": "keyword",<br /> "copy_to": [<br /> "all"<br /> ]<br /> },<br /> "alias": {<br /> "type": "text",<br /> "fields": {<br /> "keyword": {<br /> "type": "keyword",<br /> "ignore_above": 256<br /> }<br /> },<br /> "copy_to": [<br /> "all"<br /> ],<br /> "analyzer": "ik_max_word",<br /> "search_analyzer": "ik_smart"<br /> },<br /> "all": {<br /> "type": "text",<br /> "analyzer": "ik_max_word",<br /> "search_analyzer": "ik_smart"<br /> },<br /> "area": {<br /> "type": "text",<br /> "fields": {<br /> "keyword": {<br /> "type": "keyword",<br /> "ignore_above": 256<br /> }<br /> },<br /> "copy_to": [<br /> "all"<br /> ],<br /> "analyzer": "ik_max_word",<br /> "search_analyzer": "ik_smart"<br /> },<br /> "director": {<br /> "type": "text",<br /> "fields": {<br /> "keyword": {<br /> "type": "keyword",<br /> "ignore_above": 256<br /> }<br /> },<br /> "copy_to": [<br /> "all"<br /> ],<br /> "analyzer": "ik_max_word",<br /> "search_analyzer": "ik_smart"<br /> },<br /> "film_length": {<br /> "type": "long"<br /> },<br /> "id": {<br /> "type": "keyword"<br /> },<br /> "language": {<br /> "type": "text",<br /> "fields": {<br /> "keyword": {<br /> "type": "keyword",<br /> "ignore_above": 256<br /> }<br /> },<br /> "copy_to": [<br /> "all"<br /> ],<br /> "analyzer": "ik_max_word",<br /> "search_analyzer": "ik_smart"<br /> },<br /> "link": {<br /> "type": "keyword"<br /> },<br /> "name": {<br /> "type": "text",<br /> "fields": {<br /> "keyword": {<br /> "type": "keyword",<br /> "ignore_above": 256<br /> }<br /> },<br /> "copy_to": [<br /> "all"<br /> ],<br /> "analyzer": "ik_max_word",<br /> "search_analyzer": "ik_smart"<br /> },<br /> "number": {<br /> "type": "long"<br /> },<br /> "photo": {<br /> "type": "keyword"<br /> },<br /> "review": {<br /> "type": "text",<br /> "fields": {<br /> "keyword": {<br /> "type": "keyword",<br /> "ignore_above": 256<br /> }<br /> },<br /> "copy_to": [<br /> "all"<br /> ],<br /> "analyzer": "ik_max_word",<br /> "search_analyzer": "ik_smart"<br /> },<br /> "score": {<br /> "type": "double"<br /> },<br /> "screenwriter": {<br /> "type": "text",<br /> "fields": {<br /> "keyword": {<br /> "type": "keyword",<br /> "ignore_above": 256<br /> }<br /> },<br /> "copy_to": [<br /> "all"<br /> ],<br /> "analyzer": "ik_max_word",<br /> "search_analyzer": "ik_smart"<br /> },<br /> "type": {<br /> "type": "text",<br /> "fields": {<br /> "keyword": {<br /> "type": "keyword",<br /> "ignore_above": 256<br /> }<br /> },<br /> "copy_to": [<br /> "all"<br /> ],<br /> "analyzer": "ik_max_word",<br /> "search_analyzer": "ik_smart"<br /> },<br /> "year": {<br /> "type": "long"<br /> }<br /> }<br /> }<br /> }<br />
kibana
需要使用pipeline对索引字段进行处理,如对type 通过空格进行分割为数组等,可以参照官方文档或其他博客。
制作仪表板省略, 请自行搜索
SearchKit
es bulk写入数据时,查询变得很慢
charlesfang 回复了问题 • 2 人关注 • 1 个回复 • 6379 次浏览 • 2023-04-06 11:35
es局部更新文档字段
duanxiaobiao 回复了问题 • 2 人关注 • 1 个回复 • 6450 次浏览 • 2023-04-02 15:39
Elastic7.10.0 restore定期恢复抛 data too large
回复Hyj_simple1 发起了问题 • 1 人关注 • 0 个回复 • 6312 次浏览 • 2023-03-31 14:17
Es怎么实现按多字段去重查询呢?
charlesfang 回复了问题 • 4 人关注 • 3 个回复 • 5728 次浏览 • 2023-03-28 16:26
elasticsearch GC问题
mryu 回复了问题 • 2 人关注 • 1 个回复 • 3506 次浏览 • 2023-03-27 11:41
php elasticsearch 查询 偶尔慢
zhengrukai 回复了问题 • 2 人关注 • 2 个回复 • 1316 次浏览 • 2023-03-23 14:39
elasticsearch regexp怎么匹配特殊字符
zhangcm 回复了问题 • 2 人关注 • 1 个回复 • 3473 次浏览 • 2023-03-23 10:41