使用 shuf 来打乱一个文件中的行或是选择文件中一个随机的行。

spark读取ES时报scroll错误

Elasticsearch | 作者 EmperorisMe | 发布于2020年02月17日 | 阅读数:2533

相关日志如下:
org.elasticsearch.hadoop.EsHadoopIllegalStateException: Cannot create scroll for query [udb_user/user/_search?sort=_doc&scroll=5m&size=50&preference=_shards%3A1%7C_local&track_total_hits=true/{"query":{"bool":{"must":[{"match_all":{}}],"filter":[{"exists":{"field":"id"}},{"match":{"id":12003}}]}},"_source":["gmt_create","age","personal_signature","id","email","birthday","disabled","school","major","education","mobile","gmt_modified","real_name","employment_status","pwd_need_perfect","province","wechat_nick","head_icon","occupation","account","region","password","weixin","gender","qq","income_range","relevance_mobile","nick","card_no","card_type"]}]

Demo源码如下:
package esspark

import org.apache.spark.sql.{DataFrame, SparkSession}
import org.elasticsearch.spark.sql._

object ESSparkRead extends App {
val spark: SparkSession = SparkSession.builder().appName("ESSpark").master("local[4]")
.config("es.index.auto.create", "true")
.config("es.nodes","192.168.0.11")
.config("es.port","9200")
.config("es.nodes.wan.only","true")
.getOrCreate()

private val df: DataFrame = spark.read.format("es")
.option("pushdown",true)
.load("udb_user/user")
df.show(10)
spark.stop()
}

已邀请:

要回复问题请先登录注册