API

API

Pandasticsearch: An Elasticsearch client exposing DataFrame API

Elasticsearchonesuper 发表了文章 • 0 个评论 • 248 次浏览 • 2016-11-08 18:02 • 来自相关话题

https://github.com/onesuper/pandasticsearch
 
# Create a DataFrame object
from pandasticsearch import DataFrame
df = DataFrame.from_es('http://localhost:9200', index='people')

# Print the schema(mapping) of the index
df.print_schema()
# company
# |-- employee
# |-- name: {'index': 'not_analyzed', 'type': 'string'}
# |-- age: {'type': 'integer'}
# |-- gender: {'index': 'not_analyzed', 'type': 'string'}

# Inspect the columns
df.columns
#['name', 'age', 'gender']

# Get the column
df.name
# Column('name')

# Filter
df.filter(df.age < 13).collect()
# [Row(age=12,gender='female',name='Alice'), Row(age=11,gender='male',name='Bob')]

# Project
df.filter(df.age < 25).select('name', 'age').collect()
# [Row(age=12,name='Alice'), Row(age=11,name='Bob'), Row(age=13,name='Leo')]

# Print the rows into console
df.filter(df.age < 25).select('name').show(3)
# +------+
# | name |
# +------+
# | Alice|
# | Bob |
# | Leo |
# +------+

# Sort
df.sort(df.age.asc).select('name', 'age').collect()
#[Row(age=11,name='Bob'), Row(age=12,name='Alice'), Row(age=13,name='Leo')]

# Aggregate
df[df.gender == 'male'].agg(df.age.avg).collect()
# [Row(avg(age)=12)]

# Groupby
df.groupby('gender').collect()
# [Row(doc_count=1), Row(doc_count=2)]

# Groupby and then aggregate
df.groupby('gender').agg(df.age.max).collect()
# [Row(doc_count=1, max(age)=12), Row(doc_count=2, max(age)=13)]

# Convert to Pandas object for subsequent analysis
df[df.gender == 'male'].agg(df.age.avg).to_pandas()
# avg(age)
# 0 12 查看全部
https://github.com/onesuper/pandasticsearch
 
# Create a DataFrame object
from pandasticsearch import DataFrame
df = DataFrame.from_es('http://localhost:9200', index='people')

# Print the schema(mapping) of the index
df.print_schema()
# company
# |-- employee
# |-- name: {'index': 'not_analyzed', 'type': 'string'}
# |-- age: {'type': 'integer'}
# |-- gender: {'index': 'not_analyzed', 'type': 'string'}

# Inspect the columns
df.columns
#['name', 'age', 'gender']

# Get the column
df.name
# Column('name')

# Filter
df.filter(df.age < 13).collect()
# [Row(age=12,gender='female',name='Alice'), Row(age=11,gender='male',name='Bob')]

# Project
df.filter(df.age < 25).select('name', 'age').collect()
# [Row(age=12,name='Alice'), Row(age=11,name='Bob'), Row(age=13,name='Leo')]

# Print the rows into console
df.filter(df.age < 25).select('name').show(3)
# +------+
# | name |
# +------+
# | Alice|
# | Bob |
# | Leo |
# +------+

# Sort
df.sort(df.age.asc).select('name', 'age').collect()
#[Row(age=11,name='Bob'), Row(age=12,name='Alice'), Row(age=13,name='Leo')]

# Aggregate
df[df.gender == 'male'].agg(df.age.avg).collect()
# [Row(avg(age)=12)]

# Groupby
df.groupby('gender').collect()
# [Row(doc_count=1), Row(doc_count=2)]

# Groupby and then aggregate
df.groupby('gender').agg(df.age.max).collect()
# [Row(doc_count=1, max(age)=12), Row(doc_count=2, max(age)=13)]

# Convert to Pandas object for subsequent analysis
df[df.gender == 'male'].agg(df.age.avg).to_pandas()
# avg(age)
# 0 12

在ES的搜索中怎么搜索出field事空字符串的的结果 是一个 “ ” 不是null

回复

Elasticsearchedwardyang6936 发起了问题 • 1 人关注 • 0 个回复 • 263 次浏览 • 2016-11-04 17:54 • 来自相关话题

Elasticsearch 5.0.0 Java API 连接问题

Elasticsearch超超大猴子 回复了问题 • 4 人关注 • 1 个回复 • 881 次浏览 • 2016-09-29 15:49 • 来自相关话题

JAVA API 写入ElasticSearch的数据,不带version

回复

Elasticsearchcloud_915 发起了问题 • 1 人关注 • 0 个回复 • 644 次浏览 • 2016-07-27 16:57 • 来自相关话题

如何计算mapping中某两个字段的和,然后根据这个和来排序?

Elasticsearchmartindu 回复了问题 • 2 人关注 • 1 个回复 • 509 次浏览 • 2016-06-02 16:00 • 来自相关话题

请问如何知道是数组里面的哪一行命中了条件?

Elasticsearchmartindu 回复了问题 • 2 人关注 • 1 个回复 • 376 次浏览 • 2016-06-02 15:57 • 来自相关话题

ElasticSearch 如何increment值?

Elasticsearchallen 回复了问题 • 2 人关注 • 1 个回复 • 359 次浏览 • 2016-05-28 13:42 • 来自相关话题

ElasticSearch是2.3的。API 里面有一个should关键字,这个关键字是干嘛的?

Elasticsearchqq123 回复了问题 • 2 人关注 • 1 个回复 • 681 次浏览 • 2016-05-26 10:07 • 来自相关话题

在ES的搜索中怎么搜索出field事空字符串的的结果 是一个 “ ” 不是null

回复

Elasticsearchedwardyang6936 发起了问题 • 1 人关注 • 0 个回复 • 263 次浏览 • 2016-11-04 17:54 • 来自相关话题

Elasticsearch 5.0.0 Java API 连接问题

回复

Elasticsearch超超大猴子 回复了问题 • 4 人关注 • 1 个回复 • 881 次浏览 • 2016-09-29 15:49 • 来自相关话题

JAVA API 写入ElasticSearch的数据,不带version

回复

Elasticsearchcloud_915 发起了问题 • 1 人关注 • 0 个回复 • 644 次浏览 • 2016-07-27 16:57 • 来自相关话题

如何计算mapping中某两个字段的和,然后根据这个和来排序?

回复

Elasticsearchmartindu 回复了问题 • 2 人关注 • 1 个回复 • 509 次浏览 • 2016-06-02 16:00 • 来自相关话题

请问如何知道是数组里面的哪一行命中了条件?

回复

Elasticsearchmartindu 回复了问题 • 2 人关注 • 1 个回复 • 376 次浏览 • 2016-06-02 15:57 • 来自相关话题

ElasticSearch 如何increment值?

回复

Elasticsearchallen 回复了问题 • 2 人关注 • 1 个回复 • 359 次浏览 • 2016-05-28 13:42 • 来自相关话题

ElasticSearch是2.3的。API 里面有一个should关键字,这个关键字是干嘛的?

回复

Elasticsearchqq123 回复了问题 • 2 人关注 • 1 个回复 • 681 次浏览 • 2016-05-26 10:07 • 来自相关话题

Pandasticsearch: An Elasticsearch client exposing DataFrame API

Elasticsearchonesuper 发表了文章 • 0 个评论 • 248 次浏览 • 2016-11-08 18:02 • 来自相关话题

https://github.com/onesuper/pandasticsearch
 
# Create a DataFrame object
from pandasticsearch import DataFrame
df = DataFrame.from_es('http://localhost:9200', index='people')

# Print the schema(mapping) of the index
df.print_schema()
# company
# |-- employee
# |-- name: {'index': 'not_analyzed', 'type': 'string'}
# |-- age: {'type': 'integer'}
# |-- gender: {'index': 'not_analyzed', 'type': 'string'}

# Inspect the columns
df.columns
#['name', 'age', 'gender']

# Get the column
df.name
# Column('name')

# Filter
df.filter(df.age < 13).collect()
# [Row(age=12,gender='female',name='Alice'), Row(age=11,gender='male',name='Bob')]

# Project
df.filter(df.age < 25).select('name', 'age').collect()
# [Row(age=12,name='Alice'), Row(age=11,name='Bob'), Row(age=13,name='Leo')]

# Print the rows into console
df.filter(df.age < 25).select('name').show(3)
# +------+
# | name |
# +------+
# | Alice|
# | Bob |
# | Leo |
# +------+

# Sort
df.sort(df.age.asc).select('name', 'age').collect()
#[Row(age=11,name='Bob'), Row(age=12,name='Alice'), Row(age=13,name='Leo')]

# Aggregate
df[df.gender == 'male'].agg(df.age.avg).collect()
# [Row(avg(age)=12)]

# Groupby
df.groupby('gender').collect()
# [Row(doc_count=1), Row(doc_count=2)]

# Groupby and then aggregate
df.groupby('gender').agg(df.age.max).collect()
# [Row(doc_count=1, max(age)=12), Row(doc_count=2, max(age)=13)]

# Convert to Pandas object for subsequent analysis
df[df.gender == 'male'].agg(df.age.avg).to_pandas()
# avg(age)
# 0 12 查看全部
https://github.com/onesuper/pandasticsearch
 
# Create a DataFrame object
from pandasticsearch import DataFrame
df = DataFrame.from_es('http://localhost:9200', index='people')

# Print the schema(mapping) of the index
df.print_schema()
# company
# |-- employee
# |-- name: {'index': 'not_analyzed', 'type': 'string'}
# |-- age: {'type': 'integer'}
# |-- gender: {'index': 'not_analyzed', 'type': 'string'}

# Inspect the columns
df.columns
#['name', 'age', 'gender']

# Get the column
df.name
# Column('name')

# Filter
df.filter(df.age < 13).collect()
# [Row(age=12,gender='female',name='Alice'), Row(age=11,gender='male',name='Bob')]

# Project
df.filter(df.age < 25).select('name', 'age').collect()
# [Row(age=12,name='Alice'), Row(age=11,name='Bob'), Row(age=13,name='Leo')]

# Print the rows into console
df.filter(df.age < 25).select('name').show(3)
# +------+
# | name |
# +------+
# | Alice|
# | Bob |
# | Leo |
# +------+

# Sort
df.sort(df.age.asc).select('name', 'age').collect()
#[Row(age=11,name='Bob'), Row(age=12,name='Alice'), Row(age=13,name='Leo')]

# Aggregate
df[df.gender == 'male'].agg(df.age.avg).collect()
# [Row(avg(age)=12)]

# Groupby
df.groupby('gender').collect()
# [Row(doc_count=1), Row(doc_count=2)]

# Groupby and then aggregate
df.groupby('gender').agg(df.age.max).collect()
# [Row(doc_count=1, max(age)=12), Row(doc_count=2, max(age)=13)]

# Convert to Pandas object for subsequent analysis
df[df.gender == 'male'].agg(df.age.avg).to_pandas()
# avg(age)
# 0 12