web框架中如何高效分页（scroll）

Elasticsearch | 作者 i7990X | 发布于2016年10月25日 | 阅读数：8947

我在django web框架中做全文检索，先获得了所有es.search query的返回值，再用paginator分页。希望做出百度这种效果，点第几页就有第几页的内容，我现在是这样，但是结果量一大，效率就异常低。
不知道怎么使用scan search type and the scroll API来分页，高效获得结果?
能否给一个分页函数的范例..万谢

6 个回复

medcl - 今晚打老虎。

赞同来自: leighton_buaa 、Xargin

scroll是遍历，不能做全文，如果可以按特定条件排序的话，可以试试 search after，是5.0的新特性
https://www.elastic.co/guide/e ... .html

leighton_buaa

    def _scroll_search_by_page(self, index, doc_type, query_dsl, page_num, page_size=incidents_per_page, scroll='1m', routing=None, **kwargs):

        first_page_ret = []

        idx = 0

        if routing is None:

            resp = self.es.search(body=query_dsl, scroll=scroll, index=index, doc_type=doc_type, size=page_size)

        else:

            resp = self.es.search(body=query_dsl, scroll=scroll, index=index, doc_type=doc_type, size=page_size, routing=routing)

        scroll_id = resp.get('_scroll_id')

        if scroll_id is None:

            return

        first_page_ret = resp["hits"]["hits"]

        if page_num == idx:

            return first_page_ret

        while True:

            idx += 1                                                                                                                                            

            resp = self.es.scroll(scroll_id, scroll=scroll)                                                                                                     

            if resp["_shards"]["failed"]:                                                                                                                       

                logger.warning('Scroll request has failed on %d shards out of %d.', resp['_shards']['failed'], resp['_shards']['total'])                        

            scroll_id = resp.get('_scroll_id')                                                                                                                  

            if page_num == idx:                                                                                                                                 

                return resp["hits"]["hits"]                                                                                                                     

            if scroll_id is None or not resp['hits']['hits']:                                                                                                   

                break                                                                                                                                           

        return first_page_ret