ADSL拨号VPS包含了中国大陆（联通，移动，电信，）

中国香港，国外拨号VPS。

当前位置：云主机 > python >

电信ADSL拨号VPS

上饶电信拨号VPS

洛阳电信拨号VPS

威海电信拨号VPS

济南电信拨号VPS

九江电信拨号VPS

厦门电信拨号VPS

邢台电信拨号VPS

湖州电信拨号VPS

绍兴电信拨号VPS

宁波电信拨号VPS

温州电信拨号VPS

杭州电信拨号VPS

郑州电信拨号VPS

铜陵电信拨号VPS

池州电信拨号VPS

黄山电信拨号VPS

十堰电信拨号VPS

荆门电信拨号VPS

莆田电信拨号VPS

三明电信拨号VPS

永州电信拨号VPS

张家界电信拨号VPS

常德电信拨号VPS

昆明电信拨号VPS

丽江电信拨号VPS

马鞍山电信拨号VPS

宣城电信拨号VPS

合肥电信拨号VPS

淮北电信拨号VPS

泰州电信拨号VPS

南通电信拨号VPS

南京电信拨号VPS

扬州电信拨号VPS

宿迁电信拨号VPS

镇江电信拨号VPS

苏州电信拨号VPS

淮安电信拨号VPS

盐城电信拨号VPS

包头电信拨号VPS

海口电信拨号VPS

江门电信拨号VPS

眉山电信拨号VPS

德阳电信拨号VPS

衢州电信拨号VPS

上海电信拨号VPS

桂林电信拨号VPS

成都电信拨号VPS

鞍山电信拨号VPS

福州电信拨号VPS

柳州电信拨号VPS

无锡电信拨号VPS

乌兰察布电信拨号VPS

深圳电信拨号VPS

河源电信拨号VPS

秦皇岛电信拨号VPS

徐州电信拨号VPS

台州电信拨号VPS

芜湖电信拨号VPS

蚌埠电信拨号VPS

潮州电信拨号VPS

重庆电信拨号VPS

连云港电信拨号VPS

绵阳电信拨号VPS

泰安电信拨号VPS

晋城电信拨号VPS

广州电信拨号VPS

联通ADSL拨号VPS

北京联通拨号VPS

滨州联通拨号VPS

莱芜联通拨号VPS

鞍山联通拨号VPS

连云港联通拨号VPS

海口联通拨号VPS

徐州联通拨号VPS

重庆联通拨号VPS

上海联通拨号VPS

西昌联通拨号VPS

南充联通拨号VPS

枣庄联通拨号VPS

抚顺联通拨号VPS

唐山联通拨号VPS

保定联通拨号VPS

廊坊联通拨号VPS

武汉联通拨号VPS

泰安联通拨号VPS

雅安联通拨号VPS

盘锦联通拨号VPS

泰州联通拨号VPS

移动ADSL拨号VPS

盐城移动拨号VPS

莱芜移动拨号VPS

Django中使用Whoosh进行全文检索的方法

时间:2022-04-02 10:26 作者:admin

Whoosh 是纯python/' target='_blank'>python实现的全文搜索引擎，通过Whoosh可以很方便的给文档加上全文索引功能。

什么是全文检索

简单讲分为两块，一块是分词，一块是搜索。比如下面一段话：

上次舞蹈演出直接在上海路的弄堂里

比如我们现在想检索上次的演出，通常我们会直接搜索关键词：上次演出，但是使用传统的SQL like 查询并不能命中上面的这段话，因为在上次和演出中间还有舞蹈。然而全文搜索却将上文切成一个个Token，类似：

上次/舞蹈/演出/直接/在/上海路/的/弄堂/里

切分成Token后做反向索引(inverted indexing)，这样我们就可以通过关键字很快查询到了结果了。

解决分词问题

分词是个很有技术难度的活，比如上面的语句中一个难点就是到底是上海路还是上海呢？Python有个中文分词库：结巴分词，我们可以通过结巴分词来完成索引中分词工作，结巴分词提供了Whoosh的组件可以直接集成，代码示例

遇到的问题

如果是在一些VPS上测试的时候非常慢的话可能是内存不足，比如512MB做一个博客索引非常慢，尝试升级到1GB后可以正常使用了。

代码

import loggingimport osimport shutilfrom django.conf import settingsfrom whoosh.fields import Schema, ID, TEXT, NUMERICfrom whoosh.index import create_in, open_dirfrom whoosh.qparser import MultifieldParserfrom jieba.analyse import ChineseAnalyzerfrom .models import Articlelog = logging.getLogger(__name__)index_dir = os.path.join(settings.BASE_DIR, "whoosh_index")indexer = open_dir(index_dir)def articles_search(keyword):  mp = MultifieldParser(    ['content', 'title'], schema=indexer.schema, fieldboosts={'title': 5.0})  query = mp.parse(keyword)  with indexer.searcher() as searcher:    results = searcher.search(query, limit=15)    articles = []    for hit in results:      log.debug(hit)      articles.append({        'id': hit['id'],        'slug': hit['slug'],      })  return articlesdef rebuild():  if os.path.exists(index_dir):    shutil.rmtree(index_dir)  os.makedirs(index_dir)  analyzer = ChineseAnalyzer()  schema = Schema(    id=ID(stored=True, unique=True),    slug=TEXT(stored=True),    title=TEXT(),    content=TEXT(analyzer=analyzer))  indexer = create_in(index_dir, schema)  __index_all_articles()def __index_all_articles():  writer = indexer.writer()  published_articles = Article.objects.exclude(is_draft=True)  for article in published_articles:    writer.add_document(      id=str(article.id),      slug=article.slug,      title=article.title,      content=article.content,    )  writer.commit()def article_update_index(article):  '''  updating an article to indexer, adding if not.  '''  writer = indexer.writer()  writer.update_document(    id=str(article.id),    slug=article.slug,    title=article.title,    content=article.content,  )  writer.commit()def article_delete_index(article):  writer = indexer.writer()  writer.delete_by_term('id', str(article.id))  writer.commit()

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持脚本之家。

(责任编辑：admin)

上一篇：Python实现的爬取小说爬虫功能示例
下一篇：linux安装python修改默认python版本方法

帮助中心: 会员注册; 找回密码; 新闻中心

快捷通道: 域名登录面板; 虚机登录面板; 云主机登录面板

关于我们: 关于我们; 联系我们

联系方式: 售前咨询：17830004266(重庆移动); 企业QQ：383546523

《中华人民共和国工业和信息化部》编号：ICP备00012341号

Copyright © 2002 -2018 香港云主机版权所有
声明：香港云主机品牌标志、品牌吉祥物均已注册商标，版权所有，窃用必究

云官方微信

在线客服

企业QQ:
技术支持：383546523

公司总台电话：17830004266(重庆移动)
售前咨询热线：17830004266(重庆移动)