Elasticsearch Analyzer 内置分词器( 二 ) _生活百科

4. Simple Analyzer简单的分词器分词规则就是遇到非字母的就分词, 并且转化为小写,(lowercase tokennizer )
POST _analyze{"analyzer": "simple","text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."}[ the, quick, brown, foxes, jumped, over, the, lazy, dog, s, bone ]4.1 DefinitionTokenizer

Lower Case Tokenizer

4.2 Configuation无配置参数
4.3 实验simple analyzer 分词器的实现就是如下
PUT /simple_example{"settings": {"analysis": {"analyzer": {"rebuilt_simple": {"tokenizer": "lowercase","filter": []}}}}}5. Stop Analyzerstop analyzer和 simple analyzer 一样, 只是多了过滤 stop word 的 token filter , 并且默认使用 english 停顿词规则

POST _analyze{"analyzer": "stop","text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."}// 可以看到 非字母进行分词 并且转小写 然后 去除了停顿词[ quick, brown, foxes, jumped, over, lazy, dog, s, bone ]

5.1 DefinitionTokenizer

Lower Case Tokenizer : 转小写的

Token filters

Stop Token Filter : 过滤停顿词默认使用规则 english

5.2 Configuration

stopwords : 指定分词的规则默认 english , 或者分词的数组
stopwords_path : 指定分词停顿词文件

5.3 实验如下就是对 Stop Analyzer 的实现 , 先转小写后进行停顿词的过滤

PUT /stop_example{"settings": {"analysis": {"filter": {"english_stop": {"type":"stop","stopwords":"_english_"}},"analyzer": {"rebuilt_stop": {"tokenizer": "lowercase","filter": ["english_stop"]}}}}}

设置 stopwords 参数指定过滤的停顿词列表

PUT my_index{"settings": {"analysis": {"analyzer": {"my_stop_analyzer": {"type": "stop","stopwords": ["the", "over"]}}}}}POST my_index/_analyze{"analyzer": "my_stop_analyzer","text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."}[ quick, brown, foxes, jumped, lazy, dog, s, bone ]

6. Whitespace Analyzer空格分词器, 顾名思义遇到空格就进行分词, 不会转小写

POST _analyze{"analyzer": "whitespace","text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."}[ The, 2, QUICK, Brown-Foxes, jumped, over, the, lazy, dog's, bone. ]

6.1 DefinitionTokenizer

Whitespace Tokenizer

6.2 Configuration无配置
6.3 实验whitespace analyzer 的实现就是如下, 可以根据实际情况进行添加 filter

PUT /whitespace_example{"settings": {"analysis": {"analyzer": {"rebuilt_whitespace": {"tokenizer": "whitespace","filter": []}}}}}

7. Keyword Analyzer很特殊它不会进行分词, 怎么输入就怎么输出

POST _analyze{"analyzer": "keyword","text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."}//注意 这里并没有进行分词 而是原样输出[ The 2 QUICK Brown-Foxes jumped over the lazy dog's bone. ]

7.1 DefinitionTokennizer

Keyword Tokenizer

7.2 Configuration无配置
7.3 实验rebuit 如下就是 Keyword Analyzer 实现
PUT /keyword_example{"settings": {"analysis": {"analyzer": {"rebuilt_keyword": {"tokenizer": "keyword","filter": []}}}}}8. Patter Analyzer正则表达式进行拆分 ,注意正则匹配的是标记, 就是要被分词的标记 默认是按照 \w+ 正则分词

POST _analyze{"analyzer": "pattern","text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."}// 默认是 按照 \w+ 正则[ the, 2, quick, brown, foxes, jumped, over, the, lazy, dog, s, bone ]

8.1 DefinitionTokennizer

Pattern Tokenizer

Token Filters

Lower Case Token Filter
Stop Token Filter (默认未开启)

8.2 ConfigurationpatternA Java regular expression, defaults to \W+.flagsJava regular expression.

lowercase
上一页
1
2
3
下一页
		  	


经验总结扩展阅读

           
                  
              
                  艺术有哪些,艺术有哪些种类？ 
                
                   
                
              
            

                  
              
                  男人有以下表现，离婚是注定，女人不要为男人找借口 
                
                   
                
              
            

                  
              
                  苹果左上角的箭头是什么? 
                
                   
                
              
            

                  
              
                  鲱鱼罐头有多臭? 
                
                   
                
              
            

                  
              
                  初中男生叛逆怎么处理 
                
                   
                
              
            

                  
              
                  呢子衣服可以用洗衣机洗吗 
                
                   
                
              
            

                  
              
                  2023年6月哪天提车最吉祥 
                
                   
                
              
            

                  
              
                  翡翠好坏鉴定 
                
                   
                
              
            

                  
              
                  元旦快乐祝福语2022 
                
                   
                
              
            

                  
              
                  童装 诅咒孩子下地狱？？？“国货之光”欠妈妈们一个解释！ 
                
                   
                
              
            

                  
              
                  胶原蛋白 淡斑精华液有用吗 中年淡斑美白精华液推荐 
                
                   
                
              
            

                  
              
                  地下城堡3魂之诗游戏攻略 
                
                   
                
              
            

                  
              
                  双星是什么车标？ 
                
                   
                
              
            

                  
              
                  洗衣机如何消毒清洗 
                
                   
                
              
            

                  
              
                  空腹可以吃饭吗 
                
                   
                
              
            

                  
              
                  古人对梅花的美称  梅花的相关知识 
                
                   
                
              
            

                  
              
                  卢先生跟妻子结婚已经3年了|对于这种事，可以看出卢先生是真的很爱自己的妻子 
                
                   
                
              
            

                  
              
                  德普跟自己辩护律师恋上了？女方有钱有名，还没离婚 
                
                   
                
              
            

                  
              
                  麻腐是什么 
                
                   
                
              
            

                  
              
                  挑战地心引力 对皮肤松弛说拜拜 
                
                   
                
              
            

          

ElasticSearch这些坑记得避开 

有用的内置Node.js APIs 

京东云开发者｜ElasticSearch降本增效常见的方法 

记录在linux上单机elasticsearch8和kibana8 

AgileBoot - 如何集成内置数据库H2和内置Redis 

Elasticsearch rest-high-level-client 基本操作 

SpringBoot内置工具类，告别瞎写工具类了 

Hyperf使用ElasticSearch记录 

Mysql通过Canal同步Elasticsearch 

4 Java注解：一个真实的Elasticsearch案例