休眠搜索不索引电子邮件地址?

问题描述:

我想在使用Hibernate Search的实体中对电子邮件地址进行全文搜索。休眠搜索不索引电子邮件地址?

考虑下面的实体 “人” 与索引字段 “电子邮件”:

Person.groovy

package com.example 

import javax.persistence.Entity 
import javax.persistence.GeneratedValue 
import javax.persistence.GenerationType 
import javax.persistence.Id 

import org.hibernate.search.annotations.Field 
import org.hibernate.search.annotations.Indexed 

@Entity 
@Indexed 
class Person { 
    @Id 
    @GeneratedValue(strategy=GenerationType.AUTO) 
    Long id 

    @Field 
    String email 
} 

并给出了库

SearchRepository.groovy

package com.example 

import javax.persistence.EntityManager 

import org.apache.lucene.search.Query 
import org.hibernate.search.jpa.FullTextEntityManager 
import org.hibernate.search.jpa.Search 
import org.hibernate.search.query.dsl.QueryBuilder 
import org.springframework.beans.factory.annotation.Autowired 
import org.springframework.stereotype.Repository 

@Repository 
class SearchRepository { 

    @Autowired 
    EntityManager entityManager 

    FullTextEntityManager getFullTextEntityManager() { 
     Search.getFullTextEntityManager(entityManager) 
    } 

    List<Person> findPeople(String searchText){ 
     searchText = searchText.toLowerCase()+'*' 
     QueryBuilder qb = fullTextEntityManager.searchFactory 
       .buildQueryBuilder().forEntity(Person).get() 
     Query query = 
       qb 
       .keyword() 
       .wildcard() 
       .onField('email') 
       .matching(searchText) 
       .createQuery() 

     javax.persistence.Query jpaQuery = 
       fullTextEntityManager.createFullTextQuery(query, Person) 

     jpaQuery.resultList 
    } 
} 

然后下面的测试失败:

SearchWildcardTest.groovy

package com.example 

import javax.persistence.EntityManager 

import org.hibernate.search.jpa.FullTextEntityManager 
import org.hibernate.search.jpa.Search 
import org.junit.Test 
import org.junit.runner.RunWith 
import org.springframework.beans.factory.annotation.Autowired 
import org.springframework.boot.test.SpringApplicationConfiguration 
import org.springframework.test.context.junit4.SpringJUnit4ClassRunner 
import org.springframework.transaction.annotation.Transactional 

@RunWith(SpringJUnit4ClassRunner) 
@SpringApplicationConfiguration(classes = HibernateSearchWildcardApplication) 
@Transactional 
class SearchWildcardTest { 

    @Autowired 
    SearchRepository searchRepo 

    @Autowired 
    PersonRepository personRepo 

    @Autowired 
    EntityManager em 

    FullTextEntityManager getFullTextEntityManager() { 
     Search.getFullTextEntityManager(em) 
    } 

    @Test 
    void findTeamsByNameWithWildcard() { 
     Person person = personRepo.save new Person(email: '[email protected]') 

     fullTextEntityManager.createIndexer().startAndWait() 
     fullTextEntityManager.flushToIndexes() 

     List<Person> people = searchRepo.findPeople('[email protected]') 

     assert people.contains(person) // this assertion fails! Why? 
    } 
} 

PersonRepository.groovy

package com.example 

import org.springframework.data.repository.CrudRepository 

interface PersonRepository extends CrudRepository<Person, Long>{ 
} 

的build.gradle

buildscript { 
    ext { 
     springBootVersion = '1.2.7.RELEASE' 
    } 
    repositories { 
     mavenCentral() 
    } 
    dependencies { 
     classpath("org.springframework.boot:spring-boot-gradle-plugin:${springBootVersion}") 
     classpath('io.spring.gradle:dependency-management-plugin:0.5.2.RELEASE') 
    } 
} 

apply plugin: 'groovy' 
apply plugin: 'eclipse' 
apply plugin: 'spring-boot' 
apply plugin: 'io.spring.dependency-management' 

jar { 
    baseName = 'hibernate-search-email' 
    version = '0.0.1-SNAPSHOT' 
} 
sourceCompatibility = 1.8 
targetCompatibility = 1.8 

repositories { 
    mavenCentral() 
} 

dependencies { 
    compile('org.springframework.boot:spring-boot-starter-data-jpa') 
    compile('org.codehaus.groovy:groovy') 
    compile('org.hibernate:hibernate-search:5.3.0.Final') 
    testCompile('com.h2database:h2') 
    testCompile('org.springframework.boot:spring-boot-starter-test') 
} 

task wrapper(type: Wrapper) { 
    gradleVersion = '2.8' 
} 

这里是卢克从生成Lucene索引显示了运行测试后:

enter image description here

在我看来,电子邮件地址“[email protected]”没有完全存储在索引中,而是被拆分为两个字符串“foo”和“bar.com”。

从官方Hibernate Search website “入门”指南指出

[...]标记者处分割标点字符单词和连字符,同时保持电子邮件地址和主机名互联网完整的标准。这是一个很好的通用分词器。 [...]

我必须在这里失踪,但无法弄清楚。

我的问题:

  • 为什么我的代码不会索引完整的电子邮件地址?
  • 我该如何做到索引地址以便测试通过?

似乎文档反映了底层Lucene API中的更改不正确。

[K] eeping电子邮件地址和主机名互联网完整...

这用来为自那时以来已经改变对Lucene的侧面传统StandardTokenizer是正确的。它的行为现在可以在ClassicTokenizer中找到。

所以下面的配置应该给你你所追求的:

@Entity 
@Indexed 
@AnalyzerDef(
    name = "emailanalyzer", 
    tokenizer = @TokenizerDef(factory = ClassicTokenizerFactory.class), 
    filters = { 
     @TokenFilterDef(factory = LowerCaseFilterFactory.class), 
    } 
) 
class Person { 

    // ... 

    @Field 
    @Analyzer(definition = "emailanalyzer") 
    String email; 
} 

注意微调也与此配置应用。我们将相应调整HSEARCH文档,感谢您发现这一点!

+0

太棒了,@Gunnar!这对我很有用,非常感谢! – Riggs

+0

不错,很高兴听到! – Gunnar