休眠搜索不索引电子邮件地址?
问题描述:
我想在使用Hibernate Search的实体中对电子邮件地址进行全文搜索。休眠搜索不索引电子邮件地址?
考虑下面的实体 “人” 与索引字段 “电子邮件”:
Person.groovy
package com.example
import javax.persistence.Entity
import javax.persistence.GeneratedValue
import javax.persistence.GenerationType
import javax.persistence.Id
import org.hibernate.search.annotations.Field
import org.hibernate.search.annotations.Indexed
@Entity
@Indexed
class Person {
@Id
@GeneratedValue(strategy=GenerationType.AUTO)
Long id
@Field
String email
}
并给出了库
SearchRepository.groovy
package com.example
import javax.persistence.EntityManager
import org.apache.lucene.search.Query
import org.hibernate.search.jpa.FullTextEntityManager
import org.hibernate.search.jpa.Search
import org.hibernate.search.query.dsl.QueryBuilder
import org.springframework.beans.factory.annotation.Autowired
import org.springframework.stereotype.Repository
@Repository
class SearchRepository {
@Autowired
EntityManager entityManager
FullTextEntityManager getFullTextEntityManager() {
Search.getFullTextEntityManager(entityManager)
}
List<Person> findPeople(String searchText){
searchText = searchText.toLowerCase()+'*'
QueryBuilder qb = fullTextEntityManager.searchFactory
.buildQueryBuilder().forEntity(Person).get()
Query query =
qb
.keyword()
.wildcard()
.onField('email')
.matching(searchText)
.createQuery()
javax.persistence.Query jpaQuery =
fullTextEntityManager.createFullTextQuery(query, Person)
jpaQuery.resultList
}
}
然后下面的测试失败:
SearchWildcardTest.groovy
package com.example
import javax.persistence.EntityManager
import org.hibernate.search.jpa.FullTextEntityManager
import org.hibernate.search.jpa.Search
import org.junit.Test
import org.junit.runner.RunWith
import org.springframework.beans.factory.annotation.Autowired
import org.springframework.boot.test.SpringApplicationConfiguration
import org.springframework.test.context.junit4.SpringJUnit4ClassRunner
import org.springframework.transaction.annotation.Transactional
@RunWith(SpringJUnit4ClassRunner)
@SpringApplicationConfiguration(classes = HibernateSearchWildcardApplication)
@Transactional
class SearchWildcardTest {
@Autowired
SearchRepository searchRepo
@Autowired
PersonRepository personRepo
@Autowired
EntityManager em
FullTextEntityManager getFullTextEntityManager() {
Search.getFullTextEntityManager(em)
}
@Test
void findTeamsByNameWithWildcard() {
Person person = personRepo.save new Person(email: '[email protected]')
fullTextEntityManager.createIndexer().startAndWait()
fullTextEntityManager.flushToIndexes()
List<Person> people = searchRepo.findPeople('[email protected]')
assert people.contains(person) // this assertion fails! Why?
}
}
PersonRepository.groovy
package com.example
import org.springframework.data.repository.CrudRepository
interface PersonRepository extends CrudRepository<Person, Long>{
}
的build.gradle
buildscript {
ext {
springBootVersion = '1.2.7.RELEASE'
}
repositories {
mavenCentral()
}
dependencies {
classpath("org.springframework.boot:spring-boot-gradle-plugin:${springBootVersion}")
classpath('io.spring.gradle:dependency-management-plugin:0.5.2.RELEASE')
}
}
apply plugin: 'groovy'
apply plugin: 'eclipse'
apply plugin: 'spring-boot'
apply plugin: 'io.spring.dependency-management'
jar {
baseName = 'hibernate-search-email'
version = '0.0.1-SNAPSHOT'
}
sourceCompatibility = 1.8
targetCompatibility = 1.8
repositories {
mavenCentral()
}
dependencies {
compile('org.springframework.boot:spring-boot-starter-data-jpa')
compile('org.codehaus.groovy:groovy')
compile('org.hibernate:hibernate-search:5.3.0.Final')
testCompile('com.h2database:h2')
testCompile('org.springframework.boot:spring-boot-starter-test')
}
task wrapper(type: Wrapper) {
gradleVersion = '2.8'
}
这里是卢克从生成Lucene索引显示了运行测试后:
在我看来,电子邮件地址“[email protected]”没有完全存储在索引中,而是被拆分为两个字符串“foo”和“bar.com”。
从官方Hibernate Search website “入门”指南指出
[...]标记者处分割标点字符单词和连字符,同时保持电子邮件地址和主机名互联网完整的标准。这是一个很好的通用分词器。 [...]
我必须在这里失踪,但无法弄清楚。
我的问题:
- 为什么我的代码不会索引完整的电子邮件地址?
- 我该如何做到索引地址以便测试通过?
答
似乎文档反映了底层Lucene API中的更改不正确。
[K] eeping电子邮件地址和主机名互联网完整...
这用来为自那时以来已经改变对Lucene的侧面传统StandardTokenizer
是正确的。它的行为现在可以在ClassicTokenizer
中找到。
所以下面的配置应该给你你所追求的:
@Entity
@Indexed
@AnalyzerDef(
name = "emailanalyzer",
tokenizer = @TokenizerDef(factory = ClassicTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
}
)
class Person {
// ...
@Field
@Analyzer(definition = "emailanalyzer")
String email;
}
注意微调也与此配置应用。我们将相应调整HSEARCH文档,感谢您发现这一点!
太棒了,@Gunnar!这对我很有用,非常感谢! – Riggs
不错,很高兴听到! – Gunnar