java Lucene 中自定义排序的实现

所属分类: 网络编程 / JSP编程 阅读数: 2100
收藏 0 赞 0 分享
Lucene中的自定义排序功能和Java集合中的自定义排序的实现方法差不多,都要实现一下比较接口. 在Java中只要实现Comparable接口就可以了.但是在Lucene中要实现SortComparatorSource接口和ScoreDocComparator接口.在了解具体实现方法之前先来看看这两个接口的定义吧.
SortComparatorSource接口的功能是返回一个用来排序ScoreDocs的comparator(Expert: returns a comparator for sorting ScoreDocs).该接口只定义了一个方法.如下:
Java代码
/**
* Creates a comparator for the field in the given index.
* @param reader - Index to create comparator for.
* @param fieldname - Field to create comparator for.
* @return Comparator of ScoreDoc objects.
* @throws IOException - If an error occurs reading the index.
*/
public ScoreDocComparator newComparator(IndexReader reader,String fieldname) throws IOException
view plaincopy to clipboardprint?
/**
* Creates a comparator for the field in the given index.
* @param reader - Index to create comparator for.
* @param fieldname - Field to create comparator for.
* @return Comparator of ScoreDoc objects.
* @throws IOException - If an error occurs reading the index.
*/
public ScoreDocComparator newComparator(IndexReader reader,String fieldname) throws IOException
/**
* Creates a comparator for the field in the given index.
* @param reader - Index to create comparator for.
* @param fieldname - Field to create comparator for.
* @return Comparator of ScoreDoc objects.
* @throws IOException - If an error occurs reading the index.
*/
public ScoreDocComparator newComparator(IndexReader reader,String fieldname) throws IOException
该方法只是创造一个ScoreDocComparator 实例用来实现排序.所以我们还要实现ScoreDocComparator 接口.来看看ScoreDocComparator 接口.功能是比较来两个ScoreDoc 对象来排序(Compares two ScoreDoc objects for sorting) 里面定义了两个Lucene实现的静态实例.如下:
Java代码
//Special comparator for sorting hits according to computed relevance (document score).
public static final ScoreDocComparator RELEVANCE;
//Special comparator for sorting hits according to index order (document number).
public static final ScoreDocComparator INDEXORDER;
view plaincopy to clipboardprint?
//Special comparator for sorting hits according to computed relevance (document score).
public static final ScoreDocComparator RELEVANCE;
//Special comparator for sorting hits according to index order (document number).
public static final ScoreDocComparator INDEXORDER;
//Special comparator for sorting hits according to computed relevance (document score).
public static final ScoreDocComparator RELEVANCE;

//Special comparator for sorting hits according to index order (document number).
public static final ScoreDocComparator INDEXORDER;
有3个方法与排序相关,需要我们实现 分别如下:
Java代码
/**
* Compares two ScoreDoc objects and returns a result indicating their sort order.
* @param i First ScoreDoc
* @param j Second ScoreDoc
* @return -1 if i should come before j;
* 1 if i should come after j;
* 0 if they are equal
*/
public int compare(ScoreDoc i,ScoreDoc j);
/**
* Returns the value used to sort the given document. The object returned must implement the java.io.Serializable interface. This is used by multisearchers to determine how to collate results from their searchers.
* @param i Document
* @return Serializable object
*/
public Comparable sortValue(ScoreDoc i);
/**
* Returns the type of sort. Should return SortField.SCORE, SortField.DOC, SortField.STRING, SortField.INTEGER, SortField.FLOAT or SortField.CUSTOM. It is not valid to return SortField.AUTO. This is used by multisearchers to determine how to collate results from their searchers.
* @return One of the constants in SortField.
*/
public int sortType();
view plaincopy to clipboardprint?
/**
* Compares two ScoreDoc objects and returns a result indicating their sort order.
* @param i First ScoreDoc
* @param j Second ScoreDoc
* @return -1 if i should come before j;
* 1 if i should come after j;
* 0 if they are equal
*/
public int compare(ScoreDoc i,ScoreDoc j);
/**
* Returns the value used to sort the given document. The object returned must implement the java.io.Serializable interface. This is used by multisearchers to determine how to collate results from their searchers.
* @param i Document
* @return Serializable object
*/
public Comparable sortValue(ScoreDoc i);
/**
* Returns the type of sort. Should return SortField.SCORE, SortField.DOC, SortField.STRING, SortField.INTEGER, SortField.FLOAT or SortField.CUSTOM. It is not valid to return SortField.AUTO. This is used by multisearchers to determine how to collate results from their searchers.
* @return One of the constants in SortField.
*/
public int sortType();
/**
     * Compares two ScoreDoc objects and returns a result indicating their sort order.
     * @param i First ScoreDoc
     * @param j Second ScoreDoc
     * @return -1 if i should come before j;
     * 1 if i should come after j;
     * 0 if they are equal
     */
    public int compare(ScoreDoc i,ScoreDoc j);
    /**
     * Returns the value used to sort the given document. The object returned must implement the java.io.Serializable interface. This is used by multisearchers to determine how to collate results from their searchers.
     * @param i Document
     * @return Serializable object
     */
    public Comparable sortValue(ScoreDoc i);
    /**
     * Returns the type of sort. Should return SortField.SCORE, SortField.DOC, SortField.STRING, SortField.INTEGER, SortField.FLOAT or SortField.CUSTOM. It is not valid to return SortField.AUTO. This is used by multisearchers to determine how to collate results from their searchers.
     * @return One of the constants in SortField.
     */
    public int sortType();
看个例子吧!
该例子为Lucene in Action中的一个实现,用来搜索距你最近的餐馆的名字. 餐馆坐标用字符串"x,y"来存储.
Java代码
package com.nikee.lucene;
import java.io.IOException;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.TermDocs;
import org.apache.lucene.index.TermEnum;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.ScoreDocComparator;
import org.apache.lucene.search.SortComparatorSource;
import org.apache.lucene.search.SortField;
//实现了搜索距你最近的餐馆的名字. 餐馆坐标用字符串"x,y"来存储
//DistanceComparatorSource 实现了SortComparatorSource接口
public class DistanceComparatorSource implements SortComparatorSource {
private static final long serialVersionUID = 1L;
// x y 用来保存 坐标位置
private int x;
private int y;
public DistanceComparatorSource(int x, int y) {
this.x = x;
this.y = y;
}
// 返回ScoreDocComparator 用来实现排序功能
public ScoreDocComparator newComparator(IndexReader reader, String fieldname) throws IOException {
return new DistanceScoreDocLookupComparator(reader, fieldname, x, y);
}
//DistanceScoreDocLookupComparator 实现了ScoreDocComparator 用来排序
private static class DistanceScoreDocLookupComparator implements ScoreDocComparator {
private float[] distances; // 保存每个餐馆到指定点的距离
// 构造函数 , 构造函数在这里几乎完成所有的准备工作.
public DistanceScoreDocLookupComparator(IndexReader reader, String fieldname, int x, int y) throws IOException {
System.out.println("fieldName2="+fieldname);
final TermEnum enumerator = reader.terms(new Term(fieldname, ""));
System.out.println("maxDoc="+reader.maxDoc());
distances = new float[reader.maxDoc()]; // 初始化distances
if (distances.length > 0) {
TermDocs termDocs = reader.termDocs();
try {
if (enumerator.term() == null) {
throw new RuntimeException("no terms in field " + fieldname);
}
int i = 0,j = 0;
do {
System.out.println("in do-while :" + i ++);
Term term = enumerator.term(); // 取出每一个Term
if (term.field() != fieldname) // 与给定的域不符合则比较下一个
break;
//Sets this to the data for the current term in a TermEnum.
//This may be optimized in some implementations.
termDocs.seek(enumerator); //参考TermDocs Doc
while (termDocs.next()) {
System.out.println(" in while :" + j ++);
System.out.println(" in while ,Term :" + term.toString());
String[] xy = term.text().split(","); // 去处x y
int deltax = Integer.parseInt(xy[0]) - x;
int deltay = Integer.parseInt(xy[1]) - y;
// 计算距离
distances[termDocs.doc()] = (float) Math.sqrt(deltax * deltax + deltay * deltay);
}
}
while (enumerator.next());
} finally {
termDocs.close();
}
}
}
//有上面的构造函数的准备 这里就比较简单了
public int compare(ScoreDoc i, ScoreDoc j) {
if (distances[i.doc] < distances[j.doc])
return -1;
if (distances[i.doc] > distances[j.doc])
return 1;
return 0;
}
// 返回距离
public Comparable sortValue(ScoreDoc i) {
return new Float(distances[i.doc]);
}
//指定SortType
public int sortType() {
return SortField.FLOAT;
}
}
public String toString() {
return "Distance from (" + x + "," + y + ")";
}
}
view plaincopy to clipboardprint?
package com.nikee.lucene;
import java.io.IOException;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.TermDocs;
import org.apache.lucene.index.TermEnum;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.ScoreDocComparator;
import org.apache.lucene.search.SortComparatorSource;
import org.apache.lucene.search.SortField;
//实现了搜索距你最近的餐馆的名字. 餐馆坐标用字符串"x,y"来存储
//DistanceComparatorSource 实现了SortComparatorSource接口
public class DistanceComparatorSource implements SortComparatorSource {
private static final long serialVersionUID = 1L;
// x y 用来保存 坐标位置
private int x;
private int y;
public DistanceComparatorSource(int x, int y) {
this.x = x;
this.y = y;
}
// 返回ScoreDocComparator 用来实现排序功能
public ScoreDocComparator newComparator(IndexReader reader, String fieldname) throws IOException {
return new DistanceScoreDocLookupComparator(reader, fieldname, x, y);
}
//DistanceScoreDocLookupComparator 实现了ScoreDocComparator 用来排序
private static class DistanceScoreDocLookupComparator implements ScoreDocComparator {
private float[] distances; // 保存每个餐馆到指定点的距离
// 构造函数 , 构造函数在这里几乎完成所有的准备工作.
public DistanceScoreDocLookupComparator(IndexReader reader, String fieldname, int x, int y) throws IOException {
System.out.println("fieldName2="+fieldname);
final TermEnum enumerator = reader.terms(new Term(fieldname, ""));
System.out.println("maxDoc="+reader.maxDoc());
distances = new float[reader.maxDoc()]; // 初始化distances
if (distances.length > 0) {
TermDocs termDocs = reader.termDocs();
try {
if (enumerator.term() == null) {
throw new RuntimeException("no terms in field " + fieldname);
}
int i = 0,j = 0;
do {
System.out.println("in do-while :" + i ++);
Term term = enumerator.term(); // 取出每一个Term
if (term.field() != fieldname) // 与给定的域不符合则比较下一个
break;
//Sets this to the data for the current term in a TermEnum.
//This may be optimized in some implementations.
termDocs.seek(enumerator); //参考TermDocs Doc
while (termDocs.next()) {
System.out.println(" in while :" + j ++);
System.out.println(" in while ,Term :" + term.toString());
String[] xy = term.text().split(","); // 去处x y
int deltax = Integer.parseInt(xy[0]) - x;
int deltay = Integer.parseInt(xy[1]) - y;
// 计算距离
distances[termDocs.doc()] = (float) Math.sqrt(deltax * deltax + deltay * deltay);
}
}
while (enumerator.next());
} finally {
termDocs.close();
}
}
}
//有上面的构造函数的准备 这里就比较简单了
public int compare(ScoreDoc i, ScoreDoc j) {
if (distances[i.doc] < distances[j.doc])
return -1;
if (distances[i.doc] > distances[j.doc])
return 1;
return 0;
}
// 返回距离
public Comparable sortValue(ScoreDoc i) {
return new Float(distances[i.doc]);
}
//指定SortType
public int sortType() {
return SortField.FLOAT;
}
}
public String toString() {
return "Distance from (" + x + "," + y + ")";
}
}
package com.nikee.lucene;
import java.io.IOException;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.TermDocs;
import org.apache.lucene.index.TermEnum;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.ScoreDocComparator;
import org.apache.lucene.search.SortComparatorSource;
import org.apache.lucene.search.SortField;
//实现了搜索距你最近的餐馆的名字. 餐馆坐标用字符串"x,y"来存储
//DistanceComparatorSource 实现了SortComparatorSource接口
public class DistanceComparatorSource implements SortComparatorSource {
    private static final long serialVersionUID = 1L;

    // x y 用来保存 坐标位置
    private int x;
    private int y;

    public DistanceComparatorSource(int x, int y) {
        this.x = x;
        this.y = y;
    }

    // 返回ScoreDocComparator 用来实现排序功能
    public ScoreDocComparator newComparator(IndexReader reader, String fieldname) throws IOException {
        return new DistanceScoreDocLookupComparator(reader, fieldname, x, y);
    }

    //DistanceScoreDocLookupComparator 实现了ScoreDocComparator 用来排序
    private static class DistanceScoreDocLookupComparator implements ScoreDocComparator {
        private float[] distances; // 保存每个餐馆到指定点的距离

        // 构造函数 , 构造函数在这里几乎完成所有的准备工作.
        public DistanceScoreDocLookupComparator(IndexReader reader, String fieldname, int x, int y) throws IOException {
            System.out.println("fieldName2="+fieldname);
            final TermEnum enumerator = reader.terms(new Term(fieldname, ""));

            System.out.println("maxDoc="+reader.maxDoc());
            distances = new float[reader.maxDoc()]; // 初始化distances
            if (distances.length > 0) {
                TermDocs termDocs = reader.termDocs();
                try {
                    if (enumerator.term() == null) {
                        throw new RuntimeException("no terms in field " + fieldname);
                    }
                    int i = 0,j = 0;
                    do {
                        System.out.println("in do-while :" + i ++);
                        Term term = enumerator.term(); // 取出每一个Term
                        if (term.field() != fieldname) // 与给定的域不符合则比较下一个
                            break;

                        //Sets this to the data for the current term in a TermEnum.
                        //This may be optimized in some implementations.
                        termDocs.seek(enumerator); //参考TermDocs Doc
                        while (termDocs.next()) {
                            System.out.println(" in while :" + j ++);
                            System.out.println(" in while ,Term :" + term.toString());

                            String[] xy = term.text().split(","); // 去处x y
                            int deltax = Integer.parseInt(xy[0]) - x;
                            int deltay = Integer.parseInt(xy[1]) - y;
                            // 计算距离
                            distances[termDocs.doc()] = (float) Math.sqrt(deltax * deltax + deltay * deltay);
                        }
                    }
                    while (enumerator.next());
                } finally {
                    termDocs.close();
                }
            }
        }
        //有上面的构造函数的准备 这里就比较简单了
        public int compare(ScoreDoc i, ScoreDoc j) {
            if (distances[i.doc] < distances[j.doc])
                return -1;
            if (distances[i.doc] > distances[j.doc])
                return 1;
            return 0;
        }

        // 返回距离
        public Comparable sortValue(ScoreDoc i) {
            return new Float(distances[i.doc]);
        }

        //指定SortType
        public int sortType() {
            return SortField.FLOAT;
        }
    }

    public String toString() {
        return "Distance from (" + x + "," + y + ")";
    }
}
这是一个实现了上面两个接口的两个类, 里面带有详细注释, 可以看出 自定义排序并不是很难的. 该实现能否正确实现,我们来看看测试代码能否通过吧.
Java代码
package com.nikee.lucene.test;
import java.io.IOException;
import junit.framework.TestCase;
import org.apache.lucene.analysis.WhitespaceAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.FieldDoc;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.Sort;
import org.apache.lucene.search.SortField;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.TopFieldDocs;
import org.apache.lucene.store.RAMDirectory;
import com.nikee.lucene.DistanceComparatorSource;
public class DistanceComparatorSourceTest extends TestCase {
private RAMDirectory directory;
private IndexSearcher searcher;
private Query query;
//建立测试环境
protected void setUp() throws Exception {
directory = new RAMDirectory();
IndexWriter writer = new IndexWriter(directory, new WhitespaceAnalyzer(), true);
addPoint(writer, "El Charro", "restaurant", 1, 2);
addPoint(writer, "Cafe Poca Cosa", "restaurant", 5, 9);
addPoint(writer, "Los Betos", "restaurant", 9, 6);
addPoint(writer, "Nico's Taco Shop", "restaurant", 3, 8);
writer.close();
searcher = new IndexSearcher(directory);
query = new TermQuery(new Term("type", "restaurant"));
}
private void addPoint(IndexWriter writer, String name, String type, int x, int y) throws IOException {
Document doc = new Document();
doc.add(new Field("name", name, Field.Store.YES, Field.Index.TOKENIZED));
doc.add(new Field("type", type, Field.Store.YES, Field.Index.TOKENIZED));
doc.add(new Field("location", x + "," + y, Field.Store.YES, Field.Index.UN_TOKENIZED));
writer.addDocument(doc);
}
public void testNearestRestaurantToHome() throws Exception {
//使用DistanceComparatorSource来构造一个SortField
Sort sort = new Sort(new SortField("location", new DistanceComparatorSource(0, 0)));
Hits hits = searcher.search(query, sort); // 搜索
//测试
assertEquals("closest", "El Charro", hits.doc(0).get("name"));
assertEquals("furthest", "Los Betos", hits.doc(3).get("name"));
}
public void testNeareastRestaurantToWork() throws Exception {
Sort sort = new Sort(new SortField("location", new DistanceComparatorSource(10, 10))); // 工作的坐标 10,10
//上面的测试实现了自定义排序,但是并不能访问自定义排序的更详细信息,利用
//TopFieldDocs 可以进一步访问相关信息
TopFieldDocs docs = searcher.search(query, null, 3, sort);
assertEquals(4, docs.totalHits);
assertEquals(3, docs.scoreDocs.length);
//取得FieldDoc 利用FieldDoc可以取得关于排序的更详细信息 请查看FieldDoc Doc
FieldDoc fieldDoc = (FieldDoc) docs.scoreDocs[0];
assertEquals("(10,10) -> (9,6) = sqrt(17)", new Float(Math.sqrt(17)), fieldDoc.fields[0]);
Document document = searcher.doc(fieldDoc.doc);
assertEquals("Los Betos", document.get("name"));
dumpDocs(sort, docs); // 显示相关信息
}
// 显示有关排序的信息
private void dumpDocs(Sort sort, TopFieldDocs docs) throws IOException {
System.out.println("Sorted by: " + sort);
ScoreDoc[] scoreDocs = docs.scoreDocs;
for (int i = 0; i < scoreDocs.length; i++) {
FieldDoc fieldDoc = (FieldDoc) scoreDocs[i];
Float distance = (Float) fieldDoc.fields[0];
Document doc = searcher.doc(fieldDoc.doc);
System.out.println(" " + doc.get("name") + " @ (" + doc.get("location") + ") -> " + distance);
}
}
}
更多精彩内容其他人还在看

weblogic 8.1下重新编译java类但不用重启服务器的方法

weblogic 8.1下重新编译java类但不用重启服务器的方法
收藏 0 赞 0 分享

JSP下动态INCLUDE与静态INCLUDE的区别分析

这篇文章给大家介绍了JSP下动态INCLUDE与静态INCLUDE的区别分析,非常不错,具有一定的参考借鉴价值,需要的朋友参考下吧
收藏 0 赞 0 分享

jsp中文乱码 jsp mysql 乱码的解决方法

当使用JSP页面将中文数据添加到MySql数据库中的时候发现变为乱码,或者从mysql中读取中文的时候出现乱码,这些问题根源都是由于字符编码不一致造成的。本文介绍jsp mysql 乱码的解决方法,感兴趣的小伙伴们可以参考一下
收藏 0 赞 0 分享

Jsp页面实现文件上传下载类代码第1/2页

Jsp页面实现文件上传下载类代码
收藏 0 赞 0 分享

下载完成后页面不自动关闭的方法

其实就一句话js代码,window.close()
收藏 0 赞 0 分享

JBuilder2005实现重构

JBuilder2005实现重构
收藏 0 赞 0 分享

CORBA对象生命周期

CORBA对象生命周期
收藏 0 赞 0 分享

基于Java的代理设计模式

基于Java的代理设计模式
收藏 0 赞 0 分享

Java中四种XML解析技术

Java中四种XML解析技术
收藏 0 赞 0 分享

跨平台Java程序

跨平台Java程序
收藏 0 赞 0 分享
查看更多