博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Learning to rank (software, datasets)
阅读量:5037 次
发布时间:2019-06-12

本文共 2508 字,大约阅读时间需要 8 分钟。

Datasets for ranking (LETOR datasets)

  • You’ll need much patience to download it, since Microsoft’s server seeds with the speed of 1 Mbit or even slower.

    The only difference between these two datasets is the number of queries (10000 and 30000 respectively). They contain 136 columns, mostly filled with different term frequencies and so on. (but the text of query and document are available)

  • Apart from these datasets,  and are available, which were published in 2008 and 2009. Those datasets are smaller. From LETOR4.0 MQ-2007 and MQ-2008 are interesting (46 features there).  MQ stays for million queries.

  • , from challenge organized in 2010. There are currently two versions: 1.0(400Mb) and 2.0 (600Mb). Here is more info about two sets within this data
    • There is also

    • (Интернет-Математика 2009) dataset, which is rather small. (~100000 query-pairs in test and the same in train, 245 features). 

 

 

Algorithms

There are plenty of on wiki and their modifications created specially for LETOR (with papers).

Implementations

There are many algorithms developed, but checking most of them is real problem, because there is no available implementation one can try. But constantly new algorithms appear and their developers claim that new algorithm provides best results on all (or almost all) datasets.

This of course hardly believable, specially provided that most researchers don’t publish code of their algorithms. In theory,  one shall publish not only the code of algorithms, but the whole code of experiment.

However, there are some algorithms that are available (apart from regression, of course).

    1. LEMUR.Ranklib project incorporates many algorithms in C++ the best option unless you need implementation of something specific. Currently contains
      MART (=GBRT), RankNet, RankBoost, AdaRank, Coordinate Ascent, LambdaMART and ListNet
    2. : written in python online learning to rank framework. Also there is less detailed, butlonger list of datasets: 
    3. on learning to rank
    4.  (in python specially for kaggle ranking competition)
    5.  is part of xapian project, this library was developed at GSoC 2014. Though I haven’t found anythong on ranking in documentation, some implementations can be found in C++ code: https://github.com/xapian/xapian/tree/master/xapian-letor https://github.com/v-hasu/xapian/tree/master/xapian-letor

 

  •  (mlr) for matlab
  • in C++
  • , rankers (probably these were used in xapian)
  • in C++

Comparison from ,
though paper was about comparison of nDCG implementations.

转载于:https://www.cnblogs.com/energy1010/p/7261851.html

你可能感兴趣的文章
Windows2008R2安装Exchange 2010前必须要做的准备工作
查看>>
了解栈(顺序栈)的实现方法
查看>>
bzoj 3732 Network
查看>>
对象数组
查看>>
Hadoop创建/删除文件夹出错
查看>>
差速移动机器人之建模与里程计
查看>>
Django学习笔记
查看>>
03-THREE.JS GUI使用
查看>>
Python os.path.join 双斜杠的解决方法
查看>>
高并发下线程安全的单例模式
查看>>
Windows下修改Git bash的HOME路径(转)
查看>>
第三章 TCP/IP
查看>>
【cocos2d-x制作别踩白块儿】第一期:游戏介绍
查看>>
发现的最大数量
查看>>
Ubuntu12.04环境搭建遇到的问题和建议(一个)
查看>>
19.最经济app发短信的方法
查看>>
从零開始学android<SeekBar滑动组件.二十二.>
查看>>
教你用笔记本破解无线路由器password
查看>>
网络编程学习小结
查看>>
JS面向对象
查看>>