Transfer Learning
Transfer learning is what happens when someone finds it much easier to learn to play chess having already learned to play checkers; or to recognize tables having already learned to recognize chairs; or to learn Spanish having already learned Italian. It emphasizes the transfer of knowledge across domains, tasks, and distributions that are similar but not the same.
Paper
Some early transfer learning research papers are listed in the following.
Jürgen Schmidhuber (1995). On Learning How to Learn Learning Strategies. Technical Report FKI-198-94, Fakultät für Informatik, Technische Universität München.
Sebastian Thrun and Tom M. Mitchell (1995). Learning One More Thing. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, 1217-1225, Montreal, Canada.
Sebastian Thrun (1996). Is Learning The n-th Thing Any Easier Than Learning The First?. Advances in Neural Information Processing Systems, 640-646, Denver, CO, USA.
Multi-Task Learning
Rich Caruana (1997). Multitask Learning. Machine Learning, 28(1), 41-75.
Jonathan Baxter (1997). A Bayesian/Information Theoretic Model of Learning to Learn via Multiple Task Sampling. Machine Learning, 28, 7–39.
Bart Bakker and Tom Heskes (2003). Task Clustering and Gating for Bayesian Multitask Learning. Journal of Machine Learning Research, 4, 83–99.
Shai Ben-David and Reba Schuller (2003). Exploiting Task Relatedness for Multiple Task Learning. In Proceedings of the Sixteenth Annual Conference on Learning Theory, 567-580, Washington, DC, USA.
Tony Jebara (2004). Multi-Task Feature and Kernel Selection for SVMs. In Proceedings of Twenty-first International Conference on Machine Learning, 329-336, Banff, Alberta, Canada.
Theodoros Evgeniou and Massimiliano Pontil (2004). Regularized Multi-Task Learning. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 109-117, Seattle, WA, USA.
Charles A. Micchelli and Massimiliano Pontil (2004). Kernels for Multi-task Learning. Advances in Neural Information Processing Systems 17, 921-928, Vancouver, British Columbia, Canada.
Neil D. Lawrence and John C. Platt (2004). Learning to Learn with the Informative Vector Machine. In Proceedings of the Twenty-first International Conference on Machine Learning, 178-185, Banff, Alberta, Canada.
Theodoros Evgeniou, Charles A. Micchelli and Massimiliano Pontil (2005). Learning Multiple Tasks with Kernel Methods. Journal of Machine Learning Research, 6, 615-637.
Rie Kubota Ando and Tong Zhang (2005). A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data. Journal of Machine Learning Research, 6, 1817–1853.
Kai Yu, Volker Tresp and Anton Schwaighofer (2005). Learning Gaussian Processes from Multiple Tasks. In Proceedings of the Twenty-second International Conference on Machine Learning, 1012-1019, Bonn, Germany.
Andreas Argyriou, Theos Evgeniou and Massimiliano Pontil (2006). Multi-Task Feature Learning. Advances in Neural Information Processing Systems 19, Vancouver, British Columbia, Canada.
Guillaume Obozinski, Ben Taskar and Michael I. Jordan (2006). Multi-Task Feature Selection. Technical Report, Department of Statistics, University of California, Berkeley.
Meta Learning
Phillip K. Chan and Salvatore J. Stolfo (1993a). Toward Parallel and Distributed Learning by Meta-Learning. In AAAI Workshop in Knowledge Discovery in Databases, 227-240, Washington, DC, USA.
Philip K. Chan and Salvatore J. Stolfo (1993b). Experiments on Multi-Strategy Learning by Meta-Learning. In Proceedings of the Second International Conference on Information and Knowledge Management, 314-323, Washington, DC, USA.
Jürgen Schmidhuber, Jieyu Zhao and Marco Wiering (1996). Simple Principles of Metalearning. Technical Report IDSIA-69-96, IDSIA.
Philip K. Chan and Salvatore J. Stolfo (1997). On the Accuracy of Meta-learning for Scalable Data Mining. Journal of Intelligent Information Systems, 8(1), 5-28.
Andreas L. Prodromidis, Philip K. Chan and Salvatore J. Stolfo (2000). Meta-learning in distributed data mining systems: Issues and Approaches. Book on Advances of Distributed Data Mining, editors Hillol Kargupta and Philip Chan, AAAI press.
Chuong B. Do and Andrew Y. Ng (2005). Transfer Learning for Text Classification. Advances in Neural Information Processing Systems 18, 299-306, Vancouver, British Columbia, Canada.
Domain Adaptation
Hal Daumé III and Daniel Marcu (2006). Domain Adaptation for Statistical Classifiers. Journal of Artificial Intelligence Research, Vol. 26, 101-126.
Shai Ben-David and John Blitzer and Koby Crammer and Fernando Pereira (2006). Analysis of Representations for Domain Adaptation. Advances in Neural Information Processing Systems 19, Vancouver, British Columbia, Canada.
Hal Daumé III (2007). Frustratingly Easy Domain Adaptation. To appear in Proceedings of Forty-fifth Annual Meeting of the Association for Computational Linguistics, Prague, Czech.
Lilyana Mihalkova, Tuyen Huynh and Raymond J. Mooney (2007). Mapping and Revising Markov Logic Networks for Transfer Learning. To appear in Proceedings of the Twenty-second National Conference on Artificial Intelligence, Vancouver, British Columbia, Canada.
Sample Selection Bias & Covariate Shift
James J. Heckman (1979). Sample Selection Bias as a Specification Error. Econometrica, Vol. 47, 153-161.
Hidetoshi Shimodaira (2000). Improving Predictive Inference under Covariate Shift by Weighting the Log-likelihood Function. Journal of Statistical Planning and Inference, 90, 227-244.
Bianca Zadrozny (2004). Learning and Evaluating Classifiers under Sample Selection Bias. In Proceedings of the Twenty-first International Conference on Machine Learning, 114-121, Banff, Alberta, Canada.
Miroslav Dudík, Robert Schapire and Steven Phillips (2005). Correcting Sample Selection Bias in Maximum Entropy Density Estimation. Advances in Neural Information Processing Systems 18, 323-330, Vancouver, British Columbia, Canada.
Steffen Bickel and Tobias Scheffer (2006). Dirichlet-Enhanced Spam Filtering based on Biased Samples. Advances in Neural Information Processing Systems 19, Vancouver, British Columbia, Canada. (workshop version)
Jiayuan Huang, Alex Smola, Arthur Gretton, Karsten M. Borgwardt and Bernhard Schölkopf (2006). Correcting Sample Selection Bias by Unlabeled Data. Advances in Neural Information Processing Systems 19, Vancouver, British Columbia, Canada.
Auxiliary Data
Koji Tsuda, Shotaro Akaho and Kiyoshi Asai (2003). The EM Algorithm for Kernel Matrix Completion with Auxiliary Data. Journal of Machine Learning Research, 4, 67-81.
Pengcheng Wu and Thomas G. Dietterich (2004). Improving SVM Accuracy by Training on Auxiliary Data Sources. In Proceedings of the Twenty-first International Conference on Machine Learning, 110-117, Banff, Alberta, Canada.
Xuejun Liao, Ya Xue and Lawrence Carin (2005). Logistic Regression with an Auxiliary Data Source. In Proceedings of the Twenty-second International Conference on Machine Learning, 505-512, Bonn, Germany.
Slides
Our Works
Wenyuan Dai, Gui-Rong Xue, Qiang Yang and Yong Yu (2007). Transferring Naive Bayes Classifiers for Text Classification. To appear in Proceedings of Twenty-Second National Conference on Artificial Intelligence, Vancouver, British Columbia, Canada.
Wenyuan Dai, Qiang Yang, Gui-Rong Xue and Yong Yu (2007). Boosting for Transfer Learning. To appear in Proceedings of the Twenty-Fourth International Conference on Machine Learning, Corvallis, Oregon, USA.
