Dimensions and Metrics for Evaluating Recommendation Systems

Recommendation systems support users and developers of various computer and software systems to overcome information overload, perform information discovery tasks, and approximate computation, among others. They have recently become popular and have attracted a wide variety of application scenarios ranging from business process modeling to source code manipulation. Due to this wide variety of application domains, different approaches and metrics have been adopted for their evaluation. In this chapter, we review a range of evaluation metrics and measures as well as some approaches used for evaluating recommendation systems. The metrics presented in this chapter are grouped under sixteen different dimensions, e.g., correctness, novelty, coverage. We review these metrics according to the dimensions to which they correspond. A brief overview of approaches to comprehensive evaluation using collections of recommendation system dimensions and associated metrics is presented. We also provide suggestions for key future research and practice directions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic €32.70 /Month

Buy Now

Price includes VAT (France)

eBook EUR 117.69 Price includes VAT (France)

Softcover Book EUR 158.24 Price includes VAT (France)

Hardcover Book EUR 158.24 Price includes VAT (France)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Similar content being viewed by others

Recommender Systems: Sources of Knowledge and Evaluation Metrics

Chapter © 2013

A Review of the Techniques and Evaluation Parameters for Recommendation Systems

Chapter © 2022

Evaluating Recommender Systems

Chapter © 2015

Notes

Editors’ note: This is the notion of macroevaluation ; compare microevaluation . Editors’ note: The general F-measure allows for unequal but specific costs.

References

  1. Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17(6), 734–749 (2005). doi:10.1109/TKDE.2005.99 ArticleGoogle Scholar
  2. Adomavicius, G., Zhang, J.: Iterative smoothing technique for improving stability of recommender systems. In: Proceedings of the Workshop on Recommendation Utility Evaluation: Beyond RMSE. CEUR Workshop Proceedings, vol. 910, pp. 3–8 (2012a) Google Scholar
  3. Adomavicius, G., Zhang, J.: Stability of recommendation algorithms. ACM Trans. Inform. Syst. 30(4), 23:1–23:31 (2012b). doi:10.1145/2382438.2382442 Google Scholar
  4. Aïmeur, E., Brassard, G., Fernandez, J.M., Onana, F.S.M.: Alambic: a privacy-preserving recommender system for electronic commerce. Int. J. Inf. Security 7(5), 307–334 (2008). doi:10.1007/s10207-007-0049-3 ArticleGoogle Scholar
  5. Ashok, B., Joy, J., Liang, H., Rajamani, S.K., Srinivasa, G., Vangala, V.: DebugAdvisor: a recommender system for debugging. In: Proceedings of the European Software Engineering Conference/ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 373–382 (2009). doi:10.1145/1595696.1595766 Google Scholar
  6. Bell, R., Koren, Y., Volinsky, C.: Modeling relationships at multiple scales to improve accuracy of large recommender systems. In: Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 95–104 (2007). doi:10.1145/1281192.1281206 Google Scholar
  7. Bonhard, P., Harries, C., McCarthy, J., Sasse, M.A.: Accounting for taste: using profile similarity to improve recommender systems. In: Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems, pp. 1057–1066 (2006). doi:10.1145/1124772.1124930 Google Scholar
  8. Burke, R.: Hybrid web recommender systems. In: Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.) The Adaptive Web: Methods and Strategies of Web Personalization. Lecture Notes in Computer Science, vol. 4321, pp. 377–408. Springer, New York (2007). doi:10.1007/978-3-540-72079-9_12ChapterGoogle Scholar
  9. Burke, R., Ramezani, M.: Matching recommendation technologies and domains. In: Ricci, F., Rokach, L., Shapira, B., Kantor, P.B. (eds.) Recommender Systems Handbook, pp. 367–386. Springer, New York (2011). doi:10.1007/978-0-387-85820-3_11ChapterGoogle Scholar
  10. Calandrino, J.A., Kilzer, A., Narayanan, A., Felten, E.W., Shmatikov, V.: “You might also like”: privacy risks of collaborative filtering. In: Proceedings of the IEEE Symposium on Security and Privacy, pp. 231–246 (2011). doi:10.1109/SP.2011.40 Google Scholar
  11. Candillier, L., Chevalier, M., Dudognon, D., Mothe, J.: Diversity in recommender systems: bridging the gap between users and systems. In: Proceedings of the International Conference on Advances in Human-Oriented and Personalized Mechanisms, Technologies, and Services, pp. 48–53 (2011) Google Scholar
  12. Canny, J.: Collaborative filtering with privacy. In: Proceedings of the IEEE Symposium on Security and Privacy, pp. 45–57 (2002). doi:10.1109/SECPRI.2002.1004361 Google Scholar
  13. Cheetham, W., Price, J.: Measures of solution accuracy in case-based reasoning systems. In: Proceedings of the European Conference on Case-Based Reasoning. Lecture Notes in Computer Science, vol. 3155, pp. 106–118 (2004). doi:10.1007/978-3-540-28631-8_9Google Scholar
  14. Cramer, H., Evers, V., Ramlal, S., Someren, M., Rutledge, L., Stash, N., Aroyo, L., Wielinga, B.: The effects of transparency on trust in and acceptance of a content-based art recommender. User Model. User-Adap. Interact. 18(5), 455–496 (2008). doi:10.1007/s11257-008-9051-3 ArticleGoogle Scholar
  15. Čubranić, D., Murphy, G.C., Singer, J., Booth, K.S.: Hipikat: a project memory for software development. IEEE Trans. Software Eng. 31(6), 446–465 (2005). doi:10.1109/TSE.2005.71 ArticleGoogle Scholar
  16. Das, A.S., Datar, M., Garg, A., Rajaram, S.: Google news personalization: scalable online collaborative filtering. In: Proceedings of the International Conference on the World Wide Web, pp. 271–280 (2007). doi:10.1145/1242572.1242610 Google Scholar
  17. De Lucia, A., Fasano, F., Oliveto, R., Tortor, G.: Recovering traceability links in software artifact management systems using information retrieval methods. ACM Trans. Software Eng. Methodol. 16(4), 13:1–13:50 (2007). doi:10.1145/1276933.1276934 Google Scholar
  18. Dolques, X., Dogui, A., Falleri, J.R., Huchard, M., Nebut, C., Pfister, F.: Easing model transformation learning with automatically aligned examples. In: Proceedings of the European Conference on Modelling Foundations and Applications. Lecture Notes in Computer Science, vol. 6698, pp. 189–204 (2011). doi:10.1007/978-3-642-21470-7_14Google Scholar
  19. Dwork, C.: Differential privacy: a survey of results. In: Proceedings of the International Conference on Theory and Applications of Models of Computation. Lecture Notes in Computer Science, vol. 4978, pp. 1–19 (2008). doi:10.1007/978-3-540-79228-4_1MathSciNetGoogle Scholar
  20. Ge, M., Delgado-Battenfeld, C., Jannach, D.: Beyond accuracy: evaluating recommender systems by coverage and serendipity. In: Proceedings of the ACM Conference on Recommender Systems, pp. 257–260 (2010). doi:10.1145/1864708.1864761 Google Scholar
  21. George, T., Merugu, S.: A scalable collaborative filtering framework based on co-clustering. In: Proceedings of the IEEE International Conference on Data Mining (2005). doi:10.1109/ICDM.2005.14 Google Scholar
  22. Good, N., Schafer, J.B., Konstan, J.A., Borchers, A., Sarwar, B., Herlocker, J., Riedl, J.: Combining collaborative filtering with personal agents for better recommendations. In: Proceedings of the National Conference on Artificial Intelligence and the Conference on Innovative Applications of Artificial Intelligence, pp. 439–446 (1999) Google Scholar
  23. Han, P., Xie, B., Yang, F., Shen, R.: A scalable P2P recommender system based on distributed collaborative filtering. Expert Syst. Appl. 27(2), 203–210 (2004). doi:10.1016/j.eswa.2004.01.003 ArticleGoogle Scholar
  24. Happel, H.J., Maalej, W.: Potentials and challenges of recommendation systems for software development. In: Proceedings of the International Workshop on Recommendation Systems for Software Engineering, pp. 11–15 (2008). doi:10.1145/1454247.1454251 Google Scholar
  25. Herlocker, J.L., Konstan, J.A., Riedl, J.: Explaining collaborative filtering recommendations. In: Proceedings of the ACM Conference on Computer Supported Cooperative Work, pp. 241–250 (2000). doi:10.1145/358916.358995 Google Scholar
  26. Herlocker, J.L., Konstan, J.A., Terveen, L.G., Riedl, J.T.: Evaluating collaborative filtering recommender systems. ACM Trans. Inform. Syst. 22(1), 5–53 (2004). doi:10.1145/963770.963772 ArticleGoogle Scholar
  27. Hernández del Olmo, F., Gaudioso, E.: Evaluation of recommender systems: a new approach. Expert Syst. Appl. 35(3), 790–804 (2008). doi:10.1016/j.eswa.2007.07.047 Google Scholar
  28. Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inform. Syst. 20(4), 422–446 (2002). doi:10.1145/582415.582418 ArticleGoogle Scholar
  29. Karypis, G.: Evaluation of item-based top-N recommendation algorithms. In: Proceedings of the International Conference on Information and Knowledge Management, pp. 247–254 (2001). doi:10.1145/502585.502627 Google Scholar
  30. Kendall, M.G.: A new measure of rank correlation. Biometrika 30(1–2), 81–93 (1938) ArticleMATHMathSciNetGoogle Scholar
  31. Kendall, M.G.: The treatment of ties in ranking problems. Biometrika 33(3), 239–251 (1945) ArticleMATHMathSciNetGoogle Scholar
  32. Kille, B., Albayrak, S.: Modeling difficulty in recommender systems. In: Proceedings of the Workshop on Recommendation Utility Evaluation: Beyond RMSE. CEUR Workshop Proceedings, vol. 910, pp. 30–32 (2012) Google Scholar
  33. Kitchenham, B.A., Pfleeger, S.L.: Principles of survey research. Part 3: constructing a survey instrument. SIGSOFT Software Eng. Note. 27(2), 20–24 (2002). doi:10.1145/511152.511155 Google Scholar
  34. Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009). doi:10.1109/MC.2009.263 ArticleGoogle Scholar
  35. Koychev, I., Schwab, I.: Adaptation to drifting user’s interests. In: Proceedings of the Workshop on Machine Learning in the New Information Age, pp. 39–46 (2000) Google Scholar
  36. Krishnamurthy, B., Malandrino, D., Wills, C.E.: Measuring privacy loss and the impact of privacy protection in web browsing. In: Proceedings of the Symposium on Usable Privacy and Security, pp. 52–63 (2007). doi:10.1145/1280680.1280688 Google Scholar
  37. Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51(2), 181–207 (2003). doi:10.1023/A:1022859003006 ArticleMATHGoogle Scholar
  38. Lam, S.K., Riedl, J.: Shilling recommender systems for fun and profit. In: Proceedings of the International Conference on the World Wide Web, pp. 393–402 (2004). doi:10.1145/988672.988726 Google Scholar
  39. Lam, S.K.T., Frankowski, D., Riedl, J.: Do you trust your recommendations?: an exploration of security and privacy issues in recommender systems. In: Proceedings of the International Conference on Emerging Trends in Information and Communication Security. Lecture Notes in Computer Science, vol. 3995, pp. 14–29 (2006). doi:10.1007/11766155_2 Google Scholar
  40. Lathia, N., Hailes, S., Capra, L., Amatriain, X.: Temporal diversity in recommender systems. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 210–217 (2010). doi:10.1145/1835449.1835486 Google Scholar
  41. Le, Q.V., Smola, A.J.: Direct optimization of ranking measures. Technical Report (2007) [arXiv:0704.3359] Google Scholar
  42. Massa, P., Avesani, P.: Trust-aware recommender systems. In: Proceedings of the ACM Conference on Recommender Systems, pp. 17–24 (2007). doi:10.1145/1297231.1297235 Google Scholar
  43. McCarey, F., Ó Cinnéide, M., Kushmerick, N.: RASCAL: a recommender agent for agile reuse. Artif. Intell. Rev. 24(3–4), 253–276 (2005). doi:10.1007/s10462-005-9012-8 Google Scholar
  44. McNee, S.M.: Meeting user information needs in recommender systems. Ph.D. thesis, University of Minnesota (2006) Google Scholar
  45. McNee, S.M., Riedl, J., Konstan, J.A.: Being accurate is not enough: how accuracy metrics have hurt recommender systems. In: Extended Abstracts of the ACM SIGCHI Conference on Human Factors in Computing Systems, pp. 1097–1101 (2006). doi:10.1145/1125451.1125659 Google Scholar
  46. McSherry, F., Mironov, I.: Differentially private recommender systems: building privacy into the net. In: Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 627–636 (2009). doi:10.1145/1557019.1557090 Google Scholar
  47. Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: Proceedings of the International Conference on Data Engineering, pp. 117–128 (2002). doi:10.1109/ICDE.2002.994702 Google Scholar
  48. Meyer, F., Fessant, F., Clérot, F., Gaussier, E.: Toward a new protocol to evaluate recommender systems. In: Proceedings of the Workshop on Recommendation Utility Evaluation: Beyond RMSE. CEUR Workshop Proceedings, vol. 910, pp. 9–14 (2012) Google Scholar
  49. Mobasher, B., Burke, R., Bhaumik, R., Williams, C.: Toward trustworthy recommender systems: an analysis of attack models and algorithm robustness. ACM Trans. Inter. Tech. 7(4), 23:1–23:38 (2007). doi:10.1145/1278366.1278372 Google Scholar
  50. Mockus, A., Herbsleb, J.D.: Expertise Browser: a quantitative approach to identifying expertise. In: Proceedings of the ACM/IEEE International Conference on Software Engineering, pp. 503–512 (2002). doi:10.1145/581339.581401 Google Scholar
  51. Nielsen, J.: Usability Engineering. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1993) MATHGoogle Scholar
  52. O’Donovan, J., Smyth, B.: Trust in recommender systems. In: Proceedings of the International Conference on Intelligent User Interfaces, pp. 167–174 (2005). doi:10.1145/1040830.1040870 Google Scholar
  53. O’Mahony, M., Hurley, N., Kushmerick, N., Silvestre, G.: Collaborative recommendation: a robustness analysis. ACM Trans. Inter. Tech. 4(4), 344–377 (2004). doi:10.1145/1031114.1031116 ArticleGoogle Scholar
  54. Oxford Dictionaries: Oxford Dictionary of English. 3rd edn. Oxford: Oxford University Press, UK (2010) Google Scholar
  55. Ozok, A.A., Fan, Q., Norcio, A.F.: Design guidelines for effective recommender system interfaces based on a usability criteria conceptual model: results from a college student population. Behav. Inf. Technol. 29(1), 57–83 (2010). doi:10.1080/01449290903004012 ArticleGoogle Scholar
  56. Quinlan, J. R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1993) Google Scholar
  57. Ramakrishnan, N., Keller, B.J., Mirza, B.J., Grama, A.Y., Karypis, G.: Privacy risks in recommender systems. IEEE Internet Comput. 5(6), 54–62 (2001). doi:10.1109/4236.968832 ArticleGoogle Scholar
  58. Rashid, A.M., Albert, I., Cosley, D., Lam, S.K., McNee, S.M., Konstan, J.A., Riedl, J.: Getting to know you: learning new user preferences in recommender systems. In: Proceedings of the International Conference on Intelligent User Interfaces, pp. 127–134 (2002). doi:10.1145/502716.502737 Google Scholar
  59. Robillard, M.P.: Topology analysis of software dependencies. ACM Trans. Software Eng. Methodol. 17(4), 18:1–18:36 (2008). doi:10.1145/13487689.13487691 Google Scholar
  60. Robillard, M.P., Walker, R.J., Zimmermann, T.: Recommendation systems for software engineering. IEEE Software 27(4), 80–86 (2010). doi:10.1109/MS.2009.161 ArticleGoogle Scholar
  61. Rubens, N., Kaplan, D., Sugiyama, M.: Active learning in recommender systems. In: Ricci, F., Rokach, L., Shapira, B., Kantor, P.B. (eds.) Recommender Systems Handbook, pp. 735–767. Springer, New York (2011). doi:10.1007/978-0-387-85820-3_23ChapterGoogle Scholar
  62. Said, A., Tikk, D., Shi, Y., Larson, M., Stumpf, K., Cremonesi, P.: Recommender systems evaluation: a 3D benchmark. In: Proceedings of the Workshop on Recommendation Utility Evaluation: Beyond RMSE. CEUR Workshop Proceedings, vol. 910, pp. 21–23 (2012) Google Scholar
  63. Salfner, F., Lenk, M., Malek, M.: A survey of online failure prediction methods. ACM Comput. Surv. 42(3), 10:1–10:42 (2010). doi:10.1145/1670679.1670680 Google Scholar
  64. Sandvig, J.J., Mobasher, B., Burke, R.: Robustness of collaborative recommendation based on association rule mining. In: Proceedings of the ACM Conference on Recommender Systems, pp. 105–112 (2007). doi:10.1145/1297231.1297249 Google Scholar
  65. Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Application of dimensionality reduction in recommender system: a case study. Technical Report 00-043, Department of Computer Science & Engineering, University of Minnesota (2000) Google Scholar
  66. Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Item-based collaborative filtering recommendation algorithms. In: Proceedings of the International Conference on the World Wide Web, pp. 285–295 (2001). doi:10.1145/371920.372071 Google Scholar
  67. Schein, A.I., Popescul, A., Ungar, L.H., Pennock, D.M.: Methods and metrics for cold-start recommendations. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 253–260 (2002). doi:10.1145/564376.564421 Google Scholar
  68. Schroder, G., Thiele, M., Lehner, W.: Setting goals and choosing metrics for recommender system evaluation. In: Proceedings of the Workshop on Human Decision Making in Recommender Systems and User-Centric Evaluation of Recommender Systems and Their Interfaces. CEUR Workshop Proceedings, vol. 811, pp. 78–85 (2011) Google Scholar
  69. Seminario, C.E., Wilson, D.C.: Robustness and accuracy tradeoffs for recommender systems under attack. In: Proceedings of the Florida Artificial Intelligence Research Society Conference, pp. 86–91 (2012) Google Scholar
  70. Shani, G., Gunawardana, A.: Evaluating recommendation systems. In: Ricci, F., Rokach, L., Shapira, B., Kantor, P.B. (eds.) Recommender Systems Handbook, pp. 257–297. Springer, New York (2011). doi:10.1007/978-0-387-85820-3_8ChapterGoogle Scholar
  71. Simon, F., Steinbrückner, F., Lewerentz, C.: Metrics based refactoring. In: Proceedings of the European Conference on Software Maintenance and Reengineering, pp. 30–38 (2001). doi:10.1109/.2001.914965 Google Scholar
  72. Sinha, R., Swearingen, K.: The role of transparency in recommender systems. In: Extended Abstracts of the ACM SIGCHI Conference on Human Factors in Computing Systems, pp. 830–831 (2002). doi:10.1145/506443.506619 Google Scholar
  73. Smyth, B., McClave, P.: Similarity vs. diversity. In: Proceedings of the International Conference on Case-Based Reasoning. Lecture Notes in Computer Science, vol. 2080, pp. 347–361 (2001). doi:10.1007/3-540-44593-5_25ArticleGoogle Scholar
  74. Spearman, C.: The proof and measurement of association between two things. Am. J. Psychol. 15(1), 72–101 (1904). doi:10.2307/1412159 ArticleGoogle Scholar
  75. Su, X., Khoshgoftaar, T.M.: A survey of collaborative filtering techniques. Adv. Artif. Intell. 2009, 421425:1–421425:19 (2009). doi:10.1155/2009/421425 Google Scholar
  76. Thummalapenta, S., Xie, T.: PARSEWeb: a programmer assistant for reusing open source code on the web. In: Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, pp. 204–213 (2007). doi:10.1145/1321631.1321663 Google Scholar
  77. Tintarev, N., Masthoff, J.: A survey of explanations in recommender systems. In: Proceedings of the IEEE International Workshop on Web Personalisation, Recommender Systems and Intelligent User Interfaces, pp. 801–810 (2007). doi:10.1109/ICDEW.2007.4401070 Google Scholar
  78. Weimer, M., Karatzoglou, A., Le, Q.V., Smola, A.: CoFi RANK : maximum margin matrix factorization for collaborative ranking. In: Proceedings of the Annual Conference on Neural Information Processing Systems, pp. 222–230 (2007) Google Scholar
  79. Yao, Y.Y.: Measuring retrieval effectiveness based on user preference of documents. J. Am. Soc. Inform. Sci. Technol. 46(2), 133–145 (1995). doi:10.1002/(SICI)1097-4571(199503)46:2⟨133::AID-ASI6⟩3.0.CO;2-Z ArticleGoogle Scholar
  80. Ye, Y., Fischer, G.: Reuse-conducive development environments. Automat. Software Eng. Int. J. 12(2), 199–235 (2005). doi:10.1007/s10515-005-6206-x ArticleGoogle Scholar
  81. Ziegler, C.N., McNee, S.M., Konstan, J.A., Lausen, G.: Improving recommendation lists through topic diversification. In: Proceedings of the International Conference on the World Wide Web, pp. 22–32 (2005). doi:10.1145/1060745.1060754 Google Scholar

Author information

Authors and Affiliations

  1. Faculty of ICT, Centre for Computing and Engineering Software and Systems (SUCCESS), Swinburne University of Technology, Hawthorn, Australia Iman Avazpour & John Grundy
  2. Institute of Software Technology, Universität Stuttgart, Stuttgart, Germany Teerat Pitakrat & Lars Grunske
  1. Iman Avazpour