Dimensions and Metrics for Evaluating Recommendation Systems
Recommendation systems support users and developers of various computer and software systems to overcome information overload, perform information discovery tasks, and approximate computation, among others. They have recently become popular and have attracted a wide variety of application scenarios ranging from business process modeling to source code manipulation. Due to this wide variety of application domains, different approaches and metrics have been adopted for their evaluation. In this chapter, we review a range of evaluation metrics and measures as well as some approaches used for evaluating recommendation systems. The metrics presented in this chapter are grouped under sixteen different dimensions, e.g., correctness, novelty, coverage. We review these metrics according to the dimensions to which they correspond. A brief overview of approaches to comprehensive evaluation using collections of recommendation system dimensions and associated metrics is presented. We also provide suggestions for key future research and practice directions.
This is a preview of subscription content, log in via an institution to check access.
Access this chapter
Subscribe and save
Springer+ Basic
€32.70 /Month
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (France)
eBook EUR 117.69 Price includes VAT (France)
Softcover Book EUR 158.24 Price includes VAT (France)
Hardcover Book EUR 158.24 Price includes VAT (France)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Recommender Systems: Sources of Knowledge and Evaluation Metrics
Chapter © 2013
A Review of the Techniques and Evaluation Parameters for Recommendation Systems
Chapter © 2022
Evaluating Recommender Systems
Chapter © 2015
Notes
Editors’ note: This is the notion of macroevaluation ; compare microevaluation .
Editors’ note: The general F-measure allows for unequal but specific costs.
References
- Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17(6), 734–749 (2005). doi:10.1109/TKDE.2005.99 ArticleGoogle Scholar
- Adomavicius, G., Zhang, J.: Iterative smoothing technique for improving stability of recommender systems. In: Proceedings of the Workshop on Recommendation Utility Evaluation: Beyond RMSE. CEUR Workshop Proceedings, vol. 910, pp. 3–8 (2012a) Google Scholar
- Adomavicius, G., Zhang, J.: Stability of recommendation algorithms. ACM Trans. Inform. Syst. 30(4), 23:1–23:31 (2012b). doi:10.1145/2382438.2382442 Google Scholar
- Aïmeur, E., Brassard, G., Fernandez, J.M., Onana, F.S.M.: Alambic: a privacy-preserving recommender system for electronic commerce. Int. J. Inf. Security 7(5), 307–334 (2008). doi:10.1007/s10207-007-0049-3 ArticleGoogle Scholar
- Ashok, B., Joy, J., Liang, H., Rajamani, S.K., Srinivasa, G., Vangala, V.: DebugAdvisor: a recommender system for debugging. In: Proceedings of the European Software Engineering Conference/ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 373–382 (2009). doi:10.1145/1595696.1595766 Google Scholar
- Bell, R., Koren, Y., Volinsky, C.: Modeling relationships at multiple scales to improve accuracy of large recommender systems. In: Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 95–104 (2007). doi:10.1145/1281192.1281206 Google Scholar
- Bonhard, P., Harries, C., McCarthy, J., Sasse, M.A.: Accounting for taste: using profile similarity to improve recommender systems. In: Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems, pp. 1057–1066 (2006). doi:10.1145/1124772.1124930 Google Scholar
- Burke, R.: Hybrid web recommender systems. In: Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.) The Adaptive Web: Methods and Strategies of Web Personalization. Lecture Notes in Computer Science, vol. 4321, pp. 377–408. Springer, New York (2007). doi:10.1007/978-3-540-72079-9_12ChapterGoogle Scholar
- Burke, R., Ramezani, M.: Matching recommendation technologies and domains. In: Ricci, F., Rokach, L., Shapira, B., Kantor, P.B. (eds.) Recommender Systems Handbook, pp. 367–386. Springer, New York (2011). doi:10.1007/978-0-387-85820-3_11ChapterGoogle Scholar
- Calandrino, J.A., Kilzer, A., Narayanan, A., Felten, E.W., Shmatikov, V.: “You might also like”: privacy risks of collaborative filtering. In: Proceedings of the IEEE Symposium on Security and Privacy, pp. 231–246 (2011). doi:10.1109/SP.2011.40 Google Scholar
- Candillier, L., Chevalier, M., Dudognon, D., Mothe, J.: Diversity in recommender systems: bridging the gap between users and systems. In: Proceedings of the International Conference on Advances in Human-Oriented and Personalized Mechanisms, Technologies, and Services, pp. 48–53 (2011) Google Scholar
- Canny, J.: Collaborative filtering with privacy. In: Proceedings of the IEEE Symposium on Security and Privacy, pp. 45–57 (2002). doi:10.1109/SECPRI.2002.1004361 Google Scholar
- Cheetham, W., Price, J.: Measures of solution accuracy in case-based reasoning systems. In: Proceedings of the European Conference on Case-Based Reasoning. Lecture Notes in Computer Science, vol. 3155, pp. 106–118 (2004). doi:10.1007/978-3-540-28631-8_9Google Scholar
- Cramer, H., Evers, V., Ramlal, S., Someren, M., Rutledge, L., Stash, N., Aroyo, L., Wielinga, B.: The effects of transparency on trust in and acceptance of a content-based art recommender. User Model. User-Adap. Interact. 18(5), 455–496 (2008). doi:10.1007/s11257-008-9051-3 ArticleGoogle Scholar
- Čubranić, D., Murphy, G.C., Singer, J., Booth, K.S.: Hipikat: a project memory for software development. IEEE Trans. Software Eng. 31(6), 446–465 (2005). doi:10.1109/TSE.2005.71 ArticleGoogle Scholar
- Das, A.S., Datar, M., Garg, A., Rajaram, S.: Google news personalization: scalable online collaborative filtering. In: Proceedings of the International Conference on the World Wide Web, pp. 271–280 (2007). doi:10.1145/1242572.1242610 Google Scholar
- De Lucia, A., Fasano, F., Oliveto, R., Tortor, G.: Recovering traceability links in software artifact management systems using information retrieval methods. ACM Trans. Software Eng. Methodol. 16(4), 13:1–13:50 (2007). doi:10.1145/1276933.1276934 Google Scholar
- Dolques, X., Dogui, A., Falleri, J.R., Huchard, M., Nebut, C., Pfister, F.: Easing model transformation learning with automatically aligned examples. In: Proceedings of the European Conference on Modelling Foundations and Applications. Lecture Notes in Computer Science, vol. 6698, pp. 189–204 (2011). doi:10.1007/978-3-642-21470-7_14Google Scholar
- Dwork, C.: Differential privacy: a survey of results. In: Proceedings of the International Conference on Theory and Applications of Models of Computation. Lecture Notes in Computer Science, vol. 4978, pp. 1–19 (2008). doi:10.1007/978-3-540-79228-4_1MathSciNetGoogle Scholar
- Ge, M., Delgado-Battenfeld, C., Jannach, D.: Beyond accuracy: evaluating recommender systems by coverage and serendipity. In: Proceedings of the ACM Conference on Recommender Systems, pp. 257–260 (2010). doi:10.1145/1864708.1864761 Google Scholar
- George, T., Merugu, S.: A scalable collaborative filtering framework based on co-clustering. In: Proceedings of the IEEE International Conference on Data Mining (2005). doi:10.1109/ICDM.2005.14 Google Scholar
- Good, N., Schafer, J.B., Konstan, J.A., Borchers, A., Sarwar, B., Herlocker, J., Riedl, J.: Combining collaborative filtering with personal agents for better recommendations. In: Proceedings of the National Conference on Artificial Intelligence and the Conference on Innovative Applications of Artificial Intelligence, pp. 439–446 (1999) Google Scholar
- Han, P., Xie, B., Yang, F., Shen, R.: A scalable P2P recommender system based on distributed collaborative filtering. Expert Syst. Appl. 27(2), 203–210 (2004). doi:10.1016/j.eswa.2004.01.003 ArticleGoogle Scholar
- Happel, H.J., Maalej, W.: Potentials and challenges of recommendation systems for software development. In: Proceedings of the International Workshop on Recommendation Systems for Software Engineering, pp. 11–15 (2008). doi:10.1145/1454247.1454251 Google Scholar
- Herlocker, J.L., Konstan, J.A., Riedl, J.: Explaining collaborative filtering recommendations. In: Proceedings of the ACM Conference on Computer Supported Cooperative Work, pp. 241–250 (2000). doi:10.1145/358916.358995 Google Scholar
- Herlocker, J.L., Konstan, J.A., Terveen, L.G., Riedl, J.T.: Evaluating collaborative filtering recommender systems. ACM Trans. Inform. Syst. 22(1), 5–53 (2004). doi:10.1145/963770.963772 ArticleGoogle Scholar
- Hernández del Olmo, F., Gaudioso, E.: Evaluation of recommender systems: a new approach. Expert Syst. Appl. 35(3), 790–804 (2008). doi:10.1016/j.eswa.2007.07.047 Google Scholar
- Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inform. Syst. 20(4), 422–446 (2002). doi:10.1145/582415.582418 ArticleGoogle Scholar
- Karypis, G.: Evaluation of item-based top-N recommendation algorithms. In: Proceedings of the International Conference on Information and Knowledge Management, pp. 247–254 (2001). doi:10.1145/502585.502627 Google Scholar
- Kendall, M.G.: A new measure of rank correlation. Biometrika 30(1–2), 81–93 (1938) ArticleMATHMathSciNetGoogle Scholar
- Kendall, M.G.: The treatment of ties in ranking problems. Biometrika 33(3), 239–251 (1945) ArticleMATHMathSciNetGoogle Scholar
- Kille, B., Albayrak, S.: Modeling difficulty in recommender systems. In: Proceedings of the Workshop on Recommendation Utility Evaluation: Beyond RMSE. CEUR Workshop Proceedings, vol. 910, pp. 30–32 (2012) Google Scholar
- Kitchenham, B.A., Pfleeger, S.L.: Principles of survey research. Part 3: constructing a survey instrument. SIGSOFT Software Eng. Note. 27(2), 20–24 (2002). doi:10.1145/511152.511155 Google Scholar
- Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009). doi:10.1109/MC.2009.263 ArticleGoogle Scholar
- Koychev, I., Schwab, I.: Adaptation to drifting user’s interests. In: Proceedings of the Workshop on Machine Learning in the New Information Age, pp. 39–46 (2000) Google Scholar
- Krishnamurthy, B., Malandrino, D., Wills, C.E.: Measuring privacy loss and the impact of privacy protection in web browsing. In: Proceedings of the Symposium on Usable Privacy and Security, pp. 52–63 (2007). doi:10.1145/1280680.1280688 Google Scholar
- Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51(2), 181–207 (2003). doi:10.1023/A:1022859003006 ArticleMATHGoogle Scholar
- Lam, S.K., Riedl, J.: Shilling recommender systems for fun and profit. In: Proceedings of the International Conference on the World Wide Web, pp. 393–402 (2004). doi:10.1145/988672.988726 Google Scholar
- Lam, S.K.T., Frankowski, D., Riedl, J.: Do you trust your recommendations?: an exploration of security and privacy issues in recommender systems. In: Proceedings of the International Conference on Emerging Trends in Information and Communication Security. Lecture Notes in Computer Science, vol. 3995, pp. 14–29 (2006). doi:10.1007/11766155_2 Google Scholar
- Lathia, N., Hailes, S., Capra, L., Amatriain, X.: Temporal diversity in recommender systems. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 210–217 (2010). doi:10.1145/1835449.1835486 Google Scholar
- Le, Q.V., Smola, A.J.: Direct optimization of ranking measures. Technical Report (2007) [arXiv:0704.3359] Google Scholar
- Massa, P., Avesani, P.: Trust-aware recommender systems. In: Proceedings of the ACM Conference on Recommender Systems, pp. 17–24 (2007). doi:10.1145/1297231.1297235 Google Scholar
- McCarey, F., Ó Cinnéide, M., Kushmerick, N.: RASCAL: a recommender agent for agile reuse. Artif. Intell. Rev. 24(3–4), 253–276 (2005). doi:10.1007/s10462-005-9012-8 Google Scholar
- McNee, S.M.: Meeting user information needs in recommender systems. Ph.D. thesis, University of Minnesota (2006) Google Scholar
- McNee, S.M., Riedl, J., Konstan, J.A.: Being accurate is not enough: how accuracy metrics have hurt recommender systems. In: Extended Abstracts of the ACM SIGCHI Conference on Human Factors in Computing Systems, pp. 1097–1101 (2006). doi:10.1145/1125451.1125659 Google Scholar
- McSherry, F., Mironov, I.: Differentially private recommender systems: building privacy into the net. In: Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 627–636 (2009). doi:10.1145/1557019.1557090 Google Scholar
- Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: Proceedings of the International Conference on Data Engineering, pp. 117–128 (2002). doi:10.1109/ICDE.2002.994702 Google Scholar
- Meyer, F., Fessant, F., Clérot, F., Gaussier, E.: Toward a new protocol to evaluate recommender systems. In: Proceedings of the Workshop on Recommendation Utility Evaluation: Beyond RMSE. CEUR Workshop Proceedings, vol. 910, pp. 9–14 (2012) Google Scholar
- Mobasher, B., Burke, R., Bhaumik, R., Williams, C.: Toward trustworthy recommender systems: an analysis of attack models and algorithm robustness. ACM Trans. Inter. Tech. 7(4), 23:1–23:38 (2007). doi:10.1145/1278366.1278372 Google Scholar
- Mockus, A., Herbsleb, J.D.: Expertise Browser: a quantitative approach to identifying expertise. In: Proceedings of the ACM/IEEE International Conference on Software Engineering, pp. 503–512 (2002). doi:10.1145/581339.581401 Google Scholar
- Nielsen, J.: Usability Engineering. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1993) MATHGoogle Scholar
- O’Donovan, J., Smyth, B.: Trust in recommender systems. In: Proceedings of the International Conference on Intelligent User Interfaces, pp. 167–174 (2005). doi:10.1145/1040830.1040870 Google Scholar
- O’Mahony, M., Hurley, N., Kushmerick, N., Silvestre, G.: Collaborative recommendation: a robustness analysis. ACM Trans. Inter. Tech. 4(4), 344–377 (2004). doi:10.1145/1031114.1031116 ArticleGoogle Scholar
- Oxford Dictionaries: Oxford Dictionary of English. 3rd edn. Oxford: Oxford University Press, UK (2010) Google Scholar
- Ozok, A.A., Fan, Q., Norcio, A.F.: Design guidelines for effective recommender system interfaces based on a usability criteria conceptual model: results from a college student population. Behav. Inf. Technol. 29(1), 57–83 (2010). doi:10.1080/01449290903004012 ArticleGoogle Scholar
- Quinlan, J. R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1993) Google Scholar
- Ramakrishnan, N., Keller, B.J., Mirza, B.J., Grama, A.Y., Karypis, G.: Privacy risks in recommender systems. IEEE Internet Comput. 5(6), 54–62 (2001). doi:10.1109/4236.968832 ArticleGoogle Scholar
- Rashid, A.M., Albert, I., Cosley, D., Lam, S.K., McNee, S.M., Konstan, J.A., Riedl, J.: Getting to know you: learning new user preferences in recommender systems. In: Proceedings of the International Conference on Intelligent User Interfaces, pp. 127–134 (2002). doi:10.1145/502716.502737 Google Scholar
- Robillard, M.P.: Topology analysis of software dependencies. ACM Trans. Software Eng. Methodol. 17(4), 18:1–18:36 (2008). doi:10.1145/13487689.13487691 Google Scholar
- Robillard, M.P., Walker, R.J., Zimmermann, T.: Recommendation systems for software engineering. IEEE Software 27(4), 80–86 (2010). doi:10.1109/MS.2009.161 ArticleGoogle Scholar
- Rubens, N., Kaplan, D., Sugiyama, M.: Active learning in recommender systems. In: Ricci, F., Rokach, L., Shapira, B., Kantor, P.B. (eds.) Recommender Systems Handbook, pp. 735–767. Springer, New York (2011). doi:10.1007/978-0-387-85820-3_23ChapterGoogle Scholar
- Said, A., Tikk, D., Shi, Y., Larson, M., Stumpf, K., Cremonesi, P.: Recommender systems evaluation: a 3D benchmark. In: Proceedings of the Workshop on Recommendation Utility Evaluation: Beyond RMSE. CEUR Workshop Proceedings, vol. 910, pp. 21–23 (2012) Google Scholar
- Salfner, F., Lenk, M., Malek, M.: A survey of online failure prediction methods. ACM Comput. Surv. 42(3), 10:1–10:42 (2010). doi:10.1145/1670679.1670680 Google Scholar
- Sandvig, J.J., Mobasher, B., Burke, R.: Robustness of collaborative recommendation based on association rule mining. In: Proceedings of the ACM Conference on Recommender Systems, pp. 105–112 (2007). doi:10.1145/1297231.1297249 Google Scholar
- Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Application of dimensionality reduction in recommender system: a case study. Technical Report 00-043, Department of Computer Science & Engineering, University of Minnesota (2000) Google Scholar
- Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Item-based collaborative filtering recommendation algorithms. In: Proceedings of the International Conference on the World Wide Web, pp. 285–295 (2001). doi:10.1145/371920.372071 Google Scholar
- Schein, A.I., Popescul, A., Ungar, L.H., Pennock, D.M.: Methods and metrics for cold-start recommendations. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 253–260 (2002). doi:10.1145/564376.564421 Google Scholar
- Schroder, G., Thiele, M., Lehner, W.: Setting goals and choosing metrics for recommender system evaluation. In: Proceedings of the Workshop on Human Decision Making in Recommender Systems and User-Centric Evaluation of Recommender Systems and Their Interfaces. CEUR Workshop Proceedings, vol. 811, pp. 78–85 (2011) Google Scholar
- Seminario, C.E., Wilson, D.C.: Robustness and accuracy tradeoffs for recommender systems under attack. In: Proceedings of the Florida Artificial Intelligence Research Society Conference, pp. 86–91 (2012) Google Scholar
- Shani, G., Gunawardana, A.: Evaluating recommendation systems. In: Ricci, F., Rokach, L., Shapira, B., Kantor, P.B. (eds.) Recommender Systems Handbook, pp. 257–297. Springer, New York (2011). doi:10.1007/978-0-387-85820-3_8ChapterGoogle Scholar
- Simon, F., Steinbrückner, F., Lewerentz, C.: Metrics based refactoring. In: Proceedings of the European Conference on Software Maintenance and Reengineering, pp. 30–38 (2001). doi:10.1109/.2001.914965 Google Scholar
- Sinha, R., Swearingen, K.: The role of transparency in recommender systems. In: Extended Abstracts of the ACM SIGCHI Conference on Human Factors in Computing Systems, pp. 830–831 (2002). doi:10.1145/506443.506619 Google Scholar
- Smyth, B., McClave, P.: Similarity vs. diversity. In: Proceedings of the International Conference on Case-Based Reasoning. Lecture Notes in Computer Science, vol. 2080, pp. 347–361 (2001). doi:10.1007/3-540-44593-5_25ArticleGoogle Scholar
- Spearman, C.: The proof and measurement of association between two things. Am. J. Psychol. 15(1), 72–101 (1904). doi:10.2307/1412159 ArticleGoogle Scholar
- Su, X., Khoshgoftaar, T.M.: A survey of collaborative filtering techniques. Adv. Artif. Intell. 2009, 421425:1–421425:19 (2009). doi:10.1155/2009/421425 Google Scholar
- Thummalapenta, S., Xie, T.: PARSEWeb: a programmer assistant for reusing open source code on the web. In: Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, pp. 204–213 (2007). doi:10.1145/1321631.1321663 Google Scholar
- Tintarev, N., Masthoff, J.: A survey of explanations in recommender systems. In: Proceedings of the IEEE International Workshop on Web Personalisation, Recommender Systems and Intelligent User Interfaces, pp. 801–810 (2007). doi:10.1109/ICDEW.2007.4401070 Google Scholar
- Weimer, M., Karatzoglou, A., Le, Q.V., Smola, A.: CoFi RANK : maximum margin matrix factorization for collaborative ranking. In: Proceedings of the Annual Conference on Neural Information Processing Systems, pp. 222–230 (2007) Google Scholar
- Yao, Y.Y.: Measuring retrieval effectiveness based on user preference of documents. J. Am. Soc. Inform. Sci. Technol. 46(2), 133–145 (1995). doi:10.1002/(SICI)1097-4571(199503)46:2⟨133::AID-ASI6⟩3.0.CO;2-Z ArticleGoogle Scholar
- Ye, Y., Fischer, G.: Reuse-conducive development environments. Automat. Software Eng. Int. J. 12(2), 199–235 (2005). doi:10.1007/s10515-005-6206-x ArticleGoogle Scholar
- Ziegler, C.N., McNee, S.M., Konstan, J.A., Lausen, G.: Improving recommendation lists through topic diversification. In: Proceedings of the International Conference on the World Wide Web, pp. 22–32 (2005). doi:10.1145/1060745.1060754 Google Scholar
Author information
Authors and Affiliations
- Faculty of ICT, Centre for Computing and Engineering Software and Systems (SUCCESS), Swinburne University of Technology, Hawthorn, Australia Iman Avazpour & John Grundy
- Institute of Software Technology, Universität Stuttgart, Stuttgart, Germany Teerat Pitakrat & Lars Grunske
- Iman Avazpour