Applications of data mining in software engineering

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Abstract

Software engineering processes are complex, and the related activities often produce a large number and variety of artefacts, making them well-suited to data mining. Recent years have seen an increase in the use of data mining techniques on such artefacts with the goal of analysing and improving software processes for a given organisation or project. After a brief survey of current uses, we offer insight into how data mining can make a significant contribution to the success of current software engineering efforts.

References

Alonso, O., Devanbu, P.T. and Gertz, M. (2006) 'Extraction of contributor information from software repositories', available at http://wwwcsif.cs.ucdavis.edu/alonsoom/contributor information adg.pdf.

Antoniol, G., Guéhéneuc, Y.G., Merlo, E. and Tonella, P. (2007) 'Mining the lexicon used by programmers during software evolution', in Proceedings of the IEEE International Conference on Software Maintenance, pp.14-23.

Anvik, J. (2006) 'Automating bug report assignment', in Proceedings of the 28th International Conference on Software Engineering, pp.937-940.

Anvik, J., Hiew, L. and Murphy, G.C. (2005) 'Coping with an open bug repository', in Proceedings of the OOPSLA Workshop on Eclipse Technology eXchange, pp.35-39.

Anvik, J., Hiew, L. and Murphy, G.C. (2006) 'Who should fix this bug?', in Proceedings of the 28th International Conference on Software Engineering, pp.361-370.

Atkins, D., Ball, T., Graves, T. and Mockus, A. (1999) 'Using version control data to evaluate the impact of software tools', in Proceedings of the 21st International Conference on Software Engineering, pp.324-333.

Ball, T., Kim, J.M., Porter, A.A. and Siy, H.P. (1997) 'If your version control system could talk. ', in Proceedings of the Workshop on Process Modelling and Empirical Studies of Software Engineering.

Bird, C., Gourley, A., Devanbu, P., Gertz, M. and Swaminathan, A. (2006) 'Mining email social networks', in Proceedings of the International Workshop on Mining Software Repositories, pp.137-143.

Canfora, G. and Cerulo, L. (2005) 'Impact analysis by mining software and change request repositories', in Proceedings of the 11th IEEE International Software Metrics Symposium, p.29.

Chen, A., Chou, E., Wong, J., Yao, A.Y., Zhang, Q., Zhang, S. and Michail, A. (2001) 'Cvssearch: searching through source code using CVS comments', in Proceedings of the IEEE International Conference on Software Maintenance, pp.364-373.

Christodorescu, M., Jha, S. and Kruegel, C. (2007) 'Mining specifications of malicious behavior', in Proceedings of the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, pp.5-14.

¿ubranic, D. and Murphy, G.C. (2004) 'Automatic bug triage using text classification', in Proceedings of the 16th International Conference on Software Engineering & Knowledge Engineering, pp.92-97.

Dickinson, W., Leon, D. and Podgurski, A. (2001) 'Finding failures by cluster analysis of execution profiles', in Proceedings of the 23rd International Conference on Software Engineering, pp.339-348.

Ducasse, S., Rieger, M. and Demeyer, S. (1999) 'A language independent approach for detecting duplicated code', in Proceedings of the IEEE International Conference on Software Maintenance, pp.109-118.

Gall, H.C. and Lanza, M. (2006) 'Software evolution: analysis and visualization', in Proceedings of the 28th International Conference on Software Engineering, pp.1055-1056.

Hassan, A.E. (2006) 'Mining software repositories to assist developers and support managers', in Proceedings of the 22nd IEEE International Conference on Software Maintenance, pp.339-342.

Howison, J. and Crowston, K. (2004) 'The perils and pitfalls of mining sourceforge', in Proceedings of the International Workshop on Mining Software Repositories.

Kagdi, H., Collard, M.L. and Maletic, J.I. (2007) 'A survey and taxonomy of approaches for mining software repositories in the context of software evolution', Journal of Software Maintenance and Evolution: Research and Practice, Vol. 19, No. 2, pp.77-131.

Kagdi, H., Yusuf, S. and Maletic, J.I. (2006) 'Mining sequences of changed-files from version histories', in Proceedings of the International Workshop on Mining Software Repositories, pp.47-53.

Kim, S. and Ernst, M.D. (2007) 'Which warnings should I fix first?', in Proceedings of the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, pp.45-54.

Liblit, B., Naik, M., Zheng, A.X., Aiken, A. and Jordan, M.I. (2005) 'Scalable statistical bug isolation', in Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp.15-26.

Liu, C. and Han, J. (2006) 'Failure proximity: a fault localization-based approach', in Proceedings of the 14th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp.46-56.

Livshits, B. and Zimmermann, T. (2005) 'Dynamine: finding common error patterns by mining software revision histories', ACM SIGSOFT Software Engineering Notes, Vol. 30, No. 5, pp.296-305.

Lotka, A.J. (1926) 'The frequency distribution of scientific productivity', Journal of the Washington Academy of Sciences, Vol. 16, No. 12, pp.317-324.

Mendonca, M. and Sunderhaft, N. (1999) 'Mining software engineering data: a survey', Data & Analysis Center for Software (DACS) State-of-the-Art Report, No. DACS-SOAR-99-3.

Mens, T. and Demeyer, S. (2001) 'Future trends in software evolution metrics', in Proceedings of the 4th International Workshop on Principles of Software Evolution, pp.83-86.

Mockus, A., Eick, S.G., Graves, T.L. and Karr, A.F. (1999) 'On measurement and analysis of software changes', Technical report, National Institute of Statistical Sciences.

Mockus, A., Weiss, D.M. and Zhang, P. (2003) 'Understanding and predicting effort in software projects', in Proceedings of the 25th International Conference on Software Engineering, pp.274-284.

Nainar, P.A., Chen, T., Rosin, J. and Liblit, B. (2007) 'Statistical debugging using compound Boolean predicates', in Proceedings of the International Symposium on Software Testing and Analysis, pp.5-15.

Newby, G.B., Greenberg, J. and Jones, P. (2003) 'Open source software development and Lotka's Law: bibliometric patterns in programming', Journal of the American Society for Information Science and Technology, Vol. 54, No. 2, pp.169-178.

Robles, G., González-Barahona, J.M. and Ghosh, R.A. (2004) 'Gluetheos: automating the retrieval and analysis of data from publicly available software repositories', in Proceedings of the International Workshop on Mining Software Repositories, pp.28-31.

Runeson, P., Alexandersson, M. and Nyholm, O. (2007) 'Detection of duplicate defect reports using natural language processing', in Proceedings of the 29th International Conference on Software Engineering, pp.499-510.

Scotto, M., Sillitti, A., Succi, G. and Vernazza, T. (2006) 'A non-invasive approach to product metrics collection', Journal of Systems Architecture, Vol. 52, No. 11, pp.668-675.

Shirabad, J.S., Lethbridge, T.C. and Matwin, S. (2001) 'Supporting software maintenance by mining software update records', in Proceedings of the IEEE International Conference on Software Maintenance, pp.22-31.

Sliwerski, J., Zimmermann, T. and Zeller, A. (2005) 'When do changes induce fixes?', ACM SIGSOFT Software Engineering Notes, Vol. 30, No. 4, pp.1-5.

Tan, L., Yuan, D., Krishna, G. and Zhou, Y. (2007) '/*icomment: bugs or bad comments?*/', in Proceedings of the 21st ACM Symposium on Operating Systems Principles, pp.145-158.

Wasylkowski, A., Zeller, A. and Lindig, C. (2007) 'Detecting object usage anomalies', in Proceedings of the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, pp.35-44.

Weimer, W. and Necula, G.C. (2005) 'Mining temporal specifications for error detection', in Proceedings of the 11th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pp.461-476.

Xie, T. (2010) 'Bibliography on mining software engineering data', available at http://ase.csc.ncsu.edu/dmse.

Xie, T., Pei, J. and Hassan, A.E. (2007) 'Mining software engineering data', in Proceedings of the 29th International Conference on Software Engineering, pp.172-173.

Zhang, S., Wang, Y., Yuan, F. and Ruan, L. (2007) 'Mining software repositories to understand the performance of individual developers', in Proceedings of the 31st Annual International Computer Software and Applications Conference, pp.625-626.

Zimmermann, T., Weißgerber, P., Diehl, S. and Zeller, A. (2005) 'Mining version histories to guide software changes', IEEE Transactions on Software Engineering, Vol. 31, No. 6, pp.429-445.