Abstract :

 

uted and collaborative platform to develop and maintain open-sourceprojects. This social coding platform achieves this collaborative development, with or without coordination,using pull requests and issues artefacts. When the number of daily submitted issues rapidly grows up,especially in popular repositories, managing issues becomes more complicated. To help the repository’sdevelopers in issues processing, there are external contributors who fi x issues by submitting pull-requests.On GitHub, a pull-request is frequently linked with a submitted issue to show that a solution is in progress.Unfortunately, contributors might be lazy or forget to link the Pull-Requests with their corresponding Issues.Only a very small share of these links are established, whereas a large portion of links is missed in thedevelopment history. In spite of that, even for senior developers, manually recovering the links betweenPull-Request and Issues from evolutionary development history is a time-consuming, challenging, and error-prone task. In this article, we propose to build ML models to recover links between pull-requests andtheir issues using two Machine Learning algorithms (KMeans and BIRCH) based on lexical and semanticweighting measurements. These models are evaluated using PI-Link ground-truth dataset. The obtainedresults show that pull-request and issue links can be recovered with an accuracy of 91.5% using BIRCHclustering algorithm.