Presentation is loading. Please wait.

Presentation is loading. Please wait.

Version Control System Based DSNs

Similar presentations


Presentation on theme: "Version Control System Based DSNs"— Presentation transcript:

1 Version Control System Based DSNs
栾明明

2 Construction Rule: if two developers have both committed the same file or module, there is a link between them in the DSN. 这个观点是在04年的时候第一次提出来的,构建出了一种带有权重的无向开发者社交网络。如果两个开发者对一个相同的模块有所贡献的话,他们之间就可以建立一个连接,权重就代表有两个开发者提交记录的模块数目。

3 Data source: CVS、SVN、Git…
Construction Data source: CVS、SVN、Git… 通常,我们可以根据日志的变化来构建DSN,这种类型的方法主要关注的是在一个项目中的一些活动。而由日志变化我们直接想到的就是可以根据VCS来收集历史记录,软件开发中最重要的结果就是代码,这种方式正好可以帮助我们代码修改记录。开发者们完成编码或是修复bug后就会把文件提交到VCS上,VCS根据修改的版本就会更新代码库,并且日志记录提交的文件,这样开发者就可以通过修改的文件来建立联系了。 The source code may be the most important outcome of software development.

4 Construction Granularity
在基于VCS的DSN中,通常需要确定一个构建的粒度,不能太粗也不能太细。太过粗略的话,导致构建的社交网络过于复杂,看不出开发者之间的联系。太过细小的话导致开发者之间的联系较少,也会看不出关系 例如:许多文件太小了,每天对于这些文件只有很少的修改记录,这样导致开发者很少修改相同的文件。这个时候,我们可以选择粗一些的粒度,如模块或者类等。一般情况下,以模块为单位的DSN比以文件为单位的要密一些。 所以,连接粒度要根据具体的情形和实际的项目数据来看。 It should be determined by the size of a file, the frequency of commits, and the responsibilities of developers.

5 weighted graph or unweighted graph
Construction weighted graph or unweighted graph 权重就代表了开发者之间的关系强度。当然,也有一些无权重的DSN,这样做就简化了分析的任务量来减轻成本。 The weights of edges in DSNs indicate the strengths of developer relationships.

6 Construction 1.The weight can represent the number of common files that two developers have both committed. Reference: A. Meneely and L. Williams. Socio-technical developer networks: should we trust our measurements?

7 Construction The weights of edges are defined as the number of source code files in the system that the developers did not work on together. For example, if there are 1000 files in a system, and two developers worked on 10 different source code files together, then the weight on the edge (representing distance) would be 990.

8 Construction version control logs developer network

9 Construction 2.The weight can be defined as the number of commits performed by both developers to all common modules. Reference:L. Lopez-Fernandez, G. Robles, and J. M. Gonzalez-Barahona. Applying social network analysis to the information in cvs repositories.

10 Construction Each vertex corresponds to a particular committer (usually, a developer of the project). Two committers are linked when they have contributed to at least one common module, being the weight of the corresponding edge the number of commits performed by both developers to all common modules. 这篇论文里提到一个不同于其他DSN的方式,就是基于模块的社交网络。两个模块之间存在关联的前提是它们被同一个committer修改。

11 Construction 3.The weight can be considered as a kind of metric and researchers compute the weight by complex formula. a开发者修改的文件集合 a开发者在某个版本中修改的文件集合 通过一个函数计算出每一个开发者对每一个文件的贡献度的大小。 Definition 1: λa is the set of files modified by author a within the context of the data set. Definition 2: λa,i is the set of files modified by author a in a given revision, i. Definition 3: σ is a function that returns a vector containing measurements for an author’s contributions to each file within the data set. (for example,σ(λa)) Reference: A. C. MacLean and C. D. Knutson. Apache commits: social network dataset.

12 Construction file measurement 1 2 3 4 . 23 197 2 .
这仅仅是一个开发者的commit行为,无论开发者是否修改了文件,都存在这样一个向量。文件是按照顺序排列的,这样就可以方便不同开发者作对比。 file measurement 1 2 3 4 . 23 197 2 .

13 Construction Three metric types co-commit-count
第一种是a和b共同修改过的文件集合数量 第二种是在算出a和b的向量表后计算相似度的一种衡量机制 第三种是根据上面算出来的向量表然后通过一个函数例如最大值、最小值、平均数、中位数的方式计算出一个值。 co-commit-count vector-based similarity metrics vector-based similarity metrics at the commit level

14 Construction Time duration
最简单的方式是使用所有的历史数据,考虑所有的符合条件的记录。但是存在这样的一个问题,可能现在的某个开发者和三年前就离开此项目的开发者这样的一个联系存在于DSN中,这显然是不合理的。所以应该选取一个有限的时间段内建立DSN。 A DSN can be constructed based on the data in one month, two months, or longer.

15 Construction 这个一张关于开发者任职时间的统计表,集中趋向于短的任职时间。在apache那篇论文中,它以滑动窗口的方式以三个月(一月到三月、二月到四月、三月到五月)为时间单位,进行了统计。在一个很成功的长期项目中,开发者任期大约为3.7年,而综合所有的项目,时间大约为1.5年。有四分之一的开发者在ASF中任期少于三个月。选取一个相对短一点的周期有利于捕获开发者在每天相互之间的协作活动。

16 Construction Let V be the set of all programmers committing to a project and let t be a period of time. Then the developer network Nt is an ordered pair Nt = (V,E) where E := {{p,q}∈P2(V)|p and q contributed to the same file during period t} The simplest one is to divide the overall development time in equally sized intervals. reference: M. Pohl and S. Diehl. What dynamic network metrics can tell us about developer roles.

17 Construction For example
The researched time period starts in October 1999 and ends in November For the definition of the dynamic developer network this period is equally divided into 128 slices (which results in an equal distance of around 14 days between each consecutive slice). 它是以tomcat为例子,但是为了平滑集中度,最后是以90天为单位来统计的,通过几次试验后发现以这个周期为单位比较合理。

18 Analysis… Application…

19 THX


Download ppt "Version Control System Based DSNs"

Similar presentations


Ads by Google