Example:Empirical Study on Variable Naming What are the style,abbreviation,..of variable names? Are they correlated to bugs/code quality/...? You can study this by treating code as a tokenized text stream 1"(a+b)*2”=> 2【(SYM,('),(ID,'a'),(BIN0P,'+),(ID,'b'),(SYM,)'),(BIN0P,*'),(INT,'2)J We are interested in the IDs
Example: Empirical Study on Variable Naming Are they correlated to bugs/code quality/…? You can study this by treating code as a tokenized text stream We are interested in the IDs What are the style, abbreviation, … of variable names? 1 "(a + b) * 2" => 2 [ (SYM, '('), (ID, 'a'), (BIN_OP, '+'), (ID, 'b'), (SYM, ')'), (BIN_OP, '*'), (INT, '2') ]
Example:Differencing Files How to define"diffs"between two file versions?
Example: Differencing Files How to define "diffs" between two file versions?
The Edit Distance Approximation a b c a bb a ,21 a a C Delete Insert Unchanged Myers,E.W.An O(ND)difference algorithm and its variations.Algorithmica 1,251-266(1986). https:/doi.org/10.1007/BF01840446
The Edit Distance Approximation Myers, E.W. An difference algorithm and its variations. Algorithmica 1, 251–266 (1986). https://doi.org/10.1007/BF01840446 O(ND)
Is Edit Distance a Good Idea? Open Problem:How to produce even more developer-friendly diffs? Minimizing edit distance is a good hack Lacks semantic explanations to what are changed Not work for adding indention,renaming variables,.. You can work out a paper on this!
Is Edit Distance a Good Idea? Minimizing edit distance is a good hack Lacks semantic explanations to what are changed Not work for adding indention, renaming variables, … You can work out a paper on this! Open Problem: How to produce even more developer-friendly diffs?
Syntax Analysis on AST
Syntax Analysis on AST