Page 16 of 62 MCM 2008 Team #3780 5. Add the values of the first row to their re- i. If moving to the next line moves us spective cluster lists. passed our current three clusters i. e. (line+1)%3 is 0) then recurse 6. Call a recursive function, and pass it the fol- with a reset clusters list and cur- lowing rent columns list and incremented line number A parameter directing it to fill the sec Else recurse with current clust ond row list and current columns list ● The columns arra incremented line number. The clusters array iii. If recursion returned true return true. Otherwise go on The recursive function then performs the bulk of e)If there are no numbers left(all num the algorithm bers are slack, or recursion failed ) 1. Create an array containing a permutation i. If we have shifted 9 or more times of the sequence 1 through 9, which we shall return false all this“ numbers.” ii. Recall all of our saved data iii. Delete all values from this row 2. Create copies of the columns array, the clus iv. Move to first column ters array, and of the numbers array, so that we may backtrack later. vi. Cycle the numbers array, so the 3. If the requested line is the 10th line(off the first item becomes last and all end of the board), then we are done, and re- other items shift accordingly turn true vii. Increment times shifted 4. Initialize an empty“ slack”aray, which shall hold those values whose being placed ee also ? and 40 caused a violation of constraints 5.3.2 Random Masking 5. Move to the first column Given a 9 x 9 array that we shall call"board 6. Repeat the following 1. Initialize a 9 x 9 array of booleans to true a)Pop a value off of the"numbers"array. which we shall call the"mask b)If this number is not in the clusters list 2. Initialize a list of 81 points with one point for this columns cluster. and is not in he columns list for this column then for every cell in the board i. Set this board location to this num- 3. Randomly permute the array of points ii.Add this number to the cluster and 4. For each element in this array column lists that it applies to a) Set the mask at that point to false i. Append all numbers in the“ slack” This will result in that value being con array to the“ numbers” array. sidered not part of the board (or not iv. Move to the next column c)Else we add the number to the slack ar b)Test if this new puzzle is uniquely solv able d)If we have passed the last column c)If not, set the mask at that point back to true
Page 16 of 62 MCM 2008 Team #3780 5. Add the values of the first row to their respective cluster lists. 6. Call a recursive function, and pass it the following: • A parameter directing it to fill the second row. • The columns array. • The clusters array. The recursive function then performs the bulk of the algorithm: 1. Create an array containing a permutation of the sequence 1 through 9, which we shall call this “numbers.” 2. Create copies of the columns array, the clusters array, and of the numbers array, so that we may backtrack later. 3. If the requested line is the 10th line (off the end of the board), then we are done, and return true. 4. Initialize an empty “slack” array, which shall hold those values whose being placed caused a violation of constraints. 5. Move to the first column. 6. Repeat the following: a) Pop a value off of the “numbers” array. b) If this number is not in the clusters list for this column’s cluster, and is not in the columns list for this column, then: i. Set this board location to this number. ii. Add this number to the cluster and column lists that it applies to. iii. Append all numbers in the “slack” array to the “numbers” array. iv. Move to the next column. c) Else we add the number to the slack array. d) If we have passed the last column, then: i. If moving to the next line moves us passed our current three clusters (i. e. (line+1)%3 is 0) then recurse with a reset clusters list and current columns list and incremented line number. ii. Else recurse with current clusters list and current columns list and incremented line number. iii. If recursion returned true, return true. Otherwise go on. e) If there are no numbers left (all numbers are slack, or recursion failed): i. If we have shifted 9 or more times, return false. ii. Recall all of our saved data. iii. Delete all values from this row. iv. Move to first column. v. Erase the slack array. vi. Cycle the numbers array, so the first item becomes last and all other items shift accordingly. vii. Increment times shifted. See also ?? and 40 5.3.2 Random Masking Given a 9 × 9 array that we shall call “board”: 1. Initialize a 9 × 9 array of booleans to true, which we shall call the “mask”. 2. Initialize a list of 81 points with one point for every cell in the board. 3. Randomly permute the array of points. 4. For each element in this array: a) Set the mask at that point to false. This will result in that value being considered not part of the board (or not given). b) Test if this new puzzle is uniquely solvable. c) If not, set the mask at that point back to true
Page 17 of 62 MCM 2008 Team #3780 5.3.3 Tuned Masking i. Set the mask at that position to true Given a 9 x 9 array that we shall call"board ii. Continue from 2 1. Initialize a 9 x 9 array of booleans to true call this the“mask” c)Look for a value in the choices arra that appears only once in a cluster, if a) Repeat the following until we are done found 1. Apply some strategy in order to ob- i. Set the mask at that position to tain the coordinates of a cell to re- ii. Set the mask at those coordinates ii. Continue from 2 to false. This will result in that d) Look for a value in the choices array value being considered not part of the board (or not given) that appears only once in a row, if iii. Test if this new puzzle is uniquely solvable i. Set the mask at that position to iv. If not. set the mask at those coordi- nates back to true and select a new ii. Continue from 2 v. Calculate board statistics and test e) Look for a value in the choices array if we match them. In our that appears only once in a column, if case. this is the Wnef found vi. If we are too high, continue from i. Set the mask at that position to If we are too low, repeat the follow ing a small number of times ii. Continue from 2 A. Apply an annealing function to f If the number of times we are allowed f a cell to ot 0: Set the mask at that location to i. Locate the blank cell with the least vili. If we are within the desired range ii. Set a flag to false iii. For each choi 5.3.4 Uniqueness Testing A. Set that cell of the board to that choice and set that cell of the Given a9×9 array that we shall call"board”,a mask to true 9×9 array that we shall call"mask”, and a nu- ber of times to guess number of allowed guesses 1. Fill in a 9 x 9 array with lists such that each C. If thethe result is true. and the lists represents the value choices available flag is true, return false at that cell D. Else if the result was true. set 2. Repeat the following: the flag to true iv. If the flag is true, return e: we a) If mask contains no false values, return have found a unique solution b)If there exists any list in the choices ar- g) Return false: we know that the board ray with only one value is most likely not unique
Page 17 of 62 MCM 2008 Team #3780 5.3.3 Tuned Masking Given a 9 × 9 array that we shall call “board”: 1. Initialize a 9 × 9 array of booleans to true, call this the “mask”. a) Repeat the following until we are done: i. Apply some strategy in order to obtain the coordinates of a cell to remove. ii. Set the mask at those coordinates to false. This will result in that value being considered not part of the board (or not given). iii. Test if this new puzzle is uniquely solvable. iv. If not, set the mask at those coordinates back to true and select a new strategy. v. Calculate board statistics and test to see if we match them. In our case, this is the WNEF. vi. If we are too high, continue from (a). vii. If we are too low, repeat the following a small number of times: A. Apply an annealing function to gain the location of a cell to add. B. Set the mask at that location to true. viii. If we are within the desired range, we are done. 5.3.4 Uniqueness Testing Given a 9 × 9 array that we shall call “board”, a 9 × 9 array that we shall call “mask”, and a number of times to guess: 1. Fill in a 9×9 array with lists such that each lists represents the value choices available at that cell. 2. Repeat the following: a) If mask contains no false values, return true. b) If there exists any list in the choices array with only one value: i. Set the mask at that position to true. ii. Continue from 2. c) Look for a value in the choices array that appears only once in a cluster, if found: i. Set the mask at that position to true. ii. Continue from 2. d) Look for a value in the choices array that appears only once in a row, if found: i. Set the mask at that position to true. ii. Continue from 2. e) Look for a value in the choices array that appears only once in a column, if found: i. Set the mask at that position to true. ii. Continue from 2. f) If the number of times we are allowed to guess is not 0: i. Locate the blank cell with the least number of choices. ii. Set a flag to false. iii. For each choice: A. Set that cell of the board to that choice and set that cell of the mask to true. B. Recurse, decrementing the number of allowed guesses. C. If the the result is true, and the flag is true, return false. D. Else if the result was true, set the flag to true. iv. If the flag is true, return true: we have found a unique solution. g) Return false: we know that the board is most likely not unique
Page 18 of 62 MCM 2008 Team #3780 5.4 Complexity Analysis of how the rules of sudoku limit the number of invalid boards possible(worst case assumes that 5.4.1 Parameterization every board could be invalid). In practice this al Traditionally, when one analyzes the complexity gorithm runs in negligible time in comparison to of an algorithm, the complexity is considered as a the masking algorithms function of some parameter representing the size of the problem. Thus, the first thing we must de- 5.4.8 Complexity of Uniqueness Testing cide in analyzing the generator is what we will and Random Filling consider its complexity to be a function of. The In the worst case, the"fast "uniqueness algorithm most natural parameter would be the size of the sudoku grid, but since we only consider the tra will examine each of the 81 cells, and compare it ditional 9 x 9 grid(as opposed to "hex sudoku to each of the other 81 cells. Thus, without adding which is played on a 16 x 16 board, or the more in any brute force functionality, the uniqueness pathological boards, such as those of size 36 x 36 test can be completed in a constant number of and 100 x 100)this isnt a parameter at all. Thi operations: 81 x 81=6, 561. When we consider the hybrid algorithm, and include in our analy instead, we resort to the only variable that we sis the brute force searching, we find that in the utilize when generating puzzles: the desired dif- ficulty level d. Our complexity measure will thus worst case, we perform the fast test for each al lowed guess plus one more time before making be a function of the form t(d)=f(d). to, where a guess at all. Therefore, the hybrid uniqueness t is the time complexity, f is some function that we will find through our analysis, and where to is respect to the number of allowed guesses the time complexity for generating a puzzle ran- This allows us to now consider the complex domly ity of the random filling algorithm. Since it does not allow any guessing when it calls the unique- 5.4.2 Complexity of Completed Puzzle ness algorithm, and since it performs the unique Generation ness test exactly once per cell, it performs exactly 812=531, 441 comparisons. As such, it is a con The completed puzzle generation algorithm does stant time operation, and can be used as a point ies of work for each line of the Sudoku, and of comparison for more complicated algorithms potentially does this work over all possible differ- ent boards. As such. in the worst case we have the 9 possible values times the 9 cells in a line 5.4.4 Profiling Method times 9 shifts all raised to the 9 lines power. That In order to collect empirical data on the complex (9×9×9)3=(9) While ity of puzzle generation, we implemented a small it is true that this is a constant, the size of the code profiling utility class in PHP, as is shown on constant is prohibitively large. page 32. This class exploits that, in PHP 5.0 and However, in the average case we not only do later, when a function-scope class instance vari ot cover all possible values, or cover all able is created. it's destructor is called immedi le shifts. but we also do not recurse all ately after the function returns. Thus, we create times. So let us keep the same value for the com- an instance of Profiler at the start of each inter- plexity of generating a line(that is assume we esting function, and pass the FUNCTion_ and form all 9 shifts) but let as assume we only do compiles timing information into global variab have to try all 9 values, in all 9 cells, and per-LINE_ macros to its constructor. The class then this once per line. Here we get 9*9 * 9*9 or 6561. that are queried after the puzzle is successfully The actual value may be less then that, or slightly generated more, but should hover about that area. The best In all uses of this profiling data, we will remove case is of course 81, where all values work first dependencies on our particular hardware by con- try. We have a very high worst case, but very rea- sidering only the normalized time t= t/to, where sonable average and best cases. The worst case to is the mean running time for the random fill presented could likely be reduced with analysis generator
Page 18 of 62 MCM 2008 Team #3780 5.4 Complexity Analysis 5.4.1 Parameterization Traditionally, when one analyzes the complexity of an algorithm, the complexity is considered as a function of some parameter representing the size of the problem. Thus, the first thing we must decide in analyzing the generator is what we will consider its complexity to be a function of. The most natural parameter would be the size of the sudoku grid, but since we only consider the traditional 9 × 9 grid (as opposed to “hex sudoku,” which is played on a 16 × 16 board, or the more pathological boards, such as those of size 36 × 36 and 100 × 100) this isn’t a parameter at all. Thus, instead, we resort to the only variable that we utilize when generating puzzles: the desired dif- ficulty level d. Our complexity measure will thus be a function of the form t(d) = f (d) · t0, where t is the time complexity, f is some function that we will find through our analysis, and where t0 is the time complexity for generating a puzzle randomly. 5.4.2 Complexity of Completed Puzzle Generation The completed puzzle generation algorithm does a series of work for each line of the Sudoku, and potentially does this work over all possible different boards. As such, in the worst case we have the 9 possible values times the 9 cells in a line times 9 shifts all raised to the 9 lines power. That is, (9 × 9 × 9)9 = 9 3 9 = 927 ≈ 5.8 × 1025. While it is true that this is a constant, the size of the constant is prohibitively large. However, in the average case we not only do not cover all possible values, or cover all possible shifts, but we also do not recurse all possible times. So let us keep the same value for the complexity of generating a line (that is assume we have to try all 9 values, in all 9 cells, and perform all 9 shifts) but let as assume we only do this once per line. Here we get 9*9*9*9 or 6561. The actual value may be less then that, or slightly more, but should hover about that area. The best case is of course 81, where all values work first try. We have a very high worst case, but very reasonable average and best cases. The worst case presented could likely be reduced with analysis of how the rules of sudoku limit the number of invalid boards possible (worst case assumes that every board could be invalid). In practice this algorithm runs in negligible time in comparison to the masking algorithms. 5.4.3 Complexity of Uniqueness Testing and Random Filling In the worst case, the “fast” uniqueness algorithm will examine each of the 81 cells, and compare it to each of the other 81 cells. Thus, without adding in any brute force functionality, the uniqueness test can be completed in a constant number of operations: 81 × 81 = 6, 561. When we consider the hybrid algorithm, and include in our analysis the brute force searching, we find that in the worst case, we perform the fast test for each allowed guess plus one more time before making a guess at all. Therefore, the hybrid uniqueness testing algorithm admits a linear complexity with respect to the number of allowed guesses. This allows us to now consider the complexity of the random filling algorithm. Since it does not allow any guessing when it calls the uniqueness algorithm, and since it performs the uniqueness test exactly once per cell, it performs exactly 813 = 531, 441 comparisons. As such, it is a constant time operation, and can be used as a point of comparison for more complicated algorithms. 5.4.4 Profiling Method In order to collect empirical data on the complexity of puzzle generation, we implemented a small code profiling utility class in PHP, as is shown on page 32. This class exploits that, in PHP 5.0 and later, when a function-scope class instance variable is created, it’s destructor is called immediately after the function returns. Thus, we create an instance of Profiler at the start of each interesting function, and pass the __FUNCTION__ and __LINE__ macros to its constructor. The class then compiles timing information into global variables that are queried after the puzzle is successfully generated. In all uses of this profiling data, we will remove dependencies on our particular hardware by considering only the normalized time tˆ= t/t0, where t0 is the mean running time for the random fill generator
Page 19 of 62 MCM 2008 Team #3780 5.4.5 WNEF vS Running Time the mean WNEF for those produced with anneal ng enabled. We considered a sample of puz For the full generator algorithm, we can no longer zles of size n, whose means and variances were since there is a dependency on random variables (3,s2)for non-annealed puzzles and(3,s2)for annealed. Once again, we used the following t that is difficult to accommodate. Therefore, we statistic rely on our profiler to gather empirical data about the complexity of generating puzzles. In particu lar, Figure 13 shows the normalized running time required to generate a puzzle as a function of the at a significance level of a=0.0005 and using obtained WNEF after annealing is applied In or- the data shown in Table 2, we rejected the null der to show detail, we plot the normalized time hypothesis and concluded that annealing lowered on a logarithmic scale(base 2) the Wef values This plot suggests that even in the case of the most difficult puzzles that our algorithm gener- ates,the running time is no worse than about 20 5.5.2.2 Distinctness of Difficulty Levels To times that of the random case. Also worth not determine whether the difficulty levels of our ing is that generating easy puzzles can actually puzzle generator were unique, we performed a be faster than generating via random filling Students t-distribution hypothesis test using the following hypotheses 5.5 Testing Ho: ud=ud+ 5.5.1 WNEF as a Function of Design H d≠pd+1 where ud is the mean WNEF of puzzles produced The generator algorithm, as written, is fairly by our generator algorithm when given d as the generic. We thus need some way to empirically target difficulty. Using a significance level of determine constant terms, such as how many a= 0.0005 with the data shown in Table 2. we times we will allow for cell removal to fail be- use the following as our test statistic fore we conclude that the puzzle is minimal. We thus plotted the number of failures that we per mitted to the Wnef produced, shown in Figure s2,s2 14. This plot shows us both that we only need to allow a very small number of failures to en- where ya is the mean of nd puzzles produced by py small WNef values, and that annealing re- the algorithm, having a sample variance s3.We duces the value still further. even in low-failure scenario found that for all d, t'> to, and thus we were able to reject Ho for all difficulty levels. We con cluded that all of the difficulty levels of our puzzle 5.5.2 Hypothesis Testing generator are indeed unique 5.5.2.1 Effectiveness of Annealing To show that the process of annealing resulted in lower 6 Strengths and Weaknesses WNEF values, and was thus a useful addition to the algorithm, we tested the hypothesis that it Our approach to measuring the difficulty of su was effective versus the null hypothesis that it doku puzzles admits some real and important was not weaknesses. Primary among these is that it possible to increase the difficulty of a puzzle with- = out affecting its Wnef, by violating the assump 4≠p tion that all choices present similar difficulty to solvers. In particular, puzzles created with more where u is the mean WNEF for puzzles produced esoteric solving techniques, such as Swordfish without the aid of annealing and where u' is and XYWing, may be crafted such that their
Page 19 of 62 MCM 2008 Team #3780 5.4.5 WNEF vs Running Time For the full generator algorithm, we can no longer make deterministic arguments about complexity, since there is a dependency on random variables that is difficult to accommodate. Therefore, we rely on our profiler to gather empirical data about the complexity of generating puzzles. In particular, Figure 13 shows the normalized running time required to generate a puzzle as a function of the obtained WNEF after annealing is applied. In order to show detail, we plot the normalized time on a logarithmic scale (base 2). This plot suggests that even in the case of the most difficult puzzles that our algorithm generates, the running time is no worse than about 20 times that of the random case. Also worth noting is that generating easy puzzles can actually be faster than generating via random filling. 5.5 Testing 5.5.1 WNEF as a Function of Design Choices The generator algorithm, as written, is fairly generic. We thus need some way to empirically determine constant terms, such as how many times we will allow for cell removal to fail before we conclude that the puzzle is minimal. We thus plotted the number of failures that we permitted to the WNEF produced, shown in Figure 14. This plot shows us both that we only need to allow a very small number of failures to enjoy small WNEF values, and that annealing reduces the value still further, even in low-failure scenario 5.5.2 Hypothesis Testing 5.5.2.1 Effectiveness of Annealing To show that the process of annealing resulted in lower WNEF values, and was thus a useful addition to the algorithm, we tested the hypothesis that it was effective versus the null hypothesis that it was not: H0 : µ = µ 0 Ha : µ 6= µ 0 where µ is the mean WNEF for puzzles produced without the aid of annealing and where µ 0 is the mean WNEF for those produced with annealing enabled. We considered a sample of puzzles of size n, whose means and variances were y, s ¯ 2 for non-annealed puzzles and y¯ 0 , s02 for annealed. Once again, we used the following tstatistic: t ∗ = (y − y 0 ) q s 2 n + s 02 n At a significance level of α = 0.0005 and using the data shown in Table 2, we rejected the null hypothesis and concluded that annealing lowered the WNEF values. 5.5.2.2 Distinctness of Difficulty Levels To determine whether the difficulty levels of our puzzle generator were unique, we performed a Student’s t-distribution hypothesis test using the following hypotheses: H0 : µd = µd+1 Ha : µd 6= µd+1 where µd is the mean WNEF of puzzles produced by our generator algorithm when given d as the target difficulty. Using a significance level of α = 0.0005 with the data shown in Table 2, we use the following as our test statistic: t ∗ = yd − yd+1 r s 2 d nd + s 2 d+1 nd+1 where yd is the mean of nd puzzles produced by the algorithm, having a sample variance s 2 d . We found that for all d, t ∗ > tα, and thus we were able to reject H0 for all difficulty levels. We concluded that all of the difficulty levels of our puzzle generator are indeed unique. 6 Strengths and Weaknesses Our approach to measuring the difficulty of sudoku puzzles admits some real and important weaknesses. Primary among these is that it is possible to increase the difficulty of a puzzle without affecting its WNEF, by violating the assumption that all choices present similar difficulty to solvers. In particular, puzzles created with more esoteric solving techniques, such as Swordfish and XY-Wing, may be crafted such that their
Page 20 of 62 MCM 2008 Team #3780 ∵ WNEF after Annealing Figure 13: Running time as a function of the obtained WNEF Before Annealing Number of Fails Figure 14: WNEF as a function of allowed failures Difficulty 1 2 3 4 Pre-annealing Mean0.5239998950.3274518140.2716565910.27661661 Variance0.0171107960.0054548660.0025810530.004039649 Post li Mean 0.31876731 0.261571340.1942622570.165920803 Variance0.0006962849.32606×10-58.7219×10-50.000185543 Table 2: Pre-and post-annealing WNEF mean and variances(n= 60)
Page 20 of 62 MCM 2008 Team #3780 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 0.12 0.16 0.2 0.24 0.28 0.32 0.36 0.4 WNEF after Annealing Normalized Time (log 2 ) Figure 13: Running time as a function of the obtained WNEF. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 5 10 15 20 25 Number of Fails WNEF Before Annealing After Anneailng Figure 14: WNEF as a function of allowed failures. Difficulty 1 2 3 4 Pre-annealing Mean 0.523999895 0.327451814 0.271656591 0.27661661 Variance 0.017110796 0.005454866 0.002581053 0.004039649 Post-annealing Mean 0.31876731 0.26157134 0.194262257 0.165920803 Variance 0.000696284 9.32606 × 10−5 8.7219 × 10−5 0.000185543 Table 2: Pre- and post-annealing WNEF mean and variances (n = 60)