Page 4 of 24 Control #2858 solving behavior. We then propose two different methods again based on solve to generate Sudoku puzzles of different difficulties 2 Problem Setup In this paper, we approach the problem in two portions. First, we define a difficulty metric for a Sudoku puzzle and determine an algorithm to compute it. We then create an algorithm to generate Sudoku puzzles with desired levels of difficulty. Before beginning to describe our model, however, we first specify the problem a bit more closely in both cases 2.1 Difficulty metric Our goal in this part of the paper is to create an algorithm that takes a Sudoku puzzle and returns a real number that represents its abstract "difficulty"according to some metric The central issue in determining such a metric is then what we mean by the "difficulty of a Sudoku puzzle. We base our definition of difficulty on the following general assumptions 1. The amount of time it takes for any given solver to solve a puzzle is monotonically increasing with difficulty 2. Every solver tries to solve Sudoku puzzles by applying various strategies The difficulty of a puzzle is an inherently subjective quantity and may vary among different solvers who use different methods of solving Sudoku puzzles. However, this type of subjective consideration is not inherent to the puzzle, so we must restrict ourselves to purely objective reactions. For this purpose, it is a useful intellectual tool to consider some hypothetical typical"solver of some postulated skill level Now, Assumption 1 suggests that we should measure difficulty by the amount of time spent solving the puzzle. Because difficulty is increasing in time spent, any definition of difficulty must give the same ordering as time spent. Since this gives an objectively precise definition, we adopt it; therefore, it only remains to determine more precisely what type of solver our metric should consider The difficulties of puzzles can vary for solvers of different skill levels, and it may be arg that one should measure each of these difficulties separately. However, we are assuming that all solvers' time spent will be consistent with difficulty, and assigning absolute difficulty alues has much practical value. Therefore, in order to avoid the dependence of our results on the novice's ignorance of various techniques and to extend the range of measurable puzzles we take our hypothetical solver to be an expert Hence, we define the difficulty of a Sudoku puzzle to be the average amount of time a hypothetical"typical " Sudoku expert would spend solving it. In the subsequent parts of this paper, our aim will be to create and evaluate a metric to determine this time 2.2 Puzzle generation A second objective of this paper is to develop a method of puzzle generation. Our main goal in puzzle generation is to produce a valid puzzle of a given desired difficulty level that has a
Page 4 of 24 Control #2858 solving behavior. We then propose two different methods again based on hsolve to generate Sudoku puzzles of different difficulties. 2 Problem Setup In this paper, we approach the problem in two portions. First, we define a difficulty metric for a Sudoku puzzle and determine an algorithm to compute it. We then create an algorithm to generate Sudoku puzzles with desired levels of difficulty. Before beginning to describe our model, however, we first specify the problem a bit more closely in both cases. 2.1 Difficulty Metric Our goal in this part of the paper is to create an algorithm that takes a Sudoku puzzle and returns a real number that represents its abstract “difficulty” according to some metric. The central issue in determining such a metric is then what we mean by the “difficulty” of a Sudoku puzzle. We base our definition of difficulty on the following general assumptions: 1. The amount of time it takes for any given solver to solve a puzzle is monotonically increasing with difficulty. 2. Every solver tries to solve Sudoku puzzles by applying various strategies. The difficulty of a puzzle is an inherently subjective quantity and may vary among different solvers who use different methods of solving Sudoku puzzles. However, this type of subjective consideration is not inherent to the puzzle, so we must restrict ourselves to purely objective reactions. For this purpose, it is a useful intellectual tool to consider some hypothetical “typical” solver of some postulated skill level. Now, Assumption 1 suggests that we should measure difficulty by the amount of time spent solving the puzzle. Because difficulty is increasing in time spent, any definition of difficulty must give the same ordering as time spent. Since this gives an objectively precise definition, we adopt it; therefore, it only remains to determine more precisely what type of solver our metric should consider. The difficulties of puzzles can vary for solvers of different skill levels, and it may be argued that one should measure each of these difficulties separately. However, we are assuming that all solvers’ time spent will be consistent with difficulty, and assigning absolute difficulty values has much practical value. Therefore, in order to avoid the dependence of our results on the novice’s ignorance of various techniques and to extend the range of measurable puzzles, we take our hypothetical solver to be an expert. Hence, we define the difficulty of a Sudoku puzzle to be the average amount of time a hypothetical “typical” Sudoku expert would spend solving it. In the subsequent parts of this paper, our aim will be to create and evaluate a metric to determine this time. 2.2 Puzzle Generation A second objective of this paper is to develop a method of puzzle generation. Our main goal in puzzle generation is to produce a valid puzzle of a given desired difficulty level that has a 4
Page 5 of 24 Control #2858 unique solution. As there is no objective scale in difficulty and no fixed criteria that indicate the difference between puzzles of varying difficulties, we will interpret the given numbe of difficulty levels and the desired difficulty level as the division of the set of all Sudoku boards into the given number of difficulty percentile intervals of equal size and the choice of a particular desired percentile interval. We will take a sample of 1000 Sudoku boards and assume that the difficulty distribution of these boards, as measured by our human solver representative of the difficulty distribution of all Sudoku board Our second goal in puzzle generation is to minimize the complexity of the generation algorithm. We note that since the size of the 9 x 9 Sudoku board is fixed. the order of growth of our generation algorithm is irrelevant. Therefore, in this paper, we will interpret the complexity of a generation algorithm to be the expected amount of execution time required to find a Sudoku puzzle of the desired difficulty level 3 A Difficulty metric In this section, we describe and evaluate a difficulty metric for Sudoku puzzles based up a computer solver hsolve that simulates the actions of a human Sudoku expert 3.1 Assumptions and Metric Development As stated in Section 2, we define the difficulty of a puzzle as the average amount of tim expert Sudoku solver to solve it. In orde e this t essentially two possibilities model the process of solving the puzzle 2. Find some heuristic for board configurations that predicts the solve time There are some known heuristics for finding the difficulty of a puzzle; most notably, puzzles with a small number of initial givens are somewhat harder than most. However, according to 91, the overall correlation is weak. Other methods of this type have similar problems, and more importantly, they cannot be used to determine the difficulty of any specific puzzle Therefore, the second possibility can be used at most as a guide for the first, and we must model the actual process of solving the puzzle. We postulate that the following assumptions hold for the solver. 1. The possible strategies can be ranked in order of difficulty, and the solver always applies them from least to most difficult puzzles.We use a widely accepted ranking of strategies described in Appendix oku This is consistent with more or less all references on the subject of solving sudo 2. During the search for a strategy application, each ordering of possible strategy appli- cations occurs with equal probability There are two components of a human search for a possible location to apply a strategy complete search and intuitive pattern recognition. While human pattern recognition
Page 5 of 24 Control #2858 unique solution. As there is no objective scale in difficulty and no fixed criteria that indicate the difference between puzzles of varying difficulties, we will interpret the given number of difficulty levels and the desired difficulty level as the division of the set of all Sudoku boards into the given number of difficulty percentile intervals of equal size and the choice of a particular desired percentile interval. We will take a sample of 1000 Sudoku boards and assume that the difficulty distribution of these boards, as measured by our human solver, is representative of the difficulty distribution of all Sudoku boards. Our second goal in puzzle generation is to minimize the complexity of the generation algorithm. We note that since the size of the 9 × 9 Sudoku board is fixed, the order of growth of our generation algorithm is irrelevant. Therefore, in this paper, we will interpret the complexity of a generation algorithm to be the expected amount of execution time required to find a Sudoku puzzle of the desired difficulty level. 3 A Difficulty Metric In this section, we describe and evaluate a difficulty metric for Sudoku puzzles based upon a computer solver hsolve that simulates the actions of a human Sudoku expert. 3.1 Assumptions and Metric Development As stated in Section 2, we define the difficulty of a puzzle as the average amount of time required for an expert Sudoku solver to solve it. In order to measure this time, there are essentially two possibilities: 1. Model the process of solving the puzzle 2. Find some heuristic for board configurations that predicts the solve time. There are some known heuristics for finding the difficulty of a puzzle; most notably, puzzles with a small number of initial givens are somewhat harder than most. However, according to [9], the overall correlation is weak. Other methods of this type have similar problems, and, more importantly, they cannot be used to determine the difficulty of any specific puzzle. Therefore, the second possibility can be used at most as a guide for the first, and we must model the actual process of solving the puzzle. We postulate that the following assumptions hold for the solver: 1. The possible strategies can be ranked in order of difficulty, and the solver always applies them from least to most difficult. This is consistent with more or less all references on the subject of solving sudoku puzzles. We use a widely accepted ranking of strategies described in Appendix A. 2. During the search for a strategy application, each ordering of possible strategy applications occurs with equal probability. There are two components of a human search for a possible location to apply a strategy: complete search and intuitive pattern recognition. While human pattern recognition 5
Page 6 of 24 Control #2858 is extremely powerful (see, for example 2 ) it is extremely difficult to determine its precise consequences, especially due to possible differences between solvers. Therefore we do not consider any intuitive component to pattern recognition and restrict our model to a complete search for strategy applications. Such a search will proceed between possible applications in the random ordering that we postulate Ts We define a possible application of a strategy to be a configuration on the board that checked by a human to determine if the given strategy can be applied; a list of exactly which configurations are checked varies by strategy and is given in Appendix A. We can now model our solver as following the algorithm Human Solve defined as follows Definition. Given a sudoku puzzle, the algorithm Human Solve repeats the following steps until there are no remaining empty squares 1. Choose the least difficult strategy that has not yet been searched for in the current board configuration 2. Search through possible applications of any of these strategies for a valid application of a strategy 3. Apply the first valid application found We take the difficulty of a single run of Human Solve to be the total number of possible applications that the solver must check: here we assume that each check takes roughly have different difficulties due to different valid s of this method on the same puzzle may now ready to define the difficulty metric of a sudoku board B Definition. For a sudoku board B, take its difficulty metric m(B)to be the average total number of possible applications checked by the solver while using the Human Solve algorithm 3.2 hsolve and metric Calculation In order to actually calculate the difficulty m(B)of a board B, we use hsolve, a pre grain which simulates the running of Human Solve and calculates the resulting difficulty. hsolve is implemented in Java 1.6, and its source code is attached in Appendix B Given a Sudoku puzzle b, hsolve does the following Set the initial difficulty d=0 2. Repeat the following actions in order until B is solved or the solver cannot progress (a)Choose the tier of easiest strategies S that has not yet been searched for in the current board configuration (b)Find the number p of possible applications of S (c) Find the set V of all valid applications of S and compute the size v of V 6
Page 6 of 24 Control #2858 is extremely powerful (see, for example [2]), it is extremely difficult to determine its precise consequences, especially due to possible differences between solvers. Therefore, we do not consider any intuitive component to pattern recognition and restrict our model to a complete search for strategy applications. Such a search will proceed between possible applications in the random ordering that we postulate. We define a possible application of a strategy to be a configuration on the board that is checked by a human to determine if the given strategy can be applied; a list of exactly which configurations are checked varies by strategy and is given in Appendix A. We can now model our solver as following the algorithm HumanSolve defined as follows: Definition. Given a sudoku puzzle, the algorithm HumanSolve repeats the following steps until there are no remaining empty squares: 1. Choose the least difficult strategy that has not yet been searched for in the current board configuration. 2. Search through possible applications of any of these strategies for a valid application of a strategy. 3. Apply the first valid application found. We take the difficulty of a single run of HumanSolve to be the total number of possible applications that the solver must check; here we assume that each check takes roughly the same amount of time. Note that multiple runs of this method on the same puzzle may have different difficulties due to different valid applications being recognized first. We are now ready to define the difficulty metric of a sudoku board B. Definition. For a sudoku board B, take its difficulty metric m(B) to be the average total number of possible applications checked by the solver while using the HumanSolve algorithm. 3.2 hsolve and Metric Calculation In order to actually calculate the difficulty m(B) of a board B, we use hsolve, a program which simulates the running of HumanSolve and calculates the resulting difficulty. hsolve is implemented in Java 1.6, and its source code is attached in Appendix B. Given a Sudoku puzzle B, hsolve does the following: 1. Set the initial difficulty d = 0 2. Repeat the following actions in order until B is solved or the solver cannot progress: (a) Choose the tier of easiest strategies S that has not yet been searched for in the current board configuration. (b) Find the number p of possible applications of S. (c) Find the set V of all valid applications of S and compute the size v of V . 6
Page 7 of 24 Control #2858 (d)Compute E(p, u), the expected number of possible applications that will be ex amined before a valid application is found (e) Increment d by E(p, u).t, where t is the standard check time. Pick a random application in V and apply it to the board 3. Return the value of d and the final solved board While hsolve is mostly a direct implementation of Human Solve, it does not actually perform a random search through possible applications; instead it uses the expected search time E(p, u)to simulate this search. Note that the following lemma gives an extremely convenient closed form expression for E(p, u) that we use in hsolve Lemma 1. Assuming that all search paths through p possible approaches are equally likely, the expected number e(p, a) of checks required before finding one of v valid approaches is given by E(p, U) p+ U+1 Proof. For our purposes, to specify a search path it is enough to specify the v indices of the valid approaches out of p choices, so there are()possible search paths. Let I be the random variable equal to the smallest index of a valid approach. Then, we have that E(,0)=∑iP(=)=∑∑P(I=j=∑P(≥) +1 +)=曾 where we've used the Hockeystick identity Given a Sudoku puzzle B, then, we calculate m(B) by running hsolve several times and taking the average of the returned difficulties. To increase the level of accuracy, the number of runs can be increased; 20 runs per puzzle was found to give a ratio of standard deviation to mean of about ga 1, so we use this value throughout the rest of the paper 3.3 Analysis Our evaluation of hsolve consists of three major components 1. Checking that hsolve's conception of difficulty is correlated with existing conceptions of difficulty 2. Comparing the distribution of difficulties generated by hsolve to established distribu- tions for solve time 3. Finding the runtime of the algorithm
Page 7 of 24 Control #2858 (d) Compute E(p, v), the expected number of possible applications that will be examined before a valid application is found. (e) Increment d by E(p, v) · t, where t is the standard check time. Pick a random application in V and apply it to the board. 3. Return the value of d and the final solved board. While hsolve is mostly a direct implementation of HumanSolve, it does not actually perform a random search through possible applications; instead it uses the expected search time E(p, v) to simulate this search. Note that the following lemma gives an extremely convenient closed form expression for E(p, v) that we use in hsolve. Lemma 1. Assuming that all search paths through p possible approaches are equally likely, the expected number E(p, a) of checks required before finding one of v valid approaches is given by E(p, v) = p + 1 v + 1 . Proof. For our purposes, to specify a search path it is enough to specify the v indices of the valid approaches out of p choices, so there are p v possible search paths. Let I be the random variable equal to the smallest index of a valid approach. Then, we have that E(p, v) = p−Xv+1 i=1 iP(I = i) = p−Xv+1 i=1 p−Xv+1 j=i P(I = j) = p−Xv+1 i=1 P(I ≥ i) = 1 p v p−Xv+1 i=1 p + 1 − i v = 1 p v X p−v j=0 v + j v = p+1 v+1 p v = p + 1 v + 1 , where we’ve used the Hockeystick identity. Given a Sudoku puzzle B, then, we calculate m(B) by running hsolve several times and taking the average of the returned difficulties. To increase the level of accuracy, the number of runs can be increased; 20 runs per puzzle was found to give a ratio of standard deviation to mean of about σ µ ≈ 1 10 , so we use this value throughout the rest of the paper. 3.3 Analysis Our evaluation of hsolve consists of three major components: 1. Checking that hsolve’s conception of difficulty is correlated with existing conceptions of difficulty. 2. Comparing the distribution of difficulties generated by hsolve to established distributions for solve time. 3. Finding the runtime of the algorithm. 7