Page 1 of 62 MCM 2008 Team #3780 Ease and Toil: Analyzing Sudoku February 18, 2008 Look at any current magazine, newspaper, computer game package or handheld gaming device and you likely find sudoku, the latest puzzle game sweeping the nation. Sudoku is a number-based logic puzzle in which the numbers 1 through 9 are arranged in a 9 x 9 matrix, subject to the constraint that there are no repeated numbers in any row, column or designated 3 x 3 square In addition to being entertaining, sudoku promises valuable insight into computer nce and mathematical modeling. In particular, since sudoku solving is an NP-Complete problem, algorithms to generate and solve sudoku puzzles may offer new approaches to a whole class of computational problems. Moreover, we can further explore mathematical modeling techniques through generating puzzles since sudoku construction is essentially n optimization problem The purpose of this paper is to propose an algorithm that may be used to construct unique sudoku puzzles with four different levels of difficulty. We attempted to minimize the complexity of the algorithm while still maintaining separate difficulty levels and guar- teeing unique solutions In order to accomplish our objectives, we developed metrics with which to analyze the difficulty of a given puzzle. By applying our metrics to published control puzzles with spe- cific difficulty levels we were able to develop classification functions for specific difficulty ratings. We then used the functions we developed to ensure that our algorithm gener- ated puzzles with difficulty levels analogous to those currently published. We also sought out to measure and reduce the computational complexity of the generation and metric measurement algorithms Finally, we worked to analyze and reduce the complexity involved in generating puzzles while maintaining the ability to choose the difficulty of the puzzles generated. To do so we implemented a profiler and performed statistical hypothesis testing to streamline the algorithm
Page 1 of 62 MCM 2008 Team #3780 Ease and Toil: Analyzing Sudoku February 18, 2008 Look at any current magazine, newspaper, computer game package or handheld gaming device and you likely find sudoku, the latest puzzle game sweeping the nation. Sudoku is a number-based logic puzzle in which the numbers 1 through 9 are arranged in a 9 × 9 matrix, subject to the constraint that there are no repeated numbers in any row, column, or designated 3 × 3 square. In addition to being entertaining, sudoku promises valuable insight into computer science and mathematical modeling. In particular, since sudoku solving is an NP-Complete problem, algorithms to generate and solve sudoku puzzles may offer new approaches to a whole class of computational problems . Moreover, we can further explore mathematical modeling techniques through generating puzzles since sudoku construction is essentially an optimization problem. The purpose of this paper is to propose an algorithm that may be used to construct unique sudoku puzzles with four different levels of difficulty. We attempted to minimize the complexity of the algorithm while still maintaining separate difficulty levels and guaranteeing unique solutions. In order to accomplish our objectives, we developed metrics with which to analyze the difficulty of a given puzzle. By applying our metrics to published control puzzles with specific difficulty levels we were able to develop classification functions for specific difficulty ratings. We then used the functions we developed to ensure that our algorithm generated puzzles with difficulty levels analogous to those currently published. We also sought out to measure and reduce the computational complexity of the generation and metric measurement algorithms. Finally, we worked to analyze and reduce the complexity involved in generating puzzles while maintaining the ability to choose the difficulty of the puzzles generated. To do so, we implemented a profiler and performed statistical hypothesis testing to streamline the algorithm
Page 2 of 62 MCM 2008 Team #3780 Contents 5.3.4 Uniqueness Testing 17 5.4 Complexity Analysis 1 Introduction 5.4.1 Parameterization .18 1.1 Statement of problem 3 5.4.2 Complexity of C 1.2 Relevance of sudoku 3 Puzzle generation 3 1.4 Rules of sudoku 5.4.3 Complexity of Uniqueness 3 Testing and Random Filling. 18 1.5 Terminology and Notation 3 5.4.4 Profiling Method 1.6 Indexing 5.4.5 WNEF vs Running Time 1.7 Formal Rules of sudoku 5 5.5 Testing 1.8 Example Puzzles 5 5.5.1 WNEF as a function of de- 2 Background 5 2.1 Common Solving Techniqu 5.5.2 Hypothesis Testing 2.1.1 Naked Pair 6 Strengths and Weaknesses 2.1.2 Naked Triplet 5 2.1.3 Hidden pair 67 Conclusions 21 2.1.4 Hidden Triplet 2.1.5 Multi-Line 6 References 21 2.2 Previous Works 7 2.2.1 Sudokuexplainer 7 1 Source Code 2.2.2 QQWing 2.2.3 GNOME Sudoku 7 2 Screenshots of Puzzle Generator 3 Metric Design 10 3.1 Overview 3.2 Assumptions 10 List of Figures 3.3 Mathematical basis for WneF 10 3.3.1 Complexit 10 1 Demonstration of indexing schemes. 6 2 Puzzle generated by WebSudoku 4 Metric Calibration and Testing ( ranked as“Easy”) 6 4.1 Control Puzzle sources 118 Top 1465 Number 77 4.2 Testing Method 4 An example of a hand-made Nikoli 4.2.1 Defining Difficulty Ranges.. 12 puzzle. 4.2.2 Information collection 12 5 Example of the Naked Pair rule 4.2.3 Statistical Analysis of Con- 6 Example of the Naked Triplet rule. 8 trol puzzles 7 Example of the Hidden Pair rule 8 4.3 Choice of Weight Function. 12 8 Example of the Hidden Triplet rule. 9 9 Example of the Multi-Line rul 5 Generator Algorithm 12 10 Examples of choice histograms 11 5.1 Overview 1 WNEF for control puzzles by diffi- 5.2 Detailed Description cul 5.2.1 Completed Puzzle Generation 14 12 WNEF correlations for various 5.2.2 Cell removal 14 weighting functions 5.2.3 Uniqueness 15 13 Running time as a function of the 5.3 Pseudocode 15 obtained wef 5.3.1 Completed Board Generation 15 14 WNEF as a function of allowed fail- 5.3.2 Random Masking ures 5.3.3 Tuned Masking 17 15 Screenshots of puzzle generator
Page 2 of 62 MCM 2008 Team #3780 Contents 1 Introduction 3 1.1 Statement of Problem . . . . . . . . . 3 1.2 Relevance of Sudoku . . . . . . . . . 3 1.3 Goals . . . . . . . . . . . . . . . . . . 3 1.4 Rules of Sudoku . . . . . . . . . . . . 3 1.5 Terminology and Notation . . . . . . 3 1.6 Indexing . . . . . . . . . . . . . . . . 4 1.7 Formal Rules of Sudoku . . . . . . . 5 1.8 Example Puzzles . . . . . . . . . . . 5 2 Background 5 2.1 Common Solving Techniques . . . . 5 2.1.1 Naked Pair . . . . . . . . . . 5 2.1.2 Naked Triplet . . . . . . . . . 5 2.1.3 Hidden Pair . . . . . . . . . . 6 2.1.4 Hidden Triplet . . . . . . . . . 6 2.1.5 Multi-Line . . . . . . . . . . . 6 2.2 Previous Works . . . . . . . . . . . . 7 2.2.1 SudokuExplainer . . . . . . . 7 2.2.2 QQWing . . . . . . . . . . . . 7 2.2.3 GNOME Sudoku . . . . . . . 7 3 Metric Design 10 3.1 Overview . . . . . . . . . . . . . . . . 10 3.2 Assumptions . . . . . . . . . . . . . . 10 3.3 Mathematical Basis for WNEF . . . 10 3.3.1 Complexity . . . . . . . . . . . 10 4 Metric Calibration and Testing 11 4.1 Control Puzzle Sources . . . . . . . . 11 4.2 Testing Method . . . . . . . . . . . . 12 4.2.1 Defining Difficulty Ranges . . 12 4.2.2 Information Collection . . . . 12 4.2.3 Statistical Analysis of Control Puzzles . . . . . . . . . . . 12 4.3 Choice of Weight Function. . . . . . . 12 5 Generator Algorithm 12 5.1 Overview . . . . . . . . . . . . . . . . 12 5.2 Detailed Description . . . . . . . . . 14 5.2.1 Completed Puzzle Generation 14 5.2.2 Cell Removal . . . . . . . . . . 14 5.2.3 Uniqueness Testing . . . . . . 15 5.3 Pseudocode . . . . . . . . . . . . . . . 15 5.3.1 Completed Board Generation 15 5.3.2 Random Masking . . . . . . . 16 5.3.3 Tuned Masking . . . . . . . . 17 5.3.4 Uniqueness Testing . . . . . . 17 5.4 Complexity Analysis . . . . . . . . . 18 5.4.1 Parameterization . . . . . . . 18 5.4.2 Complexity of Completed Puzzle Generation . . . . . . . 18 5.4.3 Complexity of Uniqueness Testing and Random Filling . 18 5.4.4 Profiling Method . . . . . . . 18 5.4.5 WNEF vs Running Time . . . 19 5.5 Testing . . . . . . . . . . . . . . . . . 19 5.5.1 WNEF as a Function of Design Choices . . . . . . . . . . 19 5.5.2 Hypothesis Testing . . . . . . 19 6 Strengths and Weaknesses 19 7 Conclusions 21 References 21 1 Source Code 23 2 Screenshots of Puzzle Generator 62 List of Figures 1 Demonstration of indexing schemes. 6 2 Puzzle generated by WebSudoku (ranked as “Easy”). . . . . . . . . . . 6 3 Top 1465 Number 77. . . . . . . . . . 7 4 An example of a hand-made Nikoli puzzle. . . . . . . . . . . . . . . . . . 7 5 Example of the Naked Pair rule. . . 8 6 Example of the Naked Triplet rule. . 8 7 Example of the Hidden Pair rule. . . 8 8 Example of the Hidden Triplet rule. 9 9 Example of the Multi-Line rule. . . . 9 10 Examples of choice histograms. . . . 11 11 WNEF for control puzzles by diffi- culty. . . . . . . . . . . . . . . . . . . 13 12 WNEF correlations for various weighting functions. . . . . . . . . . . 13 13 Running time as a function of the obtained WNEF. . . . . . . . . . . . . 20 14 WNEF as a function of allowed failures. . . . . . . . . . . . . . . . . . . . 20 15 Screenshots of puzzle generator. . . . 62
Page 3 of 62 MCM 2008 Team #3780 1 Introduction Such a set of goals could easily lead to a project of an unmanageable scope. Thus, we explicitly do 1.1 Statement of Problem not aim for any of the following properties We set out to design an algorithm that would con struct unique sudoku puzzles of various difficul- Attempt to“ force” a particular solving ties as well as to develop metrics by which to mea- method upon players sure the difficulty of a given puzzle. In particular our algorithm must admit at least four levels To be the best available algorithm for the difficulty while minimizing its level of comple task of making exceedingly difficult puzzles ty Impose symmetry requirement 1.2 Relevance of sudoku 1. 4 Rules of sudoku e feel that this problem is relevant and of inter est, since the game of sudoku is inherently math- The game of sudoku is played upon a 33 grid ematical, and offers rich opportunities to explore of blocks, each of which is a 3X 3 grid of cells mathematical techniques. Indeed, the problem is Each cell can have a value of 1 through 9, sub NP-Complete [3], and yet manages to be some- ject to a simple constraint, or may be empty. what accessible to casual analysis. Moreover, The object of the game is to, given a partially by developing techniques for use with a problem filled out grid called a puzzle, use logical infer- over which we have such complete control, we ence to place values in all of the empty cells such may expand into other and more practical prob- that the constraints are upheld. It is fully pos lems. In fact, sudoku is essentially an exercise sible to create a puzzle which has no solution in compression, and so techniques for generat- (it contradicts itself, forcing the player to violate ing difficult puzzle instances lead directly to real- a constraint), or which has multiple solutions ations about information content and entropy. We shall impose the additional requirement upon We, however, shall restrict our focus directly to puzzles that they admit exactly one solution each the problem at hand, and be content to leave When properly filled out, no row, column or these reasons, along with sudokus entertainment block may have two cells with the same value value, as our motivation for exploring the game. This simple constraint is what allows all of the inference to work. Some examples of puzzles and 1.3 Goals their solutions may be found in Section 1.8. For Our goal is to create an ]gorithm that will pro- 12 e details and a complete tutorial, please see duce sudoku puzzles. In doing so, and to meet the conditions of the proposed problem(section 1.1), 1.5 Terminology and Notation we aim to create an algorithm with the following It is difficult to discuss our solution to the pro posed problem without understanding some com- Will only create valid puzzle instances (no mon terminology. Moreover, since we will apply contradictions, and admitting a unique so- more mathematical formalism here than in most lution) documents dealing with sudoku, it will be helpful Can generate puzzles at any of four differ- to introduce notational conventions ent difficulty levels(easy, medium, hard and evil) Assignment A tuple(a, X) of a value and a cell If a puzzle contains an assignment(a, X) Produces puzzles in a reasonable amount of we say that X has the value that X maps time, regardless of the chosen difficulty. to z or that xhi IThis term was chosen for traditional reasons, as many sources prefer to use references to immorality to measure diffi
Page 3 of 62 MCM 2008 Team #3780 1 Introduction 1.1 Statement of Problem We set out to design an algorithm that would construct unique sudoku puzzles of various difficulties as well as to develop metrics by which to measure the difficulty of a given puzzle. In particular, our algorithm must admit at least four levels of difficulty while minimizing its level of complexity. 1.2 Relevance of Sudoku We feel that this problem is relevant and of interest, since the game of sudoku is inherently mathematical, and offers rich opportunities to explore mathematical techniques. Indeed, the problem is NP-Complete [3], and yet manages to be somewhat accessible to casual analysis. Moreover, by developing techniques for use with a problem over which we have such complete control, we may expand into other and more practical problems. In fact, sudoku is essentially an exercise in compression, and so techniques for generating difficult puzzle instances lead directly to realizations about information content and entropy. We, however, shall restrict our focus directly to the problem at hand, and be content to leave these reasons, along with sudoku’s entertainment value, as our motivation for exploring the game. 1.3 Goals Our goal is to create an algorithm that will produce sudoku puzzles. In doing so, and to meet the conditions of the proposed problem (section 1.1), we aim to create an algorithm with the following properties: • Will only create valid puzzle instances (no contradictions, and admitting a unique solution). • Can generate puzzles at any of four different difficulty levels (easy, medium, hard and evil1 ). • Produces puzzles in a reasonable amount of time, regardless of the chosen difficulty. Such a set of goals could easily lead to a project of an unmanageable scope. Thus, we explicitly do not aim for any of the following properties: • Attempt to “force” a particular solving method upon players. • To be the best available algorithm for the task of making exceedingly difficult puzzles. • Impose symmetry requirements . 1.4 Rules of Sudoku The game of sudoku is played upon a 3 × 3 grid of blocks, each of which is a 3 × 3 grid of cells. Each cell can have a value of 1 through 9, subject to a simple constraint, or may be empty. The object of the game is to, given a partially- filled out grid called a puzzle, use logical inference to place values in all of the empty cells such that the constraints are upheld. It is fully possible to create a puzzle which has no solution (it contradicts itself, forcing the player to violate a constraint), or which has multiple solutions. We shall impose the additional requirement upon puzzles that they admit exactly one solution each. When properly filled out, no row, column or block may have two cells with the same value. This simple constraint is what allows all of the inference to work. Some examples of puzzles and their solutions may be found in Section 1.8. For more details and a complete tutorial, please see [1]. 1.5 Terminology and Notation It is difficult to discuss our solution to the proposed problem without understanding some common terminology. Moreover, since we will apply more mathematical formalism here than in most documents dealing with sudoku, it will be helpful to introduce notational conventions. Assignment A tuple (x, X) of a value and a cell. If a puzzle contains an assignment (x, X), we say that X has the value x, that X maps to x, or that X 7→ x. 1This term was chosen for traditional reasons, as many sources prefer to use references to immorality to measure diffi- culty
Page 4 of 62 MCM 2008 Team #3780 Candidates A set of those values which may umn has as its representative the cell at the be assigned to a square. As more informa fourth row and column tion is taken into account the set is reduced until only one candidate remains, at which point it becomes the value of the cell. We Restrictions In some cases, it is more straight denote the set of candidates for some cell x forward to discuss which values a cell can- not be assigned to than to discuss the set of candidates. Thus. the restrictions set X! for Cell A single square within a sudoku puzzle, a cell X is defined as V\X? which may have one of the integer values from 1 to 9. We denote cells using upper- case italic serif letters:X.Y.Z Rule An algorithm which accepts a puzzle P Block One of the nine 3 x 3 squares within the and produces either a puzzle P represent puzzle. The boundaries of these blocks are ing strictly more information(more restric denoted by thicker lines on the puzzle,s grid tions have been added via logical inference It is important to note that in sudoku, no or cells have been filled in) or some value two blocks overlap(share common cells that indicates that the rule failed to ad There are variants of sudoku. such vance the puzzle towards a solution persudoku in which this occurs, but w focus our attention on the traditional rules. Solution a set of assignments to all cells in a Grouping A set of cells in the same row, col- puzzle such that all groupings have exactly umn or block. We represent groupings with one cell assigned to each value uppercase boldface serif letters: X,Y, Z We refer unambiguously to the row group- value a ings Ri, the column groupings C, and the that may be assigned to a block groupings B, following the indexing tr purposes, all sudoku puzzles scheme in section 1.6. The set of all group ditional numeric value set v ings will be denoted G. (1, 2, 3, 4, 5, 6, 7, 8, 9. This can be confusing at times, since we will be discussing other Metric We call a function m: P-R(assigning numbers but we choose to do so for the sake a real number to each valid puzzle)a metric of convention. a value is denoted by a lower if it provides information about the relative case sans serif letter: x, y, z difficulty of the puzzle Puzzle a 9x 9 matrix of cells with at least one empty and at least one filled cell. For our 1.6 Indexing purposes, we impose the additional require ment that all puzzles have exactly one so- Define the following indicies using the terminal lution. We denote puzzles by boldface cap ital serif letters: P, Q, R. Since this no- ogy above(section 1.5). As a convention, all indi- tation conflicts with that for groupings, we cies will start with zero for the first cell or block will always denote that a variable is a puz zle. Moreover, we refer to cells belonging to a puzzle: X E P. Finally, in the rare in- block number stance that we wish to denote the set of all valid puzzles, we shall do so with a double- k: cell number within a block struck p: P i: row number sentative The upper-left cell in each column block is that block's representative. Fore i': representative row number ample, the cell in the 5 row and 5 col- representative column number
Page 4 of 62 MCM 2008 Team #3780 Candidates A set of those values which may be assigned to a square. As more information is taken into account, the set is reduced until only one candidate remains, at which point it becomes the value of the cell. We denote the set of candidates for some cell X by X?. Cell A single square within a sudoku puzzle, which may have one of the integer values from 1 to 9. We denote cells using uppercase italic serif letters: X, Y , Z. Block One of the nine 3 × 3 squares within the puzzle. The boundaries of these blocks are denoted by thicker lines on the puzzle’s grid. It is important to note that in sudoku, no two blocks overlap (share common cells). There are variants of sudoku, such as hypersudoku in which this occurs, but we will focus our attention on the traditional rules. Grouping A set of cells in the same row, column or block. We represent groupings with uppercase boldface serif letters: X, Y, Z. We refer unambiguously to the row groupings Ri , the column groupings Cj and the block groupings Bc, following the indexing scheme in section 1.6. The set of all groupings will be denoted G. Metric We call a function m : P → R (assigning a real number to each valid puzzle) a metric if it provides information about the relative difficulty of the puzzle. Puzzle A 9 × 9 matrix of cells, with at least one empty and at least one filled cell. For our purposes, we impose the additional requirement that all puzzles have exactly one solution. We denote puzzles by boldface capital serif letters: P, Q, R. Since this notation conflicts with that for groupings, we will always denote that a variable is a puzzle. Moreover, we refer to cells belonging to a puzzle: X ∈ P. Finally, in the rare instance that we wish to denote the set of all valid puzzles, we shall do so with a doublestruck P: P. Representative The upper-left cell in each block is that block’s representative. For example, the cell in the 5th row and 5th column has as its representative the cell at the fourth row and column. Restrictions In some cases, it is more straightforward to discuss which values a cell cannot be assigned to than to discuss the set of candidates. Thus, the restrictions set X! for a cell X is defined as V\X?. Rule An algorithm which accepts a puzzle P and produces either a puzzle P0 representing strictly more information (more restrictions have been added via logical inference or cells have been filled in) or some value that indicates that the rule failed to advance the puzzle towards a solution. Solution A set of assignments to all cells in a puzzle such that all groupings have exactly one cell assigned to each value. Value A symbol that may be assigned to a cell. For our purposes, all sudoku puzzles use the traditional numeric value set V = {1, 2, 3, 4, 5, 6, 7, 8, 9}. This can be confusing at times, since we will be discussing other numbers, but we choose to do so for the sake of convention. A value is denoted by a lower case sans serif letter: x, y, z. 1.6 Indexing Define the following indicies using the terminology above (section 1.5). As a convention, all indicies will start with zero for the first cell or block. c : block number k : cell number within a block i : row number j : column number i 0 : representative row number j 0 : representative column number
Page 5 of 62 MCM 2008 Team #3780 These indicies are related by the following func- 2 Background tions: 2.1 Common Solving Techniques As with any activity, several sets of techniques k i (c, k) have emerged to help solve sudoku puzzles. We collect some here so that we may refer to them in j(c, k)=(c mod 3).3+(k mod 3) our own development. In all of the techniques be- low, we assume that the puzzle being solved has a single unique solution. These techniques and j'(c)=( mod 3)3 examples are adapted from [10] and [2] ()=3|3 2.1.1 Naked pair If, in a single row, column or block grouping A Figure I demonstrates how the rows, columns there are two cells X and Y each having the same and blocks are indexed as well as the idea of a pair of candidates x?=r?=ip, q), then those block representative. In the third sudoku grid. candidates may be eliminated from all other cells the representatives for each block are denoted in A. To see that we can do this, assume for the with an“r z∈ A such that Z H p, then x≠p, which im 1.7 Formal rules of Sudoku plies that X H. This in turn means that y q, but we have from Z p that Y p, leaving We may now formally state the rules of sudoku y?=0. Since the puzzle has a solution, this is a that restrict allowable assignments using the no- contradiction, and ZH+p tation developed thus far As an example of this arrangement is shown in figure 5. The cells marked X and Y sat- (vG∈GX∈G)X+y→打∈G:Y→ v isfy X?=Y?={2,8}, and so we can remove both 2 and 8 from all other cells in R. That is pplying this sort of formalism to the rules of su- VZ E(Rs\X, Y)): 2, 8Z? doku will allow us to make strong claims about solving techniques later, and so it is useful intro duce this notation for the rules 2.1.2 Naked Triplet 1.8 Example Puzzles This rule is analogous to the Naked Pair rule(sec- tion 2.1.1), but instead it involves three cells in- The rules alone do not explain what a sudoku stead of two. Let a be some grouping(row, col puzzle looks like, and so we have included a few umn or block), and let X,Y, Z E A such that examples of well-crafted sudoku puzzles. Figure the candidates for X, Y and Z are drawn from 6 shows a puzzle ranked as "Easy by WebSudoku t, u, v). Then, by exhaustion, there is a one-to [4] one set of assignments from X,Y, Z to t,u,vI By contrast, Figures 7 and 7 show the results Therefore, no other cell in A may map to a value of two different approaches to generating difficult in t, u, v) puzzles: the first one was computer generated as An example of this is given in Figure6.Here part of an experiment in minimal sudoku puz- we have marked the cells X, Y, Z) accordingly cles, whereas the second was hand-made by the and consider only block 8. In this puzzle,x? authors at Nikoli, the company most famously (3, 7), Y?=(1, 3, 7) and Z?=(1, 3. Therefore, associated with sudoku. It is interesting that we must assign 1, 3 and 7 to these cells, and may two such completely different approaches result remove them from the candidates for those cells in very similar looking puzzles marked with an asterisk
Page 5 of 62 MCM 2008 Team #3780 These indicies are related by the following functions: c (i, j) = j 3 + i 3 · 3 i(c, k) = 3 j c 3 k + k 3 j (c, k) = (c mod 3) · 3 + (k mod 3) i 0 (c) = 3 j c 3 k j 0 (c) = (c mod 3) · 3 i 0 (i) = 3 i 3 j 0 (j) = 3 j 3 Figure 1 demonstrates how the rows, columns and blocks are indexed, as well as the idea of a block representative. In the third sudoku grid, the representatives for each block are denoted with an “r”. 1.7 Formal Rules of Sudoku We may now formally state the rules of sudoku that restrict allowable assignments using the notation developed thus far: (∀G ∈ G ∀X ∈ G) X 7→ v ⇒ @Y ∈ G : Y 7→ v Applying this sort of formalism to the rules of sudoku will allow us to make strong claims about solving techniques later, and so it is useful introduce this notation for the rules. 1.8 Example Puzzles The rules alone do not explain what a sudoku puzzle looks like, and so we have included a few examples of well-crafted sudoku puzzles. Figure 6 shows a puzzle ranked as “Easy” by WebSudoku [4]. By contrast, Figures 7 and 7 show the results of two different approaches to generating difficult puzzles: the first one was computer generated as part of an experiment in minimal sudoku puzzles, whereas the second was hand-made by the authors at Nikoli, the company most famously associated with sudoku. It is interesting that two such completely different approaches result in very similar looking puzzles. 2 Background 2.1 Common Solving Techniques As with any activity, several sets of techniques have emerged to help solve sudoku puzzles. We collect some here so that we may refer to them in our own development. In all of the techniques below, we assume that the puzzle being solved has a single unique solution. These techniques and examples are adapted from [10] and [2]. 2.1.1 Naked Pair If, in a single row, column or block grouping A, there are two cells X and Y each having the same pair of candidates X? = Y ? = {p, q} , then those candidates may be eliminated from all other cells in A. To see that we can do this, assume for the sake of contradiction that there exists some cell Z ∈ A such that Z 7→ p, then X 67→ p, which implies that X 7→ q. This in turn means that Y 67→ q, but we have from Z 7→ p that Y 67→ p, leaving Y ? = ∅. Since the puzzle has a solution, this is a contradiction, and Z 67→ p. As an example of this arrangement is shown in figure 5. The cells marked X and Y satisfy X? = Y ? = {2, 8}, and so we can remove both 2 and 8 from all other cells in R8. That is, ∀Z ∈ (R8\ {X, Y }) : 2, 8 ∈/ Z?. 2.1.2 Naked Triplet This rule is analogous to the Naked Pair rule (section 2.1.1), but instead it involves three cells instead of two. Let A be some grouping (row, column or block), and let X, Y, Z ∈ A such that the candidates for X, Y and Z are drawn from {t, u, v}. Then, by exhaustion, there is a one-toone set of assignments from {X, Y, Z} to {t, u, v}. Therefore, no other cell in A may map to a value in {t, u, v}. An example of this is given in Figure 6. Here, we have marked the cells {X, Y, Z} accordingly and consider only block 8. In this puzzle, X? = {3, 7}, Y ? = {1, 3, 7} and Z? = {1, 3}. Therefore, we must assign 1, 3 and 7 to these cells, and may remove them from the candidates for those cells marked with an asterisk