Making Sudokus Count R. Ramanujam, The Institute of Mathematical Sciences, Chennai I hope you are all as fond of Sudokus as I am. I am not a "puzzle fiend" like some of my friends are but Sudokus charm me more than most puzzles. Perhaps one reason for this is that solving them is only half the fun. I enjoy simply *looking* at the Sudoku before getting into it; there are usually a great many patterns to be noted, symmetries to be observed, and predictions to be made about which regions would be easy to fill. I try to desginate a few cells as the `villains', the ones that will cause maximum anguish. All this before starting, and if any of the predictions come out right, the resulting joy is far greater than solving the puzzle. What messages can a partially filled Sudoku give, apart from offering some obvious candidates of numbers for a few cells? Instead of answering this question, let me suggest that you look at a few Sudoku puzzles yourself, and develop whatever ideas you get. In this article, I will try and show at least some interesting mathematical ways of thinking about Sudokus that can stimulate such observation. What it is For the unfortunate ones who don't know what Sudokus are, here is a quick introduction. Take a look at Figure 1. We have a 9 by 9 square with some of the 81 cells filled, each with a positive integer in the range 1 to 9. The objective is to fill in every one of the 81 cells subject to three conditions: {\em the row constraint (RC)}, {\em the column constraint (CC)}, {\em the box constraint (BC)} and {\em the uniqueness guarantee (UG)}. Let us name the set {1,2,3,4,5,6,7,8,9} of the first nine positive integers N. The row constraint specifies that each of the 9 rows gets a permutation^* of N. Similarly, the column constraint forces a permutation of N. Note that the 81 cells of Sudoku are arranged in 9 squares, each 3 by 3. The first three rows give three boxes, the next three rows give three more boxes, and so on. Thus we have 9 boxes, and the box constraint forces each box to contain a permutation of N as well. The uniqueness guarantee says that there is only one way to fill in the remaining cells to give a solution. (*) A permutation of a set is an arrangement of its elements in any order. With {A, B, C} we get six permutations: ABC, ACB, BAC, BCA, CAB, CBA. In general, if a set has k elements, we have k! permutations, where k! is the notation for the product k x (k-1) x ... x 2 x 1. The leftmost position can be filled in k ways, the next one with any of the remaining (k-1), the next one with any of (k-2), and so on. So, if we fill any cell with a number from N, we must remember that no other cell in the same row, column or box as this cell is going to get the same number. The uniqueness guarantee seems to offer comfort; not only are we sure that we are not wasting time, there is in fact a solution, but also that we need not be confused by many different possibilities. Every cell is destined to match a unique number, we only need to make this match. That's it, with this knowledge you can solve any Sudoku. How it looks While the newly excited ones are busy solving the Sudoku in Figure 1, the rest of us can discuss some generalities. We have a 9 by 9 Sudoku with three constraints. Is the number 9 necessary? Does the set need to have exactly the first nine positive integers? Would changing any of these kill the joy of solving the puzzle? Surely, we can agree that the three constraints and the way they interconnect are the only essential ingredients. A first generalization is to think of any m by m Sudoku, filling in cells with elements of any set N with exactly m elements. Thinking thus, RC and CC are natural but with BC we have a problem. Can we ensure m boxes? (Do we need to?) If not m, can we ensure some number of complete boxes (of the same size) at all? A simple answer is to set m to be k^2 for some positive integer k. Then, we have RC, CC and BC. Let a sudoku have {\em rank k} if it is a k^2 by k^2 figure, containing k^4 cells. It is also clear that the set N can have any symbols at all, with no need for ordering them in any way. We have so far been talking of rank-3 sudokus. Think of rank-4 and rank-5 sudokus, it's fun. (Do we need names like `rank' for talking of something that is supposed to be fun and entertainment? Why should we even talk of `m by m' sudokus? Well, using such language is *doing* or *making* mathematics, and I hope you find it as enjoyable as solving puzzles; I do.) The mathematicians among you would worry about rank-0 and rank-1 sudokus. They are unique: rank-0 is the null sudoku that contains nothing, and the rank-1 sudoku is just a small box. What should it have? Any symbol you like. What about rank-2 sudokus, the 4 by 4 grids? Why are they consigned to `Sudokus for kids'? The answer is that they are very simple and easy to construct and solve, so they do not pose challenges. But then, just for that reason, they are also useful to get us thinking about sudokus, to prepare us for the harder questions on sudokus of higher rank. Making Sudokus Solving a rank-2 Sudoku may be easy, but how do you go about posing one? Oh, you say, that's trivial! Just take a filled in Sudoku, erase a few cells and give it. Sure, but how many will you erase? If you erase too few, there is no fun; if you erase too many, are you sure you are respecting the uniqueness guarantee? So, here is the question: for any rank-n sudoku, what is the largest number of cells that can be left blank, while ensuring a unique solution? Note that this is important for people posing sudokus, especially in different grades of difficulty. This is not an easy question to answer, but with rank-2 sudokus, we can at least try experimentation. Try various 4 by 4 grids, remove cells, see if there are many ways to complete them. But then, how large is the collection of sudokus we need to work with? If the collection is vast, we may spend years experimenting, getting nowhere. Perhaps we can use computer programs to do this, but even computers give up beyond some size. Thus the next question: how many distinct filled-in rank-n sudokus are there? This, it turns out, is a really hard question, and hence attracts anyone who is mathematically minded. The intrigue is that for rank-3, this is a specific number, however large. Are you surprised that until today we have only estimates of that number, anddo not know it actually? This, with very clever mathematicians thinking about it, and many computer programs helping them. Is that not amazing? The rank-2 case 4 by 4 sudokus are nice, since the two questions we have posed can be answered satisfactorily. Let us take the latter question first. How many filled in 4 by 4 sudokus are there? A simple way to approach the problem is to start row by row and simply count. I am going to follow the convention of proceeding from top to bottom and left to right. So if I speak of the cell (2,3), then I mean the cell in the second row from top and third column from the left. Let us start with an empty 4 by 4 grid. We can approach it in may ways, but let me choose to look at it as composed of four boxes. Let Box I have the cells (1,1), (1,2), (2,1) and (2,2); Box II is adjacent to it, and Box III is below it, leaving Box IV to be diagonally opposite. Even with an empty grid at hand, we can make a very useful observation, and let us do it in style. Proposition: If Boxes I, II and III are filled, there is at most one way of filling Box IV. Proof: Consider the cell (3,3). The third row and third column by now together have 4 numbers, a_1, a_2, b_1, b_2, not necessarily distinct, but a_1 \neq a_2, b_1 \neq b_2. Thus among them, we have at least two distinct numbers. If all four are distinct, there is no way of filling (3,3) and we have no solution. If three are distinct we have a unique solution to (3,3). Now suppose we have only two numbers among them. This means that the third (partial) row is exactly the same as third (partial) column. Let us say these are 1 and 2. But now note that the fourth (partial) row is exactly the same as the fourth (partial) column having the numbers 3 and 4 (see Figure 3 for an illustration). But then the only place in Box IV left for the numbers 3 and 4 would be the cell (3,3), and it cannot hold both numbers. Thus this case is not possible. (End of Proof) Can you re-work the argument and see that the proposition is actually more general: filling any three boxes fixes the fourth? When we start with filling Box I, we are unfettered. So let us fill it in as we like. Why not 1,2,3,4? There are 4! = 24 ways of filling in this box. We will keep that in mind, but proceed with that choice. What about Box II? Note that RC poses a constraint on cells (1,3) and (1,4): between them, they must contain the numbers 3 and 4, in any order. Clearly (2,3) and (2,4) must get 1 and 2 in some order. Further we see that cells (3,1) and (4.1) in Box III must contain the numbers 2 and 4 in some order and the cells (3,2), (4,2) get 1 and 3 in some order. These 4 pairs can be filled in 16 ways. Now we can backtrack a bit and observe that this argument does not at all depend on filling in Box I in any particular way. For {\em any} of filling it, we have 16 ways (in the manner argued above) for II and III. Thus we have 24 ways for Box I and 16 ways for the other two, making 24 X 16 = 384 possible rank-2 sudokus. But now recall the proof of the Proposition above. We can never have a column in Box II with the same entries in the corresponding row in Box III. Thus, of the 16 ways of fillings Boxes II and III, four ways lead to contradictions. Therefore there are only 24 X 12 = 288 possible rank-2 sudokus. Another way We now have a nice `theorem': the number of 4 by 4 sudokus is 288. That is not a large number, but for human beings, filling in that many Figures is tedious, may even be daunting. Mathematicians are remarkably lazy in this regard, and always look for ways to order or group such collections further into subgroups, so that it is easy to `picture' the collection. What I mean is this: is there some pattern by which we can classify these 288 grids, without even having all of them on hand? The answer is yes, by way of what are called {\em transformations}. Suppose you have a filled in grid. Now let us relabel 1 as A, 2 as B, 3 as C and 4 as D. The new grid is still a solution, though over a different set of symbols. But then we could equally have relabelled 1 as 2, 2 as 3, 3 as 4 and 4 as 1. We are applying the permutation 1234 --> 2341. Once again, the new grid is a solution, and is one of the 288. Thus we have: If we have a filled in grid and apply a permutation on the 4 digits, we get another filled in grid. Can we think of other transformations that take solutions to solutions? After some thought, we see that there are exactly four transformations, let us call them T1, T2, T3, T4. T1. Permute the digits. T2. Swap two rows or two columns common to a box. T3. Swap two rows or two columns of boxes. T4. Rotate or flip the whole grid. Now let us try to build all the 288 grids. We can start with Box I. By T1, we might as well assume that it is filled as in Figure 4a. Now Row I of Box II can be either 34 or 43. By T2, we can assume it to be 34. Column 1 of Box III can be 2 above 4 or 4 above 2. Again by T2, we can assume 2 to be above. We then have the grid as in Figure 4b. This places 4 uniquely in cell (3,3), giving us Figure 4c. Consider (4,4): it can have either 1,2 or 3. But if it is 3, we can apply T1 and swap 2 and 3, getting Figure 4d. Now by T4, we can flip on the top-left - bottom-right axis, getting Figure 4e. Thus we can have either 1 or 2 in (4,4). Each of these choices lead to a unique completion, and we get the two grids in Figures 5(a) and 5(c). Thus we see that there are only two really unique solutions for rank-2 sudokus, meaning that you cannot turn one of these into the other by simple operations that do not change the underlying structure of the grid. Every other grid can be obtained from one of these two by permuting the digits, swapping rows or columns common to a box, rotating or flipping the whole grid. Ways of completion We now get back to the question that started our discussion. for any rank-2 sudoku, what is the largest number of cells that can be left blank, while ensuring a unique solution? Alternatively, what is the smallest number of cells that need to be filled, to ensure a unique solution? Transformations already give us a way of thinking about this. Proposition: We need to have three or more cells filled in rank-2 sudokus to ensure a unique solution. Proof: Suppose we had only two cells filled with numbers a and b, and we get a solution. Now consider the permutation of digits that leaves a and b unchanged but swaps the remaining two numbers. This gives another different solution, violating the uniqueness guarantee. So the assumption was wrong, and we need at least three filled in cells. (End of proof) Of course, the converse need not be true. That is, having three filled in cells is no guarantee of a unique solution. To see this, consider Figure 6, try to solve it. What do you find? Can you come up with a rank-2 sudoku with only three filled-in cells and with a uniqueness guarantee? Last word I hope this foray into the world of sudoku mathematics convinces you that there is a rich world there that awaits exploration. For instance, why should we limit ourselves to square sudokus? Why not {\em rectangular} ones? Why not three-dimensional {\em cube} sudokus? And so on. And of course, rank-3 sudokus are the most fascinating. Rank-2 are too easy and Rank-4 too hard. Perhaps you will discover some propositions about them; if you do, be sure to write to JM about it!