This is the first post in a series on More Sums than Difference Sets. In this post, we'll get our terminology straight and ask a lot of questions.

I'm a sucker for puzzles. It says so on my website. Here's one for you:

Let \(A\) be a finite set of integers. Consider all numbers representable as a sum of two (not necessarily distinct) elements of \(A\). How many are there?

Let's be a bit more precise. Given a set of integers \(A\), we define the *sum set of \(A\)*, to be the set \[A + A = \{x + y \mid x,y\in
A\}.\] We can then rephrase the puzzle as asking for the cardinality of \(A+A\).

You may want to think about it yourself or at least play with a few examples before reading on.

Welcome back. Let's start with some upper bounds. If \(|A| = n\), then there are only \(n^2\) pairs of elements of \(A\), so there can't be any more than that many sums. Commutativity of addition brings this bound down to \(n + 1\choose 2\). (It's \(n + 1\) instead of \(n\) because we allow a number to be added to itself). It's not too hard to show this is tight.

Exercise 1. Show that the above bound is tight. In other words, for each \(n\), exhibit a finite set \(A_n\) of size \(n\) with \(n + 1 \choose 2\) distinct possible pairwise sums. Hint. Solution.

What about lower bounds? Let \(l = \min A\). If we define \(l + A = \{l + x \mid x \in A\}\), then it's clear that \(l + A \subseteq A + A\). Furthermore, it's not hard to see that \(|l + A| = |A|\). It follows that \(|A| \le |A + A|\). Now let \(r = \max A\) and consider \(r + A\).

Exercise 2. Show that \((l + A)\cap (r + A) = \{l + r\}\). Conclude that \(2|A| - 1 \le |A + A|\). Hint. Solution.

Exercise 3. Show that this lower bound is tight. In other words, for each \(n\), exhibit a set \(A_n\) of size \(n\) with only \(2n - 1\) sums. Hint. Solution.

Well, we've given tight upper and lower bounds on \(|A+A|\), and the two bounds aren't equal. So the answer to the puzzle is, "it depends!" The size of \(A+A\) depends on the geometry of \(A\). If the points are spaced out unevenly, we get many sums. If the points are regularly spaced, we get few.

Not a very good puzzle.

This dependence of the size of \(A+A\) on the geometry of \(A\) is quite subtle, and I don't know of any way to make precise the notion of "evenness", or even whether there are other factors at play. (As always, feel free to drop me an email if you have a comment, idea, or reference.) For now, lets move on.

By analogy to our investigations above, consider the set \[A - A = \{x - y \mid x,y \in A\}.\] We can now ask for upper and lower bounds on \(|A-A|\).

It's not hard to adjust the argument from Exercise 2 to show that the lower bound is the same as before.

Exercise 4. Show that the \(2n-1\) lower bound holds on \(|A - A|\) and is tight. Hint. Solution.

Things get a little more interesting when we look at upper bounds. If you look back at our discussion of the upper bound on the size of the sum set, we made crucial use the commutativity of addition. Since subtraction is not commutative, it may seem that the best we can do is the all-pairs bound of \(n^2\). If you work through an example you'll see why this can't be tight: \(x - x = 0\) for all \(x\in A\). Once we account for these repeated differences though, the bound is tight.

Exercise 5. Show that the upper bound \(|A - A| \le n (n - 1) + 1\) is tight. Hint. Solution.

Sum sets and difference sets have the same lower bound but difference have the potential to be about a factor of 2 larger. Anecdotally, it also appears that sets with large sum sets tend to have large difference sets and vice versa. We now turn to the question of the relative sizes of the sum set and the difference set.

Because the upper bound on difference sets is so much larger, we might expect that the typical set has a larger difference set than sum set. This is indeed the case. In fact, legend has it that the first investigators of sum and difference sets initially conjectured that this was true of *all* sets. This turns out to be wrong, and you may find it entertaining to attempt to construct such a set.

Exercise 6. Construct a set whose sum set is larger than its difference set. (Note: this may be time consuming.) Hint. Solution.

We call such sets *More Sums than Difference Sets* or *MSTD Sets*.

Exercise 6 tells us that MSTD sets exist, but we may wonder how many there are, for various notions of "many." For example, are there infinitely many? Do they occur with positive probability? The answer to both these questions is "Yes."

First, note that the sizes of the sum and difference sets are preserved by affine transformation of \(A\). In other words, adding a constant to every element of \(A\) doesn't change the size of the sum or difference set. Neither does multiplying each element of \(A\) by a constant. Thus we can weasel our way out of the question of whether infinitely many MSTD sets exist by taking an example from Exercise 6 and shifting or scaling it by infinitely many constants.

This works, but it feels like cheating. We'd like to find infinitely many examples that are not related by affine transformation. While true, I know of no simple proof of this fact.

It is also true that MSTD sets occur with positive probability. More precisely, the proportion of subsets of \(\{1, \dots, n\}\) that are MSTD is bounded below by a constant independent of \(n\) (See MO.) As one might expect, this result is even deeper.

As part of the Williams College SMALL Math REU, Steven Miller's group has been investigating problems related to MSTD sets and their generalizations. While I wasn't involved in the group, I kept up with their work because I found it fascinating in that puzzle sort of way. One thing that's really nice about this area is its exploratory nature. When investigating a question, one often just wants to "try it out" by searching for examples. The discrete nature of the problem lends itself to computer-aided brute-force search, often leading to new insights, or perhaps more commonly, new questions and conjectures.

The rest of this series will discuss my work as "chief tryer-outer guy" for Miller's group. Next time we'll start by looking at typical question arising from this work: for each \(n\), how many subsets of \(\{1, \dots, n\}\) are MSTD? Until then, perhaps you'd like to answer this question yourself? I know the answer up to \(n=42\). If you can push it further, you will advance the state of human knowledge! In the mean time, you can check your answers, post new results, and explore more at the OEIS page for MSTD sets.^{1}

*Note: the hints are designed not to give it away.*

If the elements of \(A\) are too close together, it will be difficult to avoid "collisions" between sums of different pairs of elements. This suggests spreading your points out.

On the other hand, if all your points are equally spaced, then by scaling down we obtain an equivalent, smaller set. This suggests spreading your points out *by different amounts*.

For the second part, try inclusion-exclusion.

If you leave a lot of gaps in \(A\) (and thus in \(l + A\)), they could be filled in by other sums not involving \(l\) or \(r\)!

To show the bound holds, modify the argument from Exercise 2.

To show the bound is tight, reuse an example from a previous exercise.

Reuse an example from a previous exercise.

You'll need at least 8 elements.

The distance between the minimum and maximum elements must be at least 14.

If you're still stuck but want to keep trying, try taking a set of size 3, call it \(S\). Try combining \(S\) with some of its mirror images. For example if \(S=\{1,2,4\}\), a mirror image of \(S\) would be \(\{1,3,4\}\). Another would be \(\{5,7,8\}\). Note that unioning \(S\) with a mirror image yields a "symmetric" set (in a sense we'll discuss later, but it should be intuitively clear what is meant). Finally, for each resulting symmetric set, break the symmetry by adding two more elements. Remember that the final set must have "width" (distance between min and max) at least 14. You'll want to leave quite a bit of space between your mirror images.

*Note: these are just sketches, meant to give away the insight but not necessarily work out all the details.*

There are many solutions, but here's just one.

Let \(A_n = \{1, 2, \dots, 2^{n-1}\}\), so that \(|A_n| = n\). In binary, each element has its own bit. If we add two different elements together, the resulting number has exactly two bits set in its binary representation. In either case, we can determine which elements were added together just by looking at the binary representation of the sum. It follows that distinct pairs (considered as sets) of elements map to distinct sums. Thus, \(|A_n| = {n+1\choose 2}\).

Let \(l+x \in l + A\) and \(r+y \in r + A\). If \(x \ne r\) or \(y \ne l\), it follows that \(l + x < r + y\) so that the two elements are not equal. Otherwise, we're in the \(l+r\) case, and we're done.

For the second part, we have \((l + A) \cup (r + A) \subseteq A + A\). This, together with the first part of the exercise and inclusion-exclusion yields the desired result.

Again, there are many solutions. Perhaps the simplest is to just take an interval of consecutive integers. That is, define \(A_n = \{1, \dots, n\}\). It is easy to see that every element of \(\{2, \dots, 2n\}\) is in \(A_n + A_n\). Computing sizes gives the desired result.

Showing the bound holds is analogous to Exercise 2. The example from Exercise 3 shows the bound is tight.

Powers of two (from Exercise 1) show that the bound is tight.

There are two minimal examples up to shifting by a constant. They are mirror images of each other, so that there is a unique minimal example up to \(\mathbf{Z}\)-affine transformation. Proving this uniqueness is non-trivial. In fact, I only know of one proof, and I don't understand it. See Hegarty's arXiv paper.

\[\{0, 2, 3, 4, 7, 11, 12, 14\}\] \[\{0, 2, 3, 7, 10, 11, 12, 14\}\]

I haven't posted the answer for 42 yet because it hasn't been double checked. Let me know if you want to do the double checking!↩