robert Random Thoughts on a Neat Puzzle by

Here is a puzzle from Quanta magazine that I spent wayyy too many spare cycles on (and then a lot of cycles afterwards telling everybody the answers to the puzzle and also all the variations that I thought of). (EDIT: I saw that the Quanta article had a “solution” section after I wrote this, but I think it does a horrible job of explaining why randomness helps. Maybe this will be better.). Quote:

I write down two different numbers that are completely unknown to you, and hold one in my left hand and one in my right. You have absolutely no idea how I generated these two numbers. Which is larger? You can point to one of my hands, and I will show you the number in it. Then you can decide to either select the number you have seen or switch to the number you have not seen, held in the other hand, as your final choice. Is there a strategy that will give you a greater than 50 percent chance of choosing the larger number, no matter which two numbers I write down?

Rather than immediately give you the solution, which is technical, let’s introduce two easier versions of this puzzle to get the juices flowing. First, the easiest version:

Puzzle 1. Consider the following game. There are two players, Alice and Bob. Privately, Alice chooses two integers uniformly at random between 1 and 6 (say, by rolling two fair die), places one in each hand randomly, and shows her closed hands to Bob. Bob’s goal is to find the larger number, and he is allowed the following two actions. First, he chooses one of Alice’s hands, and Alice reveals her number to Bob. Then, he can either choose to take the number he has seen, or he can switch to the other hand. If he switches, he must take the number in the other hand. If the two numbers are the same then Bob always wins. Is there a strategy for Bob that will allow him to find the larger number with probability greater than 50%?

Here is a “dumb” solution. Bob picks a hand, ignores the number, and stays. The probability that he wins is the probability that the numbers are different and he chose the larger number, plus the probability that both numbers were the same (note that these events are disjoint!). Summing up, we get the probability that Bob wins is

\displaystyle \Pr[\text{Numbers are different and Bob chooses larger}] + \Pr[\text{Numbers are the same}]

\displaystyle = \frac{30}{36}\cdot \frac{1}{2} + \frac{1}{6} = \frac{42}{72} \approx 57\%

However, he can do much better. Here is one of many possible strategies: Bob chooses a hand uniformly at random, and receives a number L. If 1 <= L <= 3, then Bob switches. If 4 <= L <= 6, then Bob stays.

To see that this works better, suppose that Bob chooses the hand and receives a value L. The probability that Bob wins is the probability that he chose the smaller number first and switched, or he chose the larger number first and stayed, or both numbers were the same. Clearly, the probability that he gets the smaller number first or the larger number first is 1/2 in both cases. The probability that the smaller number is no more than 3 is 2/3, and symmetrically the probability that the larger number is no less than 4 is 2/3. Finally, the probability that the numbers are the same is 1/6, in which case he always wins. Thus the probability of winning is

\displaystyle \frac{1}{2}\cdot\frac{2}{3} + \frac{1}{2}\cdot\frac{2}{3} + \frac{1}{6} = \frac{5}{6} \approx 83\%

This strategy is both intuitive and surprisingly effective. If Alice instead chooses her random integers to be between 1 and k, for some positive k (let’s assume it is even, for simplicity), the above strategy generalizes in the obvious way, with Bob’s probability of success jumping to (exercise if you’re bored)

\displaystyle \frac{3}{4} + \frac{1}{4k}

which is guaranteed to be above 3/4 = 75%.

While this game is tilted towards Bob even from the outset, the second strategy above reveals the remarkable power that randomization can have. In Computer Science, randomness tends to have two major applications:

  1. In any sort of optimization or problem solving, when you are confronted with a problem for which computing the best answer is hard, but you (via some other knowledge) know that most answers are high quality, then generating an answer uniformly at random will give you a good approximation most of the time. You can often improve the random answers through repeated independent trials: for example, you can take the best answer (i.e. the Monte-Carlo method), take a “majority vote” (in which case you need a concentration of measure phenomenon such as the Chernoff bounds), or perhaps you can somehow breed the results together and get a new answer for which the whole is greater than the sum of its parts (this type of argument often appears in extremal combinatorics, for example in the Rödl nibble). Of course, there is no need to sample answers uniformly: if instead there is an efficiently computable distribution of good answers you can sample from that. Let us call this the sampling regime.
  2. In cryptography, randomness is used almost exclusively for unpredictability, as not even the most powerful computer in the world can predict the outcome of a truly random bit. In this guise, randomness is used to help parties perform a computation correctly while maintaining some data privately either from each other, or from some adversaries who have compromised the method of communication. This is the privacy regime.

The strategy outlined above is an excellent example of the efficacy of the sampling regime of randomization. In the rest of this post we will show how randomization can be used to give a strategy to solve the puzzle at the top better than randomly guessing.

Before we get there, however, let’s first make the previous puzzle a bit harder. Introducing

Puzzle 2. Consider the following modification of the previous game. Now, Alice generates her numbers in some way known only to her (perhaps she always generates them randomly, or, perhaps she always picks the numbers 1 and 2 and places them in the left and right hand, respectively). Assume also that Alice always chooses numbers which are distinct. Is there now a strategy for Bob that will allow him to find the larger number with probability greater than 50%?

Notice that because Bob now has no information on how Alice is generating her numbers, the strategy from the previous puzzle will no longer work. However, by a suitable modification of the previous strategy, he can (surprisingly) do better than 1/2.

The key observation is this. Suppose Alice has generated two numbers, and let S be the smaller number and let L be the larger number. Then there are more numbers between S and 6 then there are between L and 6; symmetrically, there are more numbers between 1 and L than there are between 1 and S. Considering the sampling regime above, we arrive at the following strategy:

Strategy: Bob chooses a hand uniformly at random and receives a number N. He then chooses a uniformly random number R between 1 and 6. If R <= N, he stays. Otherwise, he switches.

We can calculate the probability that the above strategy succeeds as follows. The probability that Bob chooses the smaller number, S, first is 1/2, and the probability that he switches is (6 – S)/6. Similarly, the probability that Bob chooses the larger number first is 1/2, and the probability that he stays is L/6. So, the probability that Bob succeeds is

\displaystyle \frac{1}{2}\cdot \frac{6-S}{6} + \frac{1}{2}\cdot \frac{L}{6} = \frac{1}{2} + \frac{L-S}{6}

Since L and S are integers and L > S we know L – S is at least one, and so Bob succeeds with probability greater than 1/2 + 1/6 = 4/6. This strategy similarly generalizes: if Alice chooses a number between 1 and k instead of 1 and 6, we get that Bob succeeds with probability at least 1/2 + 1/k > 1/2.

So, despite the fact that Bob has no idea how Alice is choosing her numbers, by cleverly using randomization he can still strictly improve on randomly guessing (it’s not a great improvement, but the fact of it is still fascinating).

Of course, there are caveats. You have to play the game multiple times in order for Bob’s strategy to really quantitatively improve on random guessing — but, this is the same grain of salt that comes with any randomized strategy. In other words, the “gain” from using such a randomized strategy will really only start showing itself once you are allowed to do repeated and independent trials, as described in the sampling regime above.

Now, what about the puzzle that we first discussed? We will state it again, but using our language:

Puzzle 3: Consider the following modification of Puzzle 2. Now, Alice can choose any two distinct real numbers, by way of whatever strategy she wants. Is there a strategy for Bob that will allow him to find the larger number with probability greater than 50%?

The strategy from Puzzle 2 would dictate that Bob selects a hand uniformly at random, chooses a uniformly random real number R, and if R is less than the number he chose he stays, otherwise he switches. Unfortunately, there is no uniform distribution on the real numbers (there are “too many of them” in a certain sense: any attempt to create a distribution on the reals which has the properties of the uniform distribution will violate either the second or third axiom of probability).

To get around this, we will use a bijective mapping from the reals to the real interval [0, 1] that is monotonically increasing — i.e. if x < y then f(x) < f(y) —- then, we choose a random number on [0, 1] (this, fortunately, can be done to any degree of accuracy). There are many possible choices: an obvious one is the logistic function

\displaystyle f(x) = \frac{e^x}{e^x + 1} .

The new strategy is as follows: we choose a hand at random, and receive a number N. Compute f(N), and choose a uniformly random number R from the interval [0, 1]. If f(R) <= f(N), then stay. Otherwise, switch. By a similar argument as before, the probability that Bob wins the new game will be

\displaystyle \frac{1}{2} + \frac{f(L) - f(S)}{2} ,

and since L > S we get that this will always be bounded away from 1/2 (although, it can be bounded away by an arbitrarily small fraction, and so success will crucially depend on the degree of accuracy by which we calculate the random number R).

We can interpret the role of the function f above in a similar way as the role of the uniform distributions in the previous two puzzles. Recall that the intuition in the other puzzles was as follows: Bob chooses a random hand, and if he gets the smaller number S first there are “more” numbers between the number he sees and the highest number, 6, so by choosing a random number R between 1 and 6 and switching if R > S he guarantees that more of the time he will be switching to the larger number (and symmetrically if he gets the larger number first). The problem in the case of the real numbers is that if x is any real number, then there are infinitely many numbers greater  and less than x. The function f, by mapping the entire real line to [0, 1] monotonically, gives us a “measure” for S and L that allows us to re-use this intuition: there are still not more numbers greater than S then there are numbers greater than L, but the numbers greater than S have larger measure (according to f) then the corresponding measure for L.

robert Infinity is Weird by

Infinity is weird.

This post is about an odd little thing I learned about involving infinite sets quite recently. First, let’s introduce some notation. We let N = {0, 1, 2, … } denote the set of natural numbers, and Q denote the set of rational numbers (recall a number is rational if we can write it as a fraction of two integers in lowest terms. So 1/2 is rational while  2 is not). If A and B are sets then a function f mapping A to B is one-to-one if everything in B is mapped to by a unique thing in A (so, for every y in B there is at most one element x in A such that f(x) = y), and it is onto if everything in B is mapped to by something in A (so for every y in B there is some element x in A such that f(x) = y). Finally, f is bijective if it is one-to-one and onto. In other words, f is bijective if everything in B is mapped to by a unique element in A.

In elementary set theory we use bijections to define what we mean by the “size” of a set. In other words, two sets A and B have the “same size” (now called cardinality) if there is some bijection f mapping A to B. For example, if A = {1, 2, 3} and B = {a, b, c}, then we can say that A and B have the same size since the function f mapping f(1) = a, f(2) = b, f(3) = c is a bijection.

What do we get by defining “size” in this manner? Well, clearly we recover “size” in the “regular” sense. If A and B are finite sets with a bijection between them and A has 10 elements, then B certainly has 10 elements as well. The nice thing about this definition of size is that it generalizes to infinite sets in a clean way. Once you realize this you can get some remarkable observations. Here is a nice one:

Theorem 1: Let N be the set of natural numbers and Q be the set of rational numbers. Then there is a bijection f from N to Q.

So, even though there are “clearly more” rational numbers than natural numbers, really the two sets have the same size. Do all infinite sets have the same size? Cantor showed that this is false via, famously, the diagonal argument.

Theorem 2 (Cantor’s Theorem): Let N be the set of natural numbers and R be the set of real numbers. Then there is no bijection from N to R.

Now, Theorem 1 tells us that there is a bijection f from N to Q. This gives us a natural way of ordering the set of rational numbers: just order them according to f! That is, we can define Q by

Q = {f(0), f(1), f(2), …}.

In particular, for every natural number x, this gives us a finite set Q(x) defined by

Q(x) = {f(0), f(1), …, f(x-1), f(x)}.

Note that for any x < y we have Q(x) is strictly contained in Q(y), and the union of Q(x) over all natural numbers x gives us every rational number! In the language of order theory the sequence of sets

{}, Q(0), Q(1), …, Q(n), …

yields an infinite chain in the lattice of all subsets of rational numbers. Moreover, this infinite chain is countable: there is a bijection between it and the natural numbers (just map any natural x to the set Q(x)).

Now, let us consider different subsets of rational numbers. For any real number t, define the set P(t) to be the collection of all rational numbers x < t.

Now things are getting interesting. Like before, for any pair of real numbers t, u with t < u we have that P(t) is strictly contained in P(u). In the language of order theory the collection

{P(t) : t >= 0}

is an infinite chain in the lattice of all subsets of rational numbers. Also, like before, if we take the union of every set P(t) for every positive real number t, we get Q again. So, we get that there is a natural bijection from non-negative real numbers to this new infinite chain (just map t to P(t)).

But, by combining Theorem 1 and Theorem 2, there is no bijection from the set of rational numbers to the set of real numbers. Why is this weird? Well, if we consider the set of all subsets of rational numbers, we get that

  • There is an infinite chain {{}, Q(0), Q(1), …} starting from the empty set that (in the limit) covers all rational numbers, which is also countable — it can be ordered (for all i, j with i < j, Q(i) is strictly contained in Q(j)), and the elements of the chain are bijective with the natural numbers.
  • There is another infinite chain {P(t) : t >= 0}, starting from the empty set that (in the limit) covers all rational numbers, which is provably not countable — it can be ordered (for all real numbers t, u with t < u we have P(t) is strictly contained in P(u)), but there is no bijection with the natural numbers.


It is not as paradoxical as it first seems, once you realize that the rational numbers are dense in the real numbers. Moreover, these sets P(t) can (in a loose sense) be used to define the real numbers (this uses the notion of a Dedekind cut). A similar construction was used by John Conway to define the surreal numbers, which is detailed in a fairly entertaining novel by Donald Knuth. But, it is late and I have already ranted for too long. Math is cool.


robert Codes and Gödel by

SHORT POST TIME. Which means I thought of this off-hand (and it’s certainly not new information, but is kind of fun).

Coding theory is concerned with encoding messages in a way so as to minimize their length for transmission over some sort of channel. The mathematical formalization of this goes all the way back to Shannon’s Information Theory, so I’ll give some basics and then mention the RANDOM CONNECTION.

Here’s the idea. We have two parties, Alice and Bob, who are trying to communicate over some sort of digital channel. (For convenience, let’s assume that the channel communicates every message that is sent across without corruption). Alice has a message M that she wants to send, and the message is drawn from some alphabet \Sigma. Concretely, let’s assume the message is drawn from the English alphabet

\displaystyle \Sigma = \{a, b, c, \ldots, x, y, z, \#\},

where we use # as a placeholder for a blank space. Let \Sigma^* denote the set of all messages we can compose out of the symbols in the alphabet \Sigma. For example,

what\#up\#dog \in \Sigma^*.

Now, suppose the channel is binary, so it can only send 0s and 1s. Obviously, Alice needs some way to encode her alphabet \Sigma into the alphabet \{0,1\} to send over pressing messages to Bob.



To bring this about, let’s define a binary code to be a function

\displaystyle C: \Sigma \rightarrow \{0,1\}^*.

That is, a binary code is any map from our source alphabet to a sequence of bits. Note that if we have a binary code C we can easily extend it to messages (i.e. to elements of \Sigma^*) by defining, for any sequence of symbols \alpha_1 \alpha_2 \cdots \alpha_n \in \Sigma^*, the map

\displaystyle C(\alpha_1 \alpha_2 \cdots \alpha_n) = C(\alpha_1) C(\alpha_2) \cdots C(\alpha_n).

Now, most codes are useless. Indeed, under our above definition, the map C(\alpha) = 0 for every English letter \alpha is a code. Unfortunately, if Alice used this code over their channel Bob would have a tough time decoding it.alicebad

So, we need some condition that allows us to actually decode the bloody things! We’ll start with a useful type of code called a prefix-free code.

Definition: A binary code C: \Sigma \rightarrow \{0,1\}^* is prefix-free if, for every pair of symbols \alpha, \beta \in \Sigma neither C(\alpha) is a prefix of C(\beta) nor vice-versa.

An example of a prefix-free binary code (for the first four letters of the English alphabet) could be the following:

\displaystyle C(a) = 0, C(b) = 10, C(c) = 110, C(d) = 111.

Let’s encode a message with C: if Alice encoded the message badcab via C and sent it to Bob, Bob would receive


Now, the beautiful property of prefix-free codes is the following: Bob can actually decode this message online. That is, he can do the following: iterate through each of the bits in sequence, and store what order they came in. Once his stored bit sequence matches a sequence in the code, he can automatically decode that character and keep going!

To illustrate, Bob first reads a 1 off the string. He convinces himself that 1 is not the code for anything, so he reads the next bit, a 0. He now has the string “10″, which is a code for b. Now, is it possible that this could be the beginning of a code for another letter? NO! Because “10″ is the code for b and is not the prefix of any other code. So Bob can translate the b, and move on.

We define nonsingular codes to be the set of codes that can actually be decoded. After seeing the above example, it’s clear that prefix-free codes are non-singular. However, is it possible for there to be non-prefix-free, non-singular codes? That is, are there codes that are decodable, but require us to read the entire message before we can decode them? (NOTE: These codes are practically useless, from an efficiency point of view. This is just a thought experiment to test the definition.)

The answer is YES, and a natural example are Gödel numberings! Here is how it works: for each letter \alpha in the alphabet \Sigma choose a distinct positive integer z_\alpha. Now, to encode a message

\displaystyle \alpha_1 \alpha_2 \cdots \alpha_n

let M be the positive integer defined as

\displaystyle M = 2^{z_1}3^{z_2}5^{z_3}\cdots p_n^{z_n}

where p_n is the nth prime number. We then send the binary expansion of M as our message.

How does Bob decrypt it? Easily: he reads ALL of M, factors it, and reads off the powers of the exponents: the order of the message is preserved if we read off in order of lowest prime to highest, where the power of the ith prime is the code of the ith symbol in the message. Bob has to read all of the message (and he has to make sure he’s transcribed it correctly), or else he cannot recover any of it! Marvelously useless.

OR IS IT USELESS? Similar ideas lurk under regular RSA encryption which everyone uses a billion times a day without even realizing it (thank you blaggerwebs). If factoring integers is as hard as complexity theorists believe it is, then Alice has just sent Bob a frustratingly uncrackable message.


robert Two Principles of Mathematics by

I was explaining something in probability theory to somebody last night, and I offhandedly said the following remark:

You know, it’s interesting what sorts of mathematics come up. For example, a usual exercise in undergraduate probability is the following: Flip a coin repeatedly until a heads comes up. What’s the expected number of coin flips required?

The person asked me what the number was, and I realized that I actually didn’t know. I gave an offhand guess of three, since we’re asking about a very particular sequence of coin flips (which has exponentially small density in the measure of all sequences of coin flips, and so it should be small). I sat down to work on it before bed, and rather quickly derived the following expression.

Let X be the random variable in \{1,2, \ldots\} = \mathbb{N} with the interpretation that X = i if the ith coin flip in a sequence of flips is a head after i-1 tails. It’s straightforward to calculate \Pr[X = i] — assuming we’re flipping a fair coin, the probability of getting i-1 tails followed by a single head is (1/2)^{i}. This means our expected value will be E[X] = \sum_{i=1}^{\infty} i \Pr[X = i] = \sum_{i=1}^{\infty} i 2^{-i}.

And, wait a minute, but this sum is not trivial to evaluate! At first I did what any self-respecting mathematician/computer scientist would do (i.e. HIT IT WITH YOUR HARDEST SLEDGEHAMMERULTRATOOL AND DYNAMITE THE PROBLEM TO ACCESS IT’S SWEET GOOEY INSIDES) and applied generating functions.


This (alas) didn’t work and I fell asleep dejected.

And I woke up with the cutest solution!

To begin, here’s a secret that your math teacher just won’t ever bloody tell you:

(1) Every inequality/sum/identity in the history of mathematics just comes from writing the same thing in two different ways.

Of course, with our friend hindsight bias this is obvious — once we have the identity x = y in front of us, it’s easy to say “oh, well of COURSE x = y, it’s so obvious, duh!”.

Now, here is a second secret that your math teacher won’t ever bloody tell you:

(2) Every result ever obtained in mathematics can be broken down to a sequence of tiny, local, or otherwise easy steps.

When you say something as simple as I did in these two principles the questions of mathematics suddenly become significantly less daunting. To illustrate both of these principles, I’ll use them to evaluate our sum \sum_{i=1}^{\infty} i2^{-i} from the probabilistic puzzle above. First, let’s recall what an infinite sum actually is, as it’s kind of easy to forget: the sum

\displaystyle \sum_{i=1}^\infty i2^{-i}

is really defined as a limit of partial sums

\displaystyle \lim_{n \rightarrow \infty} \sum_{i=1}^n i2^{-i}.

So, applying our first principle from above, we’re going to rewrite \sum_{i=1}^n i2^{-i} as another function f(n) so that we can actually evaluate the limit above.

Now, how do we do this? First, just to simplify notation for ourselves, let f(n) = \sum_{i=1}^n i2^{-i}. Let’s apply our second principle from above — what are some really stupendously obvious facts about the sum f(n) = \sum_{i=1}^n i2^{-i}? Well, since it’s a frigging sum, we know that

\displaystyle f(n+1) = \sum_{i=1}^{n+1} i2^{-(i+1)} = f(n) + (n+1)2^{-(n+1)}.

Alright, here is a start. If we can apply our first principle to the sum f(n+1) and write it down in another way then maybe we’ll end up somewhere interesting. Well, what about this sum? Let’s write it down explicitly, so that we can actually see some of the structure of the terms. I’m also going to make the substitution r = 1/2 and instead write

\displaystyle f(n) = \sum_{i=1}^{n} ir^i.

Time for a side rant. Now, a math teacher, jerks as they are, will tell you to do this kind of substitution because your result is more general (or, even worse, tell you nothing at all, leaving you swimming in a soup of variables/indeterminates with no flotation device).

Everyone in any math class, ever. THE OCEAN IS VARIABLES

As usual, this is the correct information but stated in a way so that humans can’t understand it. Another way to say this “generality” assumption is, simply, people hate magic numbers! Notice that NOTHING! about the sums we’ve considered so far have needed the 2 to be there (other than the fact that our problem happens to be about coins). Well, if there’s no reason for it to be there, then why should it be there? The sum \sum_{i=1}^n ir^i is even a bit easier to swallow visually. Anyways, side rant over.

Back on track, here are the sums f(n) and f(n+1), both written down explicitly:

\displaystyle f(n) = r + 2r^2 + 3r^3 + \cdots + nr^n

\displaystyle f(n+1) = r + 2r^2 + 3r^3 + \cdots + nr^n + (n+1)r^{n+1}.

Well, recall that I said that we were trying to rewrite f(n+1) in a way other than

\displaystyle f(n+1) = f(n) + (n+1)r^{n+1}.

Applying our first principle — and this is really the leap of intuition — let’s just transform f(n) into f(n+1) in another way! How? Well, multiply f(n) by r and compare it to f(n+1):

\displaystyle rf(n) = r^2 + 2r^3 + \cdots + (n-1)r^n + nr^{n+1}.

\displaystyle f(n+1) = r + 2r^2 + 3r^3 + \cdots + nr^n + (n+1)r^{n+1}.

We’ve almost got f(n+1)! The only thing that’s missing is a single copy of each term in the sum! Phrased mathematically, we now have the identity

\displaystyle f(n+1) = rf(n) + (r + r^2 + r^3 + \ldots + r^n + r^{n+1}).

Now, the sum \sum_{i=1}^n r^i is a geometric sum which has a simple formula (fact: this simple formula can be derived in a way similar to our current investigation):

\displaystyle \sum_{i=1}^n r^i = \frac{1 - r^{n+1}}{1 - r} - 1.

So, substituting in this new simple formula gives

\displaystyle f(n+1) = rf(n) + \frac{1 - r^{n+2}}{1 - r} - 1

and then, finally finishing our application of the first principle, we can apply our early “stupid” identity for f(n+1) and get

\displaystyle f(n) + (n+1)r^{n+1} = rf(n) + \frac{1 - r^{n+2}}{1 - r} - 1.

The rest is algebra/boilerplate. Collecting the f(n) terms on the left hand side, we get

\displaystyle (1- r)f(n) = \frac{1 - r^{n+2}}{1 - r} - 1 - (n+1)r^{n+1},

then dividing both sides by (1-r) finally gives

\displaystyle f(n) = (1-r)^{-1}\left(\frac{1 - r^{n+2}}{1 - r} - 1 - (n+1)r^{n+1}\right).

Taking the limit as n \rightarrow \infty and using our knowledge that r = 1/2 < 1, we see that the terms involving r^{n+1} will disappear. This leaves

\displaystyle \lim_{n \rightarrow \infty} f(n) = \frac{1}{1-r}\left(\frac{1}{1 - r} - 1\right).

Substiting in r = 1/2, we get

\displaystyle \lim_{n \rightarrow \infty} f(n) = 2(2 - 1) = 2.

And we’re done. In expectation, you will see a heads after 2 coin flips.

You see, math is not mystical. Unless you’re a Newton or an Euler (viz. an absolutely genius), math proceeds pretty much the same for everybody. There are underlying principles and heuristics that help you do math that every established mathematician actually uses — the secret is that no one ever tells you them. Of course, I have a sneaking suspicion that this due to the fact that our high school math teachers don’t actually understand the principles themselves (while this may seem like a bit of an attack, I did graduate with people who were going to be math teachers. Most of those people should not have been math teachers).

robert Kobo Mini Protip by

Here I was, minding my own business, when I picked up my Kobo Mini to do some reading and suddenly I heard a rattle.

Not in my house, I thought.

So, off comes the back, and here is what greeted me:Chapters LIES

Considering that the Kobo Mini does not claim to have extensible memory, I popped out the SD card, tried turning the e-reader back on, and was (in retrospect, a little too) surprised to discover that nothing happened. I guess it’s cheaper to use an SD card for secondary memory and just not tell your customers that it’s there. I mean, most people would beat the crap out of this tiny plastic rectangle while trying to get the back off — not to mention the intense technical skill that goes into copying the OS files from one SD card to a slightly larger one.

And in case your wondering, the piece responsible for the rattle was this guy:

Honestly, what is this piece of trash.

Honestly, what is this piece of trash.

It appears that the tiny piece of plastic which makes the Kobo’s nigh-unseeable LED slightly more visible had broken off and was tap-dancing on the motherboard. Sigh.

robert Pet Peeves and Inductive Horses by

Put your math pants on ladies and gentleman, this one’s gonna be a vomitbuster.

This post will be about mathematical induction (if you aren’t familiar with induction, it’s a simple and powerful proof technique that is ubiquitous in mathematics. For the intuitive picture, go here). Well, it will sort of be about mathematical induction. This post will use mathematical induction to help express my hatred for incomplete explanations. We will prove the following theorem (anybody who has seen induction will have seen this theorem and the proof before.)

Theorem All horses are the same colour.

Proof. We will proceed by induction on the size of the set of horses. Suppose that we have a single horse. Then, clearly all (one) of our horses have the same colour. Assume inductively that for every collection of n horses, all the horses in A have the same colour. We will show that in every collection of n+1 horses, all the horses have the same colour.

Let H = {h(1), h(2), …, h(n+1)} be a collection of n+1 horses. Then we can choose two sub-collections of H: A = {h(1), h(2), …, h(n)} and B = {h(2), h(3), …, h(n+1)}. By our inductive assumption, all the horses in A and B have the same colour. It follows that the colour of h(1) is the same as the colour of h(2) which is the same as the colour of h(n+1), and so all the horses in H must have the same colour. Q.E.D.

Now, clearly all horses are not the same colour.

And so pretty!

Not the same colour.

Let’s name those two horses Flowerpot and Daffodil. Most explanations of why the above theorem is wrong go like this:

Teacher: Clearly, the set of horses P = {Flowerpot, Daffodil} is a contradiction to our theorem. So, class, what have we learned? When you’re doing induction, always check your base cases!

Right, but quite an unsatisfying explanation. This explains why the theorem is wrong. But why is the proof wrong? That’s actually a bit of a head scratcher.

Mmmmmm. Yellow.

Go on. Do it. Scratch that head. Get them lices out.

I mean, we followed all the steps of mathematical induction — we chose a base case, we proved that theorem was correct. We made an inductive assumption for some n, and using the assumption we proved the theorem holds for n+1. Where’s the problem?

Well, lets look at our proof. We have a set H of size n+1, which clearly can be done. We chose two subsets of H, A and B, both of size n. Also can be done. Next, we apply the inductive hypothesis to A and B, and so all the horses in A and B are the same colour. Now, since A and B have a non-empty intersection… Ahah. There’s the problem. What if n = 1? Then H = {h(1), h(2)}, and A = {h(1)}, B = {h(2)}. But then A and B have an empty intersection. If A is the set containing Flowerpot and B is the set containing Daffodil, then we cannot say that Flowerpot is the same colour as Daffodil. Okay! Great!

I think that this is fundamentally different from the broad sweep of “You didn’t check all your base cases”. This is really saying your logic used in the inductive step does not apply to all of the base cases. It’s subtle, but a very important difference. The pedagogical conclusion: teachers, think hard about what you’re presenting. When you think you understand something, it turns out that you usually don’t.

One more thing. In a way, the problem really comes from how I wrote H in the first place! Remember I wrote H = {h(1), h(2), …, h(n+1)}. When performing the partitioning of H into A = {h(1), h(2), …, h(n)} and B = {h(2), h(3), …, h(n+1)}, this suggests that the intersection of A and B is non-empty. This is an interesting instance of the form of our text affecting the content of our text (at least to us). This is an idea I will hopefully return to in later posts.

Finally, a moment of silence. Searching for an image of a head scratcher led me deep into a rabbit hole from which I may never return.

And you can barely even see it!

A nose straightener.

robert Minecraft Tomfoolery by

As earlier mentioned by Soucy, we are running a vanilla Minecraft server here on cshouse. Here are two instances of balls-out randomness that probably have something to do with the fact that we keep the server up to the current snapshot (instability YAY!).

My personal favorite — squid in a minecart. We have a stupendously complicated minecart system on the server connecting the Cold Hell Lake to Jungle Prison.

The minecart runs through a low point before it reaches the Jungle Prison. For whatever reason, a squid spawned in the river and hopped into this minecart as it went by. Hilarious.

Next: The recent snapshot introduced pumpkin pie. For whatever reason, my version of Minecraft (snapshot 12w37b, running on Ubuntu Linux 12.04) couldn’t handle the pumpkin pie. Whenever I opened a box containing it, my client would crash. And when Soucy held one — well, just look at what happened:

Note the upside down nametag, invisible pumpkin pie in Soucy’s hand, and the floating, upside down cows in the background. Other fun glitches involved upside down torch fire dancing by, and sprinting upside down jungle cats!


robert MY .vimrc File by

Simon’s post intrigued me, so I decided to post MY .vimrc file.It may look a lot like a .emacs file, since vi is the root of all evil.

;; Add elisp directory and all subdirs
(if (fboundp 'normal-top-level-add-subdirs-to-load-path)
(let* ((my-lisp-dir "~/work/scripts/elisp/")
(default-directory my-lisp-dir))
(setq load-path (cons my-lisp-dir load-path))

;; updated org mode
(require ‘org-install)
(global-set-key “\C-cl” ‘org-store-link)
(global-set-key “\C-cc” ‘org-capture)
(global-set-key “\C-ca” ‘org-agenda)
(global-set-key “\C-cb” ‘org-iswitchb)
;; update clock for a work timer
(setq org-clock-persist ‘history)

;; Changing the colour theme
(require ‘color-theme)

;; Removing the tool and menu bar
(if (fboundp ‘tool-bar-mode) (tool-bar-mode -1))

;; Auto syntax highlighting
(if (fboundp ‘global-font-lock-mode)
(global-font-lock-mode 1)
(setq font-lock-auto-fontify t))

;;Make backups go to a separate directory
(add-to-list ‘backup-directory-alist ‘(“.” . “~/archive/emacs_backup”))

;; Change the cursor when in overwrite,input,etc
(require ‘cursor-chg)  ; Load this library
(change-cursor-mode 1) ; On for overwrite/read-only/input mode
(toggle-cursor-type-when-idle 1) ; On when idle

;; Spotlight mode

(setq visible-bell t)

;; Uncomment region!
(defun uncomment-region (beg end)
(interactive “r”)
(comment-region beg end -1))
(global-set-key (kbd “C-x C-;”) ‘comment-region)

;; Goto line
(global-set-key (kbd “M-g”) ‘goto-line)

;; Compile from emacs!!!
(global-set-key [f10] ‘compile)

;; Disable Auto Save
(setq auto-save-default nil)

;; Disable menu bar and scroll bar.
(menu-bar-mode -1)
(scroll-bar-mode -1)

;; custom-set-faces was added by Custom.
;; If you edit it by hand, you could mess it up, so be careful.
;; Your init file should contain only one such instance.
;; If there is more than one, they won’t work right.
‘(default ((t (:inherit nil :stipple nil :background “black” :foreground “white” :inverse-video nil :box nil :strike-through nil :overline nil :underline nil :slant normal :weight normal :height 93 :width normal :foundry “unknown” :family “DejaVu Sans Mono”)))))

;; org-mode agenda locations
(setq org-agenda-files (list “~/work/research/”))

(add-to-list ‘load-path (expand-file-name “/home/robere/downloads/sage-4.8-linux-32bit-ubuntu_10.04_lts-i686-Linux/data/emacs”))
(require ‘sage “sage”)
(setq sage-command “/home/robere/downloads/sage-4.8-linux-32bit-ubuntu_10.04_lts-i686-Linux/sage”)

;; auctex pdf mode by default
(setq Tex-PDF-mode t)

Elisp is cool. I wish I was lispy nerd.

robert Google Hangout #1 by

Wasting Cool Software!


robert School Sucks by

Anyone want to solve this? I need lambda in terms of epsilon. Oh, and the answer is lambda = 1 – O(eps)/(1 + O(eps)). I just can’t derive it.