18.310 Assignment 3 Due Monday September 27, 2004

1. Here is a sequence of frequencies of occurrence of words in a given message.

Construct an optimal Huffman code from this data for these frequencies.

Construct an optimal order preserving code, (Hu-Tucker code) from the same data.

Compute the length of the message using each of these and also the “entropy of information defined by these frequencies. (easy to do on a spreadsheet.)

123,12,7,8,14,192, 10,546,257,497,121, 11,7,9,14,92

2. Suppose N mutually independent events each have probability p. What is the probability that exactly k of them occur?

Plot this on a spreadsheet as a function of k for p=.5 and N=100.

Sketch your answer. What happens when you now vary p? ( do it and report the result qualitatively.)

3. The expected value of the sum of several random variables is always the sum of their expected values. Suppose we divide a circle into three arcs by putting three points on it at random and cutting the circle at them. What is the expected value of the ratio of the length of one of these arcs to the circumference of the circle?

Suppose instead we choose two points at random on the interval between 0 and 1.

What is the expected distance between them? (hint: deduce the answer from the previous one)

Now suppose we choose two distinct integers at random between 1 and N (so that any integer in this range, including 1 and N has probability 1/N of being picked. What is the expected difference between them? (hint: discretize the previous question and you have the answer at once with no work.)

4.The largest term in the power series expansion of eN  is NN/N! so we can deduce that eN is given by W(N) multiplied by  NN/N!  where W(N) is the effective number of “largest terms” that contribute to that series.

Use a spreadsheet to compute W(N) for N from 1 up to a hundred (or more if you want to).

Try to figure out what W(N) approaches for large N from this data. Hint: try squaring W and plotting it. If you look at an appropriate slope you get something you will recognize.

Examine the difference between W and what you find as an approximate formula for it.

Multiply that by N and what do you get?