%=======================02-713 LaTeX template, following the 15-210 template==================
%
% You don't need to use LaTeX or this template, but you must turn your homework in as
% a typeset PDF somehow.
%
% How to use:
% 1. Update your information in section "A" below
% 2. Write your answers in section "B" below. Precede answers for all
% parts of a question with the command "\question{n}{desc}" where n is
% the question number and "desc" is a short, one-line description of
% the problem. There is no need to restate the problem.
% 3. If a question has multiple parts, precede the answer to part x with the
% command "\part{x}".
% 4. If a problem asks you to design an algorithm, use the commands
% \algorithm, \correctness, \runtime to precede your discussion of the
% description of the algorithm, its correctness, and its running time, respectively.
% 5. You can include graphics by using the command \includegraphics{FILENAME}
%
\documentclass[11pt]{article}
\usepackage{amsmath,amssymb,amsthm}
\usepackage{graphicx}
\usepackage[margin=1in]{geometry}
\usepackage{fancyhdr}
\setlength{\parindent}{0pt}
\setlength{\parskip}{5pt plus 1pt}
\setlength{\headheight}{13.6pt}
\newcommand\question[2]{\vspace{.25in}\hrule\textbf{#1: #2}\vspace{.5em}\hrule\vspace{.10in}}
\renewcommand\part[1]{\vspace{.10in}\textbf{(#1)}}
\newcommand\algorithm{\vspace{.10in}\textbf{Algorithm: }}
\newcommand\correctness{\vspace{.10in}\textbf{Correctness: }}
\newcommand\runtime{\vspace{.10in}\textbf{Running time: }}
\pagestyle{fancyplain}
\lhead{\textbf{\NAME\ }}
\chead{\textbf{Homework 1}} % for course ``Information theory in theoretical computer science"}}
\rhead{COMS6998, \today}
\newcommand\eps{\epsilon}
\newcommand\x{\bf x \rm}
\newcommand\y{\bf y \rm}
\newcommand\supp{Supp}
\newcommand\E{\mathbb{E}}
\newcommand\cE{\mathcal{E}}
\newcommand\solution{\bf Solution \rm}
\begin{document}\raggedright
%Section A==============Change the values below to match your information==================
\newcommand\NAME{Omri Weinstein} % your name
\newcommand\HWNUM{1} % the homework number
%Section B==============Put your answers to the questions below here=======================
% no need to restate the problem --- the graders know which problem is which,
% but replacing "The First Problem" with a short phrase will help you remember
% which problem this is when you read over your homeworks to study.
\question{1}{Problem 1 [25 pts]}
\part{a} Two teams A and B play a best-of-five series that terminates as soon as one of the teams wins three games. Let $X$ be the random variable that represents the outcome of the series written as a string of who won the individual games - possible values of $X$ are AAA, BAAA, ABABB, etc.
Let $Y$ be the number of games played before the series ends. Assuming that A and B are equally matched and the outcomes of different games in the series are independent, calculate $H(X), H(Y ), H(Y |X)$, and $H(X|Y )$.
\part{b} Let $X, Y$ be integer-valued random variables and let $Z = X + Y$. Prove that $H(Z | X) = H(Y | X)$. (Hint: Expand $H(Z | X)$ using the definition of conditional entropy.)
\part{b.1} Let $X,Y,Z$ be as defined in (b). Prove that if $X, Y$ are independent, then $H(Z) \geq \max\{H(X), H(Y )\}$. That is, addition of independent random variables increases entropy.
\part{b.2} Let $X,Y,Z$ be as defined in (b). Give an example of random variables X, Y for which $H(Z) < \min\{H(X), H(Y )\}$
\part{b.3} State and prove a necessary and sufficient condition for when the entropy of the sum equals the sum of the entropies, i.e., $H(Z) = H(X) + H(Y )$.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\question{2}{Problem 2 [20 pts]}
In this exercise, we will prove "Fano's Inequality", which informally states that a random
variable $\hat{X}$ that predicts $X$ with high probability, must also "sip" almost all of the entropy out of $X$.
More formally, let X be an arbitrary random variable that takes values in $[n]=\{1,2,...,n\}$, and suppose that $\hat{X}$ is a random variable satisfying:
\[\Pr(\hat{X} = X) \geq 1-\epsilon\]
Prove that in this case,
$H(X|\hat{X}) \leq H(\epsilon) + \epsilon\cdot\log(n-1)$, where $H(\epsilon)$ is the binary entropy of $\epsilon$.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\question{3}{Problem 3 [20 pts]}
For $\tau\in(0,\frac12)$, define a subset $C \subset \{0, 1\}^n$ to be $\tau$-covering if every $\textbf r \in \{0, 1\}^n$ is within Hamming distance $\tau n$ from some element C.
\part{a} Prove, using the language of entropy and conditional entropy, that the size of such a $\tau$-covering must satisfy $|C|\geq 2^{(1-H(\tau))n}$, where $H(\tau)$ denotes the binary entropy function with parameter $\tau$. (Hint: Use the inequality we proved in class: $\sum_{j=0}^{\tau n}{n\choose j}\leq 2^{nH(\tau)}$).
\part{b} Prove that for large enough $n$, a random subset of $\{0,1\}^n$ of size $n^3\cdot 2^{(1-H(\tau))n}$ is $\tau$-covering with probability at least $1-2^{-\Omega(n)}$. (Hint: You may use without proof the inequality ${n\choose\tau n}\geq2^{H(\tau)n}/n$).
\clearpage
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\question{4}{Problem 4 [35 pts]}
Let $X$ be a random variable taking values in an alphabet $\{a_1, a_2, . . . , a_n\}$ with the probability of $X = a_i$ being $p_i$ for $i = 1, 2, . . . , n$. Assume that the probabilities are sorted $0 < p_1 \leq p_2 \leq \cdots \leq p_n$. Consider the following natural procedure to build a prefix-free code for these $n$ symbols:
Choose a $k \in \{1, 2, . . . , n − 1\}$ such that $|\sum^k_{i=1} p_i -\sum^n_{i=k+1} p_i|$ is minimized. Assign 0 for the first bit of the encoding for source symbols $a_1, . . . , a_k$, and 1 for the first bit of the encoding for source symbols $a_{k+1}, . . . , a_n$. Repeat the process recursively for each of the two subsets $\{a_1, . . . , a_k\}$ and $\{a_{k+1}, . . . , a_n\}$. By this recursive procedure, we obtain a prefix-free code for the
symbols $a_1, a_2, . . . , a_n$.
The goal of this exercise is to prove the expected length $L$ of the resulting source code is close to $H(X)$. To this end, we will view the prefix-free code naturally as a binary tree, with the symbols at the $n$ leaves, as described in lecture.
\part{a}
Argue that in the above construction, the leaves in the subtree rooted at any internal node will consist of a consecutive subset $\{a_i, a_{i+1}, . . . , a_j\}$ of symbols for some $1 \leq i < j \leq n$. We will denote such an internal node as $[i, j]$, and use the shorthand $q_{[i,j]} = p_i +p_{i+1} +\cdots +p_j$ for the total probability of leaves in its subtree. Note that $[i, i]$ is just the leaf with symbol $a_i$.
\part{b} Let $\mathcal I$ denote the set of internal nodes of the tree. Prove that the expected length $L$ of the above source code is
\[L=\sum_{[i,j]\in\mathcal I} q_{[i,j]}\]
\part{c} Prove that
\[H(X) = \sum_{[i,j]\in\mathcal I} q_{[i,j]}H(\frac{q_{[i,k]}}{q_{[i,j]}})\]
where $k, i\leq k