{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For these exercises, you may use computer help to work on a problem, \n",
    "but your solution should be self-contained without reference \n",
    "to computer output (unless stated otherwise). Use the Jupyter Notebook\n",
    "itself to typeset your solutions as markdown cells.\n",
    "\n",
    "## Exercise 3.1 \n",
    "\n",
    "The following parts are about the sample set of $n$ values ($n>2$) \n",
    "$$\n",
    "0, 0, 0, \\ldots, 0, 1000.\n",
    "$$\n",
    "\n",
    "(That is, there are $n-1$ copies of 0 and one copy of 1000.)\n",
    "\n",
    "1. Show that the sample mean is $1000/n$.\n",
    "2. Find the sample median when $n$ is odd. \n",
    "3. Show that the corrected sample variance $s_{n-1}^2$ is $10^6/n$.\n",
    "4. Find the sample z-scores of all the values."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 3.2\n",
    "\n",
    "Suppose given samples $x_1,\\ldots,x_n$ have the sample z-scores $z_1,\\ldots,z_n$. \n",
    "\n",
    "1. Show that $\\displaystyle \\sum_{i=1}^n z_i = 0.$ \n",
    "\n",
    "2. Show that $\\displaystyle \\sum_{i=1}^n z_i^2 = n-1.$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 3.3\n",
    "Define 8 points on an ellipse by $x_k=a\\cos(\\theta_k)$ and $y_k=b\\sin(\\theta_k)$, where $a$ and $b$ are positive and \n",
    "$$\n",
    "\\theta_1= \\frac{\\pi}{4}, \\theta_2 = \\frac{\\pi}{2}, \\theta_3 = \\frac{3\\pi}{4}, \\ldots, \\theta_8 = 2\\pi. \n",
    "$$\n",
    "Let $u_1,\\ldots,u_8$ and $v_1,\\ldots,v_8$ be the z-scores of the $x_k$ and the $y_k$, respectively. Show that the points $(u_k,v_k)$ all lie on a circle centered at the origin for all $k=1,\\ldots,8$. (By extension, standardizing points into z-scores is sometimes called *sphereing* them.)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 3.4 \n",
    "Given a population of values $x_1,x_2,\\ldots,x_n$, define the function \n",
    "$$\n",
    "r_2(x) = \\sum_{i=1}^n (x_i-x)^2.\n",
    "$$\n",
    "\n",
    "Show using calculus that $r_2$ is minimized at $x=\\mu$, the population mean. (The idea is that minimizing $r_2$ is a way to find the \"most representative\" value for the dataset.)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 3.5\n",
    "Suppose that $n=2k-1$ and a population has values $x_1,x_2,\\ldots,x_{n}$ in sorted order, so that the median is equal to $x_k$. Define the function \n",
    "$$\n",
    "r_1(x) = \\sum_{i=1}^n |x_i - x|.\n",
    "$$\n",
    "\n",
    "(This function is called the *total absolute deviation* of $x$ from the population.) Show that $r_1$ has a global minimum at $x=x_k$ by way of the following steps. \n",
    "\n",
    "1. Explain why the derivative of $r_1$ is undefined at every $x_i$. Consequently, all of the $x_i$ are critical points of $r_1$. \n",
    "\n",
    "2. Determine $r_1'$ within each interval $(-\\infty,x_1),\\, (x_1,x_2),\\, (x_2,x_3),$ and so on. Explain why this shows that there cannot be any additional critical points to consider. \n",
    "\n",
    "   (Note: you can replace the absolute values with a piecewise definition of $r_1$, where the formula for the pieces changes as you cross over each $x_i$.) \n",
    "\n",
    "3. By considering the $r_1'$ values between the $x_i$, explain why it must be that\n",
    "$$\n",
    "r_1(x_1) > r_1(x_2) > \\cdots > r_1(x_k) < r_1(x_{k+1}) < \\cdots < r_1(x_n).\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 3.6\n",
    "This problem is about the dataset\n",
    "$$\n",
    "1, 3, 4, 5, 5, 6, 7, 8.\n",
    "$$\n",
    "\n",
    "1. Make a table of the values of the ECDF $\\hat{F}(t)$ at the values $t=0,1,2,\\ldots,10$. \n",
    "\n",
    "2. Carefully sketch the ECDF of the dataset over the interval $[0,10]$. \n",
    "\n",
    "3. Make a table of counts $c_k$ for the bins $(0,2],(2,4],(4,6],(6,8],(8,10]$.\n",
    "\n",
    "4. Sketch a histogram of the dataset using the bins from part 3.\n",
    "\n",
    "5. Verify the equation (3.6) in Chapter 3 for the bins from part 3."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 3.7\n",
    "Suppose that a distribution has continuous PDF $f(t)$ and CDF $F(t)$, and that $F(a)=0$ and $F(b)=1$. Explain why \n",
    "$$\n",
    "\\int_a^b f(t)\\,dt = 1.\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 3.8\n",
    "Suppose that a distribution has PDF\n",
    "$$\n",
    "f(t) = \\begin{cases}\n",
    "0, & |t| >  1, \\\\\n",
    "\\tfrac{1}{2}(1+t), & |t| \\le 1.\n",
    "\\end{cases}\n",
    "$$\n",
    "Find a formula for its CDF. (Hint: It's a piecewise formula. First find it for $t< -1,$ then for $-1 \\le t \\le 1,$ and finally for $t>1$.)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 3.9\n",
    "What is the median of the normal distribution whose PDF is given by equation (3.12) in Chapter 3? The answer is probably intuitively clear, but you should make a mathematical argument (though it does not require difficult calculations). \n",
    "\n",
    "Note: there is no simple antiderivative formula for the PDF, and you do not need it anyway."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 3.10\n",
    "This exercise is about the same set of sample values as Exercise 3.1. Suppose the 2σ-outlier criterion is applied using the sample mean and sample variance. \n",
    "1. Show that regardless of $n$, the value 0 is never an outlier.\n",
    "\n",
    "2. Show that the value 1000 is an outlier if $n \\ge 6$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 3.11\n",
    "Define a population by\n",
    "$$\n",
    "x_i = \\begin{cases}\n",
    "1, & 1 \\le i \\le 11, \\\\ \n",
    "2, & 12 \\le i \\le 14,\\\\ \n",
    "4, & 15 \\le i \\le 22, \\\\ \n",
    "6, & 23 \\le i \\le 32.\n",
    "\\end{cases}\n",
    "$$\n",
    "(That is, there are 11 values of 1, 3 values of 2, 8 values of 4, and 10 values of 6.)\n",
    "\n",
    "1. Find the median of the population.\n",
    "\n",
    "2. Find the smallest interval containing all non-outlier values according to the 1.5 IQR criterion."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 3.12\n",
    "Prove that two sample sets have a Pearson correlation coefficient equal to 1 if they have identical z-scores. (Hint: Use the results of Exercise 3.2.)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 3.13\n",
    "Suppose that two sample sets satisfy $y_i=-x_i$ for all $i$. Prove that the Pearson correlation coefficient between the sets equals $-1$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 3.14\n",
    "\n",
    "Download and solve the sample test available under [Assessments](assessments.html)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
