{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "64b51d52",
   "metadata": {},
   "outputs": [],
   "source": [
    "# carefully modify the below variables. ensure there are no typos.\n",
    "student_id = \"12345678\" # add your student ID\n",
    "student_mail = \"firstname.lastname@student.manchester.ac.uk\" # your email address"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eef390d5",
   "metadata": {},
   "source": [
    "# Sample test\n",
    "\n",
    "This sample test contains several Jupyter Notebook cells with the comment `# TODO`. This is where you should type the code for your solutions. Do not alter any of the other cells. \n",
    "\n",
    "It is good practice to include docstrings for each function, and to provide markdown cells explaining your work, but in this test they won't be marked.\n",
    "\n",
    "**Important: Do not alter the names of the predefined variables and functions,** such as `birds`, `ducks`, `duckmeans`, `numOutliers`, etc. The values of these variables will inform the marking and renaming them and failure to follow the problem description will result in loss of marks. Ensure that any defined functions **return** their computed values (using the `return` keyword), not merely *print* them.\n",
    "\n",
    "This sample test will not be submitted but the testing code provided at the end will give you an idea how well your code solves the stated problems.\n",
    "\n",
    "## Note on independent work\n",
    "\n",
    "You need to complete all coursework tests independently on your own, but you are allowed to use online resources and all course notes and exercise solutions. The course notes from chapters 1 to 3 contain all that is required to solve the below problems. You are not allowed to ask other humans for help. In particular, you are not allowed to send, give, or receive code or markdown content to/from classmates and others.\n",
    "\n",
    "The University Guidelines for Academic Malpractice apply: http://documents.manchester.ac.uk/display.aspx?DocID=2870\n",
    "\n",
    "**Important: Even if you are the originator of the work** (and not the one who copied), the University Guidelines require that you will be equally responsible for this case of academic malpractice and may lose all coursework marks (or even be assigned 0 marks for the course)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2295b152",
   "metadata": {},
   "source": [
    "# Start of test\n",
    "\n",
    "We will analyse data relating to counts of various bird species of waterfowl observed during aerial surveys. We first load all the required modules and the dataset `Aerial_Waterfowl_Survey_Data.csv`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fdc2a253-d922-4124-931c-d4a90308d222",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-34389c5497ccc8a9",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "import warnings\n",
    "warnings.simplefilter(action='ignore', category=FutureWarning)\n",
    "warnings.simplefilter(action='ignore', category=DeprecationWarning)\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import seaborn as sns\n",
    "\n",
    "birds = pd.read_csv(\"_datasets/Aerial_Waterfowl_Survey_Data.csv\")\n",
    "birds"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "af4a10eb-93d7-44fa-bbf3-9c3de7025d2e",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-404456cc35ae4faa",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## Problem 1\n",
    "\n",
    "Make a data frame called `ducks` for just the species *American Black Duck*, *Mallard*, *Ruddy Duck*, and *Ring-necked Duck*. Make a series called `duckmeans` for the means of those 4 columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "6b3dc6c4-7547-4614-b0a7-7bab6a2d05e6",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "ducks-mean",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "ducks = None\n",
    "duckmeans = None\n",
    "\n",
    "# TODO: Provide your solution code here that defines `ducks` and `duckmeans`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b2e09ba2-55fb-4968-a293-0a4bbec2f066",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-869d5bfc3921ff69",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## Problem 2\n",
    "\n",
    "For the data in the *Canada Goose* column of `birds`, assign `CGmu` the value of the mean, assign `CGsigma` the value of the standard deviation, and let `CGZ` be a series of the z-scores of the column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "142aa2e7-fea5-4098-825c-3078246a0f21",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "stats-answer",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "CGmu = None\n",
    "CGsigma = None\n",
    "CGZ = None\n",
    "\n",
    "# TODO: Provide your solution code here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6e0bfca5-4238-4091-89a4-5926f5e35cd7",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-2a1f46fe80c008cc",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## Problem 3\n",
    "Let `IQR` be a series of the interquartile ranges of all the columns of *birds* after the 5th, keeping only the values greater than zero."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "cb279a41-3da0-457d-b835-919d3a9ddc65",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "iqr-answer",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    },
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "IQR = None\n",
    "\n",
    "# TODO: Provide your solution code here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5c7d28cb-50de-424e-bdf0-ab70dd735bce",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-85ab8e5970a4c206",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## Problem 4\n",
    "Use the seaborn module (already imported above as `sns`) to plot the empirical CDF of the *Snow Goose* column using only the rows in which *Zone* equals 3."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "c4b5e4fa",
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: Provide your solution code here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cae4cf60-a3c1-45ed-8582-5a5098d2bcf5",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-84a1f89197c28222",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## Problem 5\n",
    "Plot a histogram with 25 bins for the **positive entries only** in the *Bufflehead* column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "342e47a3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: Provide your solution code here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c062d3c8-8057-4249-88d4-6917e1dfa6c2",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-bc0fd65972e4a4c0",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    },
    "tags": []
   },
   "source": [
    "## Problem 6\n",
    "Make a series called `canada` of the total number of *Canada Goose* observed in each zone. I.e., the index has the zone numbers, and the values are the total number in each zone."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "1c31428f-02b9-4b84-956c-0c902a52c69a",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "canada-answer",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "canada = None\n",
    "\n",
    "# TODO: Provide your solution code here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1560db93-299f-4ae4-9bd4-4d7bf8f08980",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-7da679c2325d5b19",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    },
    "tags": []
   },
   "source": [
    "## Problem 7\n",
    "\n",
    "Make a box plot, grouped by month, of the *Snow Goose* counts for all the years since 2000 (inclusive)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "c51f8b36",
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: Provide your solution code here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "16538bde-40cf-46d9-aff6-29b1e40bf226",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-e6219011837cc64e",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## Problem 8\n",
    "\n",
    "Write a function `numOutliers` that takes a input a pandas Series and returns the number of outliers according to the 1.5 IQR criterion (an integer value).\n",
    "\n",
    "Use the function to count the total number of outliers for December in the previous plot and assign to `numOutliersDec`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "93dd6052-a249-47a2-8cf1-1b811787946d",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "outliers-answer",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "numOutliers = None\n",
    "numOutliersDec = None\n",
    "\n",
    "# TODO: Provide your solution code here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "86538277-dbe6-45b4-a75b-71d9572c965f",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-7e212e7d9a1b4e61",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## Problem 9\n",
    "\n",
    "Create a 4x4 grid of pairwise plots for the `ducks` frame from Problem 1."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "d7b1634f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: Provide your solution code here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3d464991-0e0b-43cd-97ca-59bd98bd6861",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-dc810b6f07627b04",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## Problem 10\n",
    "\n",
    "Assign to `duckduck` a dataframe of the Pearson correlation coefficients between all pairings of the four species in `ducks`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "3e664218-29be-4333-b6e9-e6974acdb4b4",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "duckduck-answer",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "duckduck = None\n",
    "\n",
    "# TODO: Provide your solution code here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a7523ac2-f4dd-4f33-84bc-04de34b8a44c",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-879a681e993aa58f",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## Problem 11\n",
    "\n",
    "Repeat the previous problem, but using Spearman coefficients, and call the result `spearduck`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "4f69bd2a-2d0a-4f0e-9e25-80bd3f3ffdf5",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "spearduck-answer",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "spearduck = None\n",
    "\n",
    "# TODO: Provide your solution code here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "415a227a",
   "metadata": {},
   "source": [
    "# End of test"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2c297e21",
   "metadata": {},
   "outputs": [],
   "source": [
    "# TESTING CODE\n",
    "try: \n",
    "    assert len(ducks.columns) == 4\n",
    "    print(\"OKAY - ducks has 4 columns\")\n",
    "except:\n",
    "    print(\"FAIL - wrong number of columns in ducks\")\n",
    "\n",
    "dsp = ['American Black Duck',\"Mallard\",'Ruddy Duck',\"Ring-necked Duck\"]\n",
    "try:\n",
    "    assert np.all( np.isclose(ducks.sum()[dsp].values, [1815774, 1703682, 163729, 68426] ))\n",
    "    print(\"OKAY - selected the correct columns for ducks\")\n",
    "except:\n",
    "    print(\"FAIL - selected the wrong columns for ducks\")\n",
    "\n",
    "try: \n",
    "    assert np.isclose(duckmeans.sum(), 1834.528606356)\n",
    "    print(\"OKAY - duckmeans values are correct\")\n",
    "except:\n",
    "    print(\"FAIL - duckmeans values are incorrect\")\n",
    "\n",
    "try: \n",
    "    assert np.isclose(CGZ.mean(), 0)\n",
    "    print(\"OKAY - mean of CGZ should be near zero\")\n",
    "except:\n",
    "    print(\"FAIL - mean of CGZ should be near zero\")\n",
    "\n",
    "try: \n",
    "    assert np.isclose(CGZ.std(), 1)\n",
    "    print(\"OKAY - standard deviation of CGZ should be near 1\")\n",
    "except:\n",
    "    print(\"FAIL - standard deviation of CGZ should be near 1\")\n",
    "\n",
    "try: \n",
    "    assert np.isclose(CGZ.median(), -0.34007213204)\n",
    "    print(\"OKAY - CGZ values are correct\")\n",
    "except:\n",
    "    print(\"FAIL - CGZ values are incorrect\")\n",
    "\n",
    "try: \n",
    "    assert len(CGZ) == birds.shape[0]\n",
    "    print(\"OKAY - CGZ should have the same number of rows as birds\")\n",
    "except:\n",
    "    print(\"FAIL - CGZ should have the same number of rows as birds\")\n",
    "\n",
    "try: \n",
    "    assert type(IQR) == pd.Series\n",
    "    print(\"OKAY - IQR should be a pandas Series\")\n",
    "except:\n",
    "    print(\"FAIL - IQR should be a pandas Series\")\n",
    "\n",
    "try: \n",
    "    assert np.isclose(IQR.sum(), 14026)\n",
    "    print(\"OKAY - sum of IQR values is correct\")\n",
    "except:\n",
    "    print(\"FAIL - sum of IQR values is incorrect\")\n",
    "\n",
    "try: \n",
    "    assert np.isclose(IQR[\"Snow Goose\"], 5900)\n",
    "    print(\"OKAY - Snow Goose IQR value is correct\")\n",
    "except:\n",
    "    print(\"FAIL - Snow Goose IQR value is incorrect\")\n",
    "\n",
    "try: \n",
    "    assert type(canada) == pd.Series\n",
    "    print(\"OKAY - canada should be a pandas Series\")\n",
    "except:\n",
    "    print(\"FAIL - canada should be a pandas Series\")\n",
    "\n",
    "try: \n",
    "    assert len(canada) == 11\n",
    "    print(\"OKAY - canada should have 11 values\")\n",
    "except:\n",
    "    print(\"FAIL - canada should have 11 values\")\n",
    "\n",
    "try: \n",
    "    assert np.all(\n",
    "        canada[[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0]] == \n",
    "        [ 822475,  238779, 1968090, 1215365, 2653983, 1020605, 891267, 312775, 116060, 62120, 143104])\n",
    "    print(\"OKAY - canada has the correct values\")\n",
    "except:\n",
    "    print(\"FAIL - canada has incorrect values\")\n",
    "\n",
    "testseries = pd.Series([]) \n",
    "try: \n",
    "    assert type(numOutliers(testseries)) == int\n",
    "    print(\"OKAY - numOutliers should return int\")\n",
    "except:\n",
    "    print(\"FAIL - numOutliers should return int\")\n",
    "\n",
    "testseries = pd.Series([0,0,0.3]) \n",
    "try: \n",
    "    assert numOutliers(testseries) == 0\n",
    "    print(\"OKAY - numOutliers([0,0,0.3]) should return 0\")\n",
    "except:\n",
    "    print(\"FAIL - numOutliers([0,0,0.3]) should return 0\")\n",
    "\n",
    "try: \n",
    "    assert int(numOutliersDec) == 15\n",
    "    print(\"OKAY - numOutliersDec should be 15\")\n",
    "except:\n",
    "    print(\"FAIL - numOutliersDec should be 15\")\n",
    "\n",
    "try: \n",
    "    assert type(duckduck) == pd.DataFrame\n",
    "    print(\"OKAY - duckduck should be a DataFrame\")\n",
    "except:\n",
    "    print(\"FAIL - duckduck should be a DataFrame\")\n",
    "\n",
    "try: \n",
    "    assert duckduck.shape == (4, 4)\n",
    "    print(\"OKAY - duckduck should be a 4x4 DataFrame\")\n",
    "except:\n",
    "    print(\"FAIL - duckduck should be a 4x4 DataFrame\")\n",
    "\n",
    "idx = ['American Black Duck', 'Mallard', 'Ruddy Duck', 'Ring-necked Duck']\n",
    "duckC = [[ 1.      ,  0.67531969,  0.04783291,  0.1519835 ],\n",
    "       [ 0.67531969,  1.        , -0.02762058019125451,  0.00975596],\n",
    "       [ 0.04783291, -0.02762058019125451,  1.        ,  -0.00693157],\n",
    "       [ 0.1519835 ,  0.00975596,  -0.00693157,  1.        ]]\n",
    "\n",
    "try: \n",
    "    assert np.all( np.isclose( np.array(duckduck.loc[idx,idx]),duckC ) )\n",
    "    print(\"OKAY - duckduck correlations correct\")\n",
    "except:\n",
    "    print(\"FAIL - duckduck correlations incorrect\")\n",
    "\n",
    "try: \n",
    "    assert type(spearduck) == pd.DataFrame\n",
    "    print(\"OKAY - spearduck should be a DataFrame\")\n",
    "except:\n",
    "    print(\"FAIL - spearduck should be a DataFrame\")\n",
    "\n",
    "idx = ['American Black Duck', 'Mallard', 'Ruddy Duck', 'Ring-necked Duck']\n",
    "duckC = [[1.        , 0.722257  , 0.18224515, 0.23807152],\n",
    "       [0.722257  , 1.        , 0.06344931, 0.16281748],\n",
    "       [0.18224515, 0.06344931, 1.        , 0.14157596],\n",
    "       [0.23807152, 0.16281748, 0.14157596, 1.        ]]\n",
    "\n",
    "try: \n",
    "    assert np.all( np.isclose( np.array(spearduck.loc[idx,idx]),duckC ) )\n",
    "    print(\"OKAY - spearduck correlations correct\")\n",
    "except:\n",
    "    print(\"FAIL - spearduck correlations incorrect\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.6"
  },
  "toc-autonumbering": false,
  "toc-showcode": false,
  "toc-showmarkdowntxt": true,
  "vscode": {
   "interpreter": {
    "hash": "1fd682e605b20a63f2f232a48fa9edc200d6bd85c08b844e49eb3f157803234a"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
