Mathematics and Applications of Machine Learning
Week 11 review: We will discuss Chapter 8.
Week 11 labs: Work on Chapter 8 exercises.
Before Week 12: Complete Coursework 3.
1 Introduction
Welcome to this second semester part of MATH36160 Mathematics and Machine Learning and Applications, taught by Jonas Latz and Stefan Güttel. We will be concerned with many practical aspects of data science and machine learning, including hands-on experience using Python. See the course unit details for prerequisites and learning outcomes.
Acknowledgements. These materials are partially based on some excellent online resources and these are usually referenced in the chapters they are needed. One of the most important sources are Toby Driscoll’s lecture notes for Math 219, Data Science 1, taught at the University of Delaware. Please refer to the original course notes and the corresponding license file for rights information.
1.1 Getting started
Learning in this part of the course is facilitated through interactive Python programs and in order to participate, you will need a working coding environment. We will be using the lastest version of Python 3 together with the Visual Studio Code (VS Code) editor. VS Code is one of the most popular integrated development environments (IDE) and getting familiar with it is useful on its own right. We will use it to write and execute code in Jupyter Notebooks. All coursework assignments will also be submitted as Jupyter Notebooks. VS Code is conveniently integrated with GitHub and we will start by creating a GitHub account first.
1.2 Setting up GitHub
There are several reasons for creating a GitHub account. First and foremost, GitHub is the world’s largest source code host built on Git, the most popular distributed Version Control System (VCS). Git is a useful and widely used tool for tracking changes to your code, collaborating, and sharing. With Git you always have a record of what code you’ve worked on and you can easily revert back to an older version if need be. It also makes working with others easier. Most software development teams use Git/GitHub.
GitHub is a way to use Git all online with a user-friendly interface. Using the full functionality of Git and GitHub effectively requires a rather steep learning curve and we will not learn about it in this course. However, GitHub will still be useful for at least three other reasons:
- VS Code can sync its setting to a GitHub account so that your code editor will always look the same.
- GitHub Copilot provides you with generative AI code completion.
- GitHub Codespaces are a way to freely develop and execute code in the cloud, accessed simply by a browser-based version of VS Code.
In order to benefit from these, you will need to setup a GitHub account and get verified as a student user. Please follow the instructions below.
Step A: Create a GitHub account
If you do not already have a GitHub account:
- Go to https://github.com/signup
- Use your University email address
firstname.lastname@student.manchester.ac.uk - Choose a memorable and safe password. You will frequently use this account throughout the semester.
- You will be asked to choose a username. Keep in mind that you might want to use this GitHub account professionally in the future. For example, you might include a link to your GitHub profile on a CV. Usernames like
firstname-lastnameorlastname-2001are probably better thandaydreaming-sloth-128374921. You can change your username later but this might lead to unintended consequences. - It’s normal for you to be asked to solve a puzzle to complete the sign up. This is to prevent bots from creating spam accounts.
If you already have an existing GitHub account: You can continue using it for this course. However, in order to qualify for the GitHub’s Student Developer Pack, you’ll need to add your student email address under https://github.com/settings/emails and verify it.
Step B: Apply for the Student Developer Pack
GitHub’s Student Developer Pack provides you with additional benefits compared to the free GitHub account created in Step A (such as Codespaces and Copilot). In order to apply for the Student Developer Pack, go to https://education.github.com/pack/join and complete the application. Approval can take a little time but you do not wait for this to follow the next steps below.
1.3 Python and VS Code setup
There are several ways to access/install Python and VS Code, and we recommend essentially three options (ordered by ease of setup):
- Using VS Code in the computer labs. Python (from Anaconda) and VS Code are already installed on the computer labs in the Alan Turing Building. In principle, you just need to log in and type “VS Code” in the AppsAnywhere app to launch it. However, the way the lab computers are set up, any settings or extensions may need to be reinstalled everytime you start up VS Code. It is therefore recommended to link VS Code to a GitHub account and to create a virtual environment. Expand the below box to follow the setup instructions.
Create a folder called ‘MAML’ on your desktop
Launch Visual Studio Code (not Visual Studio!) using the AppsAnywhere application (linked on the Windows desktop)
If you already have a GitHub account, you can enable that all VS Code extensions and settings are saved to your account. This way, VS Code will look the same when you restart it. To enable this,
- go to File -> Preferences -> Backup and Sync Settings
- click [Sign in] in the command palette show on the top, then “Sign in with GitHub”
- a browser will open asking you for your GitHub login, complete the sign in process
- your browser may try to open Visual Studio Code, allow this by clicking [Open]
You should now be back in VS Code with the status bar on the bottom indicating the sync process.
Go to File -> Open Folder and select the MAML folder from your desktop. If prompted, click [Yes, I trust the authors].
Select the Explorer in the vertical activity bar on the left (it’s the first icon). This will list the content of the MAML folder, which is currently empty.
Select the Extensions icon in the vertical activity bar (it’s an icon with 4 square tiles). In the search bar, type ‘Python’, select the Microsoft Python extension (usually the first entry), and click [Install] in the window that opened in the editor area.
Open the command palette by using Ctrl + Shift + P.
- Type ‘Python create environment’ and select that command.
- Select the ‘Venv’ option
- Select ‘Enter interpreter path’, type in
C:\Anaconda3\python.exeand then hit [Enter] - Now wait a couple of minutes as the virtual environment is being created (you see the info box on the bottom right and the new
MAML\.venvfolder in the Explorer)
Select again the Extensions icon in the vertical activity bar. In the search bar, type ‘Jupyter’, select and install the Microsoft Jupyter extension (usually the first entry).
Go to File -> New File -> Jupyter Notebook, write print(“hello world”) in the first cell of that notebook, and save it as ‘test.ipynb’ in the MAML folder (File -> Save or Ctrl + S).
Try running that notebook. VS Code wants you to select a kernel source. Choose ‘Python Environments’ -> ‘.venv (Python…)’.
VS Code then wants to install the ipykernel package. Agree to that by clicking [Install]. This can also take a couple of minutes.
Your simple hello world cell should now run successfully. However, the Python environment we have created does not have any advanced packages like, e.g., NumPy, SciPy, Matplotlib, Pandas, etc. To install any of these, open the Terminal (Ctrl + ’) and enter
- pip install numpy
- pip install matplotlib
- etc.
whenever you find a package is missing. Should you encounter any issues with missing admistrator rights, try python -m pip install numpy.
- Installation on your own machine. If you prefer to work on your own desktop computer or laptop, you need to download and install Python, and download and install VS Code. On a Mac, the recommended way to install Python is via Homebrew.
The below video walks you through the installation of Python and VS Code on your own computer. Note that you may still need to install the Jupyter extension after the setup is completed.
- Use VS Code in the cloud with GitHub Codespaces (advanced). This requires a GitHub account with student credentials, and the creation of a repository to store your files. The repository can then be opened as a workspace in a browser-based version of VS Code. The computing environments are called Codespaces. This is only recommended for those who are already familiar with GitHub and VS Code.
1.4 VS Code
Please follow the below video to get an overview of VS Code and a quick intro to Jupyter Notebooks.
1.5 Jupyter Notebooks
We will be using Jupyter Notebooks for all documentation and Python codes. Jupyter Notebooks are computational documents that contain text, mathematics, code, and code-generated graphics in one place. These lecture notes are essentially Jupyter Notebooks as well. The common file extension is .ipynb.
The primary building blocks of a Jupyter Notebook are cells. There are three main types of cells:
- Code Cells: Contain executable code (in our case, Python). When run, they produce output directly below the cell.
- Markdown Cells: Contain text formatted using Markdown, a lightweight markup language. These cells are used for documentation, explanations, and embedding images or links.
- Raw Cells: Contain raw text that is not processed by the notebook. These are rarely used.
In order to execute the code cells in a notebook, a kernel has to be selected. For Python notebooks, the kernel is typically IPython. The kernel maintains the state of the notebook, including variables and imports.
The order of cells that you see in a notebook is not necessarily the order in which they were executed.
By far the greatest source of confusion and subtle problems in a notebook is the freedom it gives you to execute the cells in whatever order you want. As you experiment and add and delete cells to try things out, you will reach a point at which the code on the screen is no longer a recipe for reaching the current state of your workspace.
Before submitting a Jupyter Notebook (e.g., for a coursework assignment), it is highly recommended to take two steps: Restart the kernel and then run all cells, just to make sure that everything still works as it’s seen on screen.
1.6 Generative AI
Generative AI is a catchall term for models that can produce output similar to data they were trained on. Most notoriously of late, these include the large language models such as ChatGPT that can simulate conversation.
The ultimate utility of these models is yet to be realized in most domains. However, they are already quite useful for generating computer code, given careful use and supervision. If nothing else, they offer a sort of intelligent autocomplete that can assist you when you have forgotten the names of functions and the orders of their arguments.
For students, there is a sophisticated code-oriented AI that you can use for free: GitHub Copilot. It is an extension for VS Code that is well-developed for use with most computer programming languages, including Python, Matlab, and Julia, and it can work with Jupyter notebooks natively.
If you are using VS Code through the Codespaces environment (our recommendation), then Copilot should already be available for you. If you prefer to use VS Code locally with Copilot, you will need follow these steps:
1.6.1 Getting started with GitHub Copilot
- Ensure you have followed the two-step process in Section 1.2 in order to have a GitHub account and be verified as a student.
- Open VS Code.
- Install the GitHub Copilot and the GitHub Copilot chat extensions for Visual Studio Code. You may need to restart or reload VS Code after installing these extensions.
- Sign in to your GitHub account in Visual Studio Code.
- Finally, open the GitHub chat window by clicking on the speech bubbles icon along the left side of the window. If it shows a button for starting a 30-day free trial, go ahead and click that, which takes you to a website. If your student status has been approved, you should not get asked for payment information before getting to a success screen. After one more restart of VS Code, you should be good to go.
1.6.2 Usage
There are now several ways you can interact with Copilot:
- Open or create a code file and start typing. You will soon start to see grayed-out suggestions from Copilot. You can accept these suggestions by pressing
TaborEnter. If you continue typing, the suggestions will change to match. Note that Copilot will use the open file as context for its suggestions, including comments and variable names. - Within your code file, you can type
Ctrl+i, orCmd+ion a Mac, to open the Copilot panel. You can type in the panel to ask questions or create a prompt for it to respond to. - Along the left side of the VS Code window are icons representing the extensions you have installed. The Copilot Chat icon is a pair of speech bubbles (though this can change). Clicking on this icon will open a chat window.
- You can use
Ctrl+Shift+P, orCmd+Shift+Pon a Mac, to open the command palette and type “Copilot” to see a list of commands you can use to interact with Copilot. For example, you can ask Copilot to generate a function definition or a class definition, or to complete a line of code.
1.7 Markdown
Markdown cells in Jupyter Notebooks use the Markdown language to format text. Here are some common Markdown features you can use:
- Headings: Use
#for headings. For example,# Heading 1,## Heading 2,### Heading 3, etc. - Bold and Italics: Use
**bold**or__bold__for bold text, and*italic*or_italic_for italic text. - Lists: Use
-or*for unordered lists, and1.,2., etc., for ordered lists. - Links: Use
[link text](URL)to create a hyperlink. - Images: Use
to embed an image. - Code: Use backticks for inline code, and triple backticks for code blocks.
For example, to create a heading, bold text, and a list:
# Heading 1
**This is bold text**
- Item 1
- Item 2
- Item 3
1.8 Math notation
To make good-looking math in Jupyter Notebooks, you can use LaTeX notation. For example, to write the equation of a line as \(y=mx+b\), you can use the following within a text cell:
For example, to write the equation of a line as $y=mx+b$, you can use the following
The example above produces inline math, which is shown as part of the text. You can also use display math, which is shown on its own line.
For example, to write the equation of a line on its own line as
\[ y=mx+b, \]
you can use the following within a text cell:
$$
y=mx+b,
$$
Use _ for subscripts and ^ for superscripts. For example, to write \(x_1^2\), use $x_1^2$. If you have more than one character in the sub/superscript, use curly braces. For example, to write \(X_{i,j}=10^{100}\), use $X_{i,j}=10^{100}$.
Function names and Greek letters are written with a backslash. For example, to write \(\sin(\theta)\), use $\sin(\theta)$. To write \(\alpha+\ln(\beta)\), use $\alpha + \ln(\beta)$. Note the difference between \(\cos(\phi)\), which is written as $\cos(\phi)$, and \(cos(\phi)\), which is written as $cos(\phi)$. All the functions you know work like this, plus square roots, like \(\sqrt{2+z}\), which is written as $\sqrt{2+z}$. Occasionally you want a short bit of text to appear like function names (upright) rather than multiplied variables (italic). For instance, to get \(x_\text{init}\), use $x_\text{init}$.
You’ve seen that curly braces are used to indicate groupings. If you actually want curly braces to appear, you have to escape them with a backslash. For example, to write \(\{2, 3, 5, 7\}\), use $\{2, 3, 5, 7\}$.
To write a fraction, use \frac{numerator}{denominator}. For example, to write \(\frac{1}{2}\), use $\frac{1}{2}$. Derivatives are written as fractions, so to write \(f'(x) = \frac{dy}{dx}\), use $f'(x) = \frac{dy}{dx}$.
If you want to put fractions inside of parentheses, use \left( and \right) to make the parentheses the right size. For example, to write
\[ \left( x - \frac{1}{2} \right)^3, \]
use $\left(x - \frac{1}{2}\right)^3$. The same applies to \left[ and \right] for square brackets.
For the most part, the spaces you put in don’t matter. For example, $x+1$ and $x + 1$ both produce \(x+1\). Usually, the automatic spacing makes the best-looking result. But if you want to put a space in manually, you can use \, for a small space, \: for a medium space, \; for a large space, and \quad for a really large space.
Integrals and sums are produced using \int and \sum, with subscripts and superscripts put where you would expect them to go. For example, to write
\[ \int_0^1 x^2 \, dx, \]
use $$\int_0^1 x^2 \, dx$$. (Here is one legitimate use for a manual space!) To write
\[ \sum_{i=1}^n i^2, \]
use $$\sum_{i=1}^n i^2$$.
There are many web pages and blogs giving quick introductions to LaTeX, if you need to do something more complex than the above. Beware: it can be a deep rabbit hole to fall into.
1.9 Exercises and lab classes
The exercises are distributed as Jupyter Notebooks. Access the exercises for this chapter by downloading ch1-exercises.ipynb from the Exercises page, saving it to your MAML folder, and opening it in VS Code. Follow the tasks therein.
Useful links
- Driscoll’s free data science course
- GitHub starter course
- YouTube Git Tutorial for Beginners: Learn Git in 1 Hour - focusses on the command line, but good explanation of the Git Workflow
- YouTube Git Tutorial For Dummies
- Getting Started with Jupyter Notebooks in VS Code
- List of useful markdown
- SIAM News article on ``Data Science: What Is It and How Is It Taught?’’
