M6C Data Science I

*Magda Gregorova* *24/4/2019*

Ex4 part2: feature transformation

In this notebook you will explore the effects of polynomial feature transformation.

Download jupyter notebook Ex4 part2: feature transformation

Set-up

Import python libraries to be used later and define a convenience functions plot_xy for quick plotting and sample_data to sample random input and output data.

In [ ]:
import numpy as np

import matplotlib.pyplot as plt 
# ipython magic for plots in notebook
%matplotlib inline

# x-y plotting function - quick and dirty!
def plot_xy(x_data, y_data, plt_title, **kwargs):
    plt.rcParams['figure.figsize'] = (10.0, 5) # set size of plot
    plt.plot(x_data, y_data, 'o', color='#1f77b4')
    if kwargs is not None and 'y_pred' in kwargs:
        plt.plot(x_data, kwargs['y_pred'], 'o', color='#d62728')
        plt.vlines(x_data, y_data, kwargs['y_pred'], color='#1fa5b4', linestyles='dotted')
    if kwargs is not None and 'x_test' in kwargs and 'y_test' in kwargs:
        plt.plot(kwargs['x_test'], kwargs['y_test'], color='#d62728')
    plt.xlabel('x')
    plt.ylabel('y')
    plt.ylim((-3,3))
    plt.title(plt_title)

def sample_data(sample_size):
    x = np.random.uniform(-np.pi,np.pi,sample_size)
    y = np.sin(x) + np.random.normal(0,0.1,sample_size)
    return (x,y)

Generate random sample

In [ ]:
# generate random sample
sample_size = 15
np.random.seed(123) # fix random seed
x_smpl, y_smpl = sample_data(sample_size)

# plot data
plot_xy(x_smpl, y_smpl, 'Random sample data')

Regression models

We will use linear regression to learn a model (hypothesis $h$) that achieves the lowest mean squared error (MSE) over the sample data.

Simple liner regression

We will begin from a simple linear regreesion $$\hat{y} = h_1(x) = w_0 + w_1 \, x$$

**Code:** Complete the code below where needed

In [ ]:
# get params of the linear regression model (use the vector-matrix approach)
# COMPLETE CODE HERE
phi =             # feature matrix
w =               # estiamted parameter vector

######
# check w
try:
    if w.size != 2:
        print('Something wrong! Param vector w should have size 2 and currently has size', w.size)
except:
    print('Something wrong! Params w should be a vector of size 2!')
######

# predictions for the sample points using the feature matrix phi and the learned parameters w
y_pred = phi.dot(w) 

# get mean squared error over the sample data
# COMPLETE CODE HERE

mse = 
print('MSE over the sample data is', mse)

######
# check mse
try:
    if abs(mse - 0.21889301862450047) > 1e-08:
        print('Something wrong! Your mse_smpl is too far from my result!')
except:
    print('Something wrong!  Your mse_smpl should be simple scalar number!')
######

# generate test data evenly over whole X space
test_size = 100
x_test = np.linspace(-np.pi, np.pi, num=test_size)

# get predictions for test data
# COMPLETE CODE HERE
phi = 
y_test_pred = 

# plot data and regression function
plot_xy(x_smpl, y_smpl, 'Random sample and linear regression', y_pred = y_pred, x_test = x_test, y_test = y_test_pred)

Quadratic regression

As discussed in the class, simple linear regression may not be suitable to all problems. We will therefore next try the quadratic regression in the form $$\hat{y} = h_2(x) = w_0 + w_1 \, x + + w_2 \, x^2$$

**Code:** Complete the code below where needed

In [ ]:
# use appropriate feature transformations and get the parameters w of the linear model over the transformed features
# COMPLETE CODE HERE
phi =             # feature matrix
w =               # estiamted parameter vector

######
# check w
try:
    if w.size != 3:
        print('Something wrong! Param vector w should have size 3 and currently has size', w.size)
except:
    print('Something wrong! Params w should be a vector of size 3!')
######

# predictions for the sample points using the feature matrix phi and the learned parameters w
y_pred = phi.dot(w) 

# get mean squared error over the sample data
# COMPLETE CODE HERE

mse = 
print('MSE over the sample data is', mse)

######
# check mse
try:
    if abs(mse - 0.21604325945831315) > 1e-08:
        print('Something wrong! Your mse_smpl is too far from my result!')
except:
    print('Something wrong! Your mse_smpl should be simple scalar number!')
######

# generate test data evenly over whole X space
test_size = 100
x_test = np.linspace(-np.pi, np.pi, num=test_size)

# get predictions for test data
# COMPLETE CODE HERE
phi = 
y_test_pred = 

# plot data and regression function
plot_xy(x_smpl, y_smpl, 'Random sample and quadratic regression', y_pred = y_pred, x_test = x_test, y_test = y_test_pred)

Higher degree polynomials

Inspecting the code above you can see that the true generating function $f : \mathcal{X} \to \mathcal{Y}$ is $f(x) = sin(x)$. In this homework we will, however, not use the $\sin$ function as a hypothesis (model) and instead explore how we can use polynomials to approximate the $\sin$.

Our hypothesis space $\mathcal{H}$ will be the space of polynomial functions $$\hat{y} = h_d(x) = w_0 + w_1 \, x + w \, x^2 + \ldots + w_d \, x^d$$ and we will search for the regression functions $h_d$ with different values $d$ for the degree of the polynomial.

**Code:** Complete the code below where needed

In [ ]:
# function to get regression parameters w, predicitons y_pred, and mse for any degree polynomial
def poly_regr(x,y,degree):
    # COMPLETE FUNCTION
    ### THIS SHALL WORK FOR ANY POLYNOMIAL DEGREE, EVEN VERY LARGE, SO FEATURES CANNOT BE DEFINED MANUALLY
    ### AVOID FOR LOOPS, THINK ABOUT BROADCASTING (WHICH WORKS FOR POWERS AS WELL)

    return(w , y_pred, mse)

# test the function for polynomial of degree 5
degree = 5
w, y_pred, mse = poly_regr(x_smpl, y_smpl, degree)
print('MSE over the sample data is', mse)

######
# check mse
try:
    if abs(mse - 0.010205041774550573) > 1e-08:
        print('Something wrong! Your mse_smpl is too far from my result!')
except:
    print('Something wrong! Your mse_smpl should be simple scalar number!')
######
  
# function to get predictions for generated test data
# generate test data evenly over whole X space
def test_data(test_size, w, degree):
    # COMPLETE FUNCTION
    # REUSE CODE FROM ABOVE TO GENERATE INPUT TEST DATA EVENLY
    return x, y

######
# plot data and regression function
x_test, y_test_pred = test_data(100, w, degree)
# plot data and regression function
plot_xy(x_smpl, y_smpl, 'Random sample and d=5 regression', y_pred = y_pred, x_test = x_test, y_test = y_test_pred)

Best fitting polynomial

As you can see in the above image, the regression line resambles rather well a $\sin$ function. It also achieves much lower MSE then the linear or quadratic regression.

**Code:** Use the function poly_regr to find the degree d which yeilds the lowest MSE over the sample data

In [ ]:
# find the best fitting polynomial
# COMPLETE CODE HERE



degree_best = 
mse_best = 
w_best = 
y_pred_best = 

print('The lowest MSE is {} for a polynomial of degree {}'.format(mse_best, degree_best))

# plot data and best regression function
x_test, y_test_pred = test_data(100, w_best, degree_best)
# plot data and regression function
plot_xy(x_smpl, y_smpl, 'Random sample and best regression', y_pred = y_pred_best, x_test = x_test, y_test = y_test_pred)

Fiting different sample

Now that you found the best degree for the polynomial, we will use it to learn the regression function from a different data sample from the same data distribution.

Compare the regression functions learned from the two different samples in a single plot.

In [ ]:
# plot data and best regression function
x_test, y_test_pred = test_data(100, w_best, degree_best)
# plot data and regression function
plot_xy(x_smpl, y_smpl, 'Random sample and best regression', y_pred = y_pred_best, x_test = x_test, y_test = y_test_pred)

# generate another random sample
x_smpl, y_smpl = sample_data(sample_size)
w, y_pred, mse = poly_regr(x_smpl, y_smpl, degree_best)

# plot data and best regression function
x_test, y_test_pred = test_data(100, w, degree_best)
# plot data and regression function
plot_xy(x_smpl, y_smpl, 'Random sample and best regression', y_pred = y_pred, x_test = x_test, y_test = y_test_pred)

**Code:** Write similar code as above to plot into a single graph the regression lines for your best degree polynomial learned from 10 different samples (10 different regression models). Calculate the average of the MSE over the sample data across the 10 samples.

In [ ]:
# YOUR CODE HERE


mse_avg = 
print('The MSE average over the 10 samples for the best polynomial is', mse_avg)

**Code:** Reuse you code from above to plot into a single graph the regression lines for 3rd degree polynomial learned from 10 different samples (10 different regression models). Calculate the average of the MSE over the sample data across the 10 samples.

In [ ]:
# YOUR CODE HERE
degree = 3

mse_avg = 
print('The MSE average over the 10 samples for the 3rd degree polynomial is', mse_avg)

**Think:** Compare the two results and plots above. When you recive a single random sample from our data (same size $n$ as we used above), which degree polynomial seems more reasonable to use to learn a regression function for prediction? Why?