How to use linear regression in practice in machine learning

The theory behind linear regression

Linear regression is actually a very simple and easy concept to grasp. In essence from a visual perspective linear regression is just data points. Data points which are plotted on a scatter chart and then a function is found which will fit a linear line on the chart. See below diagram as an example of how this works:

Example scatter chart of diabetes data from the scikit learn data set
Example scatter chart of diabetes data from the scikit learn data set

Everything you may have learnt from high school about gradients in mathematics essentially applies to linear regression as it is a matter of finding the two variables denoted m and b in the equation.

y = mx + b

Which will attempt to fit your data best in order to predict future outcomes. Linear regression isn’t often used to solve classification problems. It mostly derives a model from previous data sets to predict new ones.

Applications of linear regression in machine learning

Linear regression can be very useful in solving problems like below:

  1. Prediction of stock prices on the stock exchange and predict things like interest rate hikes, price change and even sentiment.
  2. Predicting the likely hood of certain cancers and other diseases occurring in patients with certain symptoms.
  3. Predicting user behaviour in games, things such as player boredom, player pleasure centres and so on to keep players intrigued.
  4. Predict buyer behaviour on an e commerce site and use that to improve the user experience for your buyers and ultimately increasing sales.
  5. Predict weather outcomes.

There are many many more applications and whatever you can think of where you want to do prediction this is the most basic algorithm you can use to achieve this goal. Linear regression in machine learning is quite versatile and can be used with almost any data set and as long as you can scale your data to be normal in relation to the other data points you are all good.

Lets write some code for the diabetes data set

This tutorial is aimed at readers with intermediate ability in programming and has a basic understanding of python.

First you will need to install a few libraries.

Run:

type pip3 install matplotlib
pip3 install sklearn
pip3 install numpy

A basic introduction matplotlib will be our graphing library to draw those nice graphs like we have here below:

Sklearn or better known as scikit learn is the machine learning or data science library we use to simplify our lives. This especially is true when generating our linear regression models because it just helps with plotting our graphs. Numpy allows for array type operations in python. This allows us to cut and splice our data we get from the scikit learn data set.

Enough of that here is our linear regression code


import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model

diabetes = datasets.load_diabetes()

dbX = diabetes.data[:, np.newaxis, 2]

# Split the data into training/testing sets
dbX_train = dbX[:-20]
dbX_test = dbX[-20:]

diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]

regr = linear_model.LinearRegression()

regr.fit(dbX_train, diabetes_y_train)

print('Coefficients: \n', regr.coef_)

print("Mean squared error: %.2f"
 % np.mean((regr.predict(dbX_test) - diabetes_y_test) ** 2))

print('Variance score: %.2f' % regr.score(dbX_test, diabetes_y_test))

plt.scatter(dbX_test, diabetes_y_test, color='black')
plt.plot(dbX_test, regr.predict(dbX_test), color='red',
 linewidth=2)

plt.xticks(())
plt.yticks(())

plt.show()

Leave a Reply

Your email address will not be published. Required fields are marked *