π·πππππ π½πππππππππ ππ πΈππππππ ππππππππ
Regression
The term regression is used when you try to find the relationship between variables.
In Machine Learning, and in statistical modeling, that relationship is used to predict the outcome of future events.
Table of Contents
- What is Linear Regression
- Example Explained
- Bad fit of Linear Regression
- Types of Linear Regression
Linear Regression
Linear regression uses the relationship between the data-points to draw a straight line through all them.
This line can be used to predict future values.
Example
Start by drawing a scatter plot:
import matplotlib.pyplot as plt
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
plt.scatter(x, y)
plt.show()
Result:
Example
Import scipy
and draw the line of Linear Regression:
import matplotlib.pyplot as plt
from scipy import stats
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
slope, intercept, r, p, std_err = stats.linregress(x, y)
def myfunc(x):
return slope * x + intercept
mymodel = list(map(myfunc, x))
plt.scatter(x, y)
plt.plot(x, mymodel)
plt.show()
Result:
Example Explained
Import the modules you need.
import matplotlib.pyplot as plt
from scipy import stats
Create the arrays that represent the values of the x and y axis:
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
Execute a method that returns some important key values of Linear Regression:
slope, intercept, r, p, std_err = stats.linregress(x, y)
Create a function that uses the slope
and intercept
values to return a new value. This new value represents where on the y-axis the corresponding x value will be placed:
def myfunc(x):
return slope * x + intercept
Run each value of the x array through the function. This will result in a new array with new values for the y-axis:
mymodel = list(map(myfunc, x))
Draw the original scatter plot:
plt.scatter(x, y)
Draw the line of linear regression:
plt.plot(x, mymodel)
Display the diagram:
plt.show()
Bad Fit?
Let us create an example where linear regression would not be the best method to predict future values.
Example
These values for the x- and y-axis should result in a very bad fit for linear regression:
import matplotlib.pyplot as plt
from scipy import stats
x = [89,43,36,36,95,10,66,34,38,20,26,29,48,64,6,5,36,66,72,40]
y = [21,46,3,35,67,95,53,72,58,10,26,34,90,33,38,20,56,2,47,15]
slope, intercept, r, p, std_err = stats.linregress(x, y)
def myfunc(x):
return slope * x + intercept
mymodel = list(map(myfunc, x))
plt.scatter(x, y)
plt.plot(x, mymodel)
plt.show()
Result:
There are two main types:
Simple regression
Simple linear regression uses traditional slope-intercept form, where mm and bb are the variables our algorithm will try to βlearnβ to produce the most accurate predictions. xx represents our input data and yy represents our prediction.
y=mx+by=mx+b
Multivariable regression
A more complex, multi-variable linear equation might look like this, where ww represents the coefficients, or weights, our model will try to learn.
f(x,y,z)=w1x+w2y+w3zf(x,y,z)=w1x+w2y+w3z
The variables x,y,zx,y,z represent the attributes, or distinct pieces of information, we have about each observation. For sales predictions, these attributes might include a companyβs advertising spend on radio, TV, and newspapers.
Sales=w1Radio+w2TV+w3News