Linear Fitting

Linear Fitting#

Problem: Find a and b in the equation y=a+b×x, given the vectors x and y, using the least-squares method. Also, determine the corresponding R-squared value.

Test data:

x25
y10+5×x
]plot y x
Created by Causeway SVG engine - SharpPlot v3.71.0 Paint the paper ===== Border ===== Region ===== X-Axis Ticks ===== X-Axis tickmarks Y-Axis Ticks ===== Y-Axis tickmarks Axes ===== Y-axis labels 10 20 30 40 50 60 70 80 90 100 110 120 130 140 for X-axis labels 0 2 4 6 8 10 12 14 16 18 20 22 24 26 Heading, subheading and footnotes ===== Start of Line Chart =========== Points follow ... Line Reset to original origin

This data is too perfect. Let’s add some noise:

y+(5-10×?0y)
]plot y x
Created by Causeway SVG engine - SharpPlot v3.71.0 Paint the paper ===== Border ===== Region ===== X-Axis Ticks ===== X-Axis tickmarks Y-Axis Ticks ===== Y-Axis tickmarks Axes ===== Y-axis labels 10 20 30 40 50 60 70 80 90 100 110 120 130 140 for X-axis labels 0 2 4 6 8 10 12 14 16 18 20 22 24 26 Heading, subheading and footnotes ===== Start of Line Chart =========== Points follow ... Line Reset to original origin

The slope is easy to determine if we already know what is the value of a:

b(y-10)x
4.900895018

We can also use to fit several variables. To fit the value of a, we consider an additional variable with a constant value of 1, as explained in the docuentation for . We just need to modify our right argument from

x
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

into

1,⍪x
1  1
1  2
1  3
1  4
1  5
1  6
1  7
1  8
1  9
1 10
1 11
1 12
1 13
1 14
1 15
1 16
1 17
1 18
1 19
1 20
1 21
1 22
1 23
1 24
1 25

(a vector is treated by as a column vector). Then:

(a b)y1,⍪x
8.968943213 4.961545417

Fitting function#

Let’s write a function to perform linear fittings. Our function will take as right argument the y x data and will be ambivalent (we use 0=⎕NC'⍺' to check if a left argument was passed to the function). If there is a left argument, it will be interpreted as the value of the ordinate at the origin. So, we will fit the slope of the line through the origin using the data ⍵-⍺ 0. If no argument is given, a and b will be fitted. In both cases, the function will return the values a b.

LF{0=⎕NC'⍺': ⊃⌹(1,⍪)/  ,⌹/- 0}   ⍝ linear fit. ⍵: y x data. ⍺ (optional): ordinate
⍝ eg
(ad bd)10 LF y x
(a  b )   LF y x
]plot ((ad+bd×x) (a+b×x) y y) x
10 4.900895018
8.968943213 4.961545417
Created by Causeway SVG engine - SharpPlot v3.71.0 Paint the paper ===== Border ===== Region ===== X-Axis Ticks ===== X-Axis tickmarks Y-Axis Ticks ===== Y-Axis tickmarks Axes ===== Y-axis labels 10 20 30 40 50 60 70 80 90 100 110 120 130 140 for X-axis labels 0 2 4 6 8 10 12 14 16 18 20 22 24 26 Heading, subheading and footnotes ===== Start of Line Chart =========== Points follow ... Line Points follow ... Line Points follow ... Line Points follow ... Line Reset to original origin

R-squared value#

We still need to determine the R-squared value, which can be calculated as one minus the ratio of the sums of squares (÷⍥(+/×⍨)) of differences between the real and estimated values and between the real values and their mean:

1-(y-a+b×x)÷(+/×)(⊢-+÷≢)y   ⍝ R-squared value
0.9971435494

The R-squared value is defined not only for linear fittings, but also for any other fitting we perform on some data. Therefore, we can define an operator that takes the fitted function as operand (in our case, {a+b×⍵} or a+b∘×) and the fitting data as arguments.

_R2{1-(-⍺⍺⍵)÷(+/×)(⊢-+/÷≢)}   ⍝ returns R-squared value for function ⍺⍺. ⍺ ⍵ is the y x data
⍝ eg
y (a+b×)_R2 x
0.9971435494

More generally, we can define a dyadic operator _R2_ that directly takes the fitting and fitted functions and returns the fitted parameters and R-squared value for the given data (as right argument) and optional parameters for the fitting function (as left argument). We then use this operator to define an LR2 function that performs a linear fitting and returns the fitted parameters and the R-squared value.

_R2_{  r ⍺⍺   r,r⍵⍵_R2/}   ⍝ returns fitted parameters and R-squared value
                                      ⍝ ⍺⍺: fitting function, ⍵⍵: fitted function
                                      ⍝ ⍵: fitting data, ⍺ (optional): left argument of fitting function
LR2LF _R2_{(a b)  (a+b×)}       ⍝ linear fitting with y x data ⍵ and (optional) ordinate ⍺
⍝ eg
ad bd _10 LR2 y x
az bz _ 0 LR2 y x
a  b  _   LR2 y x
]plot ((ad+bd×x) (az+bz×x) (a+b×x) y) x
10 4.900895018 0.996948701
0 5.489130312 0.9823995937
8.968943213 4.961545417 0.9971435494
Created by Causeway SVG engine - SharpPlot v3.71.0 Paint the paper ===== Border ===== Region ===== X-Axis Ticks ===== X-Axis tickmarks Y-Axis Ticks ===== Y-Axis tickmarks Axes ===== Y-axis labels 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 for X-axis labels 0 2 4 6 8 10 12 14 16 18 20 22 24 26 Heading, subheading and footnotes ===== Start of Line Chart =========== Points follow ... Line Points follow ... Line Points follow ... Line Points follow ... Line Reset to original origin

Obviously, the R-squared value is higher when we fit both values, since the fitted function can accomodate better the fitted data. However, it is useful to have an optional argument to impose already known values, as it will often be the case for functions that are known to pass by the origin.

Summary#

_R2{1-(-⍺⍺⍵)÷(+/×)(⊢-+/÷≢)}        ⍝ R-squared value for ⍺⍺ fitted function
_R2_{  r ⍺⍺   r,r⍵⍵_R2/}     ⍝ fit and R-squared value for fitting and fitted functions ⍺⍺ and ⍵⍵
LF{0=⎕NC'⍺': ⊃⌹(1,⍪)/  ,⌹/- 0}   ⍝ linear fitting function (⍵: y x data, ⍺: optional ordinate)
LR2LF _R2_{(a b)  (a+b×)}         ⍝ linear fitting and R-squared value (⍵: y x data, ⍺: optional ordinate)