cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

Community Tip - When posting, your subject should be specific and summarize your question. Here are some additional tips on asking a great question. X

slope & intercept - what is it?

ValeryOchkov
24-Ruby IV

slope & intercept - what is it?

One picture from the article

Interpolation, extrapolation, fitting

or

Lies, damned lies, and statistics

(http://twt.mpei.ac.ru/ochkov/students-eng.pdf)

7-slope-+intercept.png

14 REPLIES 14

One question!

hy we use ()^2 in the summ not abs() or ()^4

Valery Ochkov wrote:

One question!

hy we use ()^2 in the summ not abs() or ()^4

Because the method name would then be least abs or least quads errors.

HARVEY HENSLEY wrote:

Valery Ochkov wrote:

One question!

hy we use ()^2 in the summ not abs() or ()^4

Because the method name would then be least abs or least quads errors.

Sorry,

I did not asked about the name of the method - I ask: why ()^2 not abs

Second question.

What is more correct

16-medfit.png

Valery Ochkov wrote:

Second question.

What is more correct

Correct??? MORE correct?? What would that mean?

We see two different regressions, derived by using slighty different methods, thats all.

Correct? Hopefully, both. And then - correct with respect to ... what?

Which is more suitable? That depends on your needs (and probably even on your data, when I think about outliers). The question is "suitable for what?"

If you have a lot of outliers in your data set (there are algorithms to automatically detect outliers with high probability) it can be that using the MAE yields better results than the RMSE - but then: "better" results for what purpose?

So I guess your question "What is more correct" cannot be answered because its either the wrong question or incomplete.

Valery Ochkov wrote:

HARVEY HENSLEY wrote:

Valery Ochkov wrote:

One question!

hy we use ()^2 in the summ not abs() or ()^4

Because the method name would then be least abs or least quads errors.

Sorry,

I did not asked about the name of the method - I ask: why ()^2 not abs

Don't blame Harvey for not answering a question you have not asked!

You showed the picture of a way to minimize the RMSE (root-mean square error) and ask why we see the squaring of the distances and not the absolute value. The name of the method might give you a hint, why .

If your question is why we see the squaring in the picture the answer is even simpler - because you typed it that way . You could have as well type an absolute value instead.

The discussion squared errors vs absolute value is age old and may have its origin way back in 1809 when Gauss derived his eponymous deviation using squared errors.

Nowadays both RMSE an MAE (mean absolute error) have its place and are used. MAE e.g. a lot in case of error analysis (model quality). Both have their pro's and con's. Dealing with RMSE means that larger differences are emphasized which sometimes is a good thing, sometimes not (think of how outliers will influence the result). So least absolute deviation may be more robust to outliers but can be unstable and may not have only one unique solution.

One point for using the squared error is clearly shown in your picture. As soon as you set up the sum of squares (you omitted taking the root which usually is done so the the deviation has the same unit as the data) you want to minimize it and one obvious tool is calculus, especially differentiation. But absolute values are quite difficult top deal with in math, especially calculus. So squaring is quite easier to deal with.

If you really are interested you will find a lot of material if you search the net.

Some starting points may be:

http://en.wikipedia.org/wiki/Least_absolute_deviations look down until you read "Contrasting Least Squares with Least Absolute Deviations" and don't forget to follow the lin to the applets here: http://www.math.wpi.edu/Course_Materials/SAS/lablets/7.3/73_choices.html

Maybe this 10 year old paper is interesting for you, too: http://www.leeds.ac.uk/educol/documents/00003759.htm

or this http://www.bradthiessen.com/html5/docs/ols.pdf

Another question you may also ask is, why we are using the vertical distance as error and not the horizontal one or, even better, the perpendicular one (guess this would be fun when we include units). Think the answer is - because its easier!

Thanks, Werner.

I will use it in the article.

And second.

I think more correct name of the function line is lsrfit.

Ok.

We can see the model and formulas for a and b in the line, pardon, lsrfit function - see please above.

Can we do it for the medfit function?

RichardJ
19-Tanzanite
(To:Werner_E)

Another question you may also ask is, why we are using the vertical distance as error and not the horizontal one or, even better, the perpendicular one (guess this would be fun when we include units). Think the answer is - because its easier!

The assumption is that there is an independent variable and a dependent variable, and the independent variable has no error. By convention the independent variable is plotted on the abscissa, so the errors are "vertical".

If both variables have errors, it's still possible to do a least squares regression, but not by using any built-in function in Mathcad. It's also possible to do a least squares fit with the errors always perpendicular to the fitted curve. I have a nice example worksheet (not written by me) of a parametric least squares fit to an ellipse, in which the errors are always perpendicular to the ellipse.

Richard Jackson wrote:

Another question you may also ask is, why we are using the vertical distance as error and not the horizontal one or, even better, the perpendicular one (guess this would be fun when we include units). Think the answer is - because its easier!

The assumption is that there is an independent variable and a dependent variable, and the independent variable has no error. By convention the independent variable is plotted on the abscissa, so the errors are "vertical".

Yes, its a matter of what one would like to use the fitted line for and the appraoch you sketch sure is the most common one in technical statistics.

I was drawn away by a pure geometrical point of view where perpendicular distance would be the only true correct (whatever that should mean in that context) appraoch. Its hard to define anyway what the meaning of "this line fits the data points best" should mean and so we define that its exactly that what we calculate (sum of squared "distance", or sum of absolute values of the same or whatever). Which result fits your needs best may be a completely different question and may depend on your application, your data itself, your willingness to manually remoce outliers before, etc.

I have a nice example worksheet (not written by me) of a parametric least squares fit to an ellipse, in which the errors are always perpendicular to the ellipse.

Sounds interesting.

Are you referring to this one?: http://communities.ptc.com/message/219565#219565

RichardJ
19-Tanzanite
(To:Werner_E)

Are you referring to this one?: http://communities.ptc.com/message/219565#219565

No. I did write that one. The one I am referring to is older, and was written by "Paul W", who was a very smart guy that disappeared well before the Collab did. I've attached it (IIRC, I did have to tweak this a little to get it to work with MC15. The original was written for MC11).

Thanks! Thats a nice one.

Because for linear least squares, if the errors are normally distributed then the least squares estimate of the parameters is also the maximum likelihood estimate of the parameters.

Richard Jackson wrote:

Because for linear least squares, if the errors are normally distributed then the least squares estimate of the parameters is also the maximum likelihood estimate of the parameters.

Thats sure a good point (and so we are back to 1809 and pass the buck to Gauß )

Top Tags