Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Least Squares Line Regression: Fitting a Straight Line to Data, Slides of Statistics

Resurrection University (ResU)Statistics

How to fit a straight line to data using the least squares method. It covers the assumptions required for this statistical tool to be effective, the formulas for calculating the intercept and slope, and the concept of correlation coefficient. The document also discusses the interpretation of R2 and provides examples with given data.

What you will learn

What are the assumptions required for a linear least squares fit?
What is the difference between the population and sample correlation coefficient?
How is the slope related to the correlation coefficient?

How does the least squares method handle non-constant variance?
What does it mean if the correlation coefficient is -1?

Typology: Slides

2021/2022

Uploaded on 09/12/2022

eknathia 🇺🇸

4.4

(26)

264 documents

1 / 14

This page cannot be seen from the preview

Don't miss anything!

Section 4.1: Fitting a Line by Least Squares

Often we want to fit a straight line to data.

For example from an experiment we might have the following data showing the

relationship of density of specimens made from a ceramic compound at different

pressures.

By fitting a line to the data we can predict what the average density would be for

specimens made at any given temperature, even pressures we did not investigate

experimentally.

For a straight line we assume a model which says that on average in the whole

population of possible specimens the average density, y, value is related to

pressure, x, by the equation

xy 10

+≈

The population (true) intercept and slope are represented by Greek symbols just

like μ and σ.

Partial preview of the text

Download Least Squares Line Regression: Fitting a Straight Line to Data and more Slides Statistics in PDF only on Docsity!

Section 4.1: Fitting a Line by Least Squares

Often we want to fit a straight line to data.

For example from an experiment we might have the following data showing the relationship of density of specimens made from a ceramic compound at different pressures.

By fitting a line to the data we can predict what the average density would be for specimens made at any given temperature, even pressures we did not investigate experimentally.

For a straight line we assume a model which says that on average in the whole population of possible specimens the average density, y, value is related to pressure, x, by the equation

y ≈ β 0 + β 1 x

The population (true) intercept and slope are represented by Greek symbols just like μ and σ.

For the measured data we fit a straight line

y ˆ^ = b 0 + b 1 x

For the ith^ point, the fitted line or predicted value is

y ˆ^ i = b 0 + b 1 x i

The fitted line is most often determined by the method of “least squares”.

This is the optimal method to use for fitting the line if

The relationship is in fact linear.
For a fixed value of x the resulting values of y are o normally distributed with o the same constant variance at all x values.

If these assumptions are not met, then we are not using the best tool for the job.

For any statistical tool, know when that tool is the right one to use.

y ˆ − y = b 1 ( x − x ). When x = x , then y = y.

If we have average pressure, x, then we expect to get about average density, y.

The sample (linear) correlation coefficient, r, is a measure of how “correlated” the x and y variable are. The correlation coefficient is between -1 and 1

+1 means perfectly positively correlated 0 means no correlation -1 means perfectly negatively correlated

The correlation coefficient is computed by

( )( )

∑ (^ )^ ∑(^ )

∑ − −

x x y y

x x y y r i i

i i

The slope, b1 , and the correlation coefficient are related by

run

rise

SD

r SD

b

= ∗ y^ =

SDy is the standard deviation of y values and
SDx is the standard deviation of x values.

For every SDx run on the x axis, the fitted line rises r ∗ SDy units on the y axis.

So if x is a certain number of standard deviations above average, x^ ,

then y is, on average, the fraction r times that many standard deviations above average, y^.

For example if

the correlation is r = 0.
the pressure, x, is 2 standard deviations above average x
Then we expect the density, y, will be about o 0.9(2) = 1.8 standard deviations above average y^.

lnterpretation of r^2 or R^2

" R^2 = fraction of variation accounted for (explained by) the fitted line.

Ceramic Items Page 124

2450

2500

2550

2600

2650

2700

2750

2800

2850

2900

0 2000 4000 6000 8000 10000 12000 Pressure

Density

Pressure y = Density y - mean (y-mean)^ 2000 2486 -181 32761 2000 2479 -188 35344 2000 2472 -195 38025 4000 2558 -109 11881 4000 2570 -97 9409 4000 2580 -87 7569 6000 2646 -21 441 6000 2657 -10 100 6000 2653 -14 196 8000 2724 57 3249 8000 2774 107 11449 8000 2808 141 19881 10000 2861 194 37636 10000 2879 212 44944 10000 2858 191 36481 mean 6000 2667 sum 0 289366 st dev 2927.7 143. correlation 0. correl^2 0.

The percent reduction is our error sum of squares is

( ) ( ) ( )

R 98. 2 %

R

2 2 2

∑

∑ ∑ y y

y y y y i

i i i

Using x to predict y decreases the error sum of squares by 98.2%

The reduction in error sum of squares from using x to predict y is

Sum of squares explained by the regression equation
284,213.33 = SS Regression in Excel

This is also the correlation squared.

r^2 = 0.991^2 = 0.

For a perfectly straight line

All residuals are zero. o The line fits the points exactly.
SS Residual = 0
SS Regression = SS Total o The regression equation explains all variation
R^2 = 100%
r = ± o r^2 = 1

If r=0, then there is no linear relationship between x and y

R^2 = 0%
Using x to predict y does not help at all.

Checking Model Adequacy

With only single x variable, we can tell most of what we need from aplot with the fitted line.

Original Scale

0 2 4 6 8 10 12 14 16 18 20 X

Plotting residuals will be most crucial in section 4.2 with multiple x variables

But residual plots are still of use here.

Plot residuals

Versus predicted values y ˆ

Versus x
In run order
Versus other potentially influential variables, e.g. technician
Normal Plot of residuals

Some Study Questions

What does it mean to say that a line fit to data is the "least squares" line? Where do the terms least and squares come from?

We are fitting data with a straight line. What 3 assumptions (conditions) need to be true for a linear least squares fit to be the optimal way of fitting a line to the data?

What does it mean if the correlation between x and y is -1? What is the residual sum of squares in this situation?

If the correlation between x and y is 0, what is the regression sum of squares, SS Regression, in this situation?

If x is 2 standard deviations above the average x value and the correlation between x and y is -0.6, the expected corresponding y values is how many standard deviations above or below the average y value?

Consider the following data.

0 2 4 6 8 10 12 X

Y^ Series

ANOVA

df SS MS F

Significance F Regression 1 124.0333 124.03 15.85 0. Residual 4 31.3 7. Total 5 155.

Coefficients

Standard Error t Stat P-value Lower 95% Upper 95% Intercept -0.5 2.79732 -0.1787 0.867 -8.26660583 7. X 1.525 0.383039 3.9813 0.016 0.461513698 2.

What is the value of R^2? What is the least squares regression equation? How much does y increase on average if x is increased by 1.0? What is the sum of squared residuals? Do not compute the residuals; find the answer is the Excel output. What is the sum of squares of deviations of y from y? By how much is the sum of squared errors reduced by using x to predict y compared to using only y to predict y? What is the residual for the point with x = 2?

Least Squares Line Regression: Fitting a Straight Line to Data, Slides of Statistics

Related documents

Partial preview of the text

Download Least Squares Line Regression: Fitting a Straight Line to Data and more Slides Statistics in PDF only on Docsity!

Section 4.1: Fitting a Line by Least Squares

y ˆ^ = b 0 + b 1 x

y ˆ^ i = b 0 + b 1 x i

run

rise

SD

r SD

b

= ∗ y^ =

lnterpretation of r^2 or R^2

R 98. 2 %

R