Polynomial Regression Curve Fitting in R

   Polynomial regression is a nonlinear relationship between independent x and dependent y variables. Fitting such type of regression is essential when we analyze fluctuated data with some bends. In this post, I will show how to fit a curve and plot it with polynomial regression data. We use an lm() function in this regression model. As you may know, although it is a linear regression model function, lm() works perfectly for polynomial models by changing the target formula type.
   A mathematical expression of polynomial function can be described as below.

 f(x) = a0 + a1x + a2x2 + ... + anxn


 Generating data

    For our test, we need data, and we can generate it as below. You may use your own observation data too.
peq <- function(x) x^3+2*x^2+5
 
x <- seq(-0.99, 1, by = .01)
y <- peq(x) + runif(200)
 
df <- data.frame(x = x, y = y)
head(df)
      x        y
1 -0.99 6.635701
2 -0.98 6.290250
3 -0.97 6.063431
4 -0.96 6.632796
5 -0.95 6.634153
6 -0.94 6.896084

Now we have a 'df' data, and we visualize it in a plot. We need to fit this data with the best curve.

plot(df$x, df$y, pch=20, col="gray")
 


Building the model

   We build a model with lm() function with a formula. I(x^2) represents x2 in a formula. We can also use poly(x,2) function and it is the same as I(x^2).

> model <- lm(y~x+I(x^3)+I(x^2), data = df)
> summary(model)

Call:
lm(formula = y ~ x + I(x^3) + I(x^2), data = df)

Residuals:
        Min          1Q      Median          3Q         Max 
-0.49598082 -0.21488892 -0.01301059  0.18515573  0.58048188 

Coefficients:
              Estimate Std. Error  t value
(Intercept)  4.3634157  0.1091087 39.99144
x           -0.1078152  0.9309088 -0.11582
I(x^3)      -0.5925309  1.3905638 -0.42611
I(x^2)       3.6462591  2.1359770  1.70707
                        Pr(>|t|)    
(Intercept) < 0.0000000000000002 ***
x                       0.908039    
I(x^3)                  0.670983    
I(x^2)                  0.091042 .  
---
Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2626079 on 96 degrees of freedom
Multiple R-squared:  0.9243076, Adjusted R-squared:  0.9219422 
F-statistic: 390.7635 on 3 and 96 DF,  p-value: < 0.00000000000000022204

Next, we predict the model with our df data.

> pred <- predict(model,data=df)
 

Finding the most fitted curve

   Finding the best-fitted curve is important. We check the model with various possible equations. In the below plot, I use a quadratic and cubic formula for curve fitting. The orange line (linear regression) and yellow curve are the wrong choices for this data. The pink curve is close, but the blue curve is the best match for our data trend. Thus, I use the y~x3+x2 formula to build our polynomial regression model.
   You may find the best-fit formula for your data by visualizing them in a plot.



The source code of the above plot is listed below.

windows(width=8, height=6)
plot(x=df$x, y=df$y, pch=20, col="grey")

lines(df$x, predict(lm(y~x, data=df)), type="l", col="orange1", lwd=2)
lines(df$x, predict(lm(y~I(x^2), data=df)), type="l", col="pink1", lwd=2)
lines(df$x, predict(lm(y~I(x^3), data=df)), type="l", col="yellow2", lwd=2)
lines(df$x, predict(lm(y~poly(x,3)+poly(x,2), data=df)), type="l", col="blue", lwd=2)
 
legend("topleft", 
        legend = c("y~x,  - linear","y~x^2", "y~x^3", "y~x^3+x^2"), 
        col = c("orange","pink","yellow","blue"),
        lty = 1, lwd=3
 ) 


Plotting the result

1. Plotting with a plot() function.

pred <- predict(model,data = df)
lines(df$x, pred, lwd = 3, col = "blue")



2. Plotting with a ggplot().

Polynomial regression data can be easily fitted and plotted with ggplot().

library(ggplot2)
ggplot(data=df, aes(x,y)) +
       geom_point() + 
       geom_smooth(method="lm", formula=y~I(x^3)+I(x^2))



   In this post, we have briefly learned how to fit polynomial regression data in R and plot the results with a plot and ggplot functions. The full source code is listed below.


peq <- function(x) x^3+2*x^2+5
 
x <- seq(-0.99, 1, by = .01)
y <- peq(x) + runif(200)
 
df <- data.frame(x = x, y = y)
head(df)
 
plot(df$x, df$y, pch=20, col="gray")
 
model <- lm(y~x+I(x^3)+I(x^2), data = df)
summary(model)
 
pred <- predict(model,data=df)  
 
windows(width=8, height=6)
plot(x=df$x, y=df$y, pch=20, col="grey")

lines(df$x, predict(lm(y~x, data=df)), type="l", col="orange1", lwd=2)
lines(df$x, predict(lm(y~I(x^2), data=df)), type="l", col="pink1", lwd=2)
lines(df$x, predict(lm(y~I(x^3), data=df)), type="l", col="yellow2", lwd=2)
lines(df$x, predict(lm(y~poly(x,3)+poly(x,2), data=df)), type="l", col="blue", lwd=2)
 
legend("topleft", 
        legend = c("y~x,  - linear","y~x^2", "y~x^3", "y~x^3+x^2"), 
        col = c("orange","pink","yellow","blue"),
        lty = 1, lwd=3
 
 
pred <- predict(model,data = df)
lines(df$x, pred, lwd = 3, col = "blue")  

library(ggplot2) 
ggplot(data=df, aes(x,y)) + geom_point() 
       + geom_smooth(method="lm", formula=y~I(x^3)+I(x^2))


Polynomial Regression Fitting in Python

1 comment:

  1. Drawing trend lines is one of the few easy techniques that really WORK. Prices respect a trend line, or break through it resulting in a massive move. Drawing good trend lines is the MOST REWARDING skill.

    The problem is, as you may have already experienced, too many false breakouts. You see trend lines everywhere, however not all trend lines should be considered. You have to distinguish between STRONG and WEAK trend lines.

    One good guideline is that a strong trend line should have AT LEAST THREE touching points. Trend lines with more than four touching points are MONSTER trend lines and you should be always prepared for the massive breakout!

    This sophisticated software automatically draws only the strongest trend lines and recognizes the most reliable chart patterns formed by trend lines...

    http://www.forextrendy.com?kdhfhs93874

    Chart patterns such as "Triangles, Flags and Wedges" are price formations that will provide you with consistent profits.

    Before the age of computing power, the professionals used to analyze every single chart to search for chart patterns. This kind of analysis was very time consuming, but it was worth it. Now it's time to use powerful dedicated computers that will do the job for you:

    http://www.forextrendy.com?kdhfhs93874

    ReplyDelete