# Problem

We have data of medical insurance of patients. We will use the independent data to create a machine learning model which will estimate the Insurance charges. The medical charge is a numeric value so this problem is a regression problem.

Charge is dependent variable and these are independent variable:

age: Integer indicating the age of the patients

sex: patients gender ,either male or female

bmi: Body Mass Index(BMI) ,BMI is equal to weight( in kilograms) divided by height(in meter) squared.It provides a sense of how over or under weight a person is relative to their height.

children: An integer indicating the number of children/dependent covered by the insurance plan.

smoker : Patients regularly smokes tobacco

region : Patients place of residence in U.S

charges: charge of the health insurance to patient yearly.

# Load Library and data

```
library(dplyr)
library(caret)
insurance <- readRDS("insurance.rds")
```

# Exploratory Data Analysis (EDA)

It is already done in data analysis section.

`knitr::kable(head(insurance))`

age | sex | bmi | children | smoker | region | charges |
---|---|---|---|---|---|---|

19 | female | 27.900 | 0 | yes | southwest | 16884.924 |

39 | male | 33.770 | 1 | no | southeast | 1725.552 |

28 | male | 33.000 | 3 | no | southeast | 4449.462 |

33 | male | 22.705 | 0 | no | northwest | 21984.471 |

32 | male | 28.880 | 0 | no | northwest | 3866.855 |

31 | female | 25.740 | 0 | no | southeast | 3756.622 |

`str(insurance)`

```
## 'data.frame': 1338 obs. of 7 variables:
## $ age : int 19 39 28 33 32 31 46 37 37 60 ...
## $ sex : Factor w/ 2 levels "female","male": 1 2 2 2 2 1 1 1 2 1 ...
## $ bmi : num 27.9 33.8 33 22.7 28.9 ...
## $ children: int 0 1 3 0 0 0 1 3 2 0 ...
## $ smoker : Factor w/ 2 levels "no","yes": 2 1 1 1 1 1 1 1 1 1 ...
## $ region : Factor w/ 4 levels "northeast","northwest",..: 4 3 3 2 2 3 3 2 1 2 ...
## $ charges : num 16885 1726 4449 21984 3867 ...
```

## Algorithm

As this problem is regression problem we will use Multiple Linear Regression Algorithm to make Medical insurance Predictive Model.

### Simple Linear Regression

simple linear regression is a simple method for predicting the quantitative value and study relationships between two continuous variables suppose X and Y. Mathematically, simple linear regression can be written as:

\[Y=a+bâˆ—X+e\]

Where \(Y\) is dependent variable, \(X\) is independent variable, \(a\) is the intercept , \(b\) is the slope of \(X\) and \(e\) is the error term in equation.

Linear regression methodâ€™s main task is to find the best-fitting straight line through the Y and X points

### Multiple Linear Regression

Multiple linear regression attempts to model the relationship between two or more explanatory variables and a response variable by fitting a linear equation to observed data.

Multiple Linear regression uses multiple predictors. The equation for multiple linear regression looks like:

\[Y = \beta0 + \beta1x1+ \beta2x2+ ...+e\]

where:

\(Y\) is Response or dependent variable \(\beta0\) is intercept \(x1\) and \(x2\) are predictors or independent variable \(\beta1\) and \(\beta2\) are coefficeints for the \(x1\) and \(x2\) respectively and \(e\) is error term in equation.