Machine Learning Regression
1 Introduction
1.1 Definition of Machine Learning
Machine learning is a study, design, and development of models and algorithms that enable computers to learn from data ad make predictions or decisions.
Machine learning uses algorithms and computational power to determine if a relationships exists between data and labels.
In machine learning, a computer program is said to learn from experience (E), with respect to some task (T), and some performance measure (P), if its performance on (T), as measured by (P), improve with experience (E).
1.2 Different Tasks of Machine Learning
1.2.1 Supervised Learning
Supervised learning is a type of machine learning where the model is trained on a labeled dataset. The goal is to learn a mapping from inputs to outputs based on the provided labels. There are two primary tasks under supervised learning: classification and regression.
1.2.1.1 Classification
Classification is the task of predicting a category or class label for a given input. The model is trained on a dataset where each example is labeled with the correct class. The objective is to learn from these examples and accurately classify new, unseen data.
An example of classification is dividing the socks by color. In this scenario, the model could be trained to recognize different colors of socks. Given a pile of socks, the model would sort them into categories such as red, blue, green, etc.
This task is useful in various applications, such as spam detection in emails (classifying emails as spam or not spam), handwriting recognition (classifying handwritten digits), and medical diagnosis (classifying whether a patient has a particular disease based on symptoms).
1.2.1.2 Regression
Regression is the task of predicting a continuous value for a given input. Unlike classification, which predicts discrete labels, regression predicts numerical values.
An example of regression is dividing the ties by length. In this case, the model could be trained to predict the length of ties based on features such as brand, fabric, or style. Given a set of ties, the model would estimate their lengths and sort them accordingly.
Regression tasks are common in various fields, such as predicting house prices based on features like location, size, and number of bedrooms, forecasting stock prices based on historical data, and estimating the amount of rainfall based on weather conditions.
1.2.2 Unsupervised Learning
Unsupervised learning deals with unlabeled data, meaning the model tries to learn the patterns and structure from the data without any specific guidance on what the outputs should be. There are several tasks under unsupervised learning, including clustering, association, and dimension reduction.
1.2.2.1 Clustering
Clustering is a technique used to group similar data points based on their characteristics. This is useful in various scenarios where identifying natural groupings in data can provide insights or aid decision-making.
Suppose you have a pile of clothes, and you want to organize them by similarity, such as color, type, or fabric. Clustering algorithms can automatically group similar items together, making it easier to organize and manage your wardrobe. For example, all the red shirts might be grouped in one cluster, while all the blue jeans might be in another.
1.2.2.2 Association
Association learning identifies relationships between variables in a dataset. It is often used to find patterns or associations between different items that occur together.
For example, in the context of clothing, association can help find hidden dependencies, such as which items of clothing are often worn together. If you frequently wear a certain shirt with a specific pair of pants, an association learning algorithm can identify this pattern.
This information can be useful for making recommendations or optimizing your wardrobe choices. For instance, a system could suggest pairing items that you have not previously considered, based on the associations it has learned from your past choices.
1.2.2.3 Dimension Reduction
Dimension reduction is a process used to reduce the number of variables under consideration, making the data simpler to analyze and visualize. This is particularly helpful when dealing with high-dimensional data.
In the context of clothing, dimension reduction can help generalize and simplify the task of making outfits from a large collection of clothes. By reducing the complexity of the data, you can identify the most important features that contribute to making good outfits.
For example, you might reduce the data to a few key dimensions such as color, style, and occasion, which can help you mix and match clothing items more effectively to create the best outfits from your given clothes.
1.2.3 Reinforcement Learning
Reinforcement learning involves training models to make sequences of decisions by rewarding desired behaviors and penalizing undesired ones. This type of learning is inspired by behavioral psychology and is used in environments where an agent interacts with its surroundings.
An example of reinforcement learning is training a robot to navigate a maze. The robot receives positive rewards for making progress towards the exit and negative rewards for hitting walls or going in the wrong direction. Over time, the robot learns to find the most efficient path through the maze.
Another example is in gaming, where reinforcement learning agents can learn to play games like chess by receiving rewards for winning and penalties for losing. They can increase their performance levels by continuously improving their strategies based on feedback from their actions.
2 Linear Regression Analysis
To build and evaluate a linear regression model to predict the progression of diabetes using various attributes from the dataset.
2.1 Install Required Packages
2.2 Load the Dataset
diabetes <- read.csv("Datasets/diabetes.csv")
2.3 Examine the Dataset
str(diabetes)
'data.frame': 442 obs. of 11 variables:
$ age : num 0.03808 -0.00188 0.0853 -0.08906 0.00538 ...
$ sex : num 0.0507 -0.0446 0.0507 -0.0446 -0.0446 ...
$ bmi : num 0.0617 -0.0515 0.0445 -0.0116 -0.0364 ...
$ bp : num 0.02187 -0.02633 -0.00567 -0.03666 0.02187 ...
$ s1 : num -0.04422 -0.00845 -0.0456 0.01219 0.00393 ...
$ s2 : num -0.0348 -0.0192 -0.0342 0.025 0.0156 ...
$ s3 : num -0.0434 0.07441 -0.03236 -0.03604 0.00814 ...
$ s4 : num -0.00259 -0.03949 -0.00259 0.03431 -0.00259 ...
$ s5 : num 0.01991 -0.06833 0.00286 0.02269 -0.03199 ...
$ s6 : num -0.01765 -0.0922 -0.02593 -0.00936 -0.04664 ...
$ diabetes: num 151 75 141 206 135 97 138 63 110 310 ...
2.4 Split the Data
Split the data into training and testing sets with a ratio of 70:30:
2.5 Build the Linear Regression Model
Create the linear regression model using the training set:
lr.mod <- lm(diabetes ~ ., data = train1)
2.6 Summarize the Model
Display the summary of the model to understand the coefficients and residuals:
summary(lr.mod)
Call:
lm(formula = diabetes ~ ., data = train1)
Residuals:
Min 1Q Median 3Q Max
-158.16 -35.87 -2.03 34.74 150.31
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 154.183 3.071 50.198 < 2e-16 ***
age -32.436 71.563 -0.453 0.650704
sex -231.973 72.070 -3.219 0.001430 **
bmi 474.658 76.745 6.185 2.05e-09 ***
bp 304.318 76.540 3.976 8.80e-05 ***
s1 -701.242 493.006 -1.422 0.155962
s2 414.916 400.986 1.035 0.301630
s3 23.904 252.995 0.094 0.924789
s4 157.351 191.321 0.822 0.411482
s5 752.134 201.939 3.725 0.000234 ***
s6 107.649 80.012 1.345 0.179519
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 53.77 on 298 degrees of freedom
Multiple R-squared: 0.5426, Adjusted R-squared: 0.5272
F-statistic: 35.34 on 10 and 298 DF, p-value: < 2.2e-16
2.7 Evaluate the Model
Perform predictions on the test set and evaluate the model’s performance using RMSE (Root Mean Squared Error):
test1$pred <- predict(lr.mod, newdata = test1)
rmse_value1 <- rmse(test1$diabetes, test1$pred)
print(rmse_value1)
[1] 55.47577
RMSE: The root mean squared error (RMSE) measures the average deviation of predictions from the actual values. A lower RMSE indicates a better fit of the model.
2.8 Performance Metrics
2.8.1 R-Squared
R-squared, also known as the coefficient of determination, represents the squared correlation between the observed outcome values and the predicted values by the model. It indicates how well the model explains the variability of the response data around its mean. The value of R-squared ranges from 0 to 1:
Higher \(R^2\): Indicates a better fit of the model, meaning it explains a higher proportion of the variance in the outcome variable.
Lower \(R^2\): Indicates a poorer fit, meaning it explains a lower proportion of the variance.
\[ R^2 = 1 - \frac{\sum (y_i - \hat{y}_i)^2}{\sum (y_i - \bar{y})^2} \]
- \(y_i\) are the observed values.
- \(\hat{y}_i\) are the predicted values from the regression model.
- \(\bar{y}\) is the mean of the observed values.
- \(\sum (y_i - \hat{y}_i)^2\) is the sum of the squared residuals (SSR).
- \(\sum (y_i - \bar{y})^2\) is the total sum of squares (SST).
2.8.1.1 Example of Calculation
Below is a graph showing how the number lectures per day affects the number of hours spent at university per day. The equation of the regression line is drawn on the graph and it has equation \(\hat{y} = 0.143 + 1.229x\). Calculate \(R^2\).
From the graph, we can see the observed data points are:
- (2, 2)
- (3, 4)
- (4, 6)
- (6, 7)
To calculate R-squared (R²), we need to understand how well the regression line predicts the actual data points. The given equation of the regression line is:
\[ \hat{y} = 0.143 + 1.229x \]
2.8.1.2 Calculate Predicted Values (\(\hat{y}\))
Use the regression line equation to calculate the predicted values for each ( x ):
For \(x = 2\):
\[ \hat{y} = 0.143 + 1.229 \times 2 = 2.601 \ \]
For \(x = 3\):
\[ \hat{y} = 0.143 + 1.229 \times 3 = 3.83 \ \]
For \(x = 4\):
\[ \hat{y} = 0.143 + 1.229 \times 4 = 5.059 \ \]
For \(x = 6\):
\[ \hat{y} = 0.143 + 1.229 \times 6 = 7.517 \ \]
2.8.1.3 Calculate Residuals (Errors)
The residual for each point is the difference between the actual ( y ) value and the predicted ( y ) value:
For (2, 2): \[ r_1 = y_1 - \hat{y}_1 = 2 - 2.601 = -0.601 \]
For (3, 4): \[ r_2 = y_2 - \hat{y}_2 = 4 - 3.83 = 0.17 \]
For (4, 6): \[ r_3 = y_3 - \hat{y}_3 = 6 - 5.059 = 0.941 \]
For (6, 7): \[ r_4 = y_4 - \hat{y}_4 = 7 - 7.517 = -0.517 \]
2.8.1.4 Sum of Squared Residuals (SSR)
Calculate the sum of the squared residuals:
\[ SSR = (-0.601)^2 + (0.17)^2 + (0.941)^2 + (-0.517)^2 = 1.542871 \]
2.8.1.5 Total Sum of Squares (SST)
Calculate the mean of the observed \(y\) values:
\[ \bar{y} = \frac{2 + 4 + 6 + 7}{4} = 4.75 \]
Calculate the total sum of squares:
\[ SST = (2 - 4.75)^2 + (4 - 4.75)^2 + (6 - 4.75)^2 + (7 - 4.75)^2 = 14.75 \]
2.8.1.6 Calculate R-squared (R²)
Finally, calculate R-squared using the formula:
\[ R^2 = 1 - \frac{SSR}{SST} = 1 - \frac{1.542871}{14.75} = 1 - 0.105 = 0.895 \]
2.8.1.7 Conclusion
The R-squared value of 0.895 indicates that approximately 89.5% of the variance in the number of hours spent at the university per day can be explained by the number of lectures per day. This shows a strong relationship between the two variables in this example.
2.8.2 Root Mean Squared Error (RMSE)
RMSE measures the average prediction error made by the model in predicting the outcome for an observation. It is calculated as the square root of the average squared differences between the predicted values and the actual values. RMSE is sensitive to outliers because it squares the errors, giving more weight to larger errors. The formula for RMSE is:
\[ \text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^n (y_i - \hat{y_i})^2} \]
- Lower RMSE: Indicates that the model’s predictions are closer to the actual values, meaning a better fit.
- Higher RMSE: Indicates larger differences between the predicted and actual values, meaning a poorer fit.
2.8.3 Mean Absolute Error (MAE)
MAE measures the average magnitude of the errors in a set of predictions, without considering their direction (i.e., without squaring them). It is calculated as the average of the absolute differences between the predicted values and the actual values. MAE is less sensitive to outliers compared to RMSE. The formula for MAE is:
\[ \text{MAE} = \frac{1}{n} \sum_{i=1}^n |y_i - \hat{y_i}| \]
- Lower MAE: Indicates a better fit of the model, meaning the predictions are closer to the actual values.
- Higher MAE: Indicates a poorer fit, meaning the predictions are further from the actual values.
2.8.3.1 Steps to Calculate MAE
Actual Predicted
1 3 2.5
2 7 7.1
3 4 4.0
4 5 5.2
Actual (yi) | Predicted (ŷi) |
---|---|
3 | 2.5 |
7 | 7.1 |
4 | 4.0 |
5 | 5.2 |
- Calculate the Absolute Differences
For each observation, subtract the predicted value from the actual value and take the absolute value of the result. This step ensures that all differences are positive.
Sure, here is the modified text:
- For the first observation: \(|3 - 2.5| = 0.5\)
- For the second observation: \(|7 - 7.1| = 0.1\)
- For the third observation: \(|4 - 4.0| = 0.0\)
- For the fourth observation: \(|5 - 5.2| = 0.2\)
- Sum the Absolute Differences
Add up all the absolute differences calculated in the previous step.
\[ 0.5 + 0.1 + 0.0 + 0.2 = 0.8 \]
- Compute the Mean
Divide the sum of the absolute differences by the number of observations. This gives the mean absolute error.
- Number of observations (\(n\)): 4
- MAE: \(\frac{0.8}{4} = 0.2\)
So, the Mean Absolute Error (MAE) for this example is 0.2. This value indicates that, on average, the model’s predictions are off by 0.2 units from the actual values.
3 Decision Tree
It is a machine learning technique used to model and predict continuous target variables. It constructs a tree-like model of decisions and their possible consequences, allowing for interpretable analysis of complex relationships in the data.
3.1 Install and Load Required Packages
3.2 Load the Dataset
diabetes <- read.csv("Datasets/diabetes.csv")
3.3 Examine the Dataset Structure
str(diabetes)
'data.frame': 442 obs. of 11 variables:
$ age : num 0.03808 -0.00188 0.0853 -0.08906 0.00538 ...
$ sex : num 0.0507 -0.0446 0.0507 -0.0446 -0.0446 ...
$ bmi : num 0.0617 -0.0515 0.0445 -0.0116 -0.0364 ...
$ bp : num 0.02187 -0.02633 -0.00567 -0.03666 0.02187 ...
$ s1 : num -0.04422 -0.00845 -0.0456 0.01219 0.00393 ...
$ s2 : num -0.0348 -0.0192 -0.0342 0.025 0.0156 ...
$ s3 : num -0.0434 0.07441 -0.03236 -0.03604 0.00814 ...
$ s4 : num -0.00259 -0.03949 -0.00259 0.03431 -0.00259 ...
$ s5 : num 0.01991 -0.06833 0.00286 0.02269 -0.03199 ...
$ s6 : num -0.01765 -0.0922 -0.02593 -0.00936 -0.04664 ...
$ diabetes: num 151 75 141 206 135 97 138 63 110 310 ...
3.4 Split the Dataset
Split the dataset into training (70%) and test (30%) sets. The training set is used for training and creating the model. The test set is to evaluate the accuracy of the model.
3.5 Build the Regression Tree
reg_tree <- rpart(diabetes ~ .,
data = train2,
method = "anova",
control = rpart.control(cp = 0)
)
3.6 Plot the Decision Tree
3.7 Evaluate the Tree
3.8 Prune the Tree
Pruning is an essential technique used in decision tree algorithms to reduce the complexity of the model and prevent overfitting. Overfitting occurs when a decision tree model becomes too complex and captures noise in the training data rather than the underlying patterns.
Pre Pruning involves halting the growth of the decision tree early based on certain criteria before it becomes too complex. This method sets thresholds for the tree-building process, such as maximum tree depth, minimum number of samples required to split a node, or minimum number of samples in a leaf node.
For example, if the maximum depth is set to a specific value, the tree will stop growing when it reaches that depth, even if further splits could improve accuracy on the training data. This early stopping helps prevent the tree from growing excessively and overfitting the training data.
Post Pruning is the process of trimming a fully grown decision tree after it has been built. This technique involves first allowing the tree to grow to its full depth, capturing all possible splits, and then removing branches that do not provide significant power to the model.
Post pruning can be done using various methods, such as reducing the number of leaf nodes or merging leaf nodes that do not add much predictive value. The goal is to simplify the tree by removing parts that may have been added due to noise in the training data, thus enhancing the model’s ability to generalize.
Cost Complexity Pruning is a specific type of post-pruning that involves evaluating the trade-off between the complexity of the tree and its performance on the training data. This method introduces a complexity parameter, often denoted as \(\alpha\), which penalizes the tree’s complexity by adding a cost term that increases with the number of leaf nodes.
The pruning process involves finding the optimal value of \(\alpha\) that balances model accuracy and simplicity. Trees are pruned by removing branches that contribute less to reducing the cost complexity criterion, thus resulting in a more balanced and generalizable model.
3.8.1 Pre-pruning
reg_tree_es <- rpart(diabetes ~ .,
data = train2,
method = "anova",
control = rpart.control(cp = 0, maxdepth = 6, minsplit = 70)
)
test2$pred2 <- predict(reg_tree_es, test2)
rmse(test2$diabetes, test2$pred2)
[1] 62.7035
3.8.2 Post-pruning
Perform post-pruning based on the cost complexity parameter (cp):
- Display the cp table:
printcp(reg_tree)
Regression tree:
rpart(formula = diabetes ~ ., data = train2, method = "anova",
control = rpart.control(cp = 0))
Variables actually used in tree construction:
[1] age bmi bp s1 s2 s3 s4 s5 s6 sex
Root node error: 1891412/309 = 6121.1
n= 309
CP nsplit rel error xerror xstd
1 0.3260472 0 1.00000 1.00926 0.061821
2 0.1034149 1 0.67395 0.74517 0.059984
3 0.0501711 2 0.57054 0.66624 0.051547
4 0.0351445 3 0.52037 0.66388 0.052385
5 0.0295926 4 0.48522 0.66165 0.054984
6 0.0227354 5 0.45563 0.65430 0.055159
7 0.0200994 6 0.43289 0.65195 0.053732
8 0.0165288 7 0.41279 0.63124 0.051257
9 0.0092009 8 0.39627 0.64850 0.055388
10 0.0091487 9 0.38707 0.65271 0.058188
11 0.0088824 10 0.37792 0.66435 0.058791
12 0.0081130 11 0.36903 0.67396 0.058843
13 0.0072472 13 0.35281 0.68482 0.059191
14 0.0067896 14 0.34556 0.69053 0.060209
15 0.0066388 15 0.33877 0.69234 0.060212
16 0.0056527 16 0.33213 0.70058 0.061721
17 0.0055221 17 0.32648 0.69835 0.061896
18 0.0052208 18 0.32096 0.70052 0.061869
19 0.0052021 19 0.31574 0.70079 0.061862
20 0.0051200 20 0.31053 0.70351 0.062183
21 0.0030786 21 0.30541 0.70990 0.062364
22 0.0026958 22 0.30234 0.71556 0.063330
23 0.0026795 23 0.29964 0.71800 0.063371
24 0.0021099 24 0.29696 0.71580 0.063371
25 0.0000000 25 0.29485 0.71949 0.063576
- Plot the cp values against the cross-validated error:
plotcp(reg_tree)
- Find the best cp value that minimizes the cross-validated error:
bestcp <- reg_tree$cptable[which.min(reg_tree$cptable[, "xerror"]), "CP"]
- Build the pruned tree and evaluate it:
reg_tree_prune <- rpart(diabetes ~ ., data = train2, method = "anova", control = rpart.control(cp = bestcp))
test2$pred3 <- predict(reg_tree_prune, test2)
rmse(test2$diabetes, test2$pred3)
[1] 66.07529
3.9 Export the Tree
Create a postscript file of the tree for generating a PDF:
post(reg_tree_prune,
file = "reg_tree_prune.ps",
title = "Regression Tree for Diabetes Dataset")
4 Neural Network Regression
Neural Network Regression is a type of machine learning model used to predict continuous outcomes using artificial neural networks (ANNs). Neural networks, inspired by the structure and function of the human brain, consist of interconnected layers of nodes, or neurons, that process input data to produce a desired output.
The architecture of a neural network typically includes an input layer, one or more hidden layers, and an output layer. Each layer contains multiple neurons, and each neuron in a layer is connected to every neuron in the subsequent layer.
The neurons process information by applying weights to the inputs, summing them up, and passing the result through an activation function. This process, known as forward propagation, continues until the output layer produces a prediction.
In regression tasks, the output layer typically has a single neuron that provides the continuous numerical prediction.
4.1 Install and Load Required Packages
4.2 Initialize H2O
Before building the model, initialize the H2O instance:
h2o.init(nthreads = -1, max_mem_size = "2G")
H2O is not running yet, starting it now...
Note: In case of errors look at the following log files:
C:\Users\drsau\AppData\Local\Temp\Rtmp2dxlRZ\file6c447e3b25d6/h2o_drsau_started_from_r.out
C:\Users\drsau\AppData\Local\Temp\Rtmp2dxlRZ\file6c445bdf1dac/h2o_drsau_started_from_r.err
Starting H2O JVM and connecting: Connection successful!
R is connected to the H2O cluster:
H2O cluster uptime: 3 seconds 662 milliseconds
H2O cluster timezone: Asia/Singapore
H2O data parsing timezone: UTC
H2O cluster version: 3.44.0.3
H2O cluster version age: 7 months
H2O cluster name: H2O_started_from_R_drsau_arx260
H2O cluster total nodes: 1
H2O cluster total memory: 1.76 GB
H2O cluster total cores: 8
H2O cluster allowed cores: 8
H2O cluster healthy: TRUE
H2O Connection ip: localhost
H2O Connection port: 54321
H2O Connection proxy: NA
H2O Internal Security: FALSE
R Version: R version 4.4.0 (2024-04-24 ucrt)
4.3 Load the Dataset
Load the dataset into R and convert it to an H2O frame:
4.4 Examine the Data
Check the structure of the data:
str(consultation.frame)
Class 'H2OFrame' <environment: 0x000001f22836c548>
- attr(*, "op")= chr "Parse"
- attr(*, "id")= chr "consultation_sid_a7d3_1"
- attr(*, "eval")= logi FALSE
- attr(*, "nrow")= int 5962
- attr(*, "ncol")= int 6
- attr(*, "types")=List of 6
..$ : chr "string"
..$ : chr "real"
..$ : chr "int"
..$ : chr "string"
..$ : chr "string"
..$ : chr "real"
- attr(*, "data")='data.frame': 10 obs. of 6 variables:
..$ Qualification: chr "BHMS, MD - Homeopathy" "BAMS, MD - Ayurveda Medicine" "MBBS, MS - Otorhinolaryngology" "BSc - Zoology, BAMS" ...
..$ Experience : num 24 12 9 12 20 8 42 10 14 23
..$ Rating : num 100 98 95 95 100 95 95 99 95 95
..$ Place : chr "Kakkanad, Ernakulam" "Whitefield, Bangalore" "Mathikere - BEL, Bangalore" "Bannerghatta Road, Bangalore" ...
..$ Profile : chr "Homeopath" "Ayurveda" "ENT Specialist" "Ayurveda" ...
..$ Fees : num 100 350 300 250 250 100 200 200 100 100
4.5 Normalize the Data
Normalize the columns “Experience”, “Rating”, and “Fees” as they are in different ranges:
4.6 Split the Data into Training and Test Sets
Split the data into training (70%) and test (30%) sets:
split <- h2o.splitFrame(consultation.frame, ratios = 0.7)
train3 <- split[[1]]
test3 <- split[[2]]
4.7 Build the Neural Network Model
Build the neural network model with default parameters:
nn <- h2o.deeplearning(
x = c(1, 4, 5, 7, 8),
y = 9,
training_frame = train3,
epochs = 500,
mini_batch_size = 32,
hidden = c(20, 20),
seed = 1)
|
| | 0%
|
|==================== | 28%
|
|======================================================================| 100%
4.8 Plot the Scoring History
Visualize the scoring history of the model:
plot(nn)
4.9 Make Predictions and Evaluate the Model
Perform predictions on the test data and evaluate the model’s performance using RMSE:
pred1 <- h2o.predict(nn, test3)
|
| | 0%
|
|======================================================================| 100%
rmse(test3$Fees.norm, pred1)
[1] 0.2001071
4.10 Implement Early Stopping
Add early stopping parameters to the model:
nn <- h2o.deeplearning(
x = c(1, 4, 5, 7, 8),
y = 9,
training_frame = train3,
epochs = 500,
mini_batch_size = 32,
hidden = c(20, 20),
seed = 1,
stopping_metric = "rmse",
stopping_rounds = 3,
stopping_tolerance = 0.05,
score_interval = 1)
Warning in .h2o.processResponseWarnings(res): Dropping bad and constant columns: [Qualification, Place, Profile].
|
| | 0%
|
|=========================== | 38%
|
|======================================================================| 100%
pred2 <- h2o.predict(nn, test3)
|
| | 0%
|
|======================================================================| 100%
rmse(test3$Fees.norm, pred2)
[1] 0.2001832
4.11 Implement Dropout Regularization
Add dropout regularization to improve generalization:
nn <- h2o.deeplearning(
x = c(1, 4, 5, 7, 8),
y = 9,
training_frame = train3,
epochs = 500,
mini_batch_size = 32,
hidden = c(20, 20),
seed = 1,
stopping_metric = "rmse",
stopping_rounds = 3,
stopping_tolerance = 0.05,
score_interval = 1,
activation = "RectifierWithDropout",
hidden_dropout_ratio = c(0.5, 0.5),
input_dropout_ratio = 0.1)
Warning in .h2o.processResponseWarnings(res): Dropping bad and constant columns: [Qualification, Place, Profile].
|
| | 0%
|
|========================================== | 60%
|
|======================================================================| 100%
pred3 <- h2o.predict(nn, test3)
|
| | 0%
|
|======================================================================| 100%
rmse(test3$Fees.norm, pred3)
[1] 0.2016718