subject
Mathematics, 07.04.2020 23:21 GaudySky

Predicting Prices of Used Cars (Regression Trees). The file ToyotaCorolla. xls contains the data on used cars (Toyota Corolla) on sale during late summer of 2004 in The Netherlands. It has 1436 records containing details on 38 attributes, including Price, Age, Kilometers, HP, and other specifications. The goal is to predict the price of a used Toyota Corolla based on its specifications. (The example in Section 9.8 is a subset of this dataset).

Data Preprocessing: Split the data into training (50%), validation (30%), and test (20%) datasets, using the Partitioning Variable in column AN (in the data partitioning menu choose "use partition variable" and select Partitioning Variable.).

a. (7 points) Run a regression tree (RT) using the Prediction menu, manual tree option, in Data Mining with the output variable Price and input variables Age_08_04, KM, Fuel_Type, HP, Automatic, Doors, Quarterly_Tax, Mfg_Guarantee, Guarantee_Period, Airco, Automatic_Airco, CD Player, Powered_Windows, Sport_Model, and Tow_Bar. Place Fuel_Type in "Categorical Variables" pane. Keep the Records in Terminal Nodes to 1, other limits set to 100, and the scoring option to Full Tree, to make the run least restrictive.

Which appear to be the three or four most important car specifications for predicting the car’s price? [HINT - use the "Feature Importance" option]
Make a note of the RMSE for the training data.
Now re-run the model, this time using the pruning option and Best Pruned Tree. Comment on (a) the error in the training and validation data and (b) the error in the training data compared to the full tree results.

b. (6 points) Let us explore the effect of turning the price variable into a categorical variable. First, create a new variable that categorizes price into 20 bins, leaving other options at their defaults (use Transform > Transform Continuous Data > Bin). Now repartition the data keeping Binned Price instead of Price. Run a classification tree (CT) using the Classification menu of XLMiner with the same set of input variables as in the RT, and with Binned Price as the output variable. Set the minimum number of records in a terminal node to 1, and the other limits at their maximums. Enable pruning, and choose Best Pruned Tree for display and scoring.

Compare the tree generated by the CT with the one generated by the RT. Are they different? (Look at structure, the top predictors, size of tree, etc.) Why?

ansver
Answers: 1

Other questions on the subject: Mathematics

image
Mathematics, 21.06.2019 23:30, adrianna2324
Dawn is selling her mp3 player for 3 4 of the original price. the original price for the mp3 player was $40. how much is she selling her mp3 player for?
Answers: 1
image
Mathematics, 22.06.2019 00:00, minecraftsam2018
What is the effect on the graph of the function f(x) = x2 when f(x) is changed to f(x) − 4?
Answers: 1
image
Mathematics, 22.06.2019 01:00, chandranewlon
Ellie spent $88.79 at the computer stote. she had $44.50 left to buy a cool hat. how much money did she originally have? write and solve an equation to answer the question.
Answers: 2
image
Mathematics, 22.06.2019 01:40, cowgyrlup124
The tree filled 3/4 of a cup in 1/2 and hour at what rate does syurup flow from the tree
Answers: 1
You know the right answer?
Predicting Prices of Used Cars (Regression Trees). The file ToyotaCorolla. xls contains the data on...

Questions in other subjects:

Konu
Mathematics, 06.03.2021 03:00
Konu
Mathematics, 06.03.2021 03:10
Konu
Biology, 06.03.2021 03:10
Konu
Mathematics, 06.03.2021 03:10