# DATA MINING Problem

DATA MINING Problem

10% WILL BE DEDUCTED IF YOU CREATE A NEW OR SEPARATE DOCUMENT.

10% WILL BE DEDUCTED IF YOU CREATE A “TITLE PAGE” TYPE OF DOCUMENT.

YOU MUST WRITE IN YOUR OWN WORDS. FAILING TO DO SO WILL RESULT IN ZERO POINTS.

Problem #1

Consider the following four faces shown below. Again, darkness or number of dots represents density. Lines are used only to distinguish regions and do not represent points.

(a) For each figure, could you use single link to find the patterns represented by the nose, eyes, and mouth? Explain.

(b) For each figure, could you use K-means to find the patterns represented by the nose, eyes, and mouth? Explain.

(c) What limitation does clustering have in detecting all the patterns formed by the points in the figure?

Problem #2

RapidMiner and the Correlation Matrix

Task: Determine the correlation of various factors in heating a home. You will use the Correlation operator in RapidMiner (RM) to find how strong the relationship is between various attributes in the example set. The attributes are:

a. Insulation density – thickness

b. Outdoor temperature – measured in degree Fahrenheit

c. Number of Occupants – number of people living in the home

d. Home Age – years since the home was built

e. Home Size – number of square feet

f. Heating Oil – total units of heating oil purchased in the last month

Steps

Import the DataSet CSV file into RMAdd the Read CSV operator to the Process panelSet the Import Configuration Wizard Parameter to the DataSet CSV fileRun the processView the Statistics

Question 1: How many homes are in the data set?

Question 2: How many attributes have missing values?

Steps (continued)

Click the Data tab and examine the dataAdd the Correlation Matrix operator to the Process panel (to the right of the Read CSV operator)Make sure the “exa” port if connect to the “res” portConnect the “mat” port to the “second “res” port

There should now be two connections from Correlation Matrix.

Run the process again

Correlation coefficients will now be displayed.

Question 3: What is the correlation between Heating Oil Used and Insulation Rating?

Question 4: What does this correlation indicate?

Question 5: What is the correlation between Heating Oil Used and Home Age?

Question 6: What does this correlation indicate?
DATA MINING Problem