Chat with us, powered by LiveChat After completing the reading this week answer the following questions: Chapter 3: Note the basic concepts in data classif - Essayabode

After completing the reading this week answer the following questions: Chapter 3: Note the basic concepts in data classif

 After completing the reading this week answer the following questions: Chapter 3:

  1. Note the basic concepts in data classification.
  2. Discuss the general framework for classification.
  3. What is a decision tree and decision tree modifier?  Note the importance.
  4. What is a hyper-parameter?
  5. Note the pitfalls of model selection and evaluation.

 Read:

  1. Chapter 3 in textbook: Classification: Basic Concepts and Techniques

Watch: 

attached content

About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features© 2022 Google LLC,

Dr. Oner Celepcikay

ITS 632

ITS 632

Week 4

Classification

Header – dark yellow 24 points Arial Bold

Body text – white 20 points Arial Bold, dark yellow highlights

Bullets – dark yellow

Copyright – white 12 points Arial

Size:

Height: 7.52"

Width: 10.02"

Scale: 70%

Position on slide:

Horizontal – 0"

Vertical – 0"

Machine Learning Methods – Classification

ITS 632

Given a collection of records (training set)

– Each record contains a set of attributes, one of the attributes is the class.

Find a model for class attribute as a function of the values of other attributes.

A test set is used to estimate the accuracy of the model.

Goal: previously unseen records (test set) should be assigned a class as accurately as possible.

Machine Learning – Classification Example

ITS 632

categorical

categorical

continuous

class

Test

Set

Training

Set

Model

Learn

Classifier

Refund

MarSt

TaxInc

YES

NO

NO

NO

Yes

No

Married

Single, Divorced

< 80K

> 80K

Splitting Attributes

Model: Decision Tree

Machine Learning – Classification Example

categorical

categorical

continuous

ITS 632

class

MarSt

Refund

TaxInc

YES

NO

NO

NO

Yes

No

Married

Single, Divorced

< 80K

> 80K

There could be more than one tree that fits the same data!

categorical

categorical

continuous

Another Example of Decision Tree

ITS 632

Test Data

Start from the root of tree.

Refund

MarSt

TaxInc

YES

NO

NO

NO

Yes

No

Married

Single, Divorced

< 80K

> 80K

Apply Model to Test Data

ITS 632

Test Data

Start from the root of tree.

Refund

MarSt

TaxInc

YES

NO

NO

NO

Yes

No

Married

Single, Divorced

< 80K

> 80K

Apply Model to Test Data

ITS 632

Test Data

Start from the root of tree.

Refund

MarSt

TaxInc

YES

NO

NO

NO

Yes

No

Married

Single, Divorced

< 80K

> 80K

Apply Model to Test Data

ITS 632

Test Data

Start from the root of tree.

Refund

MarSt

TaxInc

YES

NO

NO

NO

Yes

No

Married

Single, Divorced

< 80K

> 80K

Apply Model to Test Data

ITS 632

Test Data

Start from the root of tree.

Refund

MarSt

TaxInc

YES

NO

NO

NO

Yes

No

Married

Single, Divorced

< 80K

> 80K

Apply Model to Test Data

ITS 632

Test Data

Start from the root of tree.

Apply Model to Test Data

ITS 632

Assign “Cheat” No

No

Refund

MarSt

TaxInc

YES

NO

NO

NO

Yes

No

Married

Single, Divorced

< 80K

> 80K

Machine Learning – Classification Example

ITS 632

categorical

categorical

continuous

class

Model

Learning

Algorithm

Induction

Deduction

General Structure of Hunt’s Algorithm

Let Dt be the set of training records that reach a node t

General Procedure:

If Dt contains records that belong the same class yt, then t is a leaf node labeled as yt

If Dt is an empty set, then t is a leaf node labeled by the default class, yd

If Dt contains records that belong to more than one class, use an attribute test to split the data into smaller subsets. Recursively apply the procedure to each subset.

Dt

?

ITS 632

Don’t

Cheat

Refund

Don’t

Cheat

Don’t

Cheat

Yes

No

Refund

Don’t

Cheat

Yes

No

Marital

Status

Don’t

Cheat

Cheat

Single,

Divorced

Married

Taxable

Income

Don’t

Cheat

< 80K

>= 80K

Refund

Don’t

Cheat

Yes

No

Marital

Status

Don’t

Cheat

Cheat

Single,

Divorced

Married

Hunt’s Algorithm

ITS 632

Decision Tree Application to Oil & Gas Data

ITS 632

British Petroleum designed a decision tree for gas-oil separation for offshore oil platforms that replaced an earlier rule-based expert system.

We will do a similar (but simpler) decision tree example towards the end of the semester.

Greedy strategy.

Split the records based on an attribute test that optimizes certain criterion.

Issues

Determine how to split the records

How to specify the attribute test condition?

How to determine the best split?

Determine when to stop splitting

Tree Induction

ITS 632

How to determine the Best Split

ITS 632

Before Splitting: 10 records of class 0, 10 records of class 1

Which test condition is the best?

How to determine the Best Split

ITS 632

Greedy approach:

Nodes with homogeneous class distribution are preferred

Need a measure of node impurity:

Non-homogeneous,

High degree of impurity

Homogeneous,

Low degree of impurity

Measures of Node Impurity

ITS 632

Gini Index

Entropy

Misclassification error

How to Find the Best Split

ITS 632

B?

Yes

No

Node N3

Node N4

A?

Yes

No

Node N1

Node N2

Before Splitting:

M0

M1

M2

M3

M4

M12

M34

Gain = M0 – M12 vs M0 – M34

Measure of Impurity: GINI

ITS 632

Gini Index for a given node t :

Need a measure of node impurity:

(NOTE: p( j | t) is the relative frequency of class j at node t).

Maximum (0.5) when records are equally distributed among all classes, implying least interesting information

Minimum (0.0) when all records belong to one class, implying most interesting information

Examples for computing GINI

ITS 632

P(C1) = 0/6 = 0 P(C2) = 6/6 = 1

Gini = 1 – P(C1)2 – P(C2)2 = 1 – 0 – 1 = 0

P(C1) = 1/6 P(C2) = 5/6

Gini = 1 – (1/6)2 – (5/6)2 = 0.278

P(C1) = 2/6 P(C2) = 4/6

Gini = 1 – (2/6)2 – (4/6)2 = 0.444

Examples for computing GINI

ITS 632

A?

Yes

No

Node N1

Node N2

Gini(N1) = 1 – (4/7)2 – (3/7)2 = 0.4898

Gini(N2) = 1 – (2/5)2 – (3/5)2 = 0.48

Gini(Children) = 7/12 * 0.4898 + 5/12 * 0.48 = 0.486

Examples for computing GINI

ITS 632

B?

Yes

No

Node N1

Node N2

Gini(N1) = 1 – (/)2 – (/)2 =

Gini(N2) = 1 – (/)2 – (/)2 =

Gini(Children) =

Classification error at a node t :

Measures misclassification error made by a node.

Maximum (0.5) when records are equally distributed among all classes, implying least interesting information

Minimum (0) when all records belong to one class, implying most interesting information

Splitting Criteria based on Classification Error

ITS 632

Splitting Criteria based on Classification Error

ITS 632

P(C1) = 0/6 = 0 P(C2) = 6/6 = 1

Error = 1 – max (0, 1) = 1 – 1 = 0

P(C1) = 1/6 P(C2) = 5/6

Error = 1 – max (1/6, 5/6) = 1 – 5/6 = 1/6

P(C1) = 2/6 P(C2) = 4/6

Error = 1 – max (2/6, 4/6) = 1 – 4/6 = 1/3

Greedy strategy.

Split the records based on an attribute test that optimizes certain criterion.

Issues

Determine how to split the records

How to specify the attribute test condition?

How to determine the best split?

Determine when to stop splitting (Next class!)

ANY IDEAS??

Tree Induction

ITS 632

Classification Methods

ITS 632

Decision Tree based Methods

Rule-based Methods

Memory based reasoning

Neural Networks

Naïve Bayes and Bayesian Belief Networks

Support Vector Machines

Tid

Refund

Marital

Status

Taxable

Income

Cheat

1

Yes

Single

125K

No

2

No

Married

100K

No

3

No

Single

70K

No

4

Yes

Married

120K

No

5

No

Divorced

95K

Yes

6

No

Married

60K

No

7

Yes

Divorced

220K

No

8

No

Single

85K

Yes

9

No

Married

75K

No

10

No

Single

90K

Yes

10

Refund

Marital

Status

Taxable

Income

Cheat

No

Single

75K

?

Yes

Married

50K

?

No

Married

150K

?

Yes

Divorced

90K

?

No

Single

40K

?

No

Married

80K

?

10

Tid

Refund

Marital

Status

Taxable

Income

Cheat

1

Yes

Single

125K

No

2

No

Married

100K

No

3

No

Single

70K

No

4

Yes

Married

120K

No

5

No

Divorced

95K

Yes

6

No

Married

60K

No

7

Yes

Divorced

220K

No

8

No

Single

85K

Yes

9

No

Married

75K

No

10

No

Single

90K

Yes

10

Refund

Marital

Status

Taxable

Income

Cheat

No

Single

75K

?

Yes

Married

50K

?

No

Married

150K

?

Yes

Divorced

90K

?

No

Single

40K

?

No

Married

80K

?

10

Tid

Refund

Marital

Status

Taxable

Income

Cheat

1

Yes

Single

125K

No

2

No

Married

100K

No

3

No

Single

70K

No

4

Yes

Married

120K

No

5

No

Divorced

95K

Yes

6

No

Married

60K

No

7

Yes

Divorced

220K

No

8

No

Single

85K

Yes

9

No

Married

75K

No

10

No

Single

90K

Yes

10

Tid

Refund

Marital

Status

Taxable

Income

Cheat

1

Yes

Single

125K

No

2

No

Married

100K

No

3

No

Single

70K

No

4

Yes

Married

120K

No

5

No

Divorced

95K

Yes

6

No

Married

60K

No

7

Yes

Divorced

220K

No

8

No

Single

85K

Yes

9

No

Married

75K

No

10

No

Single

90K

Yes

10

Refund

Marital

Status

Taxable

Income

Cheat

No

Married

80K

?

10

Refund

Marital

Status

Taxable

Income

Cheat

No

Married

80K

?

10

Refund

Marital

Status

Taxable

Income

Cheat

No

Married

80K

?

10

Refund Marital Status

Taxable Income Cheat

No Married 80K ? 10

Refund Marital

Status

Taxable

Income

Cheat

No Married 80K ?

10

Refund

Marital

Status

Taxable

Income

Cheat

No

Married

80K

?

10

Refund

Marital

Status

Taxable

Income

Cheat

No

Married

80K

?

10

Refund Marital Status

Taxable Income Cheat

No Married 80K ? 10

Refund Marital

Status

Taxable

Income

Cheat

No Married 80K ?

10

Refund

Marital

Status

Taxable

Income

Cheat

No

Married

80K

?

10

Refund

Marital

Status

Taxable

Income

Cheat

No

Married

80K

?

10

Tid

Refund

Marital

Status

Taxable

Income

Cheat

1

Yes

Single

125K

No

2

No

Married

100K

No

3

No

Single

70K

No

4

Yes

Married

120K

No

5

No

Divorced

95K

Yes

6

No

Married

60K

No

7

Yes

Divorced

220K

No

8

No

Single

85K

Yes

9

No

Married

75K

No

10

No

Single

90K

Yes

10

Refund

Marital

Status

Taxable

Income

Cheat

No

Single

75K

?

Yes

Married

50K

?

No

Married

150K

?

Yes

Divorced

90K

?

No

Single

40K

?

No

Married

80K

?

10

Tid Refund Marital

Status

Taxable

Income

Cheat

1 Yes Single 125K

No

2 No Married 100K

No

3 No Single 70K

No

4 Yes Married 120K

No

5 No Divorced 95K

Yes

6 No Married 60K

No

7 Yes Divorced 220K

No

8 No Single 85K

Yes

9 No Married 75K

No

10 No Single 90K

Yes

10

Tid

Refund

Marital

Status

Taxable

Income

Cheat

1

Yes

Single

125K

No

2

No

Married

100K

No

3

No

Single

70K

No

4

Yes

Married

120K

No

5

No

Divorced

95K

Yes

6

No

Married

60K

No

7

Yes

Divorced

220K

No

8

No

Single

85K

Yes

9

No

Married

75K

No

10

No

Single

90K

Yes

10

Tid

Refund

Marital

Status

Taxable

Income

Cheat

1

Yes

Single

125K

No

2

No

Married

100K

No

3

No

Single

70K

No

4

Yes

Married

120K

No

5

No

Divorced

95K

Yes

6

No

Married

60K

No

7

Yes

Divorced

220K

No

8

No

Single

85K

Yes

9

No

Married

75K

No

10

No

Single

90K

Yes

10

Own

Car?

C0: 6

C1: 4

C0: 4

C1: 6

C0: 1

C1: 3

C0: 8

C1: 0

C0: 1

C1: 7

Car

Type?

C0: 1

C1: 0

C0: 1

C1: 0

C0: 0

C1: 1

Student

ID?

Yes

No

Family

Sports

Luxuryc

1

c

10

c

20

C0: 0

C1: 1

c

11

Own Car?�

C0: 6 C1: 4�

C0: 4 C1: 6�

Car Type?�

C0: 1 C1: 3�

C0: 8 C1: 0�

C0: 1 C1: 7�

C0: 1 C1: 0�

C0: 1 C1: 0�

C0: 0 C1: 1�

Student ID?�

…�

Yes�

No�

Family�

Sports�

Luxury�

c1�

c10�

c20�

C0: 0 C1: 1�

…�

c11�

C0: 5

C1: 5

C0: 9

C1: 1

C0: 5 C1: 5�

C0: 9 C1: 1�

C0 N10

C1 N11

C0 N20

C1 N21

C0 N30

C1 N31

C0 N40

C1 N41

C0 N00

C1 N01

C0

N40

C1

N41

C0

N00

C1

N01

C0

N10

C1

N11

C0

N20

C1

N21

C0

N30

C1

N31

å

=

j

t

j

p

t

GINI

2

)]

|

(

[

1

)

(

C1

0

C2

6

Gini=0.000

C1

2

C2

4

Gini=0.444

C1

3

C2

3

Gini=0.500

C1

1

C2

5

Gini=0.278

C1

1

C2

5

Gini=0.278

C1

0

C2

6

Gini=0.000

C1

2

C2

4

Gini=0.444

C1

3

C2

3

Gini=0.500

C1

0

C2

6

C1

2

C2

4

C1

1

C2

5

C1

0

C2

6

C1

2

C2

4

C1

1

C2

5

Parent

C1

6

C2

6

Gini = 0.500

N1 N2 C1 4 2 C2 3 3 Gini=0.486

N1 N2

C1 4 2

C2 3 3

Gini=0. 486

Parent

C1

6

C2

6

Gini = 0.500

N1

N2

C1

4

2

C2

3

3

Gini=0.486

N1 N2 C1 1 5 C2 4 2

Gini=?

N1 N2

C1 1 5

C2 4 2

Gini= ?

Parent

C1

6

C2

6

Gini = 0.500

N1

N2

C1

1

5

C2

4

2

Gini=?

)

|

(

max

1

)

(

t

i

P

t

Error

i

=

C1

1

C2

5

C1

0

C2

6

C1

2

C2

4

</

Our website has a team of professional writers who can help you write any of your homework. They will write your papers from scratch. We also have a team of editors just to make sure all papers are of HIGH QUALITY & PLAGIARISM FREE. To make an Order you only need to click Ask A Question and we will direct you to our Order Page at WriteDemy. Then fill Our Order Form with all your assignment instructions. Select your deadline and pay for your paper. You will get it few hours before your set deadline.

Fill in all the assignment paper details that are required in the order form with the standard information being the page count, deadline, academic level and type of paper. It is advisable to have this information at hand so that you can quickly fill in the necessary information needed in the form for the essay writer to be immediately assigned to your writing project. Make payment for the custom essay order to enable us to assign a suitable writer to your order. Payments are made through Paypal on a secured billing page. Finally, sit back and relax.

Do you need an answer to this or any other questions?