48913

How do I classify this value using a decision tree

Basically my decision tree can't classify a value using the normal algorithm.

I get to a node, and there are two options (say, sunny and windy), but at this node my value is different (for example, rainy).

Are there any methods to deal with this, e.g. change the tree or just estimate based on other data?

I was thinking of assigning the most common value at that node but this is just a guess.

Answer1:

Have you considered fuzzy logic for the rich/poor continuum? As for things that can't be expressed as a continuum, I can't think of a way it can be done. Rainy weather, for example, is so fundamentally different from sunny and windy weather in how we experience and react to it, I'm not sure how you expect a computer (or whatever it is you're writing your decision tree for) to figure out what to do. (Aside from simply having an "I don't know what to do" output state, but I'm assuming you wanted something more meaningful than that.)

Answer2:

The whole point in decision trees is that the options are complete and (hopefully) mutual exclusive.

If it is not you'll get into trouble. Redefine poor and rich to cover everything. (all incomes, all states of mind...)

But honestly, interpret such weather examples as what they are: just examples for a concept, not the holy grail of meteorology.

Answer3:

The issue here is that you've learned a decision from different data as you are using to classify it. More specific, your decision tree knows only two values (i.e., sunny and windy) for the attribute Weather. But your data for classification also allows the value rainy. Since your decision tree has no observation when the weather was rainy, this value turns useless. In other words, you have to eliminate this value from your classification.

The only solution is to do data cleaning before using the decision tree as classifier. You have two options: 1. Remove all observations/instances with Weather="rainy" from your data set because you can't classify them. The disadvantage is that all instances with Weather="rainy" are not classified. 2. For all observations/instances with Weather="rainy", remove the value or rather set it to unknown/null. In case that your decision tree can handle null values, it can classify all of your data set. If not, you still have a problem. In that case you should go for option 3. 3. Relearn your decision tree with Weather={sunny, windy, rainy} (4). In your case the following is not an option. Replace "rainy" with either "sunny" or "rainy. There are different heuristics for that.

Answer4:

You are talking about the "normal algorithm", which is a quite blurry statement. I assume you are using a strictly-binary rooted decision tree, where the each internal node makes a binary split of the data. Thus, the condition evaluation at each internal node outputs a Boolean variable, which splits the data into the left node (true) and right node (false). In your case, you can have a categorical variable weather with two possible values in the training data, which makes only two possible node: weather==sunny or weather==windy. Hence, the rainy samples will be always on the right node, as it is not sunny and not windy.

In the following picture, the rainy samples will be classified as not sunny, not windy.

<img src="https://i.stack.imgur.com/Kv531.png" alt="enter image description here">

Recommend

  • What is LiveMesh Object and its connection with Silverlight 3.0
  • Classes containing other classes as properties
  • How do I classify this value using a decision tree
  • Measuring broadcast message latency using system clock, good idea?
  • how can i get two consecutive values from Iterator
  • Load factor of hash tables with tombstones
  • Controling the Volume of an Audio Clip in Java 1.4
  • How to convert time String into NSDate?
  • Choose one key arbitrarily in a dictionary without iteration [duplicate]
  • What are the best practices for migrating an Oracle 10g database to Microsoft SQL 2008 R2? Applicati
  • JQuery load doesn't seem to process ?
  • pandas parse csv with left and right quote chars
  • UIBarButtonItem's action is not called when in a view with a UIGestureRecognizer
  • Detect language of Word document
  • C++/CLI Thread synchronization including managed and unmanaged code
  • Distributed JMS based logging .. falling flat?
  • cell spacing in div table
  • Using MouseListener to select a range of cells in a grid
  • XSLT foreach repeating nodes to flat
  • How to create a 2D image by rotating 1D vector of numbers around its center element?
  • Sensibility of combined Maven/Ant+Ivy build management for dual platform Desktop/Android deployment?
  • How to explicitly/implicitly implemented interface members in C++/CLI?
  • using System.Speech.Synthesis with Windows10 universal app (XAML-C#)
  • How Get arguments value using inline assembly in C without Glibc?
  • How to make R's read_csv2() recognise the text characters properly
  • Thread 1: EXC_BAD_ACCESS (code =1 address = 0x0)
  • Implementation of State Monad
  • WPF ICommand CanExecute(): RaiseCanExecuteChanged() or automatic handling via DispatchTimer?
  • How solve “Qt: Untested Windows version 10.0 detected!”
  • Alternative to overridePendingTransition() - Android
  • OOP Javascript - Is “get property” method necessary?
  • How do I pass the string value parameter of the selected list item from an auto-populated dropdown l
  • Cancel a live stream “fast motion” catch-up in Flash
  • Javascript Callbacks with Object constructor
  • Deserializing XML into class C#
  • How can I use Kendo UI with Razor?
  • Function pointer “assignment from incompatible pointer type” only when using vararg ellipsis
  • python draw pie shapes with colour filled
  • How to Embed XSL into XML
  • Why do underscore prefixed variables exist?