According to the documentation it is possible to specify different loss functions to
SGDClassifier. And as far as I understand
log loss is a cross-entropy loss function which theoretically can handle soft labels, i.e. labels given as some probabilities [0,1].
The question is: is it possible to use
log loss function out the box for classification problems with soft labels? And if not - how this task (linear classification on soft labels) can be solved using scikit-learn?
target is labeled and by the nature of the problem hard labels don't give good results. But it is still a classification problem (not regression) and I wan't to keep probabilistic interpretation of the
prediction so regression doesn't work out of the box too. Cross-entropy loss function can handle soft labels in
target naturally. It seems that all loss functions for linear classifiers in scikit-learn can only handle hard labels.
So the question is probably:
How to specify my own loss function for
SGDClassifier, for example. It seems scikit-learn doesn't stick to the modular approach here and changes need to be done somewhere inside it's sources
<a href="http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html" rel="nofollow">According to the docs</a>,<blockquote>
The ‘log’ loss gives logistic regression, a probabilistic classifier.</blockquote>
In general a loss function is of the form
Loss( prediction, target ), where
prediction is the model's output, and
target is the ground-truth value. In the case of logistic regression,
prediction is a value on
(0,1) (i.e., a "soft label"), while
1 (i.e., a "hard label").
So in answer to your question, it depends on if you are referring to the
target. Generally speaking, the form of the labels ("hard" or "soft") is given by the algorithm chosen for
prediction and by the data on hand for
If your data has "hard" labels, and you desire a "soft" label output by your model (which can be thresholded to give a "hard" label), then yes, logistic regression is in this category.
If your data has "soft" labels, then you would have to choose a threshold to convert them to "hard" labels before using typical classification methods (i.e., logistic regression). Otherwise, you could use a regression method where the model is fit to predict the "soft" target. In this latter approach, your model could give values outside of
(0,1), and this would have to be handled.