scikit-learn classification on soft labels


According to the documentation it is possible to specify different loss functions to SGDClassifier. And as far as I understand log loss is a cross-entropy loss function which theoretically can handle soft labels, i.e. labels given as some probabilities [0,1].

The question is: is it possible to use SGDClassifier with log loss function out the box for classification problems with soft labels? And if not - how this task (linear classification on soft labels) can be solved using scikit-learn?


The way target is labeled and by the nature of the problem hard labels don't give good results. But it is still a classification problem (not regression) and I wan't to keep probabilistic interpretation of the prediction so regression doesn't work out of the box too. Cross-entropy loss function can handle soft labels in target naturally. It seems that all loss functions for linear classifiers in scikit-learn can only handle hard labels.

So the question is probably:

How to specify my own loss function for SGDClassifier, for example. It seems scikit-learn doesn't stick to the modular approach here and changes need to be done somewhere inside it's sources


<a href="http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html" rel="nofollow">According to the docs</a>,


The ‘log’ loss gives logistic regression, a probabilistic classifier.


In general a loss function is of the form Loss( prediction, target ), where prediction is the model's output, and target is the ground-truth value. In the case of logistic regression, prediction is a value on (0,1) (i.e., a "soft label"), while target is 0 or 1 (i.e., a "hard label").

So in answer to your question, it depends on if you are referring to the prediction or target. Generally speaking, the form of the labels ("hard" or "soft") is given by the algorithm chosen for prediction and by the data on hand for target.

If your data has "hard" labels, and you desire a "soft" label output by your model (which can be thresholded to give a "hard" label), then yes, logistic regression is in this category.

If your data has "soft" labels, then you would have to choose a threshold to convert them to "hard" labels before using typical classification methods (i.e., logistic regression). Otherwise, you could use a regression method where the model is fit to predict the "soft" target. In this latter approach, your model could give values outside of (0,1), and this would have to be handled.


  • Unable to import caffe
  • SciKit One-class SVM classifier training time increases exponentially with size of training data
  • save a juptyer notebook with specific name within the code
  • Why does ANTLR require all or none alternatives be labeled?
  • Passing matrices from function to function in MATLAB
  • LDA: Why sampling for inference of a new document?
  • Map predictions back to IDs - Python Scikit Learn DecisionTreeClassifier
  • How to specify columns in Swagger
  • Using recursion to search all combinations of elements in an array of integers
  • how to enforce Monotonic Constraints in XGBoost with ScikitLearn?
  • What's a better way to swap two argument values?
  • How to use a decaying learning rate with an estimator in tensorflow?
  • Installing SSL on AWS EC2 Bitnami Mean Stack
  • Typecasting `this` of a base class template to its derived class
  • Connect Node.js with Oracle on Windows platform
  • Avoid Inheriting Super Class Tests in ScalaTest
  • how do you obtain the address of an instance after overriding the __str__ method in python
  • How to access meteor package name inside package?
  • How do I formally document a C# Attribute in UML?
  • How to get latest version of a artifact on Bintray using JSONP
  • Tell Git to stop prompting me for conflicts when none really exist?
  • Inline R code in YAML for rmarkdown doesn't run
  • Declaring variable dynamically in VB.net
  • Linq Objects Group By & Sum
  • Optimizing database types to compact database (SQLite)
  • Join two tables and save into third-sql
  • Shallow update not allowed (git > 1.9)
  • How to model a transition system with SPIN
  • ORA-29908: missing primary invocation for ancillary operator
  • jquery mobile loadPage not working
  • Run Powershell script from inside other Powershell script with dynamic redirection to file
  • retrieve vertices with no linked edge in arangodb
  • using conditional logic : check if record exists; if it does, update it, if not, create it
  • How to include full .NET prerequisite for Wix Burn installer
  • Buffer size for converting unsigned long to string
  • Understanding cpu registers
  • Qt: Run a script BEFORE make
  • Add sale price programmatically to product variations
  • reshape alternating columns in less time and using less memory
  • Converting MP3 duration time