59873

calculate gradient output for Theta update rule

As this uses a sigmoid function instead of a zero/one activation function I guess this is the right way to calculate gradient descent, is that right?

static double calculateOutput( int theta, double weights[], double[][] feature_matrix, int file_index, int globo_dict_size ) { //double sum = x * weights[0] + y * weights[1] + z * weights[2] + weights[3]; double sum = 0.0; for (int i = 0; i < globo_dict_size; i++) { sum += ( weights[i] * feature_matrix[file_index][i] ); } //bias sum += weights[ globo_dict_size ]; return sigmoid(sum); } private static double sigmoid(double x) { return 1 / (1 + Math.exp(-x)); }

This following code where I'm trying up update my Θ values, (equivalent to weights in perceptron, isn't it?), I was given this formula LEARNING_RATE * localError * feature_matrix__train[p][i] * output_gradient[i] for that purpose in my <strong>related question</strong>. I commented out the weight update from my perceptron.

Is this new update rule the correct approach?

What is meant by output_gradient? Is that equivalent to the sum I calculate in my calculateOutput method?

//LEARNING WEIGHTS double localError, globalError; int p, iteration, output; iteration = 0; do { iteration++; globalError = 0; //loop through all instances (complete one epoch) for (p = 0; p < number_of_files__train; p++) { // calculate predicted class output = calculateOutput( theta, weights, feature_matrix__train, p, globo_dict_size ); // difference between predicted and actual class values localError = outputs__train[p] - output; //update weights and bias for (int i = 0; i < globo_dict_size; i++) { //weights[i] += ( LEARNING_RATE * localError * feature_matrix__train[p][i] ); weights[i] += LEARNING_RATE * localError * feature_matrix__train[p][i] * output_gradient[i] } weights[ globo_dict_size ] += ( LEARNING_RATE * localError ); //summation of squared error (error value for all instances) globalError += (localError*localError); } /* Root Mean Squared Error */ if (iteration < 10) System.out.println("Iteration 0" + iteration + " : RMSE = " + Math.sqrt( globalError/number_of_files__train ) ); else System.out.println("Iteration " + iteration + " : RMSE = " + Math.sqrt( globalError/number_of_files__train ) ); //System.out.println( Arrays.toString( weights ) ); } while(globalError != 0 && iteration<=MAX_ITER); <hr>

<strong>UPDATE</strong> Now I've updated things, looks more like this:

double loss, cost, hypothesis, gradient; int p, iteration; iteration = 0; do { iteration++; cost = 0.0; loss = 0.0; //loop through all instances (complete one epoch) for (p = 0; p < number_of_files__train; p++) { // 1. Calculate the hypothesis h = X * theta hypothesis = calculateHypothesis( theta, feature_matrix__train, p, globo_dict_size ); // 2. Calculate the loss = h - y and maybe the squared cost (loss^2)/2m loss = hypothesis - outputs__train[p]; // 3. Calculate the gradient = X' * loss / m gradient = calculateGradent( theta, feature_matrix__train, p, globo_dict_size, loss ); // 4. Update the parameters theta = theta - alpha * gradient for (int i = 0; i < globo_dict_size; i++) { theta[i] = theta[i] - (LEARNING_RATE * gradient); } } //summation of squared error (error value for all instances) cost += (loss*loss); /* Root Mean Squared Error */ if (iteration < 10) System.out.println("Iteration 0" + iteration + " : RMSE = " + Math.sqrt( cost/number_of_files__train ) ); else System.out.println("Iteration " + iteration + " : RMSE = " + Math.sqrt( cost/number_of_files__train ) ); //System.out.println( Arrays.toString( weights ) ); } while(cost != 0 && iteration<=MAX_ITER); } static double calculateHypothesis( double theta[], double[][] feature_matrix, int file_index, int globo_dict_size ) { double hypothesis = 0.0; for (int i = 0; i < globo_dict_size; i++) { hypothesis += ( theta[i] * feature_matrix[file_index][i] ); } //bias hypothesis += theta[ globo_dict_size ]; return hypothesis; } static double calculateGradent( double theta[], double[][] feature_matrix, int file_index, int globo_dict_size, double loss ) { double gradient = 0.0; for (int i = 0; i < globo_dict_size; i++) { gradient += ( feature_matrix[file_index][i] * loss); } return gradient; } public static double hingeLoss() { // l(y, f(x)) = max(0, 1 − y · f(x)) return HINGE; }

Answer1:

Your calculateOutput method looks correct. Your next piece of code I don't really think so:

weights[i] += LEARNING_RATE * localError * feature_matrix__train[p][i] * output_gradient[i]

Look at the image you posted in your other question:

<img src="https://i.stack.imgur.com/UHgMr.png" alt="Update rules for Theta">

Let's try to identify each part of these rules in your code.

<ol> <li>

Theta0 andTheta1: looks like weights[i] in your code; I hope globo_dict_size = 2;

</li> <li>

alpha: seems to be your LEARNING_RATE;

</li> <li>

1 / m: I can't find this anywhere in your update rule. m is the number of training instances in Andrew Ng's videos. In your case, it should be 1 / number_of_files__train I think; It's not very important though, things should work well even without it.

</li> <li>

The sum: you do this with the calculateOutput function, whose result you make use of in the localError variable, which you multiply by feature_matrix__train[p][i] (equivalent to x(i) in Andrew Ng's notation).

<strong>This part is your partial derivative, and part of the gradient!</strong>

Why? Because the partial derivative of [h_theta(x(i)) - y(i)]^2 with respect to Theta0 is equal to:

2*[h_theta(x(i)) - y(i)] * derivative[h_theta(x(i)) - y(i)] derivative[h_theta(x(i)) - y(i)] = derivative[Theta0 * x(i, 1) + Theta1*x(i, 2) - y(i)] = x(i, 1)

Of course, you should derive the entire sum. This is also why Andrew Ng used 1 / (2m) for the cost function, so the 2 would cancel out with the 2 we get from derivation.

Remember that x(i, 1), or just x(1) should consist of all ones. In your code, you should make sure that:

feature_matrix__train[p][0] == 1 </li> <li>

That's it! I don't know what output_gradient[i] is supposed to be in your code, you don't define it anywhere.

</li> </ol>

I suggest you take a look at this tutorial to get a better understanding of the algorithm you have used. Since you use the sigmoid function, it seems like you want to do classification, but then you should use a different cost function. That document deals with logistic regression as well.

Recommend

  • CUDA: sum of data on a global memory variable
  • Why error message from validate() method is not displayed in the template?
  • Error - Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs)= etc
  • Ambiguous overload on template operators
  • Maven: Command line to download the dependencies described in the pom.xml
  • Restricted profiles settings not being remembered
  • Different outcomes when using tf.Variable() and tf.get_variable()
  • Scala's collect inefficient in Spark?
  • constexpr vs const vs constexpr const
  • All shortest paths for weighted graphs with networkx?
  • Modify chart label orientation with XlsxWriter module (Python)
  • a concept similar to pointers in as3?
  • C++/CLI Thread synchronization including managed and unmanaged code
  • LNK1104: cannot open file 'kernel32.lib'
  • Iterate twice through a DataReader
  • Invert string in Rust
  • xtable - background colour of added rows
  • Less Conflicting Session Manager for Zope 2
  • Row to Column conversion in Talend
  • Approximate Order-Preserving Huffman Code
  • Custom preprocessing in caret
  • calculate gradient output for Theta update rule
  • Is it possible to open regedit and navigate to straight to a specific key using process.start?
  • MongoDb aggregation
  • Mysterious problem with floating point in LISP - time axis generation
  • How to know when stdin is empty if it contains EOF?
  • Update CALayer sublayers immediately
  • Getting last autonumber in access
  • Is there a amazon webstore API for customers?
  • Deserializing XML into class C#
  • Trying to switch camera back to front but getting exception
  • Timeout for blocking function call, i.e., how to stop waiting for user input after X seconds?
  • Do create extension work in single-user mode in postgres?
  • Function pointer “assignment from incompatible pointer type” only when using vararg ellipsis
  • Free memory of cv::Mat loaded using FileStorage API
  • Angular 2 constructor injection vs direct access
  • python draw pie shapes with colour filled
  • Programmatically clearing map cache
  • Reading document lines to the user (python)
  • How to Embed XSL into XML