As this uses a sigmoid function instead of a zero/one activation function I guess this is the right way to calculate gradient descent, is that right?

```
static double calculateOutput( int theta, double weights[], double[][] feature_matrix, int file_index, int globo_dict_size )
{
//double sum = x * weights[0] + y * weights[1] + z * weights[2] + weights[3];
double sum = 0.0;
for (int i = 0; i < globo_dict_size; i++)
{
sum += ( weights[i] * feature_matrix[file_index][i] );
}
//bias
sum += weights[ globo_dict_size ];
return sigmoid(sum);
}
private static double sigmoid(double x)
{
return 1 / (1 + Math.exp(-x));
}
```

This following code where I'm trying up update my Θ values, (equivalent to weights in perceptron, isn't it?), I was given this formula `LEARNING_RATE * localError * feature_matrix__train[p][i] * output_gradient[i]`

for that purpose in my <strong>related question</strong>. I commented out the weight update from my perceptron.

Is this new update rule the correct approach?

What is meant by output_gradient? Is that equivalent to the sum I calculate in my `calculateOutput`

method?

```
//LEARNING WEIGHTS
double localError, globalError;
int p, iteration, output;
iteration = 0;
do
{
iteration++;
globalError = 0;
//loop through all instances (complete one epoch)
for (p = 0; p < number_of_files__train; p++)
{
// calculate predicted class
output = calculateOutput( theta, weights, feature_matrix__train, p, globo_dict_size );
// difference between predicted and actual class values
localError = outputs__train[p] - output;
//update weights and bias
for (int i = 0; i < globo_dict_size; i++)
{
//weights[i] += ( LEARNING_RATE * localError * feature_matrix__train[p][i] );
weights[i] += LEARNING_RATE * localError * feature_matrix__train[p][i] * output_gradient[i]
}
weights[ globo_dict_size ] += ( LEARNING_RATE * localError );
//summation of squared error (error value for all instances)
globalError += (localError*localError);
}
/* Root Mean Squared Error */
if (iteration < 10)
System.out.println("Iteration 0" + iteration + " : RMSE = " + Math.sqrt( globalError/number_of_files__train ) );
else
System.out.println("Iteration " + iteration + " : RMSE = " + Math.sqrt( globalError/number_of_files__train ) );
//System.out.println( Arrays.toString( weights ) );
}
while(globalError != 0 && iteration<=MAX_ITER);
```

<hr>
<strong>UPDATE</strong> Now I've updated things, looks more like this:

```
double loss, cost, hypothesis, gradient;
int p, iteration;
iteration = 0;
do
{
iteration++;
cost = 0.0;
loss = 0.0;
//loop through all instances (complete one epoch)
for (p = 0; p < number_of_files__train; p++)
{
// 1. Calculate the hypothesis h = X * theta
hypothesis = calculateHypothesis( theta, feature_matrix__train, p, globo_dict_size );
// 2. Calculate the loss = h - y and maybe the squared cost (loss^2)/2m
loss = hypothesis - outputs__train[p];
// 3. Calculate the gradient = X' * loss / m
gradient = calculateGradent( theta, feature_matrix__train, p, globo_dict_size, loss );
// 4. Update the parameters theta = theta - alpha * gradient
for (int i = 0; i < globo_dict_size; i++)
{
theta[i] = theta[i] - (LEARNING_RATE * gradient);
}
}
//summation of squared error (error value for all instances)
cost += (loss*loss);
/* Root Mean Squared Error */
if (iteration < 10)
System.out.println("Iteration 0" + iteration + " : RMSE = " + Math.sqrt( cost/number_of_files__train ) );
else
System.out.println("Iteration " + iteration + " : RMSE = " + Math.sqrt( cost/number_of_files__train ) );
//System.out.println( Arrays.toString( weights ) );
}
while(cost != 0 && iteration<=MAX_ITER);
}
static double calculateHypothesis( double theta[], double[][] feature_matrix, int file_index, int globo_dict_size )
{
double hypothesis = 0.0;
for (int i = 0; i < globo_dict_size; i++)
{
hypothesis += ( theta[i] * feature_matrix[file_index][i] );
}
//bias
hypothesis += theta[ globo_dict_size ];
return hypothesis;
}
static double calculateGradent( double theta[], double[][] feature_matrix, int file_index, int globo_dict_size, double loss )
{
double gradient = 0.0;
for (int i = 0; i < globo_dict_size; i++)
{
gradient += ( feature_matrix[file_index][i] * loss);
}
return gradient;
}
public static double hingeLoss()
{
// l(y, f(x)) = max(0, 1 − y · f(x))
return HINGE;
}
```

### Answer1:

Your `calculateOutput`

method looks correct. Your next piece of code I don't really think so:

```
weights[i] += LEARNING_RATE * localError * feature_matrix__train[p][i] * output_gradient[i]
```

Look at the image you posted in your other question:

<img src="https://i.stack.imgur.com/UHgMr.png" alt="Update rules for Theta">

Let's try to identify each part of these rules in your code.

<ol> <li>`Theta0 and`

Theta1: looks like `weights[i]`

in your code; I hope `globo_dict_size = 2`

;

`alpha`

: seems to be your `LEARNING_RATE`

;

`1 / m`

: I can't find this anywhere in your update rule. `m`

is the number of training instances in Andrew Ng's videos. In your case, it should be `1 / number_of_files__train`

I think; It's not very important though, things should work well even without it.

The sum: you do this with the `calculateOutput`

function, whose result you make use of in the `localError`

variable, which you multiply by `feature_matrix__train[p][i]`

(equivalent to `x(i)`

in Andrew Ng's notation).

<strong>This part is your partial derivative, and part of the gradient!</strong>

Why? Because the partial derivative of `[h_theta(x(i)) - y(i)]^2`

with respect to `Theta0`

is equal to:

```
2*[h_theta(x(i)) - y(i)] * derivative[h_theta(x(i)) - y(i)]
derivative[h_theta(x(i)) - y(i)] =
derivative[Theta0 * x(i, 1) + Theta1*x(i, 2) - y(i)] =
x(i, 1)
```

Of course, you should derive the entire sum. This is also why Andrew Ng used `1 / (2m)`

for the cost function, so the `2`

would cancel out with the `2`

we get from derivation.

Remember that `x(i, 1)`

, or just `x(1)`

should consist of all ones. In your code, you should make sure that:

```
feature_matrix__train[p][0] == 1
```

</li>
<li>That's it! I don't know what `output_gradient[i]`

is supposed to be in your code, you don't define it anywhere.

I suggest you take a look at this tutorial to get a better understanding of the algorithm you have used. Since you use the sigmoid function, it seems like you want to do classification, but then you should use a different cost function. That document deals with logistic regression as well.