87979 # apache spark aggregate function using min value

I tried one example found on http://homepage.cs.latrobe.edu.au/zhe/ZhenHeSparkRDDAPIExamples.html

```val z = sc.parallelize(List("12","23","345","4567"),2) z.aggregate("")((x,y) => math.min(x.length, y.length).toString, (x,y) => x + y) res142: String = 11 ```

Why the min length is 1? The first partition contains ["12", "23"] and the second one ["345","4567"]. Comparing the min from any partition with the initial value "", the min value should be 0. And the expected result in my understanding would be 00

```val z = sc.parallelize(List("12","23","345",""),2) z.aggregate("")((x,y) => math.min(x.length, y.length).toString, (x,y) => x + y) res143: String = 10 ```

for this one I understand the same, the final result should be 00

Thanks in advance.

### Answer1:

First lets see how `parallelize` splits your data between partitions:

```val x = sc.parallelize(List("12","23","345","4567"), 2) x.glom.collect // Array[Array[String]] = Array(Array(12, 23), Array(345, 4567)) val y = sc.parallelize(List("12","23","345",""), 2) y.glom.collect // Array[Array[String]] = Array(Array(12, 23), Array(345, "")) ```

and define two helpers:

```def seqOp(x: String, y: String) = math.min(x.length, y.length).toString def combOp(x: String, y: String) = x + y ```

Now lets trace execution for `x`. Ignoring parallelism it can be represented as follows:

```(combOp (seqOp (seqOp "" "12") "23") (seqOp (seqOp "" "345") "4567")) (combOp (seqOp "0" "23") (seqOp (seqOp "" "345") "4567")) (combOp "1" (seqOp (seqOp "" "345") "4567")) (combOp "1" (seqOp "0" "4567")) (combOp "1" "1") "11" ```

The same thing for `y`:

```(combOp (seqOp (seqOp "" "12") "23") (seqOp (seqOp "" "345") "")) (combOp (seqOp "0" "23") (seqOp (seqOp "" "345") "")) (combOp "1" (seqOp (seqOp "" "345") "")) (combOp "1" (seqOp "0" "")) (combOp "1" "0") "10" ```

That being said you shouldn't use `aggregate` here in the first place. Since operations you apply are not associative a whole idea is simply wrong.