the reduceByKey example works much better on a large dataset. That's because Spark knows it can combine output with a common key on each partition before shuffling the data.
combineByKey can be used when you are combining elements but your return type differs from your input value type.
foldByKey merges the values for each key using an associative function and a neutral "zero value".
objects created outside the scope of the closure are not necessarily in the same state within the closure.
1 - Why Did My Spark Job Fail with NotSerializableException?
Any objects created outside of the scope of the closure will be serialized to the workers.
2 - Why Is My Spark Job so Slow and Only Using a Single Thread?
References to singleton objects with the closure of a parallel operation will bottleneck the process as these references will happen within the driver program
3 - Why Did My Spark Job Fail with java.lang.IllegalArgumentException: Shuffle Id Nnnn Registered Twice?