Why for() loops solve the wrong problem

By for loops, I mean any construct which makes it easier to loop a fixed number of times, or over a datastructure.

They are a solution to a problem, make no mistake.  The problem they solve is that processing a datastructure with a "simple" while loop exhibits a pattern (which is a bad thing) in the loop counter or iterator, stepping it to the next value, and its comparison with the sentinel value.  But one should not be trying to process a datastructure with a loop in the first place for several resons.

One is that normally one wants to map, filter or fold the data.  These operations have a significant general structure (hence the term, duh) which need not be a loop, because the specific part is functionally pure, i.e. does not have side effects.  Let us look at a similar example, which already works this way: sorting.

First, if there were no built-in quicksort implementation, many programmers would write an O(N2) sort, because those are easier to implement.  Some might even write bubblesorts when in a hurry.  Not only would these be inefficient, but in the code repeated over and over, some bugs would happen.  Now, if there is one implementation of sort, and programmers supply the comparison function, there are numerous advantages:

The above is merely the conventional wisdom.  However, there is another factor of increasing importance: parallelism.  Writing bug-free parallel code is hard, even harder than writing an advanced sorting algorithm.  But since hardware is increasingly parallel (and possibly heterogenous), there is a growing pressure on software to use it.

And this is where having one higher-order function becomes even more important.  This central implementation can be written to take advantage of the hardware to its fullest, and any code to use it will be both faster and shorter.  Furthermore, a "free lunch" episode can happen again.  The original was when harware regularly got faster.  One only had to recompile a program (or not even that), and it ran significantly faster on newer hardware.  Now, newer CPUs regularly have more and more cores.  If sort, map and fold have a parallel implementation, a recompilation will make software run faster on newer, more parallel hardware, without needing a rewrite.