They are a solution to a problem, make no mistake.  The problem they solve is that processing a datastructure with a "simple" while loop exhibits a pattern (which is a bad thing) in the loop counter or iterator, stepping it to the next value, and its comparison with the sentinel value.  But one should not be trying to process a datastructure with a loop in the first place for several resons.
One is that normally one wants to map, filter or fold the data.  These operations have a significant general structure (hence the term, duh) which need not be a loop, because the specific part is functionally pure, i.e. does not have side effects.  Let us look at a similar example, which already works this way: sorting.
First, if there were no built-in quicksort implementation, many programmers would write an O(N2) sort, because those are easier to implement.  Some might even write bubblesorts when in a hurry.  Not only would these be inefficient, but in the code repeated over and over, some bugs would happen.  Now, if there is one implementation of sort, and programmers supply the comparison function, there are numerous advantages:
And this is where having one higher-order function becomes even more important.
This central implementation can be written to take advantage of the hardware to its fullest, and any code to use it will be both faster and shorter.
Furthermore, a "free lunch" episode can happen again.
The original was when harware regularly got faster.
One only had to recompile a program (or not even that), and it ran significantly faster on newer hardware.
Now, newer CPUs regularly have more and more cores.
fold have a parallel implementation, a recompilation will make software run faster on newer, more parallel hardware, without needing a rewrite.