tl;dr: The mathematical definition of an idempotent function is subtly different to the definition used in software engineering. In software engineering an idempotent function is one that has the same impact on state, no matter how many times it is run. In mathematics, an idempotent function is one where f(x) =f(f(x)).
The concept of idempotence came up recently at work, in the context of infrastructure. A statement along the lines of “When treating infrastructure as code, it’s often important to ensure that your functionality is idempotent.”
I asked what “idempotent” meant, and I was given the (incorrect in all contexts) answer, “for the same input, you will always get the same output.” I was also told (correct for maths, but not for software) that the following function is not idempotent:
f(x) = x + 1
…but the following function is idempotent:
f(x) = x * 1
This instantly got me asking questions, because the first two definitions I was given were at odds with one another. The first definition describes a deterministic function. For instance: Both f(x) = x + 1 and f(x) = x * 1 are deterministic, ie given the same input, you’ll always get the same output (3 + 1 will always be equal to 4, and 3 * 1 will always equal 3).
But then I was given a better example, more relevant to the original conversation: If you have a function that adds an entry to a hosts file, you want to know that no matter how many times you execute that function, you will only ever add one entry to the hosts file. You don’t want to add more than one entry.
For instance, we start with a hosts file that looks like this:
127.0.0.1 localhost
We run our function, and now it looks like this:
127.0.0.1 localhost
255.255.255.255 broadcasthost
We run our function again, and nothing changes. The broadcasthost entry has already been added. Our function has nothing to do.
And then I found a true (in mathematics) definition of idempotence:
A function f(x) is idempotent if f(x) = f(f(x)).
To reiterate: The originator made a mistake: It is NOT true that idempotence is defined as “the same input always gives the same output”.
That is to say, if a function is idempotent and you apply that function to x, then you apply the function again to the return value, you still get the same result. Keep taking the result of each pass and sending it back into the function, and you still get the same result.
At this point I didn’t know about the difference between mathematical idempotence and software idempotence, and I was happy with my new definition: In our hosts file example, if our function takes the file content as an input and outputs the transformed result as an output, you can keep reapplying the function and you will keep getting the same result.
Using our new definition, we can easily see how f(x) = x + 1 is not idempotent, but f(x) = x * 1 is idempotent.
So far so good. But then I found this article claiming that pure functions are always idempotent, and my head exploded.
A pure function is one that has no side effects and no hidden state. The example given was this one:
f(a,b) = a + b
My confusion stemmed from two sources: Firstly, how can I apply my definition above – f(x) = f(f(x)) to this new example? It takes two parameters, but only returns one result! But secondly, how can it possibly be idempotent? It’s deterministic, yes, but any way you can find of repeatedly applying the same function to a new output will surely produce a different result? And what on earth does its pureness have to do with anything??
Well. I asked a bunch of clever people, and discovered that I had been dealing with the mathematical definition of idempotence, which is subtly different to the software engineering definition.
In software engineering, it’s all about state. My hosts file example was flawed because I had assumed that the hosts file content was being passed in as an input and then returned as an output. In fact, we are talking about a function that acts on the hosts file. This function’s input may be the hosts file path. Its output may be some kind of success code. It is not a pure function, because it will have the side effect (sometimes) of altering the state of the hosts file.
BUT our idempotent hosts-file-editing function can be run several times, and its effect on state will always be the same. No matter how many times we run this function, the hosts file will always be impacted in the same way.
The article that confused me so much was in fact making a very simple point: Pure functions are idempotent because they do not alter state. Therefore state is always impacted in the same way by multiple calls to a pure function, because it is simply not impacted. So in reality, most idempotent functions are not pure, but all pure functions are idempotent.
One common example of idempotence in software engineering is the HTTP specification – which states that GET, PUT and DELETE requests should all be idempotent, but POST should not.
At this point I will quote my colleague Mouad (and the Stormpath blog), who between them say this:
“The HTTP RFC have a better definition which goes:
A request method is considered “idempotent” if the intended effect on the server of multiple identical requests with that method is the same as the effect for a single such request.
The ‘intended effect’ as defined above is not the same thing as the returned value, example: calling PUT two times may return a different result in the second call (e.g. 409 conflict), but a PUT is still idempotent if the state and effect didn’t change by the second call, in other words, ‘HTTP idempotency only applies to server state – not client state’ ref. https://stormpath.com/blog/put-or-post.”
Hopefully your head has not exploded. Or if it did, I manage to unexplode it and return it to its former state. Hopefully also, no matter how many more times you read this post, your head will remain unexploded. And we have ourselves an idempotent blog post. Voila!