Archived from http://www.hacknot.info/hacknot/action/showEntry?eid=59, which
is sadly no longer available.
"Where else can one get such a marvelous return in conjecture from such a modest investment of fact?" - Mark Twain
Numerology is the study of the occult meanings of numbers and their influence on human life1. Numerologists specialize in finding numeric relationships between otherwise disparate figures, and attributing to them some greater significance.
For instance, some claim that by adding up the component numbers in your birth date, together with the numeric equivalent of your name (where A=1, B=2 etc) then a figure is derived that, if properly interpreted, can yield insight into your personality.1
Others consider that the reoccurrence of the number 19 in Islamic texts is evidence of their authorship by a higher being2. The Koran has 114 (6 x 19) chapters and 6346 verses (19 x 334) and 329,156 (19 x 17,324) letters. The word "Allah" appears 2,698 (19 x 142) times. The sum of the verse numbers that mention Allah is 118,123 (19 x 6,217).
Pyramids are a favorite topic for numerologists, and there are dozens of "meaningful" numeric relationships to be found in their dimensions. For instance, the base perimeter of the Great Pyramid of Cheops is 36,515 inches - 100 times the number of days in the solar year. And so on.
We can laugh at such desperate searches for meaning, but before we laugh too hard we should consider that software development has its own brand of numerology, which we have given the grand name of Function Point Analysis (FPA).
FPs were proposed in 1979 as a way of finding the size of a piece of software given only its functional specification. It was intended that the FP count of an application would be independent of the technology, people and methods eventually used to implement the application, focusing as it did upon the functionality the application provided to the user. Broadly speaking, basic FPs are calculated by following these steps:
You can then multiply the UFP by a Value Adjustment Factor (VAF) which is based on consideration of 14 general system characteristics, to yield the final Function Point count .
I won't bore you with the excruciating specifics of the component calculations. The above gives you some idea of the nature of FP counting and it's reliance upon subjective judgments. Specifically, the placement of component boundaries and the values chosen for the many weighting factors and characteristics are all determined on a subjective basis. Some of that subjectivity has been embodied in the standardized FP counting rules that are issued by the International Function Point Users Group (IFPUG).
So lacking have FPs been found, that there has been a steady stream of proposed improvements and alternatives to them since 1979. But none of these have challenged the basic FP ethos of modeling functional size as a weighted sum of arbitrarily selected attributes. They simply change the number and definition of those attributes, and the means by which they are mangled together into a final figure. The basic chronology of the FP family tree has been:
To understand why the FP and its many variants are fundamentally flawed, it is first necessary to understand the difference between measuring and rating.
To measure an attribute of something is to assign numbers to it on an objective and empirical basis, so that the relationships between the numbers preserve any intuitive notions and empirical observations about that attribute.5 For example, the metric meter is a measure, which implies:
To rate an attribute of something is to assign numbers to it on a subjective and intuitive basis. The relationships between the numbers do not preserve the intuitive and empirical observations about the attribute. In contrast to the above example, consider the rating out of 10 that a reviewer gives a movie:
To clarify, suppose a reviewer expresses their assessment of a movie in words rather than numbers. Instead of rating a movie from 1 - 10, they rate it from "abysmal" to "magnificent". We might be tempted to think a movie that gets an 8 is twice as good as a movie that gets a 4, but we would surely not conclude that "very good" is twice as good as "disappointing". We can express a rating using any symbols we want, but just because we choose numbers for our symbols does not mean that we confer the properties of those numbers upon the attribute we are rating.
In summary:
From the above, it is clear that FPs are a rating and not a measurement, due to the subjective manner in which they are derived. Hence, they cannot be manipulated mathematically. And yet the software literature is rife with examples of researchers attempting to do just that. Many researchers and reviewers continue to ignore the fundamental implications of the non-mathematical nature of the FP, such as:
Given such limitations, there are very few valid uses of an application's FP count. If the FP counts of two applications differ markedly, and their contexts are sufficiently similar, then you may be justified in saying that one is functionally bigger than the other, but not by how much.3 The notion that FPs can participate in mathematical calculations, and thereby be used for scheduling, effort and productivity measures, is without theoretical or empirical basis.
Although their use may have declined in recent years, Function Points are still quite popular. There are several factors which might account for their continued usage, despite their essential invalidity:
o As long as FPs work, who cares what basis they have or don't have? - The problem is that in general, FPs don't work. Even FP adherents will admit to the numerous shortcomings of FPs, and the need to constrain large numbers of contextual factors when applying them. Witness the various mutations of FP that have arisen, each attempting to address some subset of the numerous failings of FPs.
o It doesn't matter if you're wrong, as long as you're wrong consistently 8 - Unfortunately, unless you know why you're wrong, you have no way of knowing if you are indeed being consistently wrong. FPs are sensitive to a great many contextual factors. Unless you know what they are and the precise way they effect the resulting FP count, you have no way of knowing the extent to which your results have been influenced by those factors, let alone whether that influence has been consistent.
FPs have attracted their own league of True Believers - like many technical schools whose tenets, lacking an empirical basis, can only be defended by the emotional invective of their adherents. I encountered one such adherent recently in David Anderson, author of "Agile Project Management." Anderson made somerather pompous observations 9 on his blog as to how surprising it was that people should express disbelief regarding his claims to 5 and 10-fold increases in productivity using TDD, AM and FDD. I replied that their incredulity might stem from the boldness of his claims or the means by which he collected his data, rather than an inherently obstreperous attitude. He indicated his productivity data was expressed in FPs per unit time! I tried explaining to him that FPs cannot be used to measure productivity, because not all FPs are created equal, as explained above. He wasn't interested. That discussion has now been deleted from his blog. He also denied me permission to reproduce that portion of it which occurred in email.
Such is the attitude I typically encounter when dealing with self-styled gurus and experts. There is much talk of science and data, but as soon as you express doubt regarding their claims, there is a quick resort to insult and posture. Ironic, given that doubt and criticism are the basic mechanisms that give science the credibility that such charlatans seek to cloak themselves in.
The appeal, and hence the popularity, of FPs is their reduction of the complex notion of software functional size to a single number. The simplicity is attractive. But what basis is there for believing that such a single-figure expression of functional size is even possible?
Consider this analogy3. When you walk into a clothing store, you characterize your size using several different measures. One figure for shirt size, another for trouser size, another for shoe size and another for hat size. What if, by way of misguided reductionism, we were to try and concoct a single measure of clothing size and call it Clothing Points. We could develop all sorts of rules and regulations for counting Clothing Points, including weighting factors accounting for age, diet, race, gender, disease and so on. We might even find that if we sufficiently controlled the influence of external factors, given the limited variations of the human form, we might eventually be able to find some limited context in which Clothing Points were a semi-reasonable assessment of the size of all items of clothing. We could then walk into a clothing store and say "My size is 187 Clothing Points" and get a size 187 shirt, size 187 trousers, size 187 shoes and size 187 hat. The items might even fit, although we would likely sacrifice some comfort for the expediency and convenience of having reduced four dimensions down to a single dimensionless number.
The search for a grand unified "measure" of functional size may be just as foolhardy as the quest for uni-metric clothing.
The continued use and acceptance of Function Point Analysis in software development should be a source of acute embarrassment to us all. It is a prime example of muddle-headed, pseudo-scientific thinking, that has persisted only because of the general ignorance of measurement theory and valid experimental methodology that exists in the development community. We need to stop fabricating and embellishing arbitrary sets of counting rules. In doing so, we are treating these formulae as if they were incantations whose magic can only manifest when precisely the correct wording has been discovered, but whose inner workings must forever remain a mystery. Rather, we need to go back to basics and work towards understanding the fundamental technical dimensions that contribute to the many and varied notions of an application's functional size. How can we hope to measure something when we can't even precisely define what that something is? Empiricism holds some promise as a means to improve software development practices, but the pseudo-empiricism of Function Point Analysis is little more than numerological voodoo.