Methods & Tools Software Development Magazine

Software Development Magazine - Project Management, Programming, Software Testing

Scrum Expert - Articles, tools, videos, news and other resources on Agile, Scrum and Kanban

Click here to view the complete list of archived articles

This article was originally published in the Summer 2011 issue of Methods & Tools


Everything You Always Wanted to Know About Software Measurement

Sue Rule & P. Grant Rule, s.rule @ smsexemplar.com
Software Measurement Services Ltd., www.smsexemplar.com

What Does Functional Size Measurement Mean?

When Grant Rule was working with early pioneers of agile methods back in the late 1980s, estimating was a big issue. No one, it seemed, knew how to determine accurately in advance how big a project was, how long it would take, or how much it would cost. This was particularly a problem for a software house delivering fixed price contracts. Focusing on rapid, iterative delivery to ensure the user gets the right software product is good; but at what price?

While advocates of Lean-Agile rightly emphasise the importance of delivering the right thing, it is essential that professional software managers also keep reliable accounts of software process performance for estimating costs, auditing performance, and improving productivity. Effectiveness is delivering optimum value for the minimum cost.

The solution to Grant’s estimating issue, and many associated software project control issues, was the IFPUG function point measurement method developed by Alan Albrecht in 1979. Yet for various reasons, functional size measurement has failed to become widely adopted as a software project management tool. Perhaps it seems arcane - difficult - part of that top-heavy bureaucracy development teams want to be rid of. But as a result, estimating remains a widespread problem for Agile and non-Agile developers alike. Process performance remains hugely variable and unpredictable, with measurement often still the most neglected area of process improvement. By stripping away much of the dysfunctional and redundant process control that can accumulate around the software process, the current interest in Agile is bringing many of these issues to the surface. But many IT professionals seem stuck in a rut, ploughing new land with a newly Agile team but using the same blunt plough and running into the same obstacles.

CIOs, bid teams, development teams, vendor managers and legal advisors seeking better ways of contracting - particularly for Lean-Agile software development - are still faced with the dilemmas Albrecht was seeking to tackle. How do you know your development teams - in-house or outsourced - are delivering best value for money? How does a supplier manage the risk of a fixed price contract? How does he demonstrate competitive productivity levels and on-going improvement? How will Agile benefit my organisation? How does the business manage the risk if we can’t detail the requirements up-front? How do you measure the benefits of outsourcing IT? Which technology should we choose? Will a change of technology affect the value delivered? What are the relative merits of buying it, building it or outsourcing it?

Alan Albrecht’s answer to addressing these issues was to measure software size in terms of the output delivered to the user - its functionality - rather than simply in terms of the time and effort put into the development. Deriving a software size from the user’s requirements which can then be mapped to the time, resources and budget needed to realise those requirements gives you the necessary objectivity to tackle the scope, project and performance control issues. It also takes you at least part way to measuring the business value and benefit delivered by software. At least you know how much it will cost and how long it will take to deliver the specified functionality; the software should do what it "says on the tin" and it should be delivered on schedule, within budget. If the customer has bought beans instead of tomatoes, or needed a three course meal and not a tin of beans at all, there is something amiss elsewhere in the process.

Functional size measurement means the ability to make objective performance comparisons across teams, across market sectors, and between different suppliers. It means reliable productivity and pricing benchmarks. It means accurate baselines and measures of improvement. It means early and accurate estimations of software project cost and duration. That was why Grant Rule first adopted the technique back in the 1980s - and remains a knowledgeable advocate of it today.

Estimating

The predictability of software projects remains a major issue for many business managers. Agile can exacerbate this if delivered software products or software product components have to be routinely re-factored. Customers considering outsourced Agile development will rightly require some assurance of value for money, and suppliers need a reliable oversight of costs to ensure prices remain competitive. The ‘Story Point’ measures used by many Agile teams to estimate and manage projects are little more than a new take on our old friend, the measure of input. Estimates using Story Points rely on team knowledge and capability, based on individual experience. The process of arriving at them is even known as Planning Poker, an indication of the level of personal skill and informed guesswork required. Story Points cannot be compared between one team and another (let alone one market sector or one supplier and another). Their use either as an effective and reliable estimating tool or as objective and comparable measures of productivity is therefore very limited.

In 2010, Grant Rule published a short paper on using the most modern of the functional size methods - COSMIC - to derive accurate estimates from User Stories (used by Agile teams to capture requirements.) As measurement specialists, SMS recommend the use of COSMIC as the quickest, most reliable, and most easily learned method of introducing robust software estimating practice for developers using Agile or more traditional development practices.

Size Matters - Accurate Early Estimating

Project failure rates have been shown to relate strongly to the size of project undertaken, with the largest projects accounting for the highest failure rate. Rule’s Relative Size Scale was a table developed by Grant Rule to provide a quick and easy early range estimate of project size and cost in terms of clothing sizes - Small, Medium, Large, Extra Large. The scale uses benchmark data to provide a valuable and reliable early-stage feasability check.

COCOMO II Profiling - Accounting for Non-Functional Requirements

The Constructive Cost Model (COCOMO) is an algorithmic software cost estimation model developed by Barry Boehm. The model uses a basic regression formula, with parameters that are derived from historical project data and current project characteristics. COCOMO enables more refined estimates to be derived from simple unadjusted function points by taking into account differences in non-functional project attributes (Cost Drivers). Detailed COCOMO can also account for the influence of individual project phases.

COCOMO II was brought out in 2000, to adapt the original model for use estimating software projects in modern development conditions such as incremental development and rational unified process. COCOMO II is continually evolving to adapt to changing software methods.

What are Function Points?

Function Points measure the end users requirements in terms of data movements. This means that an early functional size estimate can be derived before any code has been written. They are then also used to measure the actual cost of developing the requirements, which can be compared directly to the estimate and used to calibrate a scale for a particular development environment. If development productivity figures are known, the cost of developing new software can therefore be estimated early enough to make direct comparisons to the cost of buying a software package, or simply using a non-technical solution. Function Point measures are independent of technology so can compare different suppliers, technologies and different teams to give an objective basis for a cost benefit decision.

Because they encompass both cost (development effort consumed) and benefit (requirements fulfilled), Function Points unlock the door to measuring productivity; velocity; quality; and value in software development.

  • The capacity to measure the amount of output produced for input provided gives an objective productivity figure.
  • Measured against time - E.g. FP delivered per elapsed month - function points provide objective and comparable measures of velocity.
  • The number of defects detected over total size of delivery - i.e. no of defects delivered over no of FP delivered - gives a measure of quality.
  • By comparing costs, effort and time expended per function point delivered, we can also demonstrate the huge waste generated by ineffective software processes compared to the standards of measurable efficiency shown by effective, Right-shifted organisations. Although a long way from being a comprehensive measure of value delivered to the customer, this offers a measure of the component of value for which the software developer is solely responsible. Ensuring the functionality delivered is aligned to business need is the joint responsibility of the business user and the development team and outside the scope of functional size measurement.

A Comparison of Functional Size Methods

The most widely used function point methods are IFPUG, NESMA, Mk II and COSMIC. Of these, IFPUG is based on the original method designed by Alan Albrecht. COSMIC is the newest method, developed by an international team of metrics experts to support modern software.

Where organisations have already made some investment in IFPUG capability, it may make sense to use this more widely known method. It is actively managed and maintained by the International Function Point User Group, with the latest version 4.3 released in January 2010. The Certified Function Point Specialist qualification is a recognised standard for IFPUG counting competency. There is considerable IFPUG benchmarking data available, although it is questionable how valuable some of the older data is in terms of benchmarking modern software performance.

While the older functional size measures - typically, IFPUG - can be used for Agile projects, there are significant problems. IFPUG counting is not the easiest process to learn or apply, and even with trained practitioners, time can be wasted debating fuzzy areas. Interpretation of the IFPUG counting rules for complex modern software environments can be tricky, sometimes resulting in differences of opinion between supplier and customer.

COSMIC offers all the advantages of functional size measurement, without some of the shortcomings of earlier methodologies which were not designed with 21st century software in mind. It will prove easier to introduce, easier for both business users and software developers to understand and apply, and a more effective communication mechanism for negotiating the delivery of value. It is an established standard with recognised training qualifications. There is a growing repository of benchmark data available.

See the table below for a more detailed comparison.

Types of measurement scale and permissible operations using them

The type of scale depends on the nature of the relationship between values on the scale. Four types of scale are commonly defined:

Nominal – arbitrary labels, classification data, no ordering – the measurement values are categorical but it makes no sense to state that one category is ‘greater than’ another. For example: Yes/No; Black/White/Yellow/Red; male/female, animal/vegetable/mineral; the classification of defects by their type.

Ordinal – ordered but differences between values are not important – the measurement values are rankings. For example: restaurant ‘star’ ratings; political parties on left to right of the spectrum are given labels Red, Orange, Blue; Likert scales that rank ‘user satisfaction’ on a scale of 1..5; the assignment of a severity level to defects.

Interval – ordered, constant scale, but no natural zero – the measurement values have equal distances corresponding to equal quantities of the attribute. For example: dates, temperature on Celsius or Fahrenheit scales – differences make sense, but ratios do not (e.g., 30°-20° = 20°-10°, but 20° is not twice as hot as 10°! Other examples: cyclomatic complexity has the minimum value of one, but each additional path increments the count by one.

Ratio – ordered, constant scale, natural zero – the measurement values have equal distances corresponding to equal quantities of the attribute where the value of zero corresponds to none of the attribute. For example: height; weight; age; length; temperature on Kelvin scale (e.g. absolute zero = 0°K, and 200°K is twice as hot as 100°K); the size of a software source listing in terms of Non-Commentary Source Statements (or Source Lines Of Code).

The method of measurement usually affects the type of scale that can be used reliably with a given attribute. For example, subjective methods of measurement usually only support ordinal or nominal scales.

Only certain operations can be performed on certain scales of measurement. The following list summarizes which operations are legitimate for each scale. Note that you can always apply operations from a 'lesser scale' to any particular data, e.g. you may apply nominal, ordinal, or interval operations to an interval scaled datum.

Nominal Scale. You are only allowed to examine if a nominal scale datum is equal to some particular value or to count the number of occurrences of each value. For example, gender is a nominal scale variable. You can examine if the gender of a person is F (female) or to count the number of Ms (males) in a sample. Valid statistics: mode, chi square.

Ordinal Scale. You are also allowed to examine if an ordinal scale datum is less than or greater than another value. Hence, you can 'rank' ordinal data, but you cannot 'quantify' differences between two ordinal values. For example, political party is an ordinal datum with the Liberal Democratic Party to the left of the Conservative Party, but you can't quantify the difference. Another example are preference scores, e.g. ratings of eating establishments where 10=good, 1=poor, but the difference between an establishment with a 10 ranking and an 8 ranking can't be quantified. Valid statistics: mode, chi square, median, percentile.

Interval Scale. You are also allowed to quantify the difference between two interval scale values but there is no natural zero. For example, temperature scales are interval data with 25C warmer than 20C and a 5C difference has some physical meaning. Note that 0C is arbitrary, so that it does not make sense to say that 20C is twice as hot as 10C. Valid statistics: mode, chi square, median, percentile, mean, standard deviation, correlation, regression, analysis of variance.

Ratio Scale. You are also allowed to take ratios among ratio scaled variables. Physical measurements of height, weight, and length are typically ratio variables. It is now meaningful to say that 10 metres is twice as long as 5 metres. This ratio holds true regardless of which scale the object is being measured in (e.g. metres or yards). This is because there is a natural zero. Valid statistics: mode, chi square, median, percentile, mean, standard deviation, correlation, regression, analysis of variance, geometric mean, harmonic mean, coefficient of variation, logarithms.

A comparison of the most common Function Size Measurement (FSM) Methods

General Information

IFPUG FPA r4.3

NESMA FPA v2.0

Mark II FPA r1.3.1

COSMIC FSM r3.0.1

Origin

Created by Allan Albrecht at IBM in 1978

Latest release (January 2010) of the original method

Believed to have been created by NESMA (aka NEFPUG) in mid-1980s

Derived from IFPUG

Created by Charles Symons at Nolan Norton in 1984 (put into public domain 1991)

Updated method for use with DBMS, structured methods, CASE tools, etc

Created by international consortium of industry subject matter experts and academics from 19 countries in 1997

Updated method for use with OOA/D, layered architectures, Web2.0, lean/agile, etc

Counting Practices Manual

Available to IFPUG members

Available for sale

Available - public domain

Available - public domain

Counting Practices Manual - languages available

English & some other language versions available to members

Dutch-language version
English-language version

English-language version

9 language versions:
Arabic, Chinese, Dutch, English, French, German, Italian, Japanese, Spanish

Used by

Public & private sector organisations, large & small, both customers & vendors, around the world

Mostly MIS users

Stable user base – international

Public & private sector organisations, large & small, both customers & vendors, primarily in The Netherlands

Mostly MIS users

Declining user base – mostly The Netherlands

Originally HM Government’s preferred method for sizing & estimating software. Now used by a few public sector customers & their vendors

Declining user base – mostly United Kingdom

Public & private sector organisations, large & small, both customers & vendors, around the world

MIS and Engineering users

Growing user base – international

Terminology used

Founded in the 1970s

Founded in the 1970s

Uses structured methods terminology

Compatible with OOA/D, & software eng. principles

Availability

Available only to members of IFPUG (but easy to join organisation)

Public domain -– download from NESMA

Public domain – download from UKSMA

Public domain – download from COSMIC

Design Authority (independent of vendors)

International Function Point Users Group (IFPUG)

www.ifpug.org

Netherlands Software Metrics Association (NESMA)

www.nesma.nl

United Kingdom Software Metrics Association (UKSMA)

www.uksma.co.uk

COmmon Software Measurement International Consortium (COSMIC)

www.cosmicon.com

Common Features

1. Compliance

All four methods comply with the international standard for Functional Size Measurement Methods – ISO14143:

IFPUG FPA r4.3

NESMA FPA v2.0

Mark II FPA r1.3.1

COSMIC FSM r3.0.1

ISO/IEC 20926:2003

ISO Standard applies only to
unadjusted FP

ISO/IEC 24570:2005

ISO Standard applies only to
unadjusted FP

ISO/IEC 20968:2002

Recommended method for HM Government (UK)

ISO/IEC 19761:2003/2010

BCS Technology Award Winner in 2006

Recognised as a National Standard in Spain & Japan

2. Certification

All four methods operate certification schemes for training measurement staff:

IFPUG FPA r4.3

NESMA FPA v2.0

Mark II FPA r1.3.1

COSMIC FSM r3.0.1

Yes
Certified Function Point Specialist (CFPS)


Uses IFPUG CFPS

Yes
Certified Function Point Analyst
(CFPA)

Yes
COSMIC Practitioner Certification

Benchmarking Data

All four methods are supported by the International Software Benchmarking Standards Group (ISBSG). There are differences, however, in the size of the comparative data pool:

IFPUG FPA r4.3

NESMA FPA v2.0

Mark II FPA r1.3.1

COSMIC FSM r3.0.1

Large data set compiled over many years – the utility of antique data is questionable

Large
Comparisons use
IFPUG data

Small
Some native data; can be compared to IFPUG data if care is taken

Moderate and growing
Data since 1997; ISBSG benchmark released 2009; can be compared to older data if care is taken

All four methods share the following characteristics:

  • Oriented toward user-required functionality
  • Helps verify consistency & completeness of user-required functionality
  • Analyses can be used as basis for construction of tests independent of code & test activities
  • Measures functional size of dynamic (behavioural) aspects of system (expressed as e.g. use cases, conversational dialogues, user stories, epics & themes, etc)
  • Measures development of new requirements
  • Measures adaptive maintenance (enhancements)
  • Designed for MIS systems - flat & indexed files, batch systems, OLTP systems
  • Can be used to measure Functional User Requirements before design, code & test
  • Can be used to measure Functional User Requirements after design, code & test
  • Can be used to (re)estimate during product life-cycle
  • Size can be used as input into top-down software cost models such as COCOMO.II.2000, SLIM, SEER, Price-S, etc
  • Can be used to construct product burndown charts, calculate takt time, #sprints, etc
  • Independent of product non-functional requirements
  • Independent of project constraints
  • Independent of developer experience
  • Independent of process, project management & development methods
  • Early estimates of functional size can be made based on incomplete knowledge of Functional User Requirements – enabling consistent use of one size scale for estimating & measurement throughout project:

IFPUG FPA r4.3

NESMA FPA v2.0

Mark II FPA r1.3.1

COSMIC FSM r3.0.1

Can produce early estimates using

various methods:
e.g. Fast Eddy, File-Based Approach, Transaction-Based approach

Can produce early estimates using

various methods:
e.g. Fast Eddy, File-Based Approach, Transaction-Based approach

Can produce early estimates using

various methods:
e.g. Data Model Approach (CRUDL), Transaction-Based approach

Can produce early estimates using

various methods:
Event-Based Approach, Object-Based Approach, Story-Based Approach

NONE of the four methods deliver the following characteristics:

  • Measures corrective maintenance (fixes)
  • Measures perfective maintenance (refactoring for improved performance)
  • Measures algorithmic complexity
  • Measures reuse of code

Differences between the four main methods:

Characteristic

IFPUG FPA r4.3

NESMA FPA v2.0

Mark II FPA r1.3.1

COSMIC FSM r3.0.1

Measures functional size of static (data storage) aspects of a system (expressed as files, tables, entity types, classes, etc)

Yes

Yes

Regarded as ‘double accounting’
only information processing measured

Regarded as ‘double accounting’
only information processing measured

Compatible with modern methods of requirements analysis

Partially
(1975/85s concepts)
requires data model

Partially
(1980/85s concepts)
requires data model

Yes
(1980/95s concepts)
requires data model

Yes
(1995/2010s concepts)
incl. incremental

Designed for MIS systems - Relational DBMS

No
But mapping rules have been developed

No
But mapping rules have been developed

Yes

Yes

Designed to be applicable to real-time and/or embedded systems

No
MIS concepts only

No
MIS concepts only

No
terminology can be re-interpreted for real-time

Yes
one common model applicable across MIS, real-time & embedded systems

Can be used to measure complex, layered architectures

No
Rules assume monolithic system – infrastructure & middleware is ‘invisible’

No
Rules assume monolithic system – infrastructure & middleware is ‘invisible’

Yes
Limited – can recognise
3-tier architecture

Yes
Designed to recognise ‘layered architectures’ –
measures all functional requirements allocated to software systems

Scale type:

Nominal – distinguishes members of sets, unordered
Ordinal – relationship between sets, unequal intervals
Interval – comparisons, equal intervals, arbitrary zero
Ratio – comparisons, equal intervals, a natural zero
ref: ISO/IEC CD 15939.

‘Nominal/Ordinal’ Scale

Unequal intervals between Low & Average, and between Average & High

‘Nominal/Ordinal’ Scale

Unequal intervals between Low & Average, and between Average & High

‘Ordinal/Interval’ Scale

Weights derived so that
1 MkII fp = 1 IFPUG fp
approximately comparing functional processes

Ratio Scale

Empirical data suggests
1 cfp = 1 IFPUG fp
approximately comparing functional processes

Permissible arithmetic & statistical operations

Categories assigned relative weights:

Data can be 'ranked', but 'quantifying' differences between values is difficult due to ‘cut off’ (Low is c. half of High) –
ratios are problematic

Categories assigned relative weights:

Data can be 'ranked', but 'quantifying' differences between values is difficult due to ‘cut off’ (Low is c. half of High) –
ratios are problematic

Ordered, synthetic scale with a natural zero:

Data can be ranked; differences & ratios between values can be quantified within limits but are problematic due to the use of weights

Ordered, constant scale with a natural zero:

Data can be ranked; differences between values can be quantified; ratios make sense (i.e. 20 is twice the size of 10, and 2000cfp is twice 1000cfp).

Accounts for information processing by:

Sizing static data and dynamic behaviour

Sizing static data and dynamic behaviour

Sizing dynamic behaviour, the use of data

Sizing dynamic behaviour, the use of data

Models the functional user requirements as:

File Types
and
Elementary Process
(= Input-Process-Output)

File Types
and
Elementary Process
(= Input-Process-Output)

Logical Transactions
(= Input-Process-Output)

Functional Processes
(= Input-Process-Output)

Equivalent of
stimulus/response message pair (i.e. a ‘thread of control with some input, related processing, and some output)

Elementary Process either:
External Input (EI),
External Output (EO) or
External Query (EQ)
depending on ‘primary intent’

Elementary Process either:
External Input (EI),
External Output (EO) or
External Query (EQ)
depending on ‘primary intent’

Logical Transaction (LT)

All stimulus/response message pairs regarded at LT irrespective of ‘primary purpose’

Functional Process (FP)

All stimulus/response message pairs regarded at FP irrespective of ‘primary purpose’

Rules for measuring size

Different rules apply depending on elementary process type

Different rules apply depending on elementary process type

Same rules apply to all logical transactions

Same rules apply to all functional processes

Base Functional Component(s)


Internal Logical File
External Interface File
External Input
External Output
External Query

Internal Logical File
External Interface File
External Input
External Output
External Query

Input Data Element
Entity Reference
Output Data Element

Data Movement

(either: Entry, eXit, Read, or Write depending on direction of movement)

Contributors to functional size

Per File Type:
#static Data Element Types & #Record Element Types

Per Transaction Type:
#dynamic Data Element Types
& #File Type References

Per File Type:
#static Data Element Types & #Record Element Types

Per Transaction Type:
#dynamic Data Element Types
& #File Type References

Per Logical Transaction:
#Input Data Elements
#Entity References
#Output Data Elements

Per Functional Process:
#Data Movements

i.e. the movement (Entry, eXit, Read or Write) of one Data Group

Unit of measure

Different weights assigned to 5 function types depending on their relative ‘complexity’

Unit = 1 fp (IFPUG)

Different weights assigned to 5 function types depending on their relative ‘complexity’

Unit = 1 fp (NESMA)

Weights assigned to the ‘minimum size logical transaction’ add to 2.5 to establish comparability between MkII and IFPUG

Unit = 1 fp (MkII)

1 Data Movement
=
1 COSMIC Function Point

Unit = 1 cfp

Sensitivity to small changes to requirements

Low

(only detects changes at boundaries between Low, Average, High categories)

Low

(only detects changes at boundaries between Low, Average, High categories)

High

(detects changes of single data element types and single entity references)

Moderate

(detects changes to single data-groups)

Integrity of measures (how well do the measures reflect the thing measured?)

Artificial limits (weights, thresholds, uneven intervals) limit size of function types measured.
Integrity is limited.

Artificial limits (weights, thresholds, uneven intervals) limit size of function types measured.
Integrity is limited.

No artificial limits imposed on size of functional process.

Integrity is good.

No artificial limits imposed on size of functional process.

Integrity is excellent.

Sensitivity to variation in functional size of dynamic model of system i.e. functional processes

Stepped:
minimum step 3fp
maximum step 7fp

Stepped:
minimum step 3fp
maximum step 7fp

Stepped:
minimum step either
0.26, 0.58 or 1.66
maximum step infinity

Accommodates size variation from zero to infinity in steps of 1 cfp

Sensitivity to variation in functional size of static model of system i.e. data stores

Stepped:
minimum step 5 fp
maximum step 15 fp

Stepped:
minimum step 5 fp
maximum step 15 fp

Data stores are considered to deliver functionality only when the data is referenced in transactions

Data stores are considered to deliver functionality only when the data is used in functional processes

Smallest feasible functional process

3 fp

3 fp

2.5 fp

2 cfp

Smallest feasible enhancement

3 fp

3 fp

0.26 fp

1 cfp


Related Articles

Estimating With Use Case Points


Back to the archive list

Methods & Tools
is supported by


Testmatick.com

Software Testing
Magazine


The Scrum Expert