Longer than I remember, it has been said you can’t measure developer productivity.
Some incredibly experience people like Martin Fowler and Kent Beck (bliki: Cannot Measure Productivity and Measuring developer productivity? A response to McKinsey and https://tidyfirst.substack.com/p/measuring-developer-productivity-440) say it.
With Gitrevio, we are making our bet that we can measure productivity of developers, development teams and we can do it in quite universal, all encompassing way.
Why they say you can’t measure productivity?
There is a number of arguments that are the most common.
Let’s go through a couple of them.
Lines of code, commits are not outputs
So the most naive metric you could try to measure productivity is by lines of code added. Alice added 100 in a day and Bob added 200. So Bob is twice as productive.
He is not. He might be more productive, less productive, Bob might in fact destroy value.
Lines of code is a good sub-metric, when you control for other things like:
- Does Bob has the same information density as Alice?
- Does activities required to deliver the feature correspondent with adding new code, deleting code, modifying existing code?
In all long lived systems, deleting code is an important part of longterm maintainability and more lines modified instead of being added. Business change, users’ needs change, so the system has to change.
So if system’s total LOC metric is likely to follow logarithmic curve, does it mean, that developers are exponentially less productive over time? No, they just modify more and add less.
Cecil and David are building CRM systems. A bunch of screens that track contacts, leads, deals. Cecil is making this app in Java and Spring Framework. David is using Ruby on Rails.Cecil is adding 1000 lines of Java a day, David is adding just 300 LOC a day.Surely, Cecil is more than three times more productive than David.
He is not.
Java and Spring Framework often require more lines of code to express certain ideas than Ruby and RoR. At the same time, a big part of the Java codebase can be generated. Java+Spring might be more explicit, have better extensibility than Rails codebase, so more lines of code might not be bad.
It is quite possible, that Cecil and David are delivering exactly the same amount of features, new screens, new datatypes, new forms and datagrids and their productivity seems absolutely the same for outer spectator (for example system user or product owner).
Ella is doing 3 commits a day and Frank is doing 24 commits a day. Is Frank eight times more productive?
Nope, it’s just that Ella has recently moved from SVN to Git, and she is used to commit and immediately push every time the system is stable enough.
Frank is doing TDD and every time his tests pass, he commits. He is not pushing 24 times per day.
Different VCS habits make a huge difference.
Well, but there’s no doubt that zero commits, is a signal.
Gilli was so much better than Herbert. She delivered those 10,000 lines in 1 month. She commited 20 commits a day. She was adding new features every day. The customer is happy!
Herb needed twice as long to deliver the app. He wasn’t pushing new features daily, but only twice per week. The customer was less happy.
Gilli is twice as good developer! Except that after the system was handed over, it has shown it has many security errors, it falls down when some tables have more than 255 items and on some pages (she did recursion in a language that doesn’t support tail call optimization), it requires thousands of queries to render (N+1 problem) while Herb’s version requires just 5.
Somehow, code quality wasn’t taken into account.
And to make it even worse, Gili built her app in Yii framework version 1 and Angular version 1 while Herb used Clojure and ClojureScript.
Yii Framework 1 and Angular 1 were surpassed by dramatically different versions 2. The company had to invest heavily to practically rewriting the whole codebase because no developer wanted to work on an abandonware.
Herb’s code is running on a language that is evolving slowly and doesn’t require developers to do any changes neither for technical reasons nor for HR or cultural reasons.
Was the company significantly more productive when rewriting app from Yii + Angular 1 to 2 in comparison with the company that didn’t have to change a thing? Of course not.
Story points can be faked and are not the whole thing delivered
IPointCorp implemented agile methodologies.
They had a trainer onsite who changed their methodologies. He’s explained why and how to use Story points instead of hours. Then, the consultant left.
The company has 25% employee attrition rate.
After 1. year, there’s only 75% of employees who go through the training and know how to use story points properly.
After 2 years, it’s 56%, then 42%, then 32%.
IPointCorp is not extremely high tech company. They don’t hire people with lots of IT management or agile experience. The understanding of all underlying principles is getting worse and worse.
So there’s Ivan. A new manager who used to be in logistics. He is a smart chap, hard working and always trying his best to deliver.
He gets to manage the IT team. It’s a new experience for him, but he has got BA in Management and he has been a manager well over a decade. He can handle it.
So one thing he notices is that the team is using internal metric called Story Points. When the business tells priorities to the team, the team provides approximate delivery dates based on estimated points and delivered points per week. It works well.
Ivan notices, that one team member delivers twice as many story points than anyone else.
By the end of the year, all four devs come to get a raise. But Ivan gives a raise only to the one team member who delivers all of those story points. In three months, two developers leave.
Ivan is happy that his star dev stays.
Well, it shows that the dev who delivers most of the features is extremely introverted and is focused on adding new features. The two devs who left were often speaking with stakeholders within the company claryfying unresolved questions. Also, the „star“ dev works in hurry and adds a lot of bugs to the repository. The other two devs who left were fixing a lot of bugs. Fixing bugs and speaking with users didn’t get any storypoints.
So Ivan was left with one fast developer who is a bit clumsy and doesn’t communicate well and with a developer who was unproductive all the time. Two other devs who were great in other attributes, left.
He didn’t learn. So he hired two new devs and told them how many storypoints must be delivered on top of all activities that are not estimated.
The team scaled story points to the level so the team is delivering expected amount. If Ivan will ask for more story points, the team will just update estimates.
Activities vs outcomes
JACME LLC is an innovative company. Just like Procter&Gamble, or Dyson, it has many new products per year. The only difference is, JACME is releasing mobile apps.
Some of its apps were very successful, attracting hundreds of thousands of users and bringing revenues in millions of dollars. The company has got 5 cash cows, 10 apps that paid off and about 40 failed apps.
It gives jobs to almost 100 developers, designers, testers, product managers, marketers and managers.
Right now, it has 3 teams building three new apps.
- Keg – Uber-like beer delivery service, roling out in Southern California first
- Log – Uber-like wood delivery service, starting in Oregon
- Mug – Uber-like sustainable local-made handmade pottery delivery service, rolled out across the whole USA
All apps have got their first version out already. These are minimal versions, the userbase is small, but growing.
Keg team is hosting parties, working with influencers and are burning $250k per month; they deliver many new features in a team of 5 devs and a few more people in business. They have revenues of $10k per month, but predicted YoY growth of 200% and massive potential market.
Log team is doing SEO, affiliate and overall is much smaller. The team of just 3 people, where just one is a developer, is spending just $30k per month. The company has revenues of $10k per month too, and it expects growth of 200% as well.
Mug team is very active on Instagram, Pinterest. It works with local artists and have a base of local scouts who are looking for talented small artists. The company is adding a lot of subscribers on social media.
Developers in all teams are great, they deliver everyting marketing wants and needs. Everyone is happy.
Yet, in the next year:
- Keg by mistake delivers beers to a bunch of teenagers who get drunk. One of them ends up in intensive care with alcohol poisoning. And at the same time, Trader Joe creates a sub-brand, hits the nail in the marketing and startgs growing aggressively. Keg growth stalls and the company still burns significantly more than it earns.
- Mug’s growth unexpectedly slows down as GenX is not as much intested in local made product as Millenials are. Mug gets a break even and is slightly profitable, but it isn’t growing a lot.
- Log team had an idea. They got contacts of all firewood providers in Oregon, created bidding system and secured the lowest prices in the whole state. Later, they started rolling out to other states doing the same thing. The business has grown to $40M revenues, then $100M revenues and is growing a lot. The most boring business has shown up the best in JACME’s portfolio.
Temporary skill vs need mismatch
Noe and Omar work in small company. They build e-shops in Django and jQuery. Yes, in jQuery. The company exists for almost 3 decades and have been here when Amazon was selling books more than anything else.
Noe has been with the company for ages. She remembers writing CGI scripts in Perl. She remembers Java servlets, Flash, Cold Fusion, HTML table layouts, CSS Zen Garden, first CSS resets, DHTML, HTML 4 and XHTML 1 war, first release of Django, moving from Prototype.JS to jQuery.
When Noe was already writing code professionally, Omar wasn’t born yet. Yet, he is clever and learns fast. He jumped on the technologies du jour and learned node.js, React, TypeScript.
And shortly after Noe has started at the company, they revealed they are moving towards node.js, React and TypeScript devstack. Even when Omar has never read a single DTD file, and he has never heard what is SGML, he is seen as the webdev star in the team.
He is teaching Noemi how to do things in node. And he is rewriting the app to node.js as fast as possible. At the same time, Noemi is trying to speak with decision makers, why is the rewrite and change of technologies being done.
Omar is significantly more productive than Noemi, who is just old, slow and problematic, right?
Or maybe not. Noemi can see that many things that are being reimplemented are more low level in node.js than they used to be in Django. They have more dependencies that are changing more often and require more maintenance. Also, the e-shop the company is building, have templates users love for very long time, and there isn’t that much JavaScript. A validation here, an animation there, a datepicker and address autocomplete sprinkled over forms. E-shops are easy to be found in Google, and other search engines.
So how do you even evaluate productivity here?
Noemi is trying to prevent company from bad technical decisions. She is experienced and can learn everything Omar knows too.
Omar is excited about new technologies, is working hard and wants to „replace that legacy system“ with the devstack everybody seems to be using.
Maybe Noemi is right, and she could save the company a lot of money. Or maybe not. This story can go both ways.
Uncertainty
And then, the last thing.
Software Development is an activity with uncertainty.
It might be closer to manufacturing, for example when you are building your 50th website on WordPress.
It might also be closer to research, for example when you are designing and building a new highly innovative programming language.
In the backlog, there is a series of tasks, all of them estimated to be 8 hours.
Peter gets 3 user stories, delivers them in 12 hours, 8 hours, 10 hours.
Quinlan gets 3 user stories too. She delivers them in 20 hours, 24 hours, 36 hours.
Quinlan needed 166% more time to deliver these features. Is she 266% slower?
No, she was just unlucky. Tasks were not estimated well, some of them were returned back from Product Owner to be improved.
So what is Gitrevio?
First, let me thank you. You are well over 2000 words into this article. Let’s speak for a second, how exactly we want to address all of those challenges.
All of the arguments above are absolutely relevant and a daily part of every team lead’s, IT manager’s and CTO’s struggle to manage productivity. We are aware of this, and our attempt is to build a solution that is holistic.
Gitrevio is a business intelligence and AI automation platform for CTOs, team leads and IT managers.
The whole system is based on three parts:
2. AI tasks, that process data across various datasources. These can be used for auto-tagging, creating summaries, predictions and to detect outliers. We are not just feeding data to LLMs. We use other forms of machine learning too (classic statistics, Bayesian Networks and Markov Chains).
3. We visualise our findings in holistic dashboards that are not just showing numbers. We want to display the whole story. Who, since when, what has happened, why? In fact, helping you to find that something is happening and why is more important for us than showing you numbers.
How does Gitrevio address each of those counterarguments?
So let’s start with a fact. Software Development is not research, it’s not art. In some ways, it could be seen as a craft. But it is mainly engineering. Good software developer is a good engineer, no matter if he has engineering degree or not.
Richard W. Hamming; The Art of Doing Science and Engineering
And let’s say a couple of things I consider axioms:
Senior-enough manager can compare productivity of two people when he has the whole picture, so even when you don’t provide a score, just showing the full picture is enough to compare and decide. But the whole picture must show all activities that are being done.
Development is uncertain, but repetitive and subject of statistics. Tasks and projects have log normal distribution (Log-normal distribution). ROI only adds additional Return attribute.
Based on the level of uncertainty, you can have expectations of how long things will take. People you hire, and teams you manage, all the tasks they do are somewhere on the curve.
Source: Wikipedia
Things that were not viable pre-LLMs and pre-machine learning can be viable now
- Nowadays, you CAN detect the team is changing measuring metrics; curve manipulation can be detected automatically.
- And you CAN ask AI to monitor methodology adherence.
- And you CAN ask AI to tag and cluster all datapoints, no matter if these are related to individuals, teams, repositories, or business initiatives. All those data that would have been a complete nightmare for managers and team leads to collect, can be collected automatically now.
There are significant differences in productivity; you don’t need high accuracy when the top 10% is consistently delivering 10x value than the bottom 10%.
By simply reallocating/replacing bottom 20% and supporting and enabling top 20%, you can gain extreme productivity gain. Or sometimes, the whole team is just purely awesome. That’s extremely rare. Or a bit more often, but still rarely, the whole team is just terribly mismanaged, misaligned with the company, demotivated and beyong the point of repair. In that case, the whole team should be replaced. You don’t need subtle signals to see where you are.
We are moving to remote settings where developers generate machine readable data showing their inputs, activities, outputs and business outcomes of those outputs.
Things like customer empathy and looking for simplifications can be detected.
Employees have different attitudes and attributes that change very slowly.
So if you know 15 people in your team commit a lot, but are not interested in PRs, code reviews; 5 are willing to do those; only 1 is willing to do interviews with candidates; nobody is willing to write documentation or tests for old code, you know what you need to do, where your bus factor https://en.wikipedia.org/wiki/Bus_factor is too low.
Lines of code
We show additions and deletions in the codebase. But we also show other activities, including non-development ones. We show code quality metrics. We don’t say fire him, we say: „Look, he hasn’t commited here for a full week, but he is in work, ask him on 1:1 what is he doing?“
You can click through each commit log directly into commit details.
We also have onpremise worker that collects data about your codebase. That means, we don’t need to clone your repositories. We don’t have your source codes on our server, just analytic data and links to your GitHub, GitLab or Azure DevOps.
Commits, PRs, Issues
The same. We show commits. We also show PRs, tasks finished. We don’t say somebody is better because he does more commits.
But we visualise everything together.
You get a picture of everything one or more individuals are doing.
You get a picture what the whole team is doing.
You can see what’s happening in one repository.
Or what is being done in one project, that spans multiple repositories.
Also, since we are tagging issues, reading estimates, we can monitor, if there is a constant or multiplier added to estimates and whether this constant is changing (somebody is manipulating the estimation).
We can also provide statistical estimate of how long tasks will really take.
Business outcomes
Even when developers often don’t have a direct responsibility to deliver particular business outcomes, it helps to know what people do and what activities they do to support business.
So we don’t focus purely on development activities. We show how developers help with tech interviews and how they support HR.
We show how much time your senior devs spend coaching junior devs and newly onboarded people.
Addressing uncertainty
And we are aware of log-normal distribution of your tasks. We train models to tell you how long tasks or whole projects will take, so you can steer them or stop earlier and save money.
And more
Gitrevio is not just this. We have Code Review dashboards where you can see how people do in their PRs, but also who are your best and worst reviewers.
We have release risk dashboard that predicts and explains risks related to releases.
And we have two screens that deserve better explanation:
- Termination impact dashboard
- Onboarding dashboard
Termination Impact Dashboard
What happens if somebody leaves? What activities are going to be affected? Which repositories and folders in your repositories are going to be unmaintained or maintained significantly less?
Termination Impact Dashboard will tell you exactly that.
Onboarding Dashboard
We are early into this dashboard, but it has such a strong potential, this might be a standalone product and still pay off.
Onboarding dashboard lets you to create three groups:
- People currently onboarded
- People who were onboarded successfully
- People who were not onboarded successfully
And in this dashboard, you can compare commits, PRs, tasks, and overall outcomes of all individuals, based on onboarding periods of your (ex)colleagues.
You can recognise much earlier, whether people will work out for the company.
Nowadays, with the price of hiring, onboarding, coaching, offices and devices, lost revenues/development, it costs companies easily an equivalent of 12 monthly salaries to recognise bad hire.
We can cut this by 50-75%.
Just this dashboard, which you get for free with Gitrevio, has probably positive ROI to pay licenses for all people in the company.
So does Gitrevio have a positive ROI?
Yes, definitely!
At $45/individual contributor/month, we believe, we deliver ROI much higher than 1:10.
Just following three features are likely to have positive ROI and would be a good investment, even if no other feature worked for you:
1. Optimisation of Activities, Outcomes and team – our KPI is to improve your productivity by 30% while costing you only 1% of your development expenses; the ROI of 30:1 is great/
2. Onboarding optimisation – 50-75% reduction in expenses for bad hires; at 25% employee attrition and 33% bad hire rate, just this has a positive ROI for the whole company.
3. Project budget overrun detection – more than half of projects go over budget and/or are stopped and/or never used. We help you to recognise probable delays months earlier.
On top of that, our second KPI is up to 50% of repetitive tasks of all leading roles (from team leads, IT managers to CTOs and tech CEOs) automated. So you can free hands of your most important people to do more important and creative work.
How can you get Gitrevio?
Contact me at jiri@gitrev.io
We will import data that can be received from APIs (like GitHub, GitLab, time trackers, CI servers, HR apps) and import them to the database.
This will make all dashboards alive.
You will run local Git repositories scanner. This small Python app is the only app that can read your repositories. It can be hosted onpremise, so we don’t have access to your source codes. You can audit the source code and you control when it is running.
You help us with mapping important things like adding individuals to teams, repositories to projects.
We will attach AI bots (we use GPT, but if you prefer another provider, including isolated, self-hosted model, so your data are not shared, we can do that) + train neural networks and start populating dashboards.
We give you training how to use Gitrevio and we are constantly in touch with you to assure you are getting the most from it.
And if you decide to leave one day, we can remove your data. You have options to commit for a month, or annually (for lower price).