Matrix of Services
Matrix of Services, which I will call MAXOS, is the secret weapon that Silicon Valley is using to disrupt and destroy competitors.
I have seen the MAXOS structure in stories from Google, HubSpot, and Amazon. None of these companies uses traditional product management or textbook “Agile” methods. They have found something more powerful and more scalable.
To see the power of this method, we can compare a traditional online retailer with Amazon. The traditional retailer offers products and custom B2B ordering services through a monolithic application that is produced by a well-managed team of about 150 people. They use lean/agile techniques to produce a new release every six weeks. The operations team needs to be sure that their Web site, which brings in millions of dollars per day, doesn't have any reliability problems. Because it takes a long time to fully test a big app, they hold each release for three weeks of testing before they deploy it. So, the minimum time from idea to release is 9 weeks.
Amazon has more than 1000 service teams, each building and continuously releasing software. They release changes to a service, on average, once every 11.6 seconds. At any given time, 10,000 servers are being updated with new code. In the time that it takes retailer X to release one somewhat modified version of a monolithic application, Amazon has made more than 300,000 different improvements and adjustments.
I recently received a call from an Amazon Web Services competitor, who described his cycle time disadvantage as "an emergency."
How it works
Here is how MAXOS works. Software applications and infrastructure are divided into small services. For example, one service might render a Web page, and call a different service to get information about a product. HubSpot divided their online marketing service into 200 different services. Each service is built, tested, and operated by a small team. HubSpot has about 90 programmers, divided into 30 service teams. So, at HubSpot, each service team has three people, who take responsibility for 6 or 7 services.
Each team takes complete responsibility for design, programming, testing, and release. They can release changes at any time. HubSpot reports that if you add up releases for all services, they typically release about 100 changes every day.
HubSpot teams also take responsibility for operations. HubSpot runs their matrix of services on 2000 Amazon servers, which they monitor for problems. If the monitoring system sees a problem with a particular service, it doesn't only notify an operations specialist, it also notifies every member of the service development team. That's a good thing, because with so many services, the operations specialist will not be able to understand and debug every one. This is a full implementation of the "DevOps" concept. HubSpot has a small number of operations specialists who build tools that the service teams can use to deploy, control, and monitor.
Each team receives a constant stream of feedback from their services about performance, errors, usage, and user-reported problems. If there is a problem, they see it and fix it.
This structure has some remarkable self-organizing properties.
In a large project, it is often difficult to coordinate dependencies - the things that each team needs from other teams in order to fix something, or add a complete feature. Waterfall planners try to figure out all of the dependencies in advance on a big project chart - which never works. Scrum practitioners recommend a "Scrum of Scrums", a big meeting in which teams ask each other about their dependencies -which nobody wants to go to.
Service teams coordinate more happily through continuous integration. They release their changes into a shared test system. This system runs automated test scripts which uncover problems inside one service, or in the calls from dependent services. It can then notify the related service teams, and tell them who they need to talk to resolve the problems.
The continuous integration machine replaces a substantial amount of human planning and project management.
A service team plans a large percentage of its own work, without any loop through a management or prioritization system.
Services teams know what they need to do because they are constantly receiving feedback from their services. They can plan their own immediate response to problems with reliability, speed, and quality. They do hands-on deployment and devops. They can plan medium-term responses to feedback from end users and their colleagues who are service consumers.
This system is more efficient than a system where each task has to go in a list for manager to look at and prioritize, since it does not consume management time. It also has a faster reaction time. Some scrum teams create an extreme version of planning lag by selecting a set of tasks in an iteration planning meeting, and then rejecting all new work for the next two weeks because it will disrupt their time estimates. Service teams can respond immediately to urgent requests.
Because service teams plan so much of their own work, they have a limited capacity to accept new requests. It is a harsh reality of software development that we spend much of our time making fixes, and it is unusual for any development team to have more than half of their time available for the development of truly new features and architectures. Self-managed teams turn this into an advantage. Service teams reduce work for managers and product owners, and free them up to focus on getting the most impact out of those new requests.
You can scale this organization by adding services, and adding service teams. Service teams have a simple structure built around a tech lead and programmers. There is a reduced need to plan and recruit for functions like QA, operations, and project management. MAXOS style organization can work for ten people, and work for 10,000.