Hacker News Clone

How to maintain engineering velocity as you scale

by sandslash on 10/25/2022, 4:00:08 PM with 107 comments

by falcolas on 10/25/2022, 4:26:18 PM
I'm going to be contrary and say "that's not possible". At least, it's impossible to maintain feature velocity, even if you maintain development velocity.
The reason is simple, in the beginning you're starting with no legacy code. Features are simple to add, because there is no history to work around. However, even a month in, features are going to start blending together, and new features will have to work around existing features.
Your databases will get new tables, and the existing tables will get more columns. The response code is going to fork as the API versions increment. The API space is going to grow.
Technology which handled 1-1000 requests per minute will fall over when they start getting hit with 10,000 requests per minute, and require optimizations, indexes, caches, and other complexities in the stack which aren't new features.
Startups are fun to work for early on, when complications like these don't exist. It requires more than "the best engineers" to keep up development velocity, let alone feature velocity.
by bentlegen on 10/25/2022, 5:33:31 PM
Genuine question that isn't answered in the article: what is it that Faire is doing that suggests they're maintaining engineering velocity as they scale in a way that is equal or better to anyone else out there? Alternatively, what are the engineering challenges of running Faire's storefront/marketplace makes them more qualified to write about their experiences scaling vs. other organizations?
This article opens with an unsupported assumption, "we are really good at this", and doesn't really elaborate on what that means. I'd genuinely like to know what great engineering at scale looks like, not just some suggested ways to do it.
by lifeisstillgood on 10/25/2022, 7:26:29 PM
- Have devleads as the central crux for every major decision. Screw "senior" management - make sure that the people who are at the codeface every day are part of the major discussions - that those people need to be persuaded of the need for product X or pivot Y. Because they do whether you treat them like it or not.
- the above means you are "surfacing" politics. Do not keep the backbiting and infighting amoung a cabal of senior managers who talk to each other a lot. Make it public
- Have one place to have these discussions - an email list probably, and the only way to get your project signed off is to get agreement on this list (well, Ok the CTO can have some veto power - this is not a democracy of course)
- Analysis really works. Publish one-page analysis of the proposed project. Watch it get destroyed with evidence and laughter and then watch even better ideas get proposed.
In short scaling is a political problem - treat it like one. And engineer the horror out of politics. Democracy and transparency are pretty good at that.
Edit: Buidling a business IMO has strategic, operational and tactical levels. Strategy should be obvious, be PMF and be well known in the company (a PC on every Desktop). Most of the article is tactics - hiring, metrics, stack etc. The hard part is often operational - and that is almost always orgabisation design and that is about communication, alignment of resources, trade offs, etc etc. That's hard. Dysfunctional organisations blow this. Open politics fights dysfunction
by sitkack on 10/25/2022, 7:19:36 PM
Doesn't mention Conway's Law or the Scientific Method, Continuous Feedback or Composition.
Scaling an organization uses the same techniques as creating scalable (computer) systems. Systems are systems.
https://en.wikipedia.org/wiki/Conway%27s_law
If your systems relies on "hiring the best engineers", it is operating Open Loop. All Open Loop systems will suffer catastrophic failure at some point.
Grit is a dog whistle for grind. You can be tough and resilient and flexible w/o being gritty.
by maerF0x0 on 10/26/2022, 12:09:08 AM
My tips for maintaining velocity and sanity:
1. Ensure your devs are making choices that do not mortgage tomorrow. Anything that gives O(1) value (one new feature) and requires O(n) investment (forever maintaining) is a bad deal.
2. Make as much of your code someone else's problem through FOSS. (This is kinda the same idea as innovation tokens)
3. Solve the generic problem not the specific one. For example, i could write a program that reads a csv row by row, converts it to my internal model, then emits those data into a ndjson file. Or I could just write a program that converts csv to ndjson , agnostic of the data. The former costs you every time the model changes. The latter is useful for a variety of system architectures again in the future.
4. Let the computer do as much as possible for you. Automation is compounding returns. O(1) value from manual testing, vs O(n) value for automated testing, there's a clear winner in my mind. Typed languages prevent a class of bugs, so does TLA+.
5. Most customers wont tell you it's broken, most engineers wont stay to clean up someone elses' mess. Best to avoid those issues or resolve them immediately.
6. Remove broken incentives where individuals leverage everyone else's resources for their own gain. An example of the pattern is "pleased to announce" emails sent to all@. (along with the litany of reply-all posturing). These kinds of emails cost you for every employee, but only bring value to a very small subset of folks. At it's worst you have N^2 cost for 0 value. Ensure individuals can be recognized and promoted without such emails. Disincentivize those who bring that behavior from previous broken cultures.
by claudiulodro on 10/25/2022, 4:52:14 PM
I would say it's not necessarily ideal to keep shipping a bunch of features and changes every week as you scale. Once you have an established customer base, change needs to be managed and customers need a heads-up on things that will affect their workflow: you'll want to do deprecation periods, beta periods, have backwards-compatibility, etc. No customer is grumpier than the one whos critical workflow you just totally changed without warning!
I suppose as long as those sorts of tasks count as maintaining engineering velocity, it's all good though.
by 0xbadcafebee on 10/25/2022, 6:42:42 PM
"Hiring the best engineers"
Every company can't have the best engineers. They cost too much and they're a finite resource. I would argue that the quality of engineers is just going down, too, as the hiring pool is flooded with people who've had zero tech experience, and then took a bootcamp and "churned on algorithms" to finally land a tech job. You probably will end up with poor to average engineers, and you'll have to deal with that.
"Building solid long-term foundations from day one"
You have average engineers. The foundations they make are not going to be solid. And even if they could make solid foundations, your founders won't care. They just want to ship ship ship ship ship. So your foundations aren't going to be solid... and you'll have to deal with that.
"Tracking metrics to guide decision-making"
Never seen it done well. If you have the time and staff to properly do this, you're either crazy lucky or are swimming in cash. Those days are probably gone... and you'll have to deal with that.
"Keeping teams small and independent"
Remember Conway's Law? The more teams you have, the more fractured your architecture becomes. Obviously you can't have one infinitely big team, so the key is to have as few teams as you can get away with. Integrate them tightly - same standards, same methods of communication and work. Eliminate monopolies of knowledge. Enforce workflows that naturally leads to everyone being more informed. (For example, every team having stand-up on a video call means none of the teams have any idea what the other teams are doing)
There's actually a lot of evidence-based research and practical experience out there, that's been around for decades, that shows how to maintain the productivity of organizations, regardless of whether they're growing or not. But they're not trendy, so nobody working today gives a crap. Actual productivity gets ignored while managers pat themselves on the back and start-ups churn out banal blog posts with the same lame advice that thousands of startups have used and still completely failed to maintain velocity with. We all know how startups today really get by: burning through staff and money, and luck.
by winphone1974 on 10/25/2022, 10:04:12 PM
So they value "grit" which is defined as the ability to code and push features in near real time, as told to the CEO at a multi day trade show, then follow that up with explaining the importance of building a solid foundation.
Pick a lane.
by whakim on 10/25/2022, 9:30:03 PM
I can't express how much I dislike the advice to be "data-driven" and collect all the data you possibly can because it could be useful someday. While this may or may not be sound business advice, it's deeply unsettling to see such profound disregard for user privacy trumpeted as a key to scaling quickly.
by reillyse on 10/25/2022, 6:22:17 PM
It's interesting to see CI wait times as core engineering metric. I 100% agree, so much so that I'm building a product specifically to speed up CI. We have redesigned how hosted CI works focusing on speed as our north star. We don't have CI wait time - your test starts immediately and we run your tests super fast. How do we do it? We have many workers with your test environment pre-built so we can split your tests and start running your tests in seconds. If anyone is interested you can check it out at https://brisktest.com/, I'd love to hear any feedback from the community.
by andsoitis on 10/26/2022, 12:55:40 AM
> cultural issues do not emerge as you scale
Naïve and unavoidable. As you add more people, the people issues become more complex and require more sophisticated approaches. A 10,0000 employee company is very very different from a 2,000 employee company. A 100,000 employee company is again very different.
The art then is how do you evolve your culture to adapt, with intent, based on solid principles but that might express themselves differently at these different sizes.
by hrpnk on 10/25/2022, 8:51:45 PM
> Pods should operate like small startups
This only works if a pod is equal to a domain that's fairly independent of the other domains, technically, and business-wise.
If there are N pods per domain, each being their own startup without additional coordination results in chaos and duplicated work. Business complexity not included (two pods from different domains can unknowingly work against each other due to having conflicting goals/targets).
by ThalesX on 10/25/2022, 8:10:51 PM
I wish I'd see advice where they prioritize having a roadmap, milestones, an actual plan of execution, after Product Market Fit (PMF) has been found. And before you find PMF, any concern for hiring the 'best engineers', building a 'solid foundation' is moot.
I feel I'm going insane expecting product people to put the time in to define the requirements and context; I get weird looks asking startups about the plan for the next week. "That's how startups do it" is just the most bullshit excuse I keep hearing constantly regarding lack of planning.
by itsmemattchung on 10/25/2022, 4:48:22 PM
> For us, pods generally include 5 to 7 engineers (including an engineering manager), a product manager, a designer, and a data scientist.
Similar approach to Amazon's internal two pizza team philosphy[0]: every internal team should be small enough that it can be fed with two pizzas.
[0] https://www.theguardian.com/technology/2018/apr/24/the-two-p...
by iovrthoughtthis on 10/26/2022, 7:20:27 AM
the number one principal of a high performing engineering team needs to be that unblocking your peers is the most important thing you can do
if it's not, your peers will just invent work to feel useful while they wait for you to unblock them
teams that do this go fast, teams that dont go slow
by cjblomqvist on 10/25/2022, 8:01:32 PM
Each pod should have a clear leader. We have an engineering manager and a product manager co-lead each pod.
So, 2 leaders is "a clear leader"? How does that work? Sounds contradictory?
by mouly on 10/26/2022, 1:07:07 AM
Scaling engineering velocity is also dependent on the domain and strategy. If the strategy is throw darts on the wall and see what sticks - one can scale independent teams. If the strategy is leverage what we have to build new features then teams have to communicate with each and this doesn't scale linearly.
by rubyist5eva on 10/25/2022, 7:18:29 PM
lol this feels like exactly the opposite of what's going on at my company now:
1. Hire the best engineers: fire half the dev team and replace them with offshore devs for pennies on the dollar 2. build solid foundations: cut every corner possible to get whatever crazy deal our sales team made yesterday 3. tracking metrics: uptime? who cares, CI taking too long? whatever
well, I suppose our teams are small when they literally fired everyone and made the "team" so small it literally couldn't be smaller or it wouldn't be a team (there is 2 of us now). Hardly independent though since we're shackled to the whims of clueless sales drones that have zero clue how building software works.
by WirelessGigabit on 10/25/2022, 11:43:28 PM
Don’t take away admin rights for devs. I gotta file paperwork every time I want to elevate to admin.
I develop. I iterate incredibly fast. I don’t have time to wait 2 weeks for security to sign off on a tool that I want to use.
by diceduckmonk on 10/25/2022, 9:00:11 PM
> We use Redshift to store data. As user events are happening, our relational database (MySQL) replicates them in Redshift. Within minutes, the data is available for queries and reports.
Is data engineering really this simple?
by jlarocco on 10/26/2022, 1:13:55 AM
Honestly, I'm so tired of arbitrary UI and behavior changes, I'd actually say more companies need to back off their engineering velocity as they scale. Get it right, then leave it alone.
by WarOnPrivacy on 10/25/2022, 5:42:33 PM
> Faire’s engineering team grew from five to over 100 engineers in three years.
If I'm reading the (below) image correctly: They did this by staying as far away from older engineers as they possibly could.
ref: https://www.ycombinator.com/blog/content/images/2022/10/3.jp...
by hayst4ck on 10/25/2022, 8:27:51 PM
Here is what I think are several root causes of poor velocity
```
  1. too much focus on hiring
  2. lack of clear responsibilities
  3. lack of management <-> line worker interaction
  4. bad mentor <-> new grad ratios
  5. bad product development to infra (build infra/infra infra/dev tools etc) ratios
  6. mistaking prolific senior engineers for good senior engineers
  7. letting senior engineers off the hook for maintenance
  8. lack of some process
  9. hiring specialists
```
One can ask what sacrifices you make to hire good engineers. You might choose to make exciting infrastructure investments rather than a necessary investment. You might promise that the "good engineer" won't have to do incredibly boring work. You might hire people who have made a career out of avoiding the real high risk pain centers of a company and instead working on high visibility low risk problems. How much of which engineer's days will be sacrificed to interviews? The engineering concessions made towards the goal of hiring are likely an underrated root cause of poor velocity.
I watched the most festering pile of code at a company be hot potato-d between the vp of infra and vp of product. The CTO was checked out and not in touch with what was happening enough to know this was a problem. Neither VPs brought it up as a problem, because neither wanted responsibility and therefore the likely black mark by their names for the uphill battle that would result. The company deeply suffered because there was no advocate for the companies highest pain area because everyone with power, clout, or authority avoided responsibility for it.
When management gets insular, and management fails to solicit direct feedback for line workers, they can't be sure the picture they have in their head is what matches reality. This creates management level delusions about the state of their engineering org. We can see this played out in US vs Russian military structure. Management sends goals down and expects them adhered to. Failure results in punishment. This creates rigid planning and low agility. The US military instead gives lower levels large leeway to achieve higher level goals. It is the lower levels responsibility to communicate feasibility or feedback, and more importantly it is upper managements responsibility to adapt plans based on feedback. I was absolutely part of an "e-4 mafia" (https://www.urbandictionary.com/define.php?term=E4-Mafia) and I knew much better than my superiors what was happening, why it was happening, who was doing it, who could help doing it, and its likelihood of success because I was in the weeds. When I laughed directly at managers who told me their plans, they thought it was something wrong with me, not something wrong with their plans. That was half management failing, and half my inexperience in leading upwards.
Every new grad needs one mentor to prevent them from doing absolutely insane overly complicated things. If you do not have a level of oversight, complexity will bloom until it festers. A good mentor preventing new grad over complications can save an incredible amount of headaches. New grads should not be doing other new grads code reviews (for substantial work). Teams should not be comprised entirely of new grads and an inexperienced tech lead. New grads are consistently the largest generators of complexity.
I worked at a place where there was 1 person working on build infra. .2% of the company was devoted to making sure we had clean reliable builds. I estimate 5-15% of the engineering org quit due to pain developing software, which meant there was a lot of time spent interviewing people and catching them up rather than fixing problems. I don't know what the right ratio is, but I can say for sure that if you don't invest in dev tools/build infra etc, early enough, you will hit a wall and it will be damaging if not a mortal wound.
There are lots of engineers who code things to be interesting. They write overly complex code. They lay down traps in their code. It's rare for there to be a senior engineer who writes boring, effective, and, most importantly, simple code. Some of the code I've seen senior engineers write violates every principle of low coupling, no global state, being easy to test, etc. These people are then given new grads who learn to cargo cult their complexity until it gets to the point where someone says 'we have to re-write this from scratch.'
There is an anti-pattern where senior engineers get to create a service with no oversight, then give it to other people to maintain and build upon or "finish." Those teams seem to have low morale and high turnover. The people left on those teams aren't impressive and so it gets harder to hire good engineers for those teams. If a team is the lowest rung on the ladder, clearly evidenced by being given problems and being told to "deal with it," that will show to new hires only exacerbating the problem.
Some people hate process, it slows them down. Bureaucracy is (debate-ably) terrible. One design doc with a review can save quarters of work. Some process slows progress down now, for less road blocks later. If process is not growing at a rate of O(log(n)) or growing at a rate greater than O(log(n)), then there's probably gonna be some problems.
Lastly, while it's important to hire good people, it's also important to hire some specialists. Databases, infra, dev tools, build infra, platform/framework infra, various front-end things, traffic infrastructure. There are all types of specializations, and if you have a good "full stack" product engineer in the room without say a platform/framework specialist, you will get the product development perspective without the product maintenance perspective, and that has exactly the consequences you might expect. The earlier you get an advocate for say "build infrastructure," the more you are able to address future problems before they are major problems.
by danielrhodes on 10/26/2022, 12:42:47 AM
This article dances around some of the important stuff.
Maintaining engineering velocity is about adapting your culture to focus on the right things. At the beginning, engineering velocity is driven by low communication barriers and fast decision making and lack of existing tech debt. But this doesn't scale.
For engineering velocity at scale you need the following:
- Maintaining a high quality bar: You need to properly manage and prioritize tech debt and infrastructure complexity. This also means keeping dependencies low. The more engineers you have, the more code you're going to have. If this catches up with you, your velocity can be severely impacted.
- You need to have good change management: things like database migrations or big code changes should carry the least amount of risk possible. Documentation is important. You quickly get into hot water if you get stuck here.
- A culture of continuous improvement: teams should be data driven in measuring their performance and motivated to maintain and improve that performance over time. That means, for example, tracking sprint completions, bugs, etc. Each team needs to own this. The goal: ship high quality code faster.
- A close connection with the business and customers: When you focus on what the business needs and what the customer needs, it prioritizes things in such a way that your teams are dialed in to work on only what is needed. You can waste a lot of engineering time on things which don't matter.
- A culture of coaching and personal improvement: hiring good people is key, but ultimately it's how they play together as a team which is the most important. People need guidance on how they can be the most useful in that context. Sometimes people don't work out, so having fast feedback and showing people the door when they don't work out is necessary. Not doing this is a great way to demotivate other team members, at the very least.
- High amounts of ownership. This has sort of become a trendy topic in management philosophy, but ultimately it comes down to: can teams make autonomous decisions and own outcomes in the greatest way possible. This is all aimed at reducing decision making and communications overhead. It is also about making sure people with the greatest context can make decisions rather than somebody higher up with less context.
I'm sure I'm missing other stuff, but if you apply these principles things start aligning themselves in a direction where velocity is constantly improving.
However, I think talking about engineering velocity can sometimes miss the mark. What you really want is consistency.
Engineering lives within a broader organizational structure and other parts of the business need to be able to rely on you for certain things. When you can say: "Yes we can ship this feature by this date" and then hit that, you enable a lot of things. So yes, being fast is great. But you won't get there without being consistent.
by vasco on 10/25/2022, 6:31:34 PM
This is bullshit advice. Hire the best engineers? Might as well say "don't make mistakes" in a list of advice about how to minimize mistakes. Same thing with building solid foundations from day one, maybe you do it, but more likely is that the person that comes in to make sure velocity can keep up is different from the core group that was just iterating to find PMF.
The other two points are more relevant, but I feel like half of this advice is not really useful. Way more to learn from descriptions of turning around a codebase of CI/CD pipelines that were struggling with slowness, flakeyness, contention and how to dig yourself out. Those stories at least you can learn from and adapt to your situation.
If "hire the best engineers" is your advice for anything, that is only a tenable strategy for VC backed startups willing to burn 200k/person from day 0, but I guess this is on a VC blog, so what can you expect. More useful advice is "how to do X with normal people".
by jahewson on 10/25/2022, 5:22:14 PM
> From the beginning, we built a simple but solid foundation that allowed us to maintain both velocity and quality. When we found product-market fit later that year and started bringing on lots of new customers, instead of spending engineering resources on re-architecturing our platform to scale, we were able to double down on product engineering to accelerate the growth.
This is the gold standard. It takes exceptional talent to put together an organization like this while searching for product-market fit. Bravo!
by thedudeabides5 on 10/25/2022, 8:16:31 PM
Step 1. Set up data pipelines that feed into a data warehouse.
Amendment if you are using financial data...
Step 1a. Rather then building this stuff yourself, go to rose.ai and use our financial data warehouse, pipelines*, and pre-built models to save yourself months.
*If you are in tradfi, we have Bloomberg, Refinitiv, FRED, CapIQ, FactSet etc.
If you are in crypto, we have integrations with the blockchain, Dune, coinmarketcap, coingecko etc.