Technical Debt: What Engineers Wish Managers Understood
What Engineers Actually Mean When They Say "Technical Debt"
Let's start with what technical debt really is, because I've been on both sides of this conversation and there's often a fundamental misunderstanding happening.
From the Engineer's Perspective:
Technical debt is every shortcut, workaround, temporary fix, and "we'll come back to this later" decision that's still in production. It's the documentation that was never written. The monitoring that was never implemented. The network architecture that was designed for 50 users and is now serving 500.
It's not laziness. It's not engineers being perfectionists. It's the accumulated cost of decisions made under time pressure, resource constraints, or incomplete information.
What It's NOT:
It's not "I want to rewrite this in my favorite programming language because it's cooler."
It's not "I want to refactor code that's working fine just to make it prettier."
It's not an excuse to avoid new work.
What It Actually IS:
Technical debt is the gap between where your infrastructure is and where it needs to be to support current (and near-future) business needs reliably, securely, and efficiently.
Real Examples: What Technical Debt Looks Like in Network Engineering
Let me get specific because abstract definitions don't capture the reality engineers are dealing with daily.
Example 1: The Undocumented Network
The Debt: Your network has grown organically over 8 years. VLANs were added as needed. Routes were patched in during emergencies. No one documented changes because there was never time.
How It Started: Five years ago, during a crisis implementation, an engineer added a static route with a note: "temporary fix for weekend cutover - will remove Monday." That Monday never came. Now that route is critical and nobody remembers why it exists.
The Compounding:
New engineer joins team: takes 3 months to understand network topology instead of 2 weeks
Troubleshooting takes 3x longer because you're reverse-engineering intent from running configs
Every change carries risk because you don't fully understand dependencies
Knowledge lives in one senior engineer's head, creating a single point of failure
The Real Cost: You're not just losing time - you're losing the ability to move fast safely. Every project takes longer because you're discovering surprises along the way.
Example 2: The "Temporary" Workaround That Became Permanent
The Debt: You have a core switch that's supposed to be redundant, but one of the pair is running ancient code with known bugs. The workaround: all traffic routes through the other switch, defeating redundancy.
How It Started: 18 months ago, during a code upgrade that went badly, you rolled back one switch and decided to "test the upgrade more thoroughly before trying again." That testing never happened because other priorities took over.
The Compounding:
You've lost redundancy that your architecture depends on
The working switch is now carrying double the intended load
You can't do maintenance on the working switch without downtime
The failed upgrade created fear of touching the environment
New features require that code version you can't upgrade to
The Real Cost: Your infrastructure is one failure away from a major outage. But from the outside, everything "works fine." Until it doesn't.
Example 3: The Monitoring Blind Spots
The Debt: You monitor core infrastructure but not branch offices. You monitor device availability but not performance metrics. You get alerts but they're not actionable.
How It Started: You implemented basic monitoring 3 years ago to "get something in place quickly." The plan was to expand monitoring coverage over time. But "over time" never had budget allocation.
The Compounding:
Problems in unmonitored areas go undetected until users complain
You're flying blind on capacity planning
Troubleshooting requires manual data collection
You discover failures hours or days after they occur
Your MTTR (Mean Time to Resolution) is artificially high
The Real Cost: You're reactive instead of proactive. Your team spends time firefighting problems that should have been caught early. Your reputation suffers because users experience issues before you know about them.
Monitoring gaps contribute directly to alert fatigue and burnout, something I explored in "Both Sides of the Desk: Burnout." Engineers who deal with constant fire drills without proper tools tend to wear out quickly.
Example 4: The Security Patches Nobody Has Time For
The Debt: Your firewalls, switches, and routers are running code versions with known CVEs. You know about them. You've read the security bulletins. But patching requires maintenance windows, testing, and risk - all things you don't have bandwidth for.
How It Started: You prioritized feature rollouts and new projects over maintenance. Leadership approved new initiatives, but not maintenance windows. Security patching kept getting pushed "one more quarter."
The Compounding:
Your vulnerability surface grows with each new CVE
Compliance audits flag your outdated versions
Insurance and security teams escalate concerns
Each quarter you delay makes the patch delta larger and riskier
Eventually, you need emergency patching during a crisis instead of planned maintenance
The Real Cost: You're one exploit away from a breach. And when (not if) that happens, explaining why you knew about vulnerabilities for 18 months but didn't patch them is a career-limiting conversation.
Example 5: The Scaling Problems You're Ignoring
The Debt: Your network was designed for 200 users. You're now at 1,500 users and growing 20% annually. Performance is degrading but not catastrophically - yet.
How It Started: Business grew faster than infrastructure planning cycles. Each time you hit capacity limits, you added temporary fixes: another VLAN here, another subnet there, some traffic prioritization tweaks.
The Compounding:
Your architecture no longer matches your scale
Band-aid fixes create complexity that slows future changes
Performance degradation is gradual so it doesn't trigger urgency
By the time failure occurs, you're in crisis mode with no good options
The "proper fix" gets more expensive and disruptive the longer you wait
The Real Cost: You'll eventually be forced to do a major infrastructure overhaul during a crisis instead of a planned migration. Crisis projects are expensive, risky, and career-damaging.
How Technical Debt Compounds: The Interest You're Paying
Here's what engineers wish managers understood: technical debt isn't static. It gets worse over time, just like financial debt accrues interest.
The Compounding Effects
1. Slowing Velocity
Year 1: Implement feature with shortcut. Saves 2 weeks. Ship faster.
Year 2: New feature requires changes to that area. Extra work to understand and work around shortcut. Adds 1 week to project.
Year 3: Another feature touches that area. Now you're working around multiple shortcuts and accumulated complexity. Adds 3 weeks to project.
Year 4: Complexity is so high that simple changes take weeks of planning. Your velocity has dropped 50% but you can't point to one cause - it's the accumulated weight of hundreds of shortcuts.
2. Increasing Risk
Early Stage: You have one workaround. You remember it exists. Risk is manageable.
Mid Stage: You have dozens of workarounds, some documented, many not. New engineers don't know they exist. Changes sometimes break unexpected things.
Late Stage: Your infrastructure is a house of cards. Every change carries significant risk. You're afraid to touch things. Innovation stops.
The Tipping Point: Eventually you have a failure and discover multiple workarounds that were interacting in ways nobody understood. The outage is long, the root cause is complex, and the post-mortem reveals a system held together with hope.
3. Knowledge Degradation
Initially: The person who implemented the shortcut knows why it exists and how it works.
6 Months Later: That person still works here but has moved on to other things. They vaguely remember it but would need to review configs to recall details.
2 Years Later: That person left the company. Documentation was never written. The workaround is now mystery infrastructure that nobody understands but everyone's afraid to change.
5 Years Later: You have entire network segments that nobody fully understands. Institutional knowledge is gone. You're reverse-engineering your own infrastructure.
4. Opportunity Cost
What You're Not Doing Because of Technical Debt:
You can't implement new technologies because your foundation is too fragile
You can't move fast on business opportunities because infrastructure is brittle
You can't experiment and innovate because you're maintaining complexity
You can't adopt modern practices because you're locked into old patterns
Your talented engineers leave because they're tired of fighting broken systems
The Hidden Cost: Every hour spent working around technical debt is an hour not spent on value-creating work. But this cost is invisible on project timelines and budget sheets.
5. The Morale Drain
What Engineers Feel When Drowning in Technical Debt:
"My job isn't engineering anymore - it's archeology and crisis management."
"I'm embarrassed to show our infrastructure to peers. We're not doing good work."
"Management doesn't care about technical excellence. They only care about shipping features."
"I'm being set up to fail. When this house of cards collapses, I'll be blamed."
"Why bother doing things right? It'll just get cut for time anyway."
The Result: Your best engineers leave. The ones who stay become cynical. Your ability to attract talent decreases. Technical debt creates a cultural problem that compounds organizational dysfunction.
What Engineers Wish Managers Understood
I've been the engineer frustrated by management's dismissal of technical debt. Now I'm the manager trying to balance debt with delivery. Here's what I wish I could communicate to every manager:
"It's Not About Perfectionism"
What Engineers Say: "We need to refactor this network segment."
What Managers Hear: "Engineers want to waste time making things perfect when they work fine now."
What Engineers Mean: "This infrastructure is fragile, risky, and slowing us down. Fixing it now prevents future crises and accelerates future work."
The Disconnect: Managers often conflate "technical debt paydown" with "gold-plating" or "over-engineering." They're not the same. Engineers usually aren't asking for perfection - they're asking to reach a sustainable baseline.
"We're Not Crying Wolf"
What Engineers Say: "This is getting critical. We need to address this soon."
What Happens: Nothing. Another quarter passes. Engineers escalate again. Still nothing.
What Engineers Learn: "My technical concerns don't matter. Management only cares when things are on fire."
The Result: Engineers stop raising concerns early. They wait until crisis forces action. Your early warning system breaks down because engineers learn that warnings are ignored anyway.
What Managers Miss: The engineers who are raising concerns are often your most experienced, most conscientious people. They're trying to prevent problems before they explode. Ignoring them doesn't make technical debt go away - it just ensures you'll deal with it during a crisis instead of proactively.
"The Temporary Fix Is Never Temporary"
What Engineers Say: "If we do this temporary workaround, we need to schedule time to fix it properly."
What Managers Say: "Okay, we'll come back to it."
What Actually Happens: You never come back to it. The temporary fix becomes permanent. Every temporary fix becomes permanent.
What Engineers Learn: "There's no point planning to do things right. Management will always choose fast over right."
The Compounding: Engineers stop suggesting proper solutions. They implement workarounds because that's what always gets approved anyway. Technical debt accumulates faster because preventative maintenance never happens.
"You Can't See the Problem Until It's Too Late"
From Outside: Everything looks fine. Services are running. Users are happy (mostly). Metrics look okay.
From Inside: Engineers see the cracks. Performance is degrading slowly. Complexity is increasing. Risk is growing. But it's gradual, so it doesn't trigger alarms.
The Analogy: It's like ignoring car maintenance. For months, everything seems fine - until your engine seizes on the highway. The warning signs were there, but they were gradual enough to ignore.
What Managers Miss: By the time technical debt becomes visible to non-technical stakeholders, it's usually a crisis. The time to fix it was months or years ago when engineers were raising concerns that seemed abstract.
"We're Making Tradeoffs, Not Excuses"
What Engineers Say: "This will take longer than expected because of our technical debt."
What Managers Hear: "Engineers are making excuses for poor estimates."
What Engineers Mean: "Our infrastructure is complex and fragile. Making changes safely requires working around accumulated shortcuts. This isn't about us being slow - it's about the environment we're working in."
The Missing Context: When managers only see project timelines extending, they don't see the hours engineers spend understanding undocumented systems, working around limitations, or preventing issues caused by past shortcuts.
The Manager's Dilemma: Balancing Debt with Delivery
Now let me flip to the manager's perspective, because having been in this role for a while now, I understand why this is legitimately hard.
The Pressure You're Under
From Above:
Leadership wants features that drive revenue and growth
Every quarter has aggressive targets
Strategic initiatives get prioritized
"Maintenance" sounds like overhead to be minimized
Your performance is measured on delivery, not infrastructure health
From Below:
Engineers say everything is urgent technical debt
Hard to distinguish critical debt from nice-to-have improvements
Engineers want to rebuild things that appear to be working
Debt paydown doesn't have a visible business impact
The Resources:
You never have enough people or time
Every choice means something else doesn't get done
Budget for new initiatives exists; budget for maintenance doesn't
The Reality: You're caught between delivering for the business and maintaining infrastructure health. Both are important. You can't do both fully. Something has to give.
How to Prioritize Technical Debt
Here's the framework I'm using to make these decisions:
Tier 1: Critical Debt (Fix Now)
These have immediate business impact or high risk:
Security vulnerabilities with active exploits
Single points of failure for critical systems
Performance issues affecting users or revenue
Compliance requirements with deadlines
Infrastructure is at imminent risk of failure
Action: Prioritize these ahead of most feature work. This is "stop the bleeding" debt.
Tier 2: Compounding Debt (Schedule Soon)
These slow you down or increase future risk:
Undocumented systems that increase change risk
Monitoring gaps that hide problems
Technical limitations are blocking planned initiatives
Accumulating workarounds that increase complexity
Action: Allocate dedicated capacity each quarter. Don't wait for a crisis. Budget 20-30% of team capacity for this tier.
Tier 3: Strategic Debt (Plan For)
These improve efficiency but aren't urgent:
Architecture improvements that enable future scale
Automation that would save time long-term
Consolidation of redundant systems
Modernization of stable but dated infrastructure
Action: Plan these during slower periods or as part of broader initiatives. Don't ignore completely, but schedule deliberately.
Tier 4: Nice-to-Have (Defer)
These are improvements without clear business benefit:
Refactoring that's purely aesthetic
Technology changes driven by preference rather than need
Over-engineering for scale, you won't reach for years
Action: Defer indefinitely unless circumstances change.
The Key: Be explicit about prioritization. Don't pretend everything will get done. Make conscious choices about what you're deferring and why.
This prioritization framework connects to broader resource allocation decisions I discussed in making the business case for network modernization - you need clear criteria for where to invest limited resources.
Communicating Technical Debt to Leadership
This is where many managers fail. Your job is to translate technical reality into business impact language that leadership understands.
Don't Say: "We have technical debt in our network architecture that needs refactoring."
Do Say: "Our network was designed for 200 users and we're at 1,500. We're seeing performance degradation that's impacting user experience. Without infrastructure investment, we'll hit hard limits in 6-9 months that will require emergency spending and potential outages."
Don't Say: "Our code is outdated and needs security patching."
Do Say: "We're running software with known vulnerabilities that are actively being exploited in the wild. Our cyber insurance policy requires patching within 90 days. We're at 120 days and at risk of coverage denial if we're breached. We need a maintenance window within 30 days."
Don't Say: "We need to document our infrastructure."
Do Say: "Key network knowledge lives in one senior engineer's head. When they're on vacation, our response time increases 3x and risk increases significantly. If they leave, we lose 8 years of institutional knowledge. Investing in documentation reduces our business continuity risk and decreases onboarding time for new engineers by 50%."
The Formula That Works
Impact + Risk + Cost of Delay = Business Case
Impact: What's the business effect? (Revenue, user experience, security, compliance)
Risk: What's the probability and severity of failure? (Quantify if possible)
Cost of Delay: What does waiting cost? (Compounding complexity, increasing risk, higher future cost)
Example: "Our WAN links are running at 85% average utilization with peaks at 95%. In networking, 70% is the recommended threshold before performance degrades. Users are experiencing slowness during peak hours.
If we wait until we max out capacity, the solution will require emergency spending at 30% premium pricing and cause business disruption during implementation. Addressing this now during our next maintenance window costs $50K. Addressing during an emergency will cost $75K plus downtime impact.
Waiting costs us money, user experience, and increases implementation risk. ROI on addressing now is clear."
Building effective business cases for infrastructure investment is something I explored in depth in Managing Up as a Technical Manager - your ability to advocate for technical debt paydown depends on speaking leadership's language.
Creating Space for Debt Paydown
The 70/20/10 Rule I'm Trying:
70% of capacity: Feature work and business initiatives
20% of capacity: Technical debt and infrastructure maintenance
10% of capacity: Learning, experimentation, innovation
Why This Works:
Business gets the majority of the team capacity for growth initiatives
Infrastructure health gets consistent attention before a crisis
Team gets space to learn and improve practices
Why This Is Hard:
When everything is urgent, that 20% gets consumed by features
Requires discipline to protect maintenance capacity
Requires leadership buy-in that 20% is non-negotiable
What I Tell My Leadership: "We need to operate sustainably. Operating at 100% feature capacity means we're paying technical debt interest daily, and one day the debt comes due during a crisis. The 20% maintenance capacity is our insurance policy against future crises."
Balancing Features and Maintenance
The Wrong Approach: "We'll do features now and catch up on maintenance later."
Later never comes. Maintenance keeps getting deferred. Technical debt compounds. Eventually, you hit a crisis.
The Right Approach: "Every quarter includes both feature delivery and infrastructure maintenance. The ratio adjusts based on current debt level, but maintenance never goes to zero."
When to Shift the Balance:
More Feature-Heavy (80/20):
Infrastructure is relatively healthy
Competitive pressure requires fast feature delivery
New market opportunities need quick response
More Maintenance-Heavy (50/50 or even 40/60):
Technical debt has reached critical levels
Recent outages or near-misses indicate fragility
Team velocity has slowed significantly due to accumulated complexity
Compliance or security issues require immediate attention
The Key: Make conscious decisions. Don't let feature work automatically consume all capacity by default.
What I'm Learning as a Manager (Four Months In)
Engineers Are Usually Right About Technical Debt
When experienced engineers raise concerns about infrastructure fragility, they're almost always correct. They see things I don't. Their concerns might seem abstract, but they're based on deep system knowledge.
My job isn't to question whether the debt exists - it's to help prioritize which debt matters most and build business cases for addressing it.
You Can't Defer Maintenance Forever
I've seen the temptation to always prioritize features over maintenance. Short-term, it feels productive. Long-term, it's devastating.
Systems that don't get maintained fail. The question isn't "if" but "when" and "how badly."
Crisis-Driven Maintenance Is Expensive
Every time we address technical debt during a crisis instead of proactively, we pay a premium:
Higher cost (emergency vendors, premium pricing, overtime)
Higher risk (no time for proper testing)
Higher stress (team working under pressure)
Higher impact (users affected by urgency)
That "expensive" technical debt work engineers wanted to do 6 months ago? It's now 2-3x more expensive because we're doing it during a crisis.
Communication Is Everything
When engineers feel like management doesn't understand or care about technical debt, they disengage. When leadership feels like engineers are always asking for time to "fix things that aren't broken," they stop listening.
My job is to bridge that gap. Translate technical reality into business impact. Translate business pressure into a prioritization context that engineers understand.
Sustainable Pace Requires Maintenance Capacity
Teams that operate at 100% feature capacity eventually break. Infrastructure degrades. Team morale suffers. Velocity drops.
The teams that sustainably deliver year after year protect time for maintenance, learning, and infrastructure health.
Practical Strategies That Are Working
Here's what I'm actually implementing to balance technical debt with feature delivery:
1. Visible Technical Debt Backlog
What We Did: Created a dedicated technical debt backlog separate from feature work. Every item includes:
Description of the debt
Business impact if not addressed
Risk level (critical/high/medium/low)
Estimated effort to resolve
Cost of delay (what happens if we wait)
Why It Helps: Makes technical debt visible to leadership. Provides data for prioritization discussions. Prevents debt from being invisible, "the engineer complaining."
2. Quarterly Debt Review
What We Do: Every quarter, we review the technical debt backlog with the engineering team and leadership. Discuss:
What debt was paid down last quarter
What new debt was created
What debt is becoming critical
What capacity are we allocating next quarter
Why It Helps: Creates a forcing function for debt discussion. Prevents debt from being perpetually deferred. Gives leadership visibility into debt trends.
3. "Tech Debt Fridays"
What We're Testing: Every other Friday afternoon, engineers can work on technical debt items without justification or approval. Choose from the backlog or identify new debt.
Why It Might Help: Gives engineers autonomy to address issues they see. Creates a dedicated space for maintenance. Prevents debt from competing with features for prioritization.
The Challenge: When deadlines loom, these Fridays get consumed by feature work. Still figuring out how to protect this time.
4. Debt Tax on New Features
What This Means: Every new feature includes time allocation for:
Proper documentation
Monitoring implementation
Security review
Performance testing
These aren't optional. They're built into estimates.
Why It Helps: Prevents creating new technical debt while building features. Makes the true cost of features visible. Changes culture from "ship fast, fix later" to "ship right."
5. Post-Mortems Include Debt Analysis
What We Do: After outages or incidents, the post-mortem includes:
What technical debt contributed to this issue?
What debt should be prioritized to prevent recurrence?
What's the cost-benefit of addressing that debt?
Why It Helps: Connects technical debt to business impact. Makes debt concrete rather than abstract. Justifies debt paydown work.
The Bottom Line: We're All on the Same Team
Technical debt isn't an engineer problem or a management problem. It's a shared challenge that requires both perspectives.
What Engineers Need to Understand:
Managers aren't ignoring technical debt because they don't care. They're juggling competing priorities with limited resources. Help them by:
Quantifying business impact, not just technical concerns
Prioritizing what truly matters versus nice-to-haves
Proposing solutions, not just identifying problems
Understanding business pressures and constraints
What Managers Need to Understand:
Engineers aren't being perfectionists or making excuses. They're seeing real risks and real slowdowns from accumulated debt. Help them by:
Taking technical concerns seriously before they become crises
Protecting capacity for maintenance and debt paydown
Communicating the business context that drives prioritization
Being honest about what you can and can't prioritize
The Truth:
Technical debt is inevitable. You can't prevent it entirely. What matters is how you manage it:
Make conscious decisions about what debt you're accepting
Consistently allocate capacity to pay down critical debt
Don't defer maintenance until crisis forces action
Communicate clearly between technical and business perspectives
For Engineers: Your manager isn't the enemy. They're trying to balance more constraints than you see. Help them understand why debt matters in language they can take to leadership.
For Managers: Your engineers aren't being difficult. They're trying to prevent future crises and maintain your infrastructure's health. Listen to them before crisis proves them right.
We're all trying to build systems that work reliably while delivering value to the business. That requires managing technical debt deliberately, not pretending it doesn't exist or deferring it indefinitely.
The organizations that do this well don't eliminate technical debt - they manage it consciously as part of sustainable engineering practice.
Related Posts
📧 Managing technical debt or trying to balance delivery with infrastructure health? Subscribe to my monthly newsletter for practical perspectives on network engineering management, technical leadership, and building sustainable engineering practices. Sign up below!
What's your experience with technical debt? Engineers - what do you wish your manager understood? Managers - how are you balancing debt with delivery? Share your experiences in the comments or connect with me on LinkedIn.

