Technical Debt: What Engineers Wish Managers Understood

What Engineers Actually Mean When They Say "Technical Debt"

Let's start with what technical debt really is, because I've been on both sides of this conversation and there's often a fundamental misunderstanding happening.

From the Engineer's Perspective:

Technical debt is every shortcut, workaround, temporary fix, and "we'll come back to this later" decision that's still in production. It's the documentation that was never written. The monitoring that was never implemented. The network architecture that was designed for 50 users and is now serving 500.

It's not laziness. It's not engineers being perfectionists. It's the accumulated cost of decisions made under time pressure, resource constraints, or incomplete information.

What It's NOT:

It's not "I want to rewrite this in my favorite programming language because it's cooler."

It's not "I want to refactor code that's working fine just to make it prettier."

It's not an excuse to avoid new work.

What It Actually IS:

Technical debt is the gap between where your infrastructure is and where it needs to be to support current (and near-future) business needs reliably, securely, and efficiently.

Real Examples: What Technical Debt Looks Like in Network Engineering

Let me get specific because abstract definitions don't capture the reality engineers are dealing with daily.

Example 1: The Undocumented Network

The Debt: Your network has grown organically over 8 years. VLANs were added as needed. Routes were patched in during emergencies. No one documented changes because there was never time.

How It Started: Five years ago, during a crisis implementation, an engineer added a static route with a note: "temporary fix for weekend cutover - will remove Monday." That Monday never came. Now that route is critical and nobody remembers why it exists.

The Compounding:

  • New engineer joins team: takes 3 months to understand network topology instead of 2 weeks

  • Troubleshooting takes 3x longer because you're reverse-engineering intent from running configs

  • Every change carries risk because you don't fully understand dependencies

  • Knowledge lives in one senior engineer's head, creating a single point of failure

The Real Cost: You're not just losing time - you're losing the ability to move fast safely. Every project takes longer because you're discovering surprises along the way.

Example 2: The "Temporary" Workaround That Became Permanent

The Debt: You have a core switch that's supposed to be redundant, but one of the pair is running ancient code with known bugs. The workaround: all traffic routes through the other switch, defeating redundancy.

How It Started: 18 months ago, during a code upgrade that went badly, you rolled back one switch and decided to "test the upgrade more thoroughly before trying again." That testing never happened because other priorities took over.

The Compounding:

  • You've lost redundancy that your architecture depends on

  • The working switch is now carrying double the intended load

  • You can't do maintenance on the working switch without downtime

  • The failed upgrade created fear of touching the environment

  • New features require that code version you can't upgrade to

The Real Cost: Your infrastructure is one failure away from a major outage. But from the outside, everything "works fine." Until it doesn't.

Example 3: The Monitoring Blind Spots

The Debt: You monitor core infrastructure but not branch offices. You monitor device availability but not performance metrics. You get alerts but they're not actionable.

How It Started: You implemented basic monitoring 3 years ago to "get something in place quickly." The plan was to expand monitoring coverage over time. But "over time" never had budget allocation.

The Compounding:

  • Problems in unmonitored areas go undetected until users complain

  • You're flying blind on capacity planning

  • Troubleshooting requires manual data collection

  • You discover failures hours or days after they occur

  • Your MTTR (Mean Time to Resolution) is artificially high

The Real Cost: You're reactive instead of proactive. Your team spends time firefighting problems that should have been caught early. Your reputation suffers because users experience issues before you know about them.

Monitoring gaps contribute directly to alert fatigue and burnout, something I explored in "Both Sides of the Desk: Burnout." Engineers who deal with constant fire drills without proper tools tend to wear out quickly.

Example 4: The Security Patches Nobody Has Time For

The Debt: Your firewalls, switches, and routers are running code versions with known CVEs. You know about them. You've read the security bulletins. But patching requires maintenance windows, testing, and risk - all things you don't have bandwidth for.

How It Started: You prioritized feature rollouts and new projects over maintenance. Leadership approved new initiatives, but not maintenance windows. Security patching kept getting pushed "one more quarter."

The Compounding:

  • Your vulnerability surface grows with each new CVE

  • Compliance audits flag your outdated versions

  • Insurance and security teams escalate concerns

  • Each quarter you delay makes the patch delta larger and riskier

  • Eventually, you need emergency patching during a crisis instead of planned maintenance

The Real Cost: You're one exploit away from a breach. And when (not if) that happens, explaining why you knew about vulnerabilities for 18 months but didn't patch them is a career-limiting conversation.

Example 5: The Scaling Problems You're Ignoring

The Debt: Your network was designed for 200 users. You're now at 1,500 users and growing 20% annually. Performance is degrading but not catastrophically - yet.

How It Started: Business grew faster than infrastructure planning cycles. Each time you hit capacity limits, you added temporary fixes: another VLAN here, another subnet there, some traffic prioritization tweaks.

The Compounding:

  • Your architecture no longer matches your scale

  • Band-aid fixes create complexity that slows future changes

  • Performance degradation is gradual so it doesn't trigger urgency

  • By the time failure occurs, you're in crisis mode with no good options

  • The "proper fix" gets more expensive and disruptive the longer you wait

The Real Cost: You'll eventually be forced to do a major infrastructure overhaul during a crisis instead of a planned migration. Crisis projects are expensive, risky, and career-damaging.

How Technical Debt Compounds: The Interest You're Paying

Here's what engineers wish managers understood: technical debt isn't static. It gets worse over time, just like financial debt accrues interest.

The Compounding Effects

1. Slowing Velocity

Year 1: Implement feature with shortcut. Saves 2 weeks. Ship faster.

Year 2: New feature requires changes to that area. Extra work to understand and work around shortcut. Adds 1 week to project.

Year 3: Another feature touches that area. Now you're working around multiple shortcuts and accumulated complexity. Adds 3 weeks to project.

Year 4: Complexity is so high that simple changes take weeks of planning. Your velocity has dropped 50% but you can't point to one cause - it's the accumulated weight of hundreds of shortcuts.

2. Increasing Risk

Early Stage: You have one workaround. You remember it exists. Risk is manageable.

Mid Stage: You have dozens of workarounds, some documented, many not. New engineers don't know they exist. Changes sometimes break unexpected things.

Late Stage: Your infrastructure is a house of cards. Every change carries significant risk. You're afraid to touch things. Innovation stops.

The Tipping Point: Eventually you have a failure and discover multiple workarounds that were interacting in ways nobody understood. The outage is long, the root cause is complex, and the post-mortem reveals a system held together with hope.

3. Knowledge Degradation

Initially: The person who implemented the shortcut knows why it exists and how it works.

6 Months Later: That person still works here but has moved on to other things. They vaguely remember it but would need to review configs to recall details.

2 Years Later: That person left the company. Documentation was never written. The workaround is now mystery infrastructure that nobody understands but everyone's afraid to change.

5 Years Later: You have entire network segments that nobody fully understands. Institutional knowledge is gone. You're reverse-engineering your own infrastructure.

4. Opportunity Cost

What You're Not Doing Because of Technical Debt:

  • You can't implement new technologies because your foundation is too fragile

  • You can't move fast on business opportunities because infrastructure is brittle

  • You can't experiment and innovate because you're maintaining complexity

  • You can't adopt modern practices because you're locked into old patterns

  • Your talented engineers leave because they're tired of fighting broken systems

The Hidden Cost: Every hour spent working around technical debt is an hour not spent on value-creating work. But this cost is invisible on project timelines and budget sheets.

5. The Morale Drain

What Engineers Feel When Drowning in Technical Debt:

"My job isn't engineering anymore - it's archeology and crisis management."

"I'm embarrassed to show our infrastructure to peers. We're not doing good work."

"Management doesn't care about technical excellence. They only care about shipping features."

"I'm being set up to fail. When this house of cards collapses, I'll be blamed."

"Why bother doing things right? It'll just get cut for time anyway."

The Result: Your best engineers leave. The ones who stay become cynical. Your ability to attract talent decreases. Technical debt creates a cultural problem that compounds organizational dysfunction.

What Engineers Wish Managers Understood

I've been the engineer frustrated by management's dismissal of technical debt. Now I'm the manager trying to balance debt with delivery. Here's what I wish I could communicate to every manager:

"It's Not About Perfectionism"

What Engineers Say: "We need to refactor this network segment."

What Managers Hear: "Engineers want to waste time making things perfect when they work fine now."

What Engineers Mean: "This infrastructure is fragile, risky, and slowing us down. Fixing it now prevents future crises and accelerates future work."

The Disconnect: Managers often conflate "technical debt paydown" with "gold-plating" or "over-engineering." They're not the same. Engineers usually aren't asking for perfection - they're asking to reach a sustainable baseline.

"We're Not Crying Wolf"

What Engineers Say: "This is getting critical. We need to address this soon."

What Happens: Nothing. Another quarter passes. Engineers escalate again. Still nothing.

What Engineers Learn: "My technical concerns don't matter. Management only cares when things are on fire."

The Result: Engineers stop raising concerns early. They wait until crisis forces action. Your early warning system breaks down because engineers learn that warnings are ignored anyway.

What Managers Miss: The engineers who are raising concerns are often your most experienced, most conscientious people. They're trying to prevent problems before they explode. Ignoring them doesn't make technical debt go away - it just ensures you'll deal with it during a crisis instead of proactively.

"The Temporary Fix Is Never Temporary"

What Engineers Say: "If we do this temporary workaround, we need to schedule time to fix it properly."

What Managers Say: "Okay, we'll come back to it."

What Actually Happens: You never come back to it. The temporary fix becomes permanent. Every temporary fix becomes permanent.

What Engineers Learn: "There's no point planning to do things right. Management will always choose fast over right."

The Compounding: Engineers stop suggesting proper solutions. They implement workarounds because that's what always gets approved anyway. Technical debt accumulates faster because preventative maintenance never happens.

"You Can't See the Problem Until It's Too Late"

From Outside: Everything looks fine. Services are running. Users are happy (mostly). Metrics look okay.

From Inside: Engineers see the cracks. Performance is degrading slowly. Complexity is increasing. Risk is growing. But it's gradual, so it doesn't trigger alarms.

The Analogy: It's like ignoring car maintenance. For months, everything seems fine - until your engine seizes on the highway. The warning signs were there, but they were gradual enough to ignore.

What Managers Miss: By the time technical debt becomes visible to non-technical stakeholders, it's usually a crisis. The time to fix it was months or years ago when engineers were raising concerns that seemed abstract.

"We're Making Tradeoffs, Not Excuses"

What Engineers Say: "This will take longer than expected because of our technical debt."

What Managers Hear: "Engineers are making excuses for poor estimates."

What Engineers Mean: "Our infrastructure is complex and fragile. Making changes safely requires working around accumulated shortcuts. This isn't about us being slow - it's about the environment we're working in."

The Missing Context: When managers only see project timelines extending, they don't see the hours engineers spend understanding undocumented systems, working around limitations, or preventing issues caused by past shortcuts.

The Manager's Dilemma: Balancing Debt with Delivery

Now let me flip to the manager's perspective, because having been in this role for a while now, I understand why this is legitimately hard.

The Pressure You're Under

From Above:

  • Leadership wants features that drive revenue and growth

  • Every quarter has aggressive targets

  • Strategic initiatives get prioritized

  • "Maintenance" sounds like overhead to be minimized

  • Your performance is measured on delivery, not infrastructure health

From Below:

  • Engineers say everything is urgent technical debt

  • Hard to distinguish critical debt from nice-to-have improvements

  • Engineers want to rebuild things that appear to be working

  • Debt paydown doesn't have a visible business impact

The Resources:

  • You never have enough people or time

  • Every choice means something else doesn't get done

  • Budget for new initiatives exists; budget for maintenance doesn't

The Reality: You're caught between delivering for the business and maintaining infrastructure health. Both are important. You can't do both fully. Something has to give.

How to Prioritize Technical Debt

Here's the framework I'm using to make these decisions:

Tier 1: Critical Debt (Fix Now)

These have immediate business impact or high risk:

  • Security vulnerabilities with active exploits

  • Single points of failure for critical systems

  • Performance issues affecting users or revenue

  • Compliance requirements with deadlines

  • Infrastructure is at imminent risk of failure

Action: Prioritize these ahead of most feature work. This is "stop the bleeding" debt.

Tier 2: Compounding Debt (Schedule Soon)

These slow you down or increase future risk:

  • Undocumented systems that increase change risk

  • Monitoring gaps that hide problems

  • Technical limitations are blocking planned initiatives

  • Accumulating workarounds that increase complexity

Action: Allocate dedicated capacity each quarter. Don't wait for a crisis. Budget 20-30% of team capacity for this tier.

Tier 3: Strategic Debt (Plan For)

These improve efficiency but aren't urgent:

  • Architecture improvements that enable future scale

  • Automation that would save time long-term

  • Consolidation of redundant systems

  • Modernization of stable but dated infrastructure

Action: Plan these during slower periods or as part of broader initiatives. Don't ignore completely, but schedule deliberately.

Tier 4: Nice-to-Have (Defer)

These are improvements without clear business benefit:

  • Refactoring that's purely aesthetic

  • Technology changes driven by preference rather than need

  • Over-engineering for scale, you won't reach for years

Action: Defer indefinitely unless circumstances change.

The Key: Be explicit about prioritization. Don't pretend everything will get done. Make conscious choices about what you're deferring and why.

This prioritization framework connects to broader resource allocation decisions I discussed in making the business case for network modernization - you need clear criteria for where to invest limited resources.

Communicating Technical Debt to Leadership

This is where many managers fail. Your job is to translate technical reality into business impact language that leadership understands.

Don't Say: "We have technical debt in our network architecture that needs refactoring."

Do Say: "Our network was designed for 200 users and we're at 1,500. We're seeing performance degradation that's impacting user experience. Without infrastructure investment, we'll hit hard limits in 6-9 months that will require emergency spending and potential outages."

Don't Say: "Our code is outdated and needs security patching."

Do Say: "We're running software with known vulnerabilities that are actively being exploited in the wild. Our cyber insurance policy requires patching within 90 days. We're at 120 days and at risk of coverage denial if we're breached. We need a maintenance window within 30 days."

Don't Say: "We need to document our infrastructure."

Do Say: "Key network knowledge lives in one senior engineer's head. When they're on vacation, our response time increases 3x and risk increases significantly. If they leave, we lose 8 years of institutional knowledge. Investing in documentation reduces our business continuity risk and decreases onboarding time for new engineers by 50%."

The Formula That Works

Impact + Risk + Cost of Delay = Business Case

Impact: What's the business effect? (Revenue, user experience, security, compliance)

Risk: What's the probability and severity of failure? (Quantify if possible)

Cost of Delay: What does waiting cost? (Compounding complexity, increasing risk, higher future cost)

Example: "Our WAN links are running at 85% average utilization with peaks at 95%. In networking, 70% is the recommended threshold before performance degrades. Users are experiencing slowness during peak hours.

If we wait until we max out capacity, the solution will require emergency spending at 30% premium pricing and cause business disruption during implementation. Addressing this now during our next maintenance window costs $50K. Addressing during an emergency will cost $75K plus downtime impact.

Waiting costs us money, user experience, and increases implementation risk. ROI on addressing now is clear."

Building effective business cases for infrastructure investment is something I explored in depth in Managing Up as a Technical Manager - your ability to advocate for technical debt paydown depends on speaking leadership's language.

Creating Space for Debt Paydown

The 70/20/10 Rule I'm Trying:

  • 70% of capacity: Feature work and business initiatives

  • 20% of capacity: Technical debt and infrastructure maintenance

  • 10% of capacity: Learning, experimentation, innovation

Why This Works:

  • Business gets the majority of the team capacity for growth initiatives

  • Infrastructure health gets consistent attention before a crisis

  • Team gets space to learn and improve practices

Why This Is Hard:

  • When everything is urgent, that 20% gets consumed by features

  • Requires discipline to protect maintenance capacity

  • Requires leadership buy-in that 20% is non-negotiable

What I Tell My Leadership: "We need to operate sustainably. Operating at 100% feature capacity means we're paying technical debt interest daily, and one day the debt comes due during a crisis. The 20% maintenance capacity is our insurance policy against future crises."

Balancing Features and Maintenance

The Wrong Approach: "We'll do features now and catch up on maintenance later."

Later never comes. Maintenance keeps getting deferred. Technical debt compounds. Eventually, you hit a crisis.

The Right Approach: "Every quarter includes both feature delivery and infrastructure maintenance. The ratio adjusts based on current debt level, but maintenance never goes to zero."

When to Shift the Balance:

More Feature-Heavy (80/20):

  • Infrastructure is relatively healthy

  • Competitive pressure requires fast feature delivery

  • New market opportunities need quick response

More Maintenance-Heavy (50/50 or even 40/60):

  • Technical debt has reached critical levels

  • Recent outages or near-misses indicate fragility

  • Team velocity has slowed significantly due to accumulated complexity

  • Compliance or security issues require immediate attention

The Key: Make conscious decisions. Don't let feature work automatically consume all capacity by default.

What I'm Learning as a Manager (Four Months In)

Engineers Are Usually Right About Technical Debt

When experienced engineers raise concerns about infrastructure fragility, they're almost always correct. They see things I don't. Their concerns might seem abstract, but they're based on deep system knowledge.

My job isn't to question whether the debt exists - it's to help prioritize which debt matters most and build business cases for addressing it.

You Can't Defer Maintenance Forever

I've seen the temptation to always prioritize features over maintenance. Short-term, it feels productive. Long-term, it's devastating.

Systems that don't get maintained fail. The question isn't "if" but "when" and "how badly."

Crisis-Driven Maintenance Is Expensive

Every time we address technical debt during a crisis instead of proactively, we pay a premium:

  • Higher cost (emergency vendors, premium pricing, overtime)

  • Higher risk (no time for proper testing)

  • Higher stress (team working under pressure)

  • Higher impact (users affected by urgency)

That "expensive" technical debt work engineers wanted to do 6 months ago? It's now 2-3x more expensive because we're doing it during a crisis.

Communication Is Everything

When engineers feel like management doesn't understand or care about technical debt, they disengage. When leadership feels like engineers are always asking for time to "fix things that aren't broken," they stop listening.

My job is to bridge that gap. Translate technical reality into business impact. Translate business pressure into a prioritization context that engineers understand.

Sustainable Pace Requires Maintenance Capacity

Teams that operate at 100% feature capacity eventually break. Infrastructure degrades. Team morale suffers. Velocity drops.

The teams that sustainably deliver year after year protect time for maintenance, learning, and infrastructure health.

Practical Strategies That Are Working

Here's what I'm actually implementing to balance technical debt with feature delivery:

1. Visible Technical Debt Backlog

What We Did: Created a dedicated technical debt backlog separate from feature work. Every item includes:

  • Description of the debt

  • Business impact if not addressed

  • Risk level (critical/high/medium/low)

  • Estimated effort to resolve

  • Cost of delay (what happens if we wait)

Why It Helps: Makes technical debt visible to leadership. Provides data for prioritization discussions. Prevents debt from being invisible, "the engineer complaining."

2. Quarterly Debt Review

What We Do: Every quarter, we review the technical debt backlog with the engineering team and leadership. Discuss:

  • What debt was paid down last quarter

  • What new debt was created

  • What debt is becoming critical

  • What capacity are we allocating next quarter

Why It Helps: Creates a forcing function for debt discussion. Prevents debt from being perpetually deferred. Gives leadership visibility into debt trends.

3. "Tech Debt Fridays"

What We're Testing: Every other Friday afternoon, engineers can work on technical debt items without justification or approval. Choose from the backlog or identify new debt.

Why It Might Help: Gives engineers autonomy to address issues they see. Creates a dedicated space for maintenance. Prevents debt from competing with features for prioritization.

The Challenge: When deadlines loom, these Fridays get consumed by feature work. Still figuring out how to protect this time.

4. Debt Tax on New Features

What This Means: Every new feature includes time allocation for:

  • Proper documentation

  • Monitoring implementation

  • Security review

  • Performance testing

These aren't optional. They're built into estimates.

Why It Helps: Prevents creating new technical debt while building features. Makes the true cost of features visible. Changes culture from "ship fast, fix later" to "ship right."

5. Post-Mortems Include Debt Analysis

What We Do: After outages or incidents, the post-mortem includes:

  • What technical debt contributed to this issue?

  • What debt should be prioritized to prevent recurrence?

  • What's the cost-benefit of addressing that debt?

Why It Helps: Connects technical debt to business impact. Makes debt concrete rather than abstract. Justifies debt paydown work.

The Bottom Line: We're All on the Same Team

Technical debt isn't an engineer problem or a management problem. It's a shared challenge that requires both perspectives.

What Engineers Need to Understand:

Managers aren't ignoring technical debt because they don't care. They're juggling competing priorities with limited resources. Help them by:

  • Quantifying business impact, not just technical concerns

  • Prioritizing what truly matters versus nice-to-haves

  • Proposing solutions, not just identifying problems

  • Understanding business pressures and constraints

What Managers Need to Understand:

Engineers aren't being perfectionists or making excuses. They're seeing real risks and real slowdowns from accumulated debt. Help them by:

  • Taking technical concerns seriously before they become crises

  • Protecting capacity for maintenance and debt paydown

  • Communicating the business context that drives prioritization

  • Being honest about what you can and can't prioritize

The Truth:

Technical debt is inevitable. You can't prevent it entirely. What matters is how you manage it:

  • Make conscious decisions about what debt you're accepting

  • Consistently allocate capacity to pay down critical debt

  • Don't defer maintenance until crisis forces action

  • Communicate clearly between technical and business perspectives

For Engineers: Your manager isn't the enemy. They're trying to balance more constraints than you see. Help them understand why debt matters in language they can take to leadership.

For Managers: Your engineers aren't being difficult. They're trying to prevent future crises and maintain your infrastructure's health. Listen to them before crisis proves them right.

We're all trying to build systems that work reliably while delivering value to the business. That requires managing technical debt deliberately, not pretending it doesn't exist or deferring it indefinitely.

The organizations that do this well don't eliminate technical debt - they manage it consciously as part of sustainable engineering practice.

Related Posts

📧 Managing technical debt or trying to balance delivery with infrastructure health? Subscribe to my monthly newsletter for practical perspectives on network engineering management, technical leadership, and building sustainable engineering practices. Sign up below!

What's your experience with technical debt? Engineers - what do you wish your manager understood? Managers - how are you balancing debt with delivery? Share your experiences in the comments or connect with me on LinkedIn.

Previous
Previous

Both Sides of the Desk: Asking for a Raise (The Engineer’s Perspective)

Next
Next

AI Tools for Network Operations: A Reality Check from the Trenches