Automation Debt: The Graveyard of Good Intentions (And Why Your Automation Keeps Failing)

Mar 3

The Graveyard Tour

Let me paint a picture you'll recognize.

There's a GitHub repository with automation scripts. Last commit: 18 months ago. Author: someone who left the company a year ago. Nobody else on the team knows how the scripts work. Nobody's sure if they even still work, but nobody wants to delete them "just in case."

There's an Ansible Tower license. Purchased three years ago after a conference demo. Annual cost: $15,000. Actual usage: One engineer created a few playbooks during the first month. They no longer work because the infrastructure has changed. Nobody else learned Ansible. The license auto-renews every year.

There's a folder of Python scripts on a shared drive. Documentation: none. Code comments: sparse. Functionality: unknown. Everyone's afraid to touch them because "that's how the backup system works," but nobody remembers exactly how.

There's a Terraform repository. Started with enthusiasm. Contains configurations for about 30% of the infrastructure. The other 70% is still manual. The team gave up on getting the rest into code. Now there are two sources of truth: Terraform for some things, manual configs for everything else. Nobody's sure which is which.

Welcome to the automation graveyard.

Every network engineering organization has one. It's where automation initiatives go to die - not with a bang, but with a slow fade into abandonment, cynicism, and wasted investment.

This isn't technical debt, work you know needs to be done but haven't gotten to yet. This is automation debt, failed automation attempts that create organizational baggage, financial waste, and cynicism that prevent future automation efforts.

Let's talk about how automation debt accumulates, why it's more damaging than you think, and what actually works to avoid creating more of it.

What Automation Debt Actually Is

Technical debt is deferred maintenance. You know what needs to be done. You've chosen to defer it, and it's in the backlog. It's accumulating interest.

Automation debt is different. It's:

Failed automation investments: Money spent on tools that never got adopted. Time invested in building automation that never got used. Skills developed by one person that never spread to the team.

Abandoned half-solutions: Automation that covers 30% of the use case. Scripts that work for one scenario but break in edge cases. Tools that automate one part of the process while the rest stay manual.

Knowledge locked in one person: The automation champion who built everything leaves, and nobody else can maintain it. The automation dies with their departure - or worse, keeps running without anyone understanding how it works.

Organizational cynicism: "We tried automation before, and it didn't work." Past failures create resistance to future attempts. The graveyard becomes evidence that "automation doesn't work here."

Competing abandoned tools: Someone started with Ansible. Someone else preferred Python. Someone tried Terraform. Nothing integrated. Everything abandoned. Now you have three different approaches to automation, none of them complete.

Why It's Worse Than Technical Debt

Technical debt has a path forward:

You know what needs to be done. You allocate resources. You chip away at it, and progress is visible.

Automation debt creates a downward spiral:

Failed attempt → wasted money and time
Wasted investment → organizational skepticism
Skepticism → resistance to new automation initiatives
Resistance → next automation attempt gets inadequate support
Inadequate support → next attempt fails
Cycle repeats, getting worse each time

The graveyard grows. The cynicism deepens. The manual processes persist.

The way automation debt accumulates and compounds is similar to technical debt patterns explored in Technical Debt: What Engineers Wish Managers Understood, but with an additional layer of organizational psychology.

How Automation Debt Accumulates: The Predictable Pattern

The pattern is remarkably consistent across organizations:

Phase 1: The Enthusiastic Champion

What it looks like:

One engineer gets excited about automation. Maybe they attended a conference. Maybe they're frustrated with manual processes. Maybe they just like to code.

They start building automation on their own time. Ansible playbooks. Python scripts. Terraform configurations. It's working. They're solving real problems.

What's happening:

Individual initiative solving individual pain points. This actually works - for that one person.

The warning sign:

It's one person. Nobody else is involved. No organizational support. No allocated time. Just one engineer's side project.

Phase 2: The Demo That Impresses

What it looks like:

The champion demos their automation to the team or management. It's impressive. It saves time, and it reduces errors. Everyone agrees "we should be doing more of this."

Management gets excited. "Let's automate everything!" Budget gets allocated. Maybe a tool gets purchased.

What's happening:

Enthusiasm without understanding the investment required to make automation succeed organization-wide.

The warning sign:

"We should automate more," without clarity on who will do it, when they'll do it, or how they'll learn.

Phase 3: The Reality of Adoption

What it looks like:

The champion tries to get the team to use the automation. It's met with resistance:

"The manual way works fine." "I don't know Python/Ansible/Terraform." "Learning this seems complicated." "I don't have time for this right now." "What if the automation breaks something?"

What's happening:

The gap between one person knowing automation and the whole team adopting it is enormous. The champion didn't realize how big that gap was.

The warning sign:

Adoption rate after 3 months: maybe 20%. The champion uses it. Everyone else does things manually.

Phase 4: The Champion's Burnout

What it looks like:

The champion is maintaining all the automation alone. Adding new capabilities and fixing bugs. Fielding questions. Updating it when infrastructure changes.

It's becoming a second job. They're frustrated that nobody else is helping. The rest of the team is frustrated that the automation is becoming a dependency on one person.

What's happening:

Unsustainable single-point-of-failure dynamic. The champion burns out or leaves.

The warning sign:

"Only [champion] knows how that works" becomes the explanation for everything automation-related.

Phase 5: The Abandonment

What it looks like:

The champion leaves the company or moves to a different role. The automation stops being maintained. It breaks when infrastructure changes. Nobody else knows how to fix it.

Eventually, people route around it. Back to manual processes. The automation sits there, unused, adding to the graveyard.

What's happening:

Automation that never achieved organizational adoption dies when the individual champion is gone.

The graveyard grows by one more failed initiative.

The Different Flavors of Automation Debt

Automation debt manifests in different ways:

The Expensive Shelf-ware

What it is:

Commercial automation platform purchased with enthusiasm. Annual license cost: significant. Actual usage: minimal or zero.

Common examples:

Network automation platforms with a capability nobody uses
Infrastructure-as-code tools that never got past proof-of-concept
Orchestration systems that turned out to be too complex for the team's needs

The cost:

Not just the license fee. The opportunity cost of what that budget could have funded instead.

Why it happens:

Bought the tool before developing the skills or culture to use it. "If we buy the tool, we'll be forced to use it." Except that's not how it works.

The Fragile Scripts Only One Person Understands

What it is:

Critical automation that works but nobody except the original author understands. No documentation. Minimal comments. Specific to infrastructure that's changed.

Common examples:

Python scripts for configuration management
Bash scripts that handle backups or deployments
Custom tools built to solve specific problems

The cost:

Risk that it breaks and nobody can fix it. Knowledge is locked in one person who becomes irreplaceable and eventually leaves.

Why it happens:

An individual engineer solving problems without thinking about knowledge transfer or maintainability.

The Partial Solution That's Worse Than Nothing

What it is:

Automation that covers 30% of a process. The other 70% is still manual. Now you have to remember which parts are automated and which aren't.

Common examples:

Infrastructure-as-code for some devices but not others
Automated deployment that still requires manual verification steps
Scripts that work for the happy path but fail on edge cases

The cost:

Cognitive overhead of the hybrid process. Risk of assuming something is automated when it isn't. Complexity without the full benefit.

Why it happens:

Started strong but lost momentum before completing. Got the easy parts automated, gave up on the hard parts.

The Competing Standards Chaos

What it is:

Multiple automation approaches coexist. Ansible for some things, Python for others, Terraform for a third set. Nothing integrated. No standard approach.

The cost:

Team members need to learn multiple tools. Nothing works together. Duplication of effort. Impossible to build on previous work.

Why it happens:

Different people championing different tools. No organizational decision on standards. "Let a thousand flowers bloom" approach to automation.

The GitHub Graveyard

What it is:

Repositories full of abandoned code. Last commits from engineers who left. No README explaining what it does. Nobody is willing to delete it "just in case."

The cost:

Confusion about what's still relevant. Wasted time investigating dead code. False sense that automation exists when it doesn't.

Why it happens:

Nobody wants responsibility for deleting someone else's work. Nobody knows if it's still needed.

Why Automation Initiatives Fail: The Root Causes

Understanding why automation fails helps prevent creating more automation debt:

Root Cause 1: The Lone Champion Problem

The pattern:

One person drives automation. They're passionate, skilled, and productive. Everyone else is passive or resistant. When that person leaves, everything collapses.

Why it happens:

The organization mistakes individual initiative for organizational capability. "We have automation" actually means "we have one person who does automation."

The fix:

Automation needs team-wide ownership from the start. Multiple people need to be involved. If only one person is working on it, you don't have automation - you have a dependency.

Root Cause 2: Tools Before Skills

The pattern:

Buy the automation platform first. Figure out how to use it later. Except "later" never comes because nobody has time to learn.

Why it happens:

Tools are easier to budget than training and time. Buying a tool feels like progress. Building skills is slower and harder to measure.

The fix:

Develop skills before buying tools. Start with free/open-source tools that teach the concepts. Prove value before investing in commercial platforms.

Root Cause 3: No Time Allocated

The pattern:

"In your spare time, work on automation." Except there is no spare time. Automation becomes a side project that never gets priority.

Why it happens:

Leadership wants automation but won't deprioritize anything else to create space for it. "Just automate stuff" without removing other work.

The fix:

Explicitly allocate time. "20% of your time is dedicated to automation work." Or "this quarter, automation is a higher priority than that project."

The resource allocation challenge connects to budget and priority decisions explored in Your First IT Budget - automation requires real investment, not just good intentions.

Root Cause 4: Automation Anxiety

The pattern:

Team members are afraid of automation. "What if it breaks something?" "What if I don't understand what it's doing?" Fear prevents adoption.

Why it happens:

Manual processes are known and safe. Automation is unknown and risky. People default to what they know works.

The fix:

Start with low-risk automation. Demonstrate that it works reliably. Build confidence through success, not by forcing adoption.

Root Cause 5: The All-or-Nothing Mentality

The pattern:

"We're going to automate everything!" Three months later: automated almost nothing, team is overwhelmed, initiative abandoned.

Why it happens:

Ambitious vision without realistic scoping. Trying to change everything at once creates change fatigue.

The fix:

Start small. Automate one painful thing well. Build from success. Incremental automation that actually gets used beats ambitious automation that gets abandoned.

Root Cause 6: No Clear "Why"

The pattern:

"We should automate because everyone's doing it." Or "automation is best practice." Without a clear understanding of what problem it solves.

Why it happens:

Following trends rather than solving actual problems. Automation for automation's sake.

The fix:

Identify the specific pain point automation will address. "Manual deployments take 6 hours and have a 20% error rate. Automation should reduce both."

What Actually Works: Avoiding Automation Debt

After seeing automation fail repeatedly and understanding why, patterns emerge around what actually works:

Strategy 1: Start With the Painful Manual Process

Don't automate because you should. Automate because something is painful.

The approach:

Identify the team's biggest manual pain point. The task that:

Takes too much time
Gets done incorrectly often
Everyone hates doing
Blocks other work

Then ask: "If we could automate this one thing, would it meaningfully improve our lives?"

If yes, start there. If no, find a different problem.

Why this works:

Automation that solves real pain gets adopted. Automation that's "best practice" but doesn't solve felt pain gets ignored.

Example:

"Generating config backups takes 2 hours every week, and someone always forgets a device. Automating this saves time and reduces errors." Clear value.

NOT: "We should have infrastructure as code because that's modern." ← No clear value.

Strategy 2: Team Ownership From Day One

Don't let automation be one person's project.

The approach:

From the start, involve multiple people:

Who's working on it (at least 2-3 people)
Who's learning the technology
Who will maintain it
Who will advocate for adoption

The pairing approach:

Pair people on automation projects. One person knows the manual process well. Another knows automation tools. They work together.

Why this works:

Knowledge distribution prevents a single point of failure. Multiple people invested in success means a better chance of sustained adoption.

The sign it's working:

Three months in, you can ask, "Who knows how this automation works?" and get multiple names.

Strategy 3: Skills First, Tools Later

Don't buy the expensive platform until you've proven the concept with free tools.

The approach:

Start with Python, Ansible, or whatever free tool teaches the concepts. Build something that works. Prove the value.

Then consider whether commercial tools add enough value to justify the cost.

Why this works:

Free tools force you to learn the fundamentals. Commercial tools let you skip learning, which means when something breaks, nobody knows how to fix it.

The progression:

Manual scripts (prove the concept)
Open-source tools (build team skills)
Commercial platforms (scale what's working)

Example:

Write Python scripts to automate backups. When that's working reliably, and multiple people understand it, then consider whether a commercial backup solution adds value.

Strategy 4: Allocate Real Time

"Work on automation in your spare time" guarantees failure.

The approach:

Explicitly protect time for automation:

"Fridays are automation days. No meetings. Focus on automation projects."

"This quarter, 20% of team time goes to automation. That means we're deferring [specific project]."

Why this works:

Without protected time, operational urgency always wins. Automation never gets priority unless it's explicitly protected.

What to say when leadership pushes back:

"We can do automation, or we can do [other project]. We can't do both well with the current capacity. Which is the priority?"

Strategy 5: One Thing at a Time

Automate one thing completely rather than ten things partially.

The approach:

Pick one process. Automate it completely. Make it reliable and get everyone using it. Document it.

Then move to the next thing.

Why this works:

Complete automation of one thing delivers value and builds confidence. Partial automation of everything delivers frustration.

The test:

Can you point to one manual process that's now fully automated and being used by the whole team? If no, don't start automating a second thing yet.

Strategy 6: Make It Less Scary Than Manual

Automation has to be obviously better than the manual process.

The approach:

Make automation:

Easier to use than the manual process
Clearly more reliable
Obviously faster
Well-documented
Easy to verify results

Example:

Manual process: "SSH into 25 devices, run commands, copy/paste output into spreadsheet, verify by eye."

Automated process: "Run script, get report showing all devices and any issues, review in 5 minutes."

Why this works:

If automation is harder or scarier than manual, people won't use it. Automation has to be obviously better to overcome inertia.

Strategy 7: Document Like You're Going to Leave

Assume the person who built it will be gone in six months.

The approach:

Every automation project needs:

README explaining what it does and why
How to run it
What can go wrong and how to fix it
Who to ask if it breaks
Where the source of truth configuration lives

The test:

Can a team member who wasn't involved in building it figure out how to use it and fix it from the documentation alone?

If not, the documentation isn't good enough.

Why this works:

Documentation ensures automation outlives the champion. It becomes organizational knowledge, not individual knowledge.

Documentation as organizational knowledge connects to the inherited network challenge explored in Inheriting Someone Else's Network - undocumented automation becomes unmaintainable automation.

The Manager's Role in Preventing Automation Debt

From a management perspective, here's what actually helps teams avoid automation debt:

Set Realistic Expectations

Don't say: "Let's automate everything this quarter!"

Do say: "This quarter, let's fully automate our configuration backup process. Next quarter, we'll automate deployment if that goes well."

Why it matters:

Unrealistic expectations lead to abandoned half-solutions. Realistic expectations lead to completed automation that gets used.

Protect Time

Don't say: "Work on automation when you can."

Do say: "Automation is 20% of team time. Here's what we're deferring to create that space."

Why it matters:

Without protected time, automation never happens. Or it happens as unpaid overtime, which creates resentment.

Invest in Skills Before Tools

Don't do: Buy an expensive automation platform, hope the team figures it out.

Do this: Send people to training. Give them time to experiment. Start with free tools. Buy the platform only after proving value.

Why it matters:

Tools without skills create expensive shelf-ware. Skills with basic tools create actual automation.

Celebrate Incremental Progress

Don't wait for: "We've automated everything!"

Celebrate: "We fully automated the backup process, and everyone's using it. That's a win."

Why it matters:

Small wins build momentum. Waiting for perfect automation means never celebrating progress.

Address Resistance Directly

When team members resist automation:

"I hear you're concerned about automation. What specifically worries you?"

Then address the actual concern - whether it's fear of breaking things, lack of skills, or not seeing the value.

Why it matters:

Unaddressed resistance kills adoption. Addressing concerns directly can turn resisters into advocates.

When to Kill Automation (And How)

Sometimes the right answer is to acknowledge automation isn't working and kill it cleanly.

Signs Automation Should Be Retired

It's not being used:

Six months in, the adoption rate is under 30% and not growing.

Nobody understands it:

The person who built it left. Nobody else can maintain it. It's effectively unmaintained code running in production.

It's creating more work than it saves:

The automation requires so much maintenance and troubleshooting that the manual process would be faster.

It's partial, and nobody's completing it:

Started strong, stalled at 30% coverage, been stagnant for months.

It's been superseded:

A new tool or approach makes this automation obsolete.

How to Kill Automation Cleanly

Document what it did and why it's being retired:

"This Ansible automation was an attempt to automate deployments. It covered 30% of use cases but stalled. We're retiring it because [specific reasons]. The manual process is documented here."

Formally deprecate rather than letting it rot:

Don't just stop using it and leave it there. Officially retire it. Move it to an archive. Update documentation to reflect it's no longer supported.

Extract lessons learned:

"What did we learn from this attempt? What would we do differently next time?"

Don't blame people:

Automation failing isn't a personal failure. It's a learning opportunity.

What Success Looks Like

Successful automation doesn't happen overnight. Here's the realistic timeline:

Month 1-2: Building and Learning

The team is learning the tool. Building the automation. Testing it. Documenting it.

Progress is slow. That's normal.

Month 3-4: Early Adoption

A few team members start using it. Others still prefer manual. Usage is maybe 30-40%.

This is fine. Don't panic.

Month 5-6: Majority Adoption

Most team members use automation most of the time. Manual process is the exception, not the rule.

This is success.

Months 6-12: Expansion

The automation is reliable. The team has confidence in it. Now you can build on it - adding capabilities, automating adjacent processes, tackling the next automation target.

The key indicator:

New team members are taught the automation approach as "how we do things here" rather than the manual process being the default.

The Bottom Line: Automation Isn't Magic

Here's what becomes clear when you've seen automation succeed and fail:

Automation isn't a technology problem - it's a people and process problem.

You can have the best tools and still fail if:

Only one person drives it
No time is protected for it
The team isn't invested in learning it
You're automating things that don't need automating
You try to automate everything at once

Successful automation requires:

Solving real pain points
Team-wide ownership
Skills development before tool purchasing
Protected time to build and learn
Incremental progress celebrated
Documentation that outlives the champion

The graveyard exists because organizations treat automation as:

A technology purchase rather than a cultural change
An individual's side project rather than the team's capability
Something that happens "when there's time" rather than a real priority
All-or-nothing rather than incremental progress

What actually works:

Start small. Pick one painful process. Get multiple people involved. Allocate real time. Build it completely. Document it thoroughly. Get everyone using it. Then move to the next thing.

The automation graveyard grows when:

Organizations skip these steps and jump straight to "we should automate everything with [expensive tool]."

The alternative:

Slow, steady automation that actually gets adopted, maintained, and expanded. One well-automated process is worth ten abandoned attempts.

Your graveyard might already exist. The question is: are you going to add to it, or start building automation that actually lasts?

📧 Working on network automation or trying to avoid past mistakes? Subscribe to my monthly newsletter for practical perspectives on automation adoption, building team capabilities, and making technology investments that actually deliver value. First Tuesday of every month. Sign up here

What's in your automation graveyard? What automation attempts have failed and why? What's actually worked for you? Share your experiences in the comments or connect with me on LinkedIn - we're all learning from these failures together.

Disclaimer: The views and experiences shared in this blog are based on common patterns observed across the network engineering community and do not represent any specific company, team, or individual.

Pat Allen

Automation Debt: The Graveyard of Good Intentions (And Why Your Automation Keeps Failing)

The Graveyard Tour

What Automation Debt Actually Is

Why It's Worse Than Technical Debt

How Automation Debt Accumulates: The Predictable Pattern

Phase 1: The Enthusiastic Champion

Phase 2: The Demo That Impresses

Phase 3: The Reality of Adoption

Phase 4: The Champion's Burnout

Phase 5: The Abandonment

The Different Flavors of Automation Debt

The Expensive Shelf-ware

The Fragile Scripts Only One Person Understands

The Partial Solution That's Worse Than Nothing

The Competing Standards Chaos

The GitHub Graveyard

Why Automation Initiatives Fail: The Root Causes

Root Cause 1: The Lone Champion Problem

Root Cause 2: Tools Before Skills

Root Cause 3: No Time Allocated

Root Cause 4: Automation Anxiety

Root Cause 5: The All-or-Nothing Mentality

Root Cause 6: No Clear "Why"

What Actually Works: Avoiding Automation Debt

Strategy 1: Start With the Painful Manual Process

Strategy 2: Team Ownership From Day One

Strategy 3: Skills First, Tools Later

Strategy 4: Allocate Real Time

Strategy 5: One Thing at a Time

Strategy 6: Make It Less Scary Than Manual

Strategy 7: Document Like You're Going to Leave

The Manager's Role in Preventing Automation Debt

Set Realistic Expectations

Protect Time

Invest in Skills Before Tools

Celebrate Incremental Progress

Address Resistance Directly

When to Kill Automation (And How)

Signs Automation Should Be Retired

How to Kill Automation Cleanly

What Success Looks Like

Month 1-2: Building and Learning

Month 3-4: Early Adoption

Month 5-6: Majority Adoption

Months 6-12: Expansion

The Bottom Line: Automation Isn't Magic

Making the Call: When to Build In-House vs. Buy vs. Outsource

The First Time You Realize You Can't Do It All: Delegation for Control Freaks