Refactoring a monolith without blowing up production

There's a lie LinkedIn loves to repeat:

"Migrate your monolith to microservices."

No.

90% of cases: you should refactor the monolith.

Not split it.

And refactoring a big monolith is a specific art — one nobody teaches because "it's not sexy".

First: why refactor?

Refactoring isn't "making it pretty".

Refactoring is:

reducing onboarding time for new devs
reducing bug rate
increasing delivery speed
increasing confidence in changes

If there's no DELIVERY pain, you might not need to refactor right now.

Refactoring without real pain is technical vanity.

Rule number 1: tests before anything

You don't refactor untested code.

You rewrite it. And rewriting breaks things.

If the area you want to touch has no coverage:

Stop. Write tests for current behavior.
Make sure they pass.
Then refactor.
Run the tests on every small change.

This is the reverse "boy scout rule": leave the base tested before taking action.

Strangler Fig — the strategy that works

Term coined by Martin Fowler.

Idea: you don't replace the whole system at once. You grow something new around the old, and migrate route by route until the old is just a husk.

[Client]
   ↓
[Router]
   ├── route A → new code
   ├── route B → new code
   ├── route C → old code
   └── route D → old code

Over time, all routes migrate. You delete the old.

Opposite strategy: big bang rewrite. Fails in 90% of cases.

Inventory before action

Before touching code, map out:

which modules exist
which depend on which
which have test coverage
which change most often (git log frequency)
which generate the most bugs (issue tracker)

Cross those last three: high change + high bugs + low coverage = start here.

That's the highest ROI.

Patterns to extract context

1. Domain object

Replace "cute service class" with a domain object that has its own identity.

# before
UserService.process(user, params)

# after
PaymentRequest.new(user: user, amount: amount).submit

It's not just renaming. It's moving behavior to where it belongs.

2. Value object

Anything that's "primitive surrounded by rules" deserves to be an object:

# before — Money scattered as integer
total_in_cents = price * quantity
formatted = "R$ #{total_in_cents / 100.0}"

# after — Money class
total = Money.new(price) * quantity
formatted = total.format(:brl)

Encapsulates the rule. Reduces bugs. Allows reuse.

3. Bounded context

Inspired by DDD. Doesn't need to become a microservice — can be just strict namespacing:

app/
├── billing/
│   ├── invoice.rb
│   ├── payment.rb
│   └── subscription.rb
├── catalog/
│   ├── product.rb
│   └── category.rb
└── identity/
    ├── user.rb
    └── session.rb

Each folder = a context. Communication between them through explicit interface, not direct cross-reference.

That's all "modular monolith" means.

Feature flag is your second best friend

Want to migrate a critical route? Don't swap everything at once.

if Flag.enabled?(:new_checkout, user)
  NewCheckout.process(order)
else
  OldCheckout.process(order)
end

Roll out to 1% of users. Measure. Increase. Measure. Increase.

When 100% has been running smoothly for X weeks, delete the old.

Without feature flags: either you launch and pray, or you create a long-lived branch that becomes a merge nightmare.

Database migrations need triple care

Refactoring code is one thing. Migrating schema is another.

Rules:

1. Breaking change = several deploys.

Step 1: add new column (compatible with current code)
Step 2: app starts writing to old AND new column
Step 3: backfill the new column
Step 4: app reads from the new column
Step 5: stop writing to the old
Step 6: drop the old

Each step is a separate deploy.

2. Never add NOT NULL without a default to a big table.

Total lock. App freezes.

3. Watch out for ALTER TABLE on tables with millions of rows.

Use pg_repack, strong_migrations gem, or strategies specific to your database.

The code graveyard

Before refactoring, find out what can die.

Tools:

rubocop with Lint/UselessAssignment
coverage running in production (coverband gem)
endpoint usage logs (if nobody's called it in 6 months, consider deleting)

Refactoring dead code is wasted effort.

Delete first.

What not to do

Rewrite from scratch without understanding the current system.
Refactor multiple areas in parallel (each becomes an infernal branch).
Refactor and add a feature in the same PR.
"Clean up" without regression tests.
Step outside the PR scope to "fix more things".

Refactoring is surgery. Not a cleanup.

Metrics to track

Before the refactor:

lead time (PR opened → production)
bug rate per area
onboarding time
coverage per context

After:

same metrics
compare
show leadership it was worth it

Without metrics, refactoring becomes "they keep messing with old code instead of shipping features".

And then you lose the political budget to continue.

The big shift

Refactoring a monolith isn't destroying and rebuilding.

It is:

choosing where there's real ROI
protecting with tests
changing in small pieces
using feature flags to de-risk
measuring before and after

People who follow this refactor huge monoliths with confidence.

People who ignore it rewrite thinking "it'll be fast" — and take 2 years on what should've been 3 months.

Conclusion

The monolith isn't the villain.

A poorly maintained monolith is.

Microservices aren't the solution. They're a trade-off with enormous costs — operational, complexity, latency, debugging.

Technical maturity is understanding that:

a well-built modular monolith beats badly-built microservices
consistent refactoring beats revolutionary rewrites
well-designed context beats service crap scattered around

Refactoring well is less about code and more about process discipline.

Whoever learns this changes their whole career.