Refactoring a monolith without blowing up production
Microservices aren't the answer. Refactoring a monolith is an art — strangler fig, feature flags, stepwise schema migration. What nobody tells you about the process.
Refactoring a monolith without blowing up production
There's a lie LinkedIn loves to repeat:
"Migrate your monolith to microservices."
No.
90% of cases: you should refactor the monolith.
Not split it.
And refactoring a big monolith is a specific art — one nobody teaches because "it's not sexy".
First: why refactor?
Refactoring isn't "making it pretty".
Refactoring is:
- reducing onboarding time for new devs
- reducing bug rate
- increasing delivery speed
- increasing confidence in changes
If there's no DELIVERY pain, you might not need to refactor right now.
Refactoring without real pain is technical vanity.
Rule number 1: tests before anything
You don't refactor untested code.
You rewrite it. And rewriting breaks things.
If the area you want to touch has no coverage:
- Stop. Write tests for current behavior.
- Make sure they pass.
- Then refactor.
- Run the tests on every small change.
This is the reverse "boy scout rule": leave the base tested before taking action.
Strangler Fig — the strategy that works
Term coined by Martin Fowler.
Idea: you don't replace the whole system at once. You grow something new around the old, and migrate route by route until the old is just a husk.
[Client]
↓
[Router]
├── route A → new code
├── route B → new code
├── route C → old code
└── route D → old code
Over time, all routes migrate. You delete the old.
Opposite strategy: big bang rewrite. Fails in 90% of cases.
Inventory before action
Before touching code, map out:
- which modules exist
- which depend on which
- which have test coverage
- which change most often (git log frequency)
- which generate the most bugs (issue tracker)
Cross those last three: high change + high bugs + low coverage = start here.
That's the highest ROI.
Patterns to extract context
1. Domain object
Replace "cute service class" with a domain object that has its own identity.
# before
UserService.process(user, params)
# after
PaymentRequest.new(user: user, amount: amount).submit
It's not just renaming. It's moving behavior to where it belongs.
2. Value object
Anything that's "primitive surrounded by rules" deserves to be an object:
# before — Money scattered as integer
total_in_cents = price * quantity
formatted = "R$ #{total_in_cents / 100.0}"
# after — Money class
total = Money.new(price) * quantity
formatted = total.format(:brl)
Encapsulates the rule. Reduces bugs. Allows reuse.
3. Bounded context
Inspired by DDD. Doesn't need to become a microservice — can be just strict namespacing:
app/
├── billing/
│ ├── invoice.rb
│ ├── payment.rb
│ └── subscription.rb
├── catalog/
│ ├── product.rb
│ └── category.rb
└── identity/
├── user.rb
└── session.rb
Each folder = a context. Communication between them through explicit interface, not direct cross-reference.
That's all "modular monolith" means.
Feature flag is your second best friend
Want to migrate a critical route? Don't swap everything at once.
if Flag.enabled?(:new_checkout, user)
NewCheckout.process(order)
else
OldCheckout.process(order)
end
Roll out to 1% of users. Measure. Increase. Measure. Increase.
When 100% has been running smoothly for X weeks, delete the old.
Without feature flags: either you launch and pray, or you create a long-lived branch that becomes a merge nightmare.
Database migrations need triple care
Refactoring code is one thing. Migrating schema is another.
Rules:
1. Breaking change = several deploys.
Step 1: add new column (compatible with current code)
Step 2: app starts writing to old AND new column
Step 3: backfill the new column
Step 4: app reads from the new column
Step 5: stop writing to the old
Step 6: drop the old
Each step is a separate deploy.
2. Never add NOT NULL without a default to a big table.
Total lock. App freezes.
3. Watch out for ALTER TABLE on tables with millions of rows.
Use pg_repack, strong_migrations gem, or strategies specific to your database.
The code graveyard
Before refactoring, find out what can die.
Tools:
rubocopwithLint/UselessAssignmentcoveragerunning in production (coverbandgem)- endpoint usage logs (if nobody's called it in 6 months, consider deleting)
Refactoring dead code is wasted effort.
Delete first.
What not to do
- Rewrite from scratch without understanding the current system.
- Refactor multiple areas in parallel (each becomes an infernal branch).
- Refactor and add a feature in the same PR.
- "Clean up" without regression tests.
- Step outside the PR scope to "fix more things".
Refactoring is surgery. Not a cleanup.
Metrics to track
Before the refactor:
- lead time (PR opened → production)
- bug rate per area
- onboarding time
- coverage per context
After:
- same metrics
- compare
- show leadership it was worth it
Without metrics, refactoring becomes "they keep messing with old code instead of shipping features".
And then you lose the political budget to continue.
The big shift
Refactoring a monolith isn't destroying and rebuilding.
It is:
- choosing where there's real ROI
- protecting with tests
- changing in small pieces
- using feature flags to de-risk
- measuring before and after
People who follow this refactor huge monoliths with confidence.
People who ignore it rewrite thinking "it'll be fast" — and take 2 years on what should've been 3 months.
Conclusion
The monolith isn't the villain.
A poorly maintained monolith is.
Microservices aren't the solution. They're a trade-off with enormous costs — operational, complexity, latency, debugging.
Technical maturity is understanding that:
- a well-built modular monolith beats badly-built microservices
- consistent refactoring beats revolutionary rewrites
- well-designed context beats service crap scattered around
Refactoring well is less about code and more about process discipline.
Whoever learns this changes their whole career.
