uWSGI and Flask gotchas that show up only in production
The operational and architectural problems teams hit when Flask apps move from local simplicity to uWSGI-backed production environments.
Flask feels deceptively simple in development. That is part of its appeal. The trouble begins when the production stack is treated as if it is just the development server with more CPU.
Once Flask sits behind uWSGI, a reverse proxy, process management, timeouts, and real traffic, the problems change. Many outages are caused not by Flask itself, but by incorrect assumptions about process behavior, buffering, worker lifecycle, and request ownership.
These are the gotchas I would review first.
1. The development server teaches the wrong lessons #
If the team learned the app through flask run, they probably internalized unsafe assumptions:
- one process model
- no serious timeout handling
- no realistic concurrency behavior
- simplistic startup and reload behavior
By the time the app reaches uWSGI, those assumptions show up as bugs in request globals, connection handling, and deployment expectations.
A production Flask app should be designed with its serving model in mind from the start.
2. Worker count is not a performance strategy by itself #
One of the most common uWSGI anti-patterns is increasing worker count to solve every problem.
That can actually make things worse:
- more DB connections
- higher memory footprint
- more context switching
- more lock contention in shared dependencies
- more painful overload behavior
If latency is caused by blocking I/O, slow queries, or external API waits, multiplying workers may only multiply the backlog.
The right question is not “how many workers can we fit?” It is:
What does each request spend time doing, and where is concurrency actually limited?
3. Request timeouts need to line up across the stack #
Production outages often happen because timeouts disagree:
- load balancer timeout
- reverse proxy timeout
- uWSGI harakiri or socket timeout
- app-level timeout or retry logic
- client retry policy
If those values are inconsistent, one layer may give up while another keeps working. That produces ghost work, duplicate requests, or confusing partial failures.
Timeout design is architectural, not just configuration.
For example, if the proxy times out first but uWSGI keeps processing, you may still execute a state-changing request after the client has already retried.
4. Database connections are often the real bottleneck #
Teams tune uWSGI workers before they understand database capacity.
A simple Flask app with too many workers can overwhelm MySQL or PostgreSQL faster than expected. The result may look like app slowness, but the root cause is connection saturation and lock contention downstream.
This is why worker sizing should be tied to:
- database pool size
- query latency
- downstream API concurrency limits
- request mix
Without that model, worker tuning becomes blind overcommitment.
5. Memory behavior matters more than many Flask teams expect #
Python process memory is easy to underestimate in production.
With uWSGI, memory growth can come from:
- large imports at worker start
- in-process caches
- request-local objects that linger longer than expected
- serialization overhead for large responses
- accidental buffering of request or response bodies
Then teams add more workers and wonder why the host is unstable.
Memory tuning is not just about “per worker” memory. It is about total resident cost across:
- master process
- worker processes
- threads if enabled
- loaded modules
- buffer behavior during peak traffic
The wrong process model can turn a modest Flask service into a host-level memory problem.
6. Threads, processes, and monkey-patching are easy to combine badly #
uWSGI is flexible enough to let teams create very confusing concurrency behavior.
Examples:
- processes plus threads without understanding thread-safety in the app
- gevent or evented patterns mixed with blocking libraries
- background work started inside the web process
- globals assumed to be shared when they are per-process
The danger is not just wrong performance. It is unpredictable correctness.
A production service should have a clearly chosen concurrency model. If the team cannot describe that model simply, there is already too much ambiguity.
7. Background work inside Flask processes is a trap #
This is a major operational gotcha.
Teams often let Flask trigger background work in-process because it is convenient:
- sending email after request return
- polling external systems
- starting threads
- scheduling jobs in the app container
That becomes fragile quickly:
- work disappears on worker restart
- duplicate work appears during scaling
- deployment cycles interrupt long tasks
- failures are poorly observed
Web workers should serve requests. Long-running or retryable work belongs in a queue-backed worker system.
8. Logging and request correlation are frequently underbuilt #
In many Flask/uWSGI deployments, logs exist but are not operationally useful.
Common issues:
- application logs and uWSGI logs are disconnected
- request IDs are not propagated
- proxy headers are not trusted or normalized correctly
- timeout and disconnect behavior is unclear in logs
That means when a request fails, operators cannot tell whether:
- the proxy dropped it
- uWSGI killed it
- the app raised
- the downstream dependency stalled
A production web stack needs unified request-level observability, not just lines printed from multiple layers.
9. Static file and request buffering assumptions matter #
Flask should not be responsible for every byte of front-door traffic. If static delivery, upload buffering, or large response handling is not shaped correctly at the proxy layer, the Python app pays the cost.
That can show up as:
- memory spikes
- slow worker turnover
- request stalls
- poor tail latency
The clean separation is:
- proxy handles what it is good at
- uWSGI manages app process behavior
- Flask focuses on application logic
When those boundaries blur, throughput suffers.
10. Graceful reloads are only graceful if dependencies cooperate #
uWSGI can reload workers cleanly in theory. In practice, long requests, hanging downstream calls, or badly managed connections can make reloads risky.
Questions I ask during deployment review:
- what happens to in-flight requests during reload
- how long can a request legitimately run
- do workers drain before restart
- are DB and external connections released predictably
- does readiness actually reflect app health
A deployment process is part of runtime architecture. If reload semantics are vague, incident risk rises.
11. Flask simplicity can hide application-structure problems #
Because Flask does not force much structure, teams often let architecture drift:
- route handlers with business logic and SQL mixed together
- hidden shared state
- import cycles
- configuration spread across environment, code, and startup scripts
This works until production behavior becomes hard to reason about. Then uWSGI gets blamed for problems that are really application-boundary issues.
Minimal frameworks demand stronger discipline from the team, not less.
12. Health checks are often too shallow #
A /healthz endpoint that always returns 200 is not enough.
You need to decide what health means:
- process alive
- app booted
- DB reachable
- migrations applied
- critical dependencies responsive
Different checks should answer different questions. Otherwise, orchestrators and load balancers make bad decisions based on incomplete signals.
What I would review in a production Flask/uWSGI stack #
My first-pass review checklist is:
- worker model and worker count
- timeout alignment across proxy, uWSGI, and app
- DB pool size versus concurrency model
- memory profile per worker and under peak load
- request logging and correlation IDs
- background work separation
- graceful deployment and reload behavior
- health check semantics
This usually surfaces the highest-risk production mistakes quickly.
Closing view #
The big Flask/uWSGI gotchas are not about obscure syntax. They are about operational assumptions.
Production reliability depends on whether the team understands:
- process boundaries
- concurrency behavior
- timeout ownership
- dependency saturation
- deployment semantics
Flask stays productive in production when the architecture around it is explicit. Without that, “simple” becomes fragile very quickly.