Database optimization is the difference between a web app that loads in 200ms and one that crawls at 5 seconds. As your application grows from hundreds to millions of rows, queries that ran fine in development start choking production servers. The good news: most database optimization techniques don’t require a PhD in computer science — they require understanding how your database engine thinks.

Table of Contents

At GTWebs, we’ve tuned databases for high-traffic web applications across MySQL, PostgreSQL, and MongoDB. These are the strategies that deliver the biggest performance gains with the least disruption.

database optimization - A close-up view of a laptop displaying a search engine page. — Photo by cottonbro studio on Unsplash

1. Index the Right Columns (And Stop There)

Indexing is the single most impactful database optimization technique. An index lets the database find rows without scanning the entire table — the difference between reading 10 rows and reading 10 million.

But more indexes aren’t always better. Every index slows down INSERT, UPDATE, and DELETE operations because the database must update each index alongside the data. Index columns that appear in WHERE clauses, JOIN conditions, and ORDER BY expressions. Don’t index columns with low cardinality (like a boolean `is_active` field on its own) or columns that are rarely queried.

Composite indexes are where the real wins hide. A query filtering on `WHERE user_id = ? AND status = ‘active’ AND created_at > ?` benefits from a single composite index on `(user_id, status, created_at)` — not three separate indexes. The column order matters: put equality conditions first, then range conditions.

“`sql — Creates a composite index optimized for the common query pattern CREATE INDEX idx_user_status_date ON orders (user_id, status, created_at); “`

2. Use EXPLAIN to Understand Query Execution

Stop guessing why a query is slow. Run `EXPLAIN` (MySQL) or `EXPLAIN ANALYZE` (PostgreSQL) to see exactly how the database executes your query. Look for:

Full table scans (`type: ALL` in MySQL) — the database is reading every row
Filesort — sorting without an index, which spills to disk on large datasets
Temporary tables — created in memory or on disk for GROUP BY or DISTINCT operations
High row estimates — the optimizer expects to examine too many rows

The PostgreSQL EXPLAIN documentation is one of the best resources for understanding query plans. Every developer working with databases should read it.

3. Eliminate N+1 Queries

The N+1 query problem is the most common database optimization issue in web applications using ORMs. It happens when your code fetches a list of records, then makes a separate query for each record’s related data.

“`python

N+1 problem: 1 query for posts + N queries for authors

Q: How many indexes should a database table have?

There's no fixed number, but a practical guideline is 3-5 indexes per table for OLTP workloads. Each index speeds up reads but slows writes. Create indexes that match your actual query patterns (check slow query logs), and regularly audit for unused indexes using tools like pg_stat_user_indexes or MySQL's sys.schema_unused_indexes.

Q: When should I switch from a relational database to NoSQL?

Consider NoSQL when your data is naturally document-shaped (nested, variable schema), when you need horizontal scaling beyond what a single relational server handles, or when your access patterns are simple key-value lookups at massive scale. For most web applications with complex relationships, transactions, and reporting needs, a well-optimized relational database outperforms NoSQL.

Q: How do I fix slow queries caused by JOINs on large tables?

First, ensure join columns are indexed on both tables. Use EXPLAIN to verify the optimizer is using the indexes. Consider denormalizing the most expensive joins into pre-computed columns or summary tables. For analytical queries joining multiple large tables, move them to a read replica or data warehouse instead of running them on your production database.

posts = Post.objects.all() # 1 query for post in posts: print(post.author.name) # N queries “`

The fix is eager loading. In Django, use `select_related()` for foreign keys and `prefetch_related()` for many-to-many relationships. In Laravel, use `with()`. In ActiveRecord, use `includes()`. This collapses N+1 queries into 1-2 queries regardless of the dataset size.

“`python

Fixed: 2 queries total regardless of post count

posts = Post.objects.select_related(‘author’).all() “`

4. Denormalize Strategically

Fully normalized databases minimize data redundancy, but they can require complex JOINs that become expensive at scale. Strategic denormalization — storing pre-computed or duplicated data — trades storage space for query speed.

Common denormalization patterns include:

Counter caches: Store `comments_count` on the post table instead of running `COUNT(*)` on every page load
Materialized columns: Store `full_name` alongside `first_name` and `last_name` to avoid concatenation in queries
Summary tables: Pre-aggregate daily/weekly/monthly stats into dedicated tables

database optimization - Close-up of a computer monitor displaying cyber security data and code, indicative of system hacking or programming. — Photo by Tima Miroshnichenko on Unsplash

The key is to denormalize read-heavy data and keep your write operations updating the denormalized values. Database triggers or application-level observers can maintain consistency.

5. Partition Large Tables

When a table grows beyond tens of millions of rows, even well-indexed queries slow down because the index itself becomes large. Table partitioning splits a logical table into smaller physical segments based on a partition key.

Range partitioning by date is the most common strategy for time-series data. An `orders` table partitioned by month lets the database skip entire partitions when your query only needs recent data. Older partitions can be archived or dropped without affecting current queries.

“`sql — PostgreSQL range partitioning by month CREATE TABLE orders ( id BIGSERIAL, created_at TIMESTAMPTZ NOT NULL, total DECIMAL(10,2) ) PARTITION BY RANGE (created_at);

CREATE TABLE orders_2026_03 PARTITION OF orders FOR VALUES FROM (‘2026-03-01’) TO (‘2026-04-01’); “`

6. Optimize Your Connection Pooling

Every database connection consumes memory (typically 5-10MB per connection in PostgreSQL). Applications that open a new connection per request waste time on TCP handshakes and authentication. Connection pooling reuses a fixed number of connections across all requests.

Use PgBouncer for PostgreSQL or ProxySQL for MySQL as external connection poolers. Configure your pool size based on the formula: `connections = (CPU cores * 2) + effective_spindle_count`. For a typical 4-core server, 10-15 connections is often optimal. More connections means more context switching and worse performance — PostgreSQL’s wiki confirms that fewer connections often yield higher throughput.

7. Cache Query Results at the Application Layer

Not every query needs to hit the database. Database optimization includes knowing when to avoid the database entirely. Cache frequently-read, rarely-changed data in Redis or Memcached.

Good cache candidates include:

User session data and permissions
Configuration settings
Product catalog data (with invalidation on updates)
Aggregated dashboard statistics

Implement cache invalidation carefully. Time-based expiration (TTL) is simplest. Event-based invalidation (clear the cache when the underlying data changes) is more accurate but more complex. Most applications benefit from a combination of both.

8. Write Efficient Queries

Small query changes yield big results:

SELECT only the columns you need — `SELECT id, name, email` instead of `SELECT *`. This reduces memory usage, network transfer, and allows covering indexes to work.
Use EXISTS instead of IN for subqueries — `EXISTS` short-circuits when it finds the first match; `IN` materializes the entire subquery result.
Avoid functions on indexed columns — `WHERE YEAR(created_at) = 2026` can’t use an index on `created_at`. Use `WHERE created_at >= ‘2026-01-01’ AND created_at < '2027-01-01'` instead.
Batch your INSERTs — inserting 1,000 rows in a single statement is 10-50x faster than 1,000 individual INSERT statements.

9. Monitor Slow Queries Continuously

Database optimization is not a one-time activity. Enable slow query logging (set the threshold to 100-500ms) and review the log weekly. Use tools like `pg_stat_statements` (PostgreSQL) or the Performance Schema (MySQL) to identify your most expensive queries by total execution time, not just individual duration.

A query that takes 50ms but runs 100,000 times per day is a bigger optimization target than a query that takes 5 seconds but runs once daily. Monitoring helps you focus on what actually matters.

We’ve helped many teams optimize their database layer — you can find more backend performance tips on the GTWebs blog.

Start with Measurement

Before applying any database optimization technique, measure your baseline. Know your current query times, connection counts, cache hit rates, and I/O throughput. Apply one change at a time and measure the impact. The fastest path to a fast database is a systematic one.

Frequently Asked Questions

How many indexes should a database table have?

There’s no fixed number, but a practical guideline is 3-5 indexes per table for OLTP workloads. Each index speeds up reads but slows writes. Create indexes that match your actual query patterns (check slow query logs), and regularly audit for unused indexes using tools like pg_stat_user_indexes or MySQL’s sys.schema_unused_indexes.

When should I switch from a relational database to NoSQL?