MSSQL to PostgreSQL: Tools, Scripts, and Performance Tuning
Migrating from Microsoft SQL Server (MSSQL) to PostgreSQL can reduce licensing costs, increase portability, and leverage PostgreSQL’s extensibility. This guide covers the tools to use, essential scripts for schema and data conversion, and performance tuning steps to ensure a smooth migration and production-ready PostgreSQL deployment.
1. Migration tools — when to use them
- pgloader — Best for straightforward bulk migrations. Handles schema creation, data copy, and basic type mapping with good speed. Use when you can tolerate some manual fixes after automated conversion.
- AWS SCT (Schema Conversion Tool) — Useful if migrating into AWS-managed RDS/Aurora PostgreSQL; converts schema and offers assessment reports. Requires AWS environment for full features.
- ora2pg — Although designed for Oracle, it can help with complex migrations via an extensible rule set; less common for MSSQL.
- SQL Server Integration Services (SSIS) — Use for complex ETL workflows, incremental loads, and transformations when staying in Microsoft tooling.
- Custom scripts (Python, Go, Node) — Required for complex transformations, stored procedure translation, or bespoke data cleaning.
2. Schema conversion: common differences and mapping
Data type mapping (common)
| MSSQL | PostgreSQL | Notes |
|---|---|---|
| INT, BIGINT | INTEGER, BIGINT | Direct mapping |
| VARCHAR(n) | VARCHAR(n) | Same; consider TEXT for unconstrained lengths |
| NVARCHAR(n) | VARCHAR(n) or TEXT | PostgreSQL uses UTF-8 by default; no separate NVARCHAR |
| DATETIME, SMALLDATETIME | TIMESTAMP WITHOUT TIME ZONE | Consider TIMESTAMP WITH TIME ZONE if storing UTC |
| DATETIME2 | TIMESTAMP | Higher precision in both |
| BIT | BOOLEAN | Map 0/1 to false/true |
| MONEY, SMALLMONEY | NUMERIC(19,4) | Prefer NUMERIC for exactness |
| UNIQUEIDENTIFIER | UUID | Use uuid type and gen_randomuuid() for generation (pgcrypto) |
| IMAGE, VARBINARY | BYTEA | Use BYTEA for binary data |
Constraints, indexes, and sequences
- MSSQL IDENTITY columns -> PostgreSQL sequences with SERIAL or IDENTITY. Prefer GENERATED BY DEFAULT AS IDENTITY for modern PostgreSQL.
- Primary/foreign keys and unique constraints map directly.
- Filtered indexes in MSSQL require partial indexes in PostgreSQL (CREATE INDEX … WHERE …).
- INCLUDE columns in MSSQL nonclustered indexes can be emulated by covering indexes — place columns in the index expression or accept planner differences.
Collation and case sensitivity
- PostgreSQL collations are set per column or database; add citext extension for case-insensitive text.
- Consider migrating to lowercased values or using functional indexes (LOWER(column)).
3. Translating T-SQL to PL/pgSQL
- Stored procedures and functions must be rewritten: T-SQL control flow, TRY/CATCH, and error handling differ.
- Replace functions like ISNULL(a,b) with COALESCE(a,b).
- String functions: REPLACE, SUBSTRING, CHARINDEX -> REPLACE, SUBSTRING, POSITION.
- Temporary tables: MSSQL uses #temp; PostgreSQL uses unlogged tables or temporary tables with CREATE TEMP TABLE.
- Transactions: PostgreSQL uses explicit BEGIN/COMMIT; notice that some nested transaction patterns require SAVEPOINT/ROLLBACK TO SAVEPOINT.
Example: simple stored procedure conversion
MSSQL (T-SQL)
sql
CREATE PROCEDURE dbo.IncrementCounter @id INT AS BEGIN UPDATE counters SET value = value + 1 WHERE id = @id; SELECT value FROM counters WHERE id = @id; END
PostgreSQL (PL/pgSQL)
sql
CREATE OR REPLACE FUNCTION increment_counter(p_id INTEGER) RETURNS INTEGER AS $\( </span><span></span><span class="token" style="color: rgb(0, 0, 255);">DECLARE</span><span> </span><span> v_value </span><span class="token" style="color: rgb(0, 0, 255);">INTEGER</span><span class="token" style="color: rgb(57, 58, 52);">;</span><span> </span><span></span><span class="token" style="color: rgb(0, 0, 255);">BEGIN</span><span> </span><span> </span><span class="token" style="color: rgb(0, 0, 255);">UPDATE</span><span> counters </span><span class="token" style="color: rgb(0, 0, 255);">SET</span><span> </span><span class="token" style="color: rgb(0, 0, 255);">value</span><span> </span><span class="token" style="color: rgb(57, 58, 52);">=</span><span> </span><span class="token" style="color: rgb(0, 0, 255);">value</span><span> </span><span class="token" style="color: rgb(57, 58, 52);">+</span><span> </span><span class="token" style="color: rgb(54, 172, 170);">1</span><span> </span><span class="token" style="color: rgb(0, 0, 255);">WHERE</span><span> id </span><span class="token" style="color: rgb(57, 58, 52);">=</span><span> p_id</span><span class="token" style="color: rgb(57, 58, 52);">;</span><span> </span><span> </span><span class="token" style="color: rgb(0, 0, 255);">SELECT</span><span> </span><span class="token" style="color: rgb(0, 0, 255);">value</span><span> </span><span class="token" style="color: rgb(0, 0, 255);">INTO</span><span> v_value </span><span class="token" style="color: rgb(0, 0, 255);">FROM</span><span> counters </span><span class="token" style="color: rgb(0, 0, 255);">WHERE</span><span> id </span><span class="token" style="color: rgb(57, 58, 52);">=</span><span> p_id</span><span class="token" style="color: rgb(57, 58, 52);">;</span><span> </span><span> </span><span class="token" style="color: rgb(0, 0, 255);">RETURN</span><span> v_value</span><span class="token" style="color: rgb(57, 58, 52);">;</span><span> </span><span></span><span class="token" style="color: rgb(0, 0, 255);">END</span><span class="token" style="color: rgb(57, 58, 52);">;</span><span> </span><span>\)$ LANGUAGE plpgsql;
4. Data migration scripts and patterns
- Use pgloader for fast bulk load:
- Create target schema (with manual adjustments).
- Run pgloader with a connection string and mapping rules to transform types.
- For complex ETL, use Python with psycopg2 and pyodbc:
- Stream rows in batches (e.g., 10k) to avoid memory spikes.
- Use COPY FROM STDIN for bulk inserts into PostgreSQL.
- Preserve transactionality: for large tables, migrate in consistent batches and use application-level quiesce or snapshot isolation where possible.
- Validate row counts, checksums, and key distributions after migration.
Example Python pattern (simplified)
python
# pseudocode src_cursor.execute(“SELECT id, col1, col2 FROM table”) while rows := src_cursor.fetchmany(10000): transformed = [transform_row(r) for r in rows] pg_cursor.copy_from(io.StringIO(format_for_copy(transformed)), ‘table’, sep=’ ‘) pg_conn.commit()
5. Handling identity, sequences, and foreign keys
- After loading data, sync sequences:
- SELECT setval(pg_get_serial_sequence(‘table’,‘id’), MAX(id)) FROM table;
- Temporarily disable foreign key checks by creating tables without constraints, load data, then add constraints with validation (using NOT VALID and VALIDATE CONSTRAINT later) to speed loading.
6. Performance tuning after migration
PostgreSQL configuration highlights
| Setting | Recommendation | Notes |
|---|---|---|
| shared_buffers | 25% of RAM | For dedicated DB servers |
| effective_cache_size | 50-75% of RAM | Helps planner estimate available cache |
| work_mem | 16MB–256MB per connection | Tune for complex sorts/joins; increase for OLAP |
| maintenance_work_mem | 512MB–2GB | For CREATE INDEX and VACUUM operations |
| max_wal_size | 1–4GB (or higher) | Reduce checkpoint frequency by increasing |
| wal_level | replica | If using replication; otherwise minimal |
| synchronous_commit | on (or off for async needs) | Off can improve write performance at durability cost |
Schema and query tuning
- Use EXPLAIN (ANALYZE, BUFFERS) to profile slow queries and adapt indexes.
- Replace scalar subqueries with JOINs where appropriate.
- Use BRIN indexes for very large append-only tables.
- Leverage partial and expression indexes for selective filters.
- Normalize vs denormalize decisions: PostgreSQL handles joins well but consider materialized views for heavy aggregations.
- VACUUM and ANALYZE: run VACUUM FULL sparingly; use autovacuum tuning to prevent bloat.
Concurrency and connection pooling
- Use a connection pooler (pgbouncer in transaction mode) to avoid too many active connections.
- Tune max_connections considering RAM and work_mem.
Index maintenance
- Rebuild bloated indexes with REINDEX or CREATE INDEX CONCURRENTLY to avoid downtime.
- Use pg_repack for online table reorganization.
7. Testing, validation, and cutover strategy
- Staging run: perform a full dry-run migration to a staging cluster; validate schema, query plans, and application behavior.
- Performance baselines: capture query latencies and throughput in MSSQL and compare in PostgreSQL.
- Data validation: row counts, checksums (e.g., md5 concatenated columns), spot-check business-critical queries.
- Cutover options:
- Big-bang: short downtime, full final sync and switch.
- Phased: replicate changes (logical replication or triggers) and switch when ready.
- Rollback plan: keep MSSQL read-only fallback for a defined period after cutover.
8. Common pitfalls and fixes
- Unexpected type mismatches: proactively map types and run automated checks.
- Collation/case-sensitivity differences: use citext or functional indexes.
- Transaction semantics differences: test stored proc and transaction behavior under load.
- Sequence mismatches causing unique violations: set sequences after load.
- Missing indexes leading to slow queries: run EXPLAIN and re-add appropriate indexes.
9. Checklist (pre-migration to post-cutover)
- Inventory schemas, procedures, and ETL jobs.
- Map data types and collations.
- Convert stored procedures and functions.
- Choose migration tool(s) and test on staging.
- Migrate schema, then data in batches; sync sequences.
- Validate data integrity and query correctness.
- Tune PostgreSQL settings and rebuild indexes.
- Execute cutover, monitor performance, and validate.
- Post-cutover: enable autovacuum tuning, backups, monitoring, and set maintenance routines.
10. Resources and commands (quick reference)
- pgloader: https://pgloader.io
- psql COPY example:
sql
COPY mytable (col1, col2) FROM STDIN WITH (FORMAT csv);
- Set sequence:
sql
SELECT setval(pg_get_serialsequence(‘mytable’,‘id’), (SELECT MAX(id) FROM mytable));
- Analyze slow query:
sql
EXPLAIN (ANALYZE, BUFFERS) SELECT ...;
Follow these steps to move from MSSQL to PostgreSQL with minimal disruption, keeping a strong emphasis on testing, validation, and iterative performance tuning.
Leave a Reply