Converting MSSQL Schemas and Queries for PostgreSQL Compatibility

MSSQL to PostgreSQL: Tools, Scripts, and Performance Tuning

Migrating from Microsoft SQL Server (MSSQL) to PostgreSQL can reduce licensing costs, increase portability, and leverage PostgreSQL’s extensibility. This guide covers the tools to use, essential scripts for schema and data conversion, and performance tuning steps to ensure a smooth migration and production-ready PostgreSQL deployment.

1. Migration tools — when to use them

  • pgloader — Best for straightforward bulk migrations. Handles schema creation, data copy, and basic type mapping with good speed. Use when you can tolerate some manual fixes after automated conversion.
  • AWS SCT (Schema Conversion Tool) — Useful if migrating into AWS-managed RDS/Aurora PostgreSQL; converts schema and offers assessment reports. Requires AWS environment for full features.
  • ora2pg — Although designed for Oracle, it can help with complex migrations via an extensible rule set; less common for MSSQL.
  • SQL Server Integration Services (SSIS) — Use for complex ETL workflows, incremental loads, and transformations when staying in Microsoft tooling.
  • Custom scripts (Python, Go, Node) — Required for complex transformations, stored procedure translation, or bespoke data cleaning.

2. Schema conversion: common differences and mapping

Data type mapping (common)

MSSQL PostgreSQL Notes
INT, BIGINT INTEGER, BIGINT Direct mapping
VARCHAR(n) VARCHAR(n) Same; consider TEXT for unconstrained lengths
NVARCHAR(n) VARCHAR(n) or TEXT PostgreSQL uses UTF-8 by default; no separate NVARCHAR
DATETIME, SMALLDATETIME TIMESTAMP WITHOUT TIME ZONE Consider TIMESTAMP WITH TIME ZONE if storing UTC
DATETIME2 TIMESTAMP Higher precision in both
BIT BOOLEAN Map 0/1 to false/true
MONEY, SMALLMONEY NUMERIC(19,4) Prefer NUMERIC for exactness
UNIQUEIDENTIFIER UUID Use uuid type and gen_randomuuid() for generation (pgcrypto)
IMAGE, VARBINARY BYTEA Use BYTEA for binary data

Constraints, indexes, and sequences

  • MSSQL IDENTITY columns -> PostgreSQL sequences with SERIAL or IDENTITY. Prefer GENERATED BY DEFAULT AS IDENTITY for modern PostgreSQL.
  • Primary/foreign keys and unique constraints map directly.
  • Filtered indexes in MSSQL require partial indexes in PostgreSQL (CREATE INDEX … WHERE …).
  • INCLUDE columns in MSSQL nonclustered indexes can be emulated by covering indexes — place columns in the index expression or accept planner differences.

Collation and case sensitivity

  • PostgreSQL collations are set per column or database; add citext extension for case-insensitive text.
  • Consider migrating to lowercased values or using functional indexes (LOWER(column)).

3. Translating T-SQL to PL/pgSQL

  • Stored procedures and functions must be rewritten: T-SQL control flow, TRY/CATCH, and error handling differ.
  • Replace functions like ISNULL(a,b) with COALESCE(a,b).
  • String functions: REPLACE, SUBSTRING, CHARINDEX -> REPLACE, SUBSTRING, POSITION.
  • Temporary tables: MSSQL uses #temp; PostgreSQL uses unlogged tables or temporary tables with CREATE TEMP TABLE.
  • Transactions: PostgreSQL uses explicit BEGIN/COMMIT; notice that some nested transaction patterns require SAVEPOINT/ROLLBACK TO SAVEPOINT.

Example: simple stored procedure conversion

MSSQL (T-SQL)

sql

CREATE PROCEDURE dbo.IncrementCounter @id INT AS BEGIN UPDATE counters SET value = value + 1 WHERE id = @id; SELECT value FROM counters WHERE id = @id; END

PostgreSQL (PL/pgSQL)

sql

CREATE OR REPLACE FUNCTION increment_counter(p_id INTEGER) RETURNS INTEGER AS $\( </span><span></span><span class="token" style="color: rgb(0, 0, 255);">DECLARE</span><span> </span><span> v_value </span><span class="token" style="color: rgb(0, 0, 255);">INTEGER</span><span class="token" style="color: rgb(57, 58, 52);">;</span><span> </span><span></span><span class="token" style="color: rgb(0, 0, 255);">BEGIN</span><span> </span><span> </span><span class="token" style="color: rgb(0, 0, 255);">UPDATE</span><span> counters </span><span class="token" style="color: rgb(0, 0, 255);">SET</span><span> </span><span class="token" style="color: rgb(0, 0, 255);">value</span><span> </span><span class="token" style="color: rgb(57, 58, 52);">=</span><span> </span><span class="token" style="color: rgb(0, 0, 255);">value</span><span> </span><span class="token" style="color: rgb(57, 58, 52);">+</span><span> </span><span class="token" style="color: rgb(54, 172, 170);">1</span><span> </span><span class="token" style="color: rgb(0, 0, 255);">WHERE</span><span> id </span><span class="token" style="color: rgb(57, 58, 52);">=</span><span> p_id</span><span class="token" style="color: rgb(57, 58, 52);">;</span><span> </span><span> </span><span class="token" style="color: rgb(0, 0, 255);">SELECT</span><span> </span><span class="token" style="color: rgb(0, 0, 255);">value</span><span> </span><span class="token" style="color: rgb(0, 0, 255);">INTO</span><span> v_value </span><span class="token" style="color: rgb(0, 0, 255);">FROM</span><span> counters </span><span class="token" style="color: rgb(0, 0, 255);">WHERE</span><span> id </span><span class="token" style="color: rgb(57, 58, 52);">=</span><span> p_id</span><span class="token" style="color: rgb(57, 58, 52);">;</span><span> </span><span> </span><span class="token" style="color: rgb(0, 0, 255);">RETURN</span><span> v_value</span><span class="token" style="color: rgb(57, 58, 52);">;</span><span> </span><span></span><span class="token" style="color: rgb(0, 0, 255);">END</span><span class="token" style="color: rgb(57, 58, 52);">;</span><span> </span><span>\)$ LANGUAGE plpgsql;

4. Data migration scripts and patterns

  • Use pgloader for fast bulk load:
    • Create target schema (with manual adjustments).
    • Run pgloader with a connection string and mapping rules to transform types.
  • For complex ETL, use Python with psycopg2 and pyodbc:
    • Stream rows in batches (e.g., 10k) to avoid memory spikes.
    • Use COPY FROM STDIN for bulk inserts into PostgreSQL.
  • Preserve transactionality: for large tables, migrate in consistent batches and use application-level quiesce or snapshot isolation where possible.
  • Validate row counts, checksums, and key distributions after migration.

Example Python pattern (simplified)

python

# pseudocode src_cursor.execute(“SELECT id, col1, col2 FROM table”) while rows := src_cursor.fetchmany(10000): transformed = [transform_row(r) for r in rows] pg_cursor.copy_from(io.StringIO(format_for_copy(transformed)), ‘table’, sep=’ ‘) pg_conn.commit()

5. Handling identity, sequences, and foreign keys

  • After loading data, sync sequences:
    • SELECT setval(pg_get_serial_sequence(‘table’,‘id’), MAX(id)) FROM table;
  • Temporarily disable foreign key checks by creating tables without constraints, load data, then add constraints with validation (using NOT VALID and VALIDATE CONSTRAINT later) to speed loading.

6. Performance tuning after migration

PostgreSQL configuration highlights

Setting Recommendation Notes
shared_buffers 25% of RAM For dedicated DB servers
effective_cache_size 50-75% of RAM Helps planner estimate available cache
work_mem 16MB–256MB per connection Tune for complex sorts/joins; increase for OLAP
maintenance_work_mem 512MB–2GB For CREATE INDEX and VACUUM operations
max_wal_size 1–4GB (or higher) Reduce checkpoint frequency by increasing
wal_level replica If using replication; otherwise minimal
synchronous_commit on (or off for async needs) Off can improve write performance at durability cost

Schema and query tuning

  • Use EXPLAIN (ANALYZE, BUFFERS) to profile slow queries and adapt indexes.
  • Replace scalar subqueries with JOINs where appropriate.
  • Use BRIN indexes for very large append-only tables.
  • Leverage partial and expression indexes for selective filters.
  • Normalize vs denormalize decisions: PostgreSQL handles joins well but consider materialized views for heavy aggregations.
  • VACUUM and ANALYZE: run VACUUM FULL sparingly; use autovacuum tuning to prevent bloat.

Concurrency and connection pooling

  • Use a connection pooler (pgbouncer in transaction mode) to avoid too many active connections.
  • Tune max_connections considering RAM and work_mem.

Index maintenance

  • Rebuild bloated indexes with REINDEX or CREATE INDEX CONCURRENTLY to avoid downtime.
  • Use pg_repack for online table reorganization.

7. Testing, validation, and cutover strategy

  • Staging run: perform a full dry-run migration to a staging cluster; validate schema, query plans, and application behavior.
  • Performance baselines: capture query latencies and throughput in MSSQL and compare in PostgreSQL.
  • Data validation: row counts, checksums (e.g., md5 concatenated columns), spot-check business-critical queries.
  • Cutover options:
    • Big-bang: short downtime, full final sync and switch.
    • Phased: replicate changes (logical replication or triggers) and switch when ready.
  • Rollback plan: keep MSSQL read-only fallback for a defined period after cutover.

8. Common pitfalls and fixes

  • Unexpected type mismatches: proactively map types and run automated checks.
  • Collation/case-sensitivity differences: use citext or functional indexes.
  • Transaction semantics differences: test stored proc and transaction behavior under load.
  • Sequence mismatches causing unique violations: set sequences after load.
  • Missing indexes leading to slow queries: run EXPLAIN and re-add appropriate indexes.

9. Checklist (pre-migration to post-cutover)

  1. Inventory schemas, procedures, and ETL jobs.
  2. Map data types and collations.
  3. Convert stored procedures and functions.
  4. Choose migration tool(s) and test on staging.
  5. Migrate schema, then data in batches; sync sequences.
  6. Validate data integrity and query correctness.
  7. Tune PostgreSQL settings and rebuild indexes.
  8. Execute cutover, monitor performance, and validate.
  9. Post-cutover: enable autovacuum tuning, backups, monitoring, and set maintenance routines.

10. Resources and commands (quick reference)

sql

COPY mytable (col1, col2) FROM STDIN WITH (FORMAT csv);
  • Set sequence:

sql

SELECT setval(pg_get_serialsequence(‘mytable’,‘id’), (SELECT MAX(id) FROM mytable));
  • Analyze slow query:

sql

EXPLAIN (ANALYZE, BUFFERS) SELECT ...;

Follow these steps to move from MSSQL to PostgreSQL with minimal disruption, keeping a strong emphasis on testing, validation, and iterative performance tuning.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *