Converting MSSQL Schemas and Queries for PostgreSQL Compatibility

MSSQL to PostgreSQL: Tools, Scripts, and Performance Tuning

Migrating from Microsoft SQL Server (MSSQL) to PostgreSQL can reduce licensing costs, increase portability, and leverage PostgreSQL’s extensibility. This guide covers the tools to use, essential scripts for schema and data conversion, and performance tuning steps to ensure a smooth migration and production-ready PostgreSQL deployment.

1. Migration tools — when to use them

pgloader — Best for straightforward bulk migrations. Handles schema creation, data copy, and basic type mapping with good speed. Use when you can tolerate some manual fixes after automated conversion.
AWS SCT (Schema Conversion Tool) — Useful if migrating into AWS-managed RDS/Aurora PostgreSQL; converts schema and offers assessment reports. Requires AWS environment for full features.
ora2pg — Although designed for Oracle, it can help with complex migrations via an extensible rule set; less common for MSSQL.
SQL Server Integration Services (SSIS) — Use for complex ETL workflows, incremental loads, and transformations when staying in Microsoft tooling.
Custom scripts (Python, Go, Node) — Required for complex transformations, stored procedure translation, or bespoke data cleaning.

2. Schema conversion: common differences and mapping

Data type mapping (common)

MSSQL	PostgreSQL	Notes
INT, BIGINT	INTEGER, BIGINT	Direct mapping
VARCHAR(n)	VARCHAR(n)	Same; consider TEXT for unconstrained lengths
NVARCHAR(n)	VARCHAR(n) or TEXT	PostgreSQL uses UTF-8 by default; no separate NVARCHAR
DATETIME, SMALLDATETIME	TIMESTAMP WITHOUT TIME ZONE	Consider TIMESTAMP WITH TIME ZONE if storing UTC
DATETIME2	TIMESTAMP	Higher precision in both
BIT	BOOLEAN	Map 0/1 to false/true
MONEY, SMALLMONEY	NUMERIC(19,4)	Prefer NUMERIC for exactness
UNIQUEIDENTIFIER	UUID	Use uuid type and gen_randomuuid() for generation (pgcrypto)
IMAGE, VARBINARY	BYTEA	Use BYTEA for binary data

Constraints, indexes, and sequences

MSSQL IDENTITY columns -> PostgreSQL sequences with SERIAL or IDENTITY. Prefer GENERATED BY DEFAULT AS IDENTITY for modern PostgreSQL.
Primary/foreign keys and unique constraints map directly.
Filtered indexes in MSSQL require partial indexes in PostgreSQL (CREATE INDEX … WHERE …).
INCLUDE columns in MSSQL nonclustered indexes can be emulated by covering indexes — place columns in the index expression or accept planner differences.

Collation and case sensitivity

PostgreSQL collations are set per column or database; add citext extension for case-insensitive text.
Consider migrating to lowercased values or using functional indexes (LOWER(column)).

3. Translating T-SQL to PL/pgSQL

Stored procedures and functions must be rewritten: T-SQL control flow, TRY/CATCH, and error handling differ.
Replace functions like ISNULL(a,b) with COALESCE(a,b).
String functions: REPLACE, SUBSTRING, CHARINDEX -> REPLACE, SUBSTRING, POSITION.
Temporary tables: MSSQL uses #temp; PostgreSQL uses unlogged tables or temporary tables with CREATE TEMP TABLE.
Transactions: PostgreSQL uses explicit BEGIN/COMMIT; notice that some nested transaction patterns require SAVEPOINT/ROLLBACK TO SAVEPOINT.

Example: simple stored procedure conversion

MSSQL (T-SQL)

sql
CREATE PROCEDURE dbo.IncrementCounter @id INT
AS
BEGIN
  UPDATE counters SET value = value + 1 WHERE id = @id;
  SELECT value FROM counters WHERE id = @id;
END

PostgreSQL (PL/pgSQL)

sql
CREATE OR REPLACE FUNCTION increment_counter(p_id INTEGER)
RETURNS INTEGER AS $\( </span><span></span><span class="token" style="color: rgb(0, 0, 255);">DECLARE</span><span> </span><span>  v_value </span><span class="token" style="color: rgb(0, 0, 255);">INTEGER</span><span class="token" style="color: rgb(57, 58, 52);">;</span><span> </span><span></span><span class="token" style="color: rgb(0, 0, 255);">BEGIN</span><span> </span><span>  </span><span class="token" style="color: rgb(0, 0, 255);">UPDATE</span><span> counters </span><span class="token" style="color: rgb(0, 0, 255);">SET</span><span> </span><span class="token" style="color: rgb(0, 0, 255);">value</span><span> </span><span class="token" style="color: rgb(57, 58, 52);">=</span><span> </span><span class="token" style="color: rgb(0, 0, 255);">value</span><span> </span><span class="token" style="color: rgb(57, 58, 52);">+</span><span> </span><span class="token" style="color: rgb(54, 172, 170);">1</span><span> </span><span class="token" style="color: rgb(0, 0, 255);">WHERE</span><span> id </span><span class="token" style="color: rgb(57, 58, 52);">=</span><span> p_id</span><span class="token" style="color: rgb(57, 58, 52);">;</span><span> </span><span>  </span><span class="token" style="color: rgb(0, 0, 255);">SELECT</span><span> </span><span class="token" style="color: rgb(0, 0, 255);">value</span><span> </span><span class="token" style="color: rgb(0, 0, 255);">INTO</span><span> v_value </span><span class="token" style="color: rgb(0, 0, 255);">FROM</span><span> counters </span><span class="token" style="color: rgb(0, 0, 255);">WHERE</span><span> id </span><span class="token" style="color: rgb(57, 58, 52);">=</span><span> p_id</span><span class="token" style="color: rgb(57, 58, 52);">;</span><span> </span><span>  </span><span class="token" style="color: rgb(0, 0, 255);">RETURN</span><span> v_value</span><span class="token" style="color: rgb(57, 58, 52);">;</span><span> </span><span></span><span class="token" style="color: rgb(0, 0, 255);">END</span><span class="token" style="color: rgb(57, 58, 52);">;</span><span> </span><span>\)$ LANGUAGE plpgsql;

4. Data migration scripts and patterns

Use pgloader for fast bulk load:
- Create target schema (with manual adjustments).
- Run pgloader with a connection string and mapping rules to transform types.
For complex ETL, use Python with psycopg2 and pyodbc:
- Stream rows in batches (e.g., 10k) to avoid memory spikes.
- Use COPY FROM STDIN for bulk inserts into PostgreSQL.
Preserve transactionality: for large tables, migrate in consistent batches and use application-level quiesce or snapshot isolation where possible.
Validate row counts, checksums, and key distributions after migration.

Example Python pattern (simplified)

python
# pseudocode
src_cursor.execute(“SELECT id, col1, col2 FROM table”)
while rows := src_cursor.fetchmany(10000):
    transformed = [transform_row(r) for r in rows]
    pg_cursor.copy_from(io.StringIO(format_for_copy(transformed)), ‘table’, sep=’	‘)
    pg_conn.commit()

5. Handling identity, sequences, and foreign keys

After loading data, sync sequences:
- SELECT setval(pg_get_serial_sequence(‘table’,‘id’), MAX(id)) FROM table;
Temporarily disable foreign key checks by creating tables without constraints, load data, then add constraints with validation (using NOT VALID and VALIDATE CONSTRAINT later) to speed loading.

6. Performance tuning after migration

PostgreSQL configuration highlights

Setting	Recommendation	Notes
shared_buffers	25% of RAM	For dedicated DB servers
effective_cache_size	50-75% of RAM	Helps planner estimate available cache
work_mem	16MB–256MB per connection	Tune for complex sorts/joins; increase for OLAP
maintenance_work_mem	512MB–2GB	For CREATE INDEX and VACUUM operations
max_wal_size	1–4GB (or higher)	Reduce checkpoint frequency by increasing
wal_level	replica	If using replication; otherwise minimal
synchronous_commit	on (or off for async needs)	Off can improve write performance at durability cost

Schema and query tuning

Use EXPLAIN (ANALYZE, BUFFERS) to profile slow queries and adapt indexes.
Replace scalar subqueries with JOINs where appropriate.
Use BRIN indexes for very large append-only tables.
Leverage partial and expression indexes for selective filters.
Normalize vs denormalize decisions: PostgreSQL handles joins well but consider materialized views for heavy aggregations.
VACUUM and ANALYZE: run VACUUM FULL sparingly; use autovacuum tuning to prevent bloat.

Concurrency and connection pooling

Use a connection pooler (pgbouncer in transaction mode) to avoid too many active connections.
Tune max_connections considering RAM and work_mem.

Index maintenance

Rebuild bloated indexes with REINDEX or CREATE INDEX CONCURRENTLY to avoid downtime.
Use pg_repack for online table reorganization.

7. Testing, validation, and cutover strategy

Staging run: perform a full dry-run migration to a staging cluster; validate schema, query plans, and application behavior.
Performance baselines: capture query latencies and throughput in MSSQL and compare in PostgreSQL.
Data validation: row counts, checksums (e.g., md5 concatenated columns), spot-check business-critical queries.
Cutover options:
- Big-bang: short downtime, full final sync and switch.
- Phased: replicate changes (logical replication or triggers) and switch when ready.
Rollback plan: keep MSSQL read-only fallback for a defined period after cutover.

8. Common pitfalls and fixes

Unexpected type mismatches: proactively map types and run automated checks.
Collation/case-sensitivity differences: use citext or functional indexes.
Transaction semantics differences: test stored proc and transaction behavior under load.
Sequence mismatches causing unique violations: set sequences after load.
Missing indexes leading to slow queries: run EXPLAIN and re-add appropriate indexes.

9. Checklist (pre-migration to post-cutover)

Inventory schemas, procedures, and ETL jobs.
Map data types and collations.
Convert stored procedures and functions.
Choose migration tool(s) and test on staging.
Migrate schema, then data in batches; sync sequences.
Validate data integrity and query correctness.
Tune PostgreSQL settings and rebuild indexes.
Execute cutover, monitor performance, and validate.
Post-cutover: enable autovacuum tuning, backups, monitoring, and set maintenance routines.

10. Resources and commands (quick reference)

pgloader: https://pgloader.io
psql COPY example:

sql
COPY mytable (col1, col2) FROM STDIN WITH (FORMAT csv);

Set sequence:

sql
SELECT setval(pg_get_serialsequence(‘mytable’,‘id’), (SELECT MAX(id) FROM mytable));

Analyze slow query:

sql
EXPLAIN (ANALYZE, BUFFERS) SELECT ...;

Follow these steps to move from MSSQL to PostgreSQL with minimal disruption, keeping a strong emphasis on testing, validation, and iterative performance tuning.