Struggling with PostgreSQL Encoding Problems on Windows + Docker
Introduction
When running PostgreSQL with Docker on Windows, you can sometimes have trouble handling Japanese data.
“Garbled characters,” “encoding errors,” “Illegal multibyte sequence”
I encountered these errors and spent several hours before resolving them.
This article is a record of those pain points and their solutions.
Problems Encountered
Problem 1: Garbled Characters When Importing CSV
During the Japanese localization work for dvdrental, I created a CSV with movie titles converted to Japanese and tried to import it into PostgreSQL.
COPY film(title) FROM '/tmp/titles_ja.csv' WITH (FORMAT CSV, HEADER TRUE, ENCODING 'UTF8');
Error:
ERROR: invalid byte sequence for encoding "UTF8": 0x83
The cause was that the CSV saved by Excel on Windows was in “Shift-JIS (CP932).”
Problem 2: Question Marks in SQL Execution from PowerShell
$sql = "UPDATE film SET title = '進撃の巨人' WHERE film_id = 1;"
docker exec -i postgres psql -U postgres -d dvdrental -c $sql
Result: title becomes ???
PowerShell’s default encoding is not UTF-8, and encoding conversion occurs when passing through the pipe to PostgreSQL.
Solutions
Solution 1: Save CSV as UTF-8 (without BOM)
Even when saving from Excel as “CSV UTF-8,” on Windows it may become BOM-attached UTF-8 (UTF-8 with BOM).
PostgreSQL’s COPY command may not handle BOM correctly.
Method A: Open the CSV in VS Code and save as “UTF-8” (without BOM)
- File → Save As → Select encoding “UTF-8”
Method B: Convert with PowerShell
# Convert from BOM-attached to BOM-free
$content = Get-Content -Path "titles_ja_bom.csv" -Encoding UTF8
$content | Set-Content -Path "titles_ja_nobom.csv" -Encoding UTF8NoBOM
Solution 2: Copy to the PostgreSQL Container and Import
# Copy file to container
docker cp .\sql\titles_ja.csv postgres:/tmp/titles_ja.csv
# Execute COPY inside the container (less prone to encoding issues)
docker exec postgres psql -U postgres -d dvdrental -c `
"COPY film(title) FROM '/tmp/titles_ja.csv' WITH (FORMAT CSV, HEADER TRUE, ENCODING 'UTF8')"
Since it doesn’t go through the local filesystem, there’s no OS encoding conversion.
Solution 3: PowerShell Encoding Settings
# Add to the top of the script
[Console]::OutputEncoding = [System.Text.Encoding]::UTF8
$PSDefaultParameterValues['Out-File:Encoding'] = 'utf8NoBOM'
# Or set as environment variable
$env:PYTHONUTF8 = "1" # When using Python scripts
PowerShell 7.x (pwsh) defaults to UTF-8, but Windows PowerShell 5.1 defaults to Shift-JIS.
How to check:
[System.Text.Encoding]::Default
If Shift-JIS or Code page 932 is displayed, settings need to be changed.
Solution 4: Explicitly Set Locale in Docker Compose
# compose.yml
services:
postgres:
image: postgres:16
environment:
POSTGRES_DB: dvdrental
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
LANG: ja_JP.UTF-8
LC_ALL: ja_JP.UTF-8
volumes:
- pgdata:/var/lib/postgresql/data
The default locale when creating a PostgreSQL database will be UTF-8.
Checking PostgreSQL Encoding
-- Check encoding of the connected DB
SELECT pg_encoding_to_char(encoding), datcollate FROM pg_database WHERE datname = current_database();
-- Check server-wide encoding
SHOW server_encoding;
SHOW client_encoding;
Confirm that server_encoding is UTF8.
How to Check CSV File Encoding
# Check the first bytes of a file with PowerShell
$bytes = [System.IO.File]::ReadAllBytes(".\titles_ja.csv") | Select-Object -First 4
$bytes | ForEach-Object { "0x{0:X2}" -f $_ }
# BOM-attached UTF-8: 0xEF 0xBB 0xBF ...
# BOM-free UTF-8: Data bytes come directly at the start
Fixing Garbled Data
For fixing data that has already been garbled:
-- Check garbled data
SELECT film_id, title FROM film WHERE title ~ '[^\x00-\x7F]' IS FALSE;
-- Overwrite with correct Japanese data (re-import from CSV)
TRUNCATE TABLE film_title_ja; -- If using a working table
-- Or fix directly with UPDATE
UPDATE film SET title = 'Correct Japanese Title' WHERE film_id = 1;
Summary
| Problem | Cause | Solution |
|---|---|---|
| CSV import fails | Shift-JIS or UTF-8 with BOM | Re-save as UTF-8 (without BOM) |
| PowerShell garbled characters | Console encoding is CP932 | [Console]::OutputEncoding = UTF8 |
| Can’t pass directly to container | PowerShell pipe converts encoding | Use docker cp to copy file first |
| DB default is not UTF-8 | LANG not set in Docker Compose | Explicitly set LANG: ja_JP.UTF-8 |
In Windows development environments, there are many situations where the assumption “UTF-8 is obvious” doesn’t hold.
Developing the habit of checking encoding every time you handle CSV files can dramatically reduce wasted time.