Tech Blog

Struggling with PostgreSQL Encoding Problems on Windows + Docker

by Tech Writer
PostgreSQL Docker Windows Encoding

Introduction

When running PostgreSQL with Docker on Windows, you can sometimes have trouble handling Japanese data.

“Garbled characters,” “encoding errors,” “Illegal multibyte sequence”

I encountered these errors and spent several hours before resolving them.
This article is a record of those pain points and their solutions.


Problems Encountered

Problem 1: Garbled Characters When Importing CSV

During the Japanese localization work for dvdrental, I created a CSV with movie titles converted to Japanese and tried to import it into PostgreSQL.

COPY film(title) FROM '/tmp/titles_ja.csv' WITH (FORMAT CSV, HEADER TRUE, ENCODING 'UTF8');

Error:

ERROR:  invalid byte sequence for encoding "UTF8": 0x83

The cause was that the CSV saved by Excel on Windows was in “Shift-JIS (CP932).”

Problem 2: Question Marks in SQL Execution from PowerShell

$sql = "UPDATE film SET title = '進撃の巨人' WHERE film_id = 1;"
docker exec -i postgres psql -U postgres -d dvdrental -c $sql

Result: title becomes ???

PowerShell’s default encoding is not UTF-8, and encoding conversion occurs when passing through the pipe to PostgreSQL.


Solutions

Solution 1: Save CSV as UTF-8 (without BOM)

Even when saving from Excel as “CSV UTF-8,” on Windows it may become BOM-attached UTF-8 (UTF-8 with BOM).
PostgreSQL’s COPY command may not handle BOM correctly.

Method A: Open the CSV in VS Code and save as “UTF-8” (without BOM)

  • File → Save As → Select encoding “UTF-8”

Method B: Convert with PowerShell

# Convert from BOM-attached to BOM-free
$content = Get-Content -Path "titles_ja_bom.csv" -Encoding UTF8
$content | Set-Content -Path "titles_ja_nobom.csv" -Encoding UTF8NoBOM

Solution 2: Copy to the PostgreSQL Container and Import

# Copy file to container
docker cp .\sql\titles_ja.csv postgres:/tmp/titles_ja.csv

# Execute COPY inside the container (less prone to encoding issues)
docker exec postgres psql -U postgres -d dvdrental -c `
    "COPY film(title) FROM '/tmp/titles_ja.csv' WITH (FORMAT CSV, HEADER TRUE, ENCODING 'UTF8')"

Since it doesn’t go through the local filesystem, there’s no OS encoding conversion.

Solution 3: PowerShell Encoding Settings

# Add to the top of the script
[Console]::OutputEncoding = [System.Text.Encoding]::UTF8
$PSDefaultParameterValues['Out-File:Encoding'] = 'utf8NoBOM'

# Or set as environment variable
$env:PYTHONUTF8 = "1"  # When using Python scripts

PowerShell 7.x (pwsh) defaults to UTF-8, but Windows PowerShell 5.1 defaults to Shift-JIS.
How to check:

[System.Text.Encoding]::Default

If Shift-JIS or Code page 932 is displayed, settings need to be changed.

Solution 4: Explicitly Set Locale in Docker Compose

# compose.yml
services:
  postgres:
    image: postgres:16
    environment:
      POSTGRES_DB: dvdrental
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      LANG: ja_JP.UTF-8
      LC_ALL: ja_JP.UTF-8
    volumes:
      - pgdata:/var/lib/postgresql/data

The default locale when creating a PostgreSQL database will be UTF-8.


Checking PostgreSQL Encoding

-- Check encoding of the connected DB
SELECT pg_encoding_to_char(encoding), datcollate FROM pg_database WHERE datname = current_database();

-- Check server-wide encoding
SHOW server_encoding;
SHOW client_encoding;

Confirm that server_encoding is UTF8.


How to Check CSV File Encoding

# Check the first bytes of a file with PowerShell
$bytes = [System.IO.File]::ReadAllBytes(".\titles_ja.csv") | Select-Object -First 4
$bytes | ForEach-Object { "0x{0:X2}" -f $_ }

# BOM-attached UTF-8: 0xEF 0xBB 0xBF ...
# BOM-free UTF-8: Data bytes come directly at the start

Fixing Garbled Data

For fixing data that has already been garbled:

-- Check garbled data
SELECT film_id, title FROM film WHERE title ~ '[^\x00-\x7F]' IS FALSE;

-- Overwrite with correct Japanese data (re-import from CSV)
TRUNCATE TABLE film_title_ja;  -- If using a working table

-- Or fix directly with UPDATE
UPDATE film SET title = 'Correct Japanese Title' WHERE film_id = 1;

Summary

ProblemCauseSolution
CSV import failsShift-JIS or UTF-8 with BOMRe-save as UTF-8 (without BOM)
PowerShell garbled charactersConsole encoding is CP932[Console]::OutputEncoding = UTF8
Can’t pass directly to containerPowerShell pipe converts encodingUse docker cp to copy file first
DB default is not UTF-8LANG not set in Docker ComposeExplicitly set LANG: ja_JP.UTF-8

In Windows development environments, there are many situations where the assumption “UTF-8 is obvious” doesn’t hold.
Developing the habit of checking encoding every time you handle CSV files can dramatically reduce wasted time.


Article Map for This Series

Building a DVD Rental End-User App Alongside the Admin Dashboard — Vue 3 + Spring Boot Architecture Overview

Feel free to send a message

Please send a message if you have any technical questions, feedback, or inquiries.