Tech Blog

How I Localized the dvdrental Sample Database into Japanese: Using SQL and CSV Together to Create Admin Screen Data

by y104
PostgreSQL SQL Localization

To make the DVD rental admin app comfortable to use in Japanese, I gradually localized the data displayed on screen rather than using the PostgreSQL sample DB dvdrental as-is.

What I wanted wasn’t a simple bulk replacement. I replaced actor names, customer names, staff names, addresses, city names, and film descriptions shown on the admin screen step by step to match the Japanese UI.

Additionally, since the amounts in dvdrental are sample data based on dollars, adjusting for a Japanese UI meant the currency needed to be converted to yen. Converting the amount columns from dollars to yen—not just translating strings—was also an important point in this work.

Therefore, I took a two-stage approach: using SQL CASE expressions for master data with few records and fixed translations, and exporting large volumes of people’s names, addresses, and film descriptions to CSV for translation, then importing back with \copy.

This article is for those looking for how to localize the PostgreSQL sample DB dvdrental into Japanese, how to translate PostgreSQL data using SQL and CSV, and how to replace dvdrental data with Japanese data for admin screens.

The PostgreSQL sample DB used as a base here can be found at:

PostgreSQL Sample Database

What You’ll Learn

  • Which columns were targeted for translation when localizing dvdrental
  • How to translate fixed master data using only SQL
  • How to export data requiring human translation to CSV and process it
  • How to change dvdrental amounts from dollar-based to yen
  • How to safely apply changes using \copy and temporary tables
  • How the localized data was reflected into the Docker full dataset

First Decision

The first decision was the policy: “translate only natural language shown on screen.”

The primary targets were:

  • Actor names
  • Customer names
  • Staff names
  • Addresses
  • District names
  • City names
  • Country names
  • Category names
  • Language names
  • Film titles
  • Film descriptions

Deciding this up front makes it easier to prioritize and localize the natural language you want to show on screen.

Additionally, amount columns were treated as localization targets separate from strings. film.rental_rate, film.replacement_cost, and payment.amount are dollar-based sample data, so rather than just displaying next to them, the policy was to convert to yen values before displaying on screen.

The Policy: Not Completing Everything with Just SQL

In practice, there were two types of translation targets:

  1. Those with few records where translations are mostly fixed
  2. Those with many records where you want to translate while checking context

The former is faster to write directly in SQL, while the latter is overwhelmingly easier to handle by exporting to CSV for translation.

So in practice, I divided it like this:

  • Category names, language names, and country names: updated with SQL CASE expressions
  • Actor names, customer names, addresses, city names, district names, film descriptions: exported to CSV for translation
  • Staff names had few records, so instead of separate CSV management, they were included as translation targets in sql/ja_localization_patch.sql

The overall flow:

flowchart TD
    A[Select translation target columns in dvdrental] --> B[Localize fixed master data with SQL]
    A --> C[Export large data to CSV]
    C --> D[Fill in Japanese in CSV]
    D --> E[\copy to temporary table]
    B --> F[Apply to local DB]
    E --> F
    F --> G[Verify on screen]
    G --> H[Update full dataset with pg_dump]
    H --> I[Reuse for Docker and AWS initialization]

First: Translate Fixed Master Data with SQL

For things like category names and language names with few records and stable translations, I used sql/ja_localization_stage1.sql.

It’s simple—just replacing English values with Japanese using CASE expressions.

update public.category
set name = case name
    when 'Action' then 'アクション'
    when 'Animation' then 'アニメーション'
    when 'Children' then '子ども向け'
    when 'Comedy' then 'コメディ'
    else name
end;

Language names were handled the same way, replacing English with 英語, Japanese with 日本語, etc.

This stage is closer to “defining names” than “translation,” so managing it in a SQL file rather than escaping to CSV is clearer.

Country Names: Separated into a Different SQL File

Country names have many records and the CASE statement gets long, so I separated them into sql/ja_localization_stage2_country.sql.

update public.country
set country = case country
    when 'Japan' then '日本'
    when 'United States' then 'アメリカ合衆国'
    when 'France' then 'フランス'
    else country
end;

Separating categories/languages from country names into different files gives better visibility and makes later reviews easier than cramming everything into one file.

High-Volume Data: Export to CSV for Translation

For data with many records like actor names, customer names, staff names, addresses, city names, and film descriptions, I didn’t write translations directly inside SQL.

The flow here was:

  1. Export target data with copy (...) to stdout with csv header
  2. Fill in Japanese in the empty translation columns of the CSV
  3. Save as UTF-8 CSV
  4. Import to a temporary table with \copy
  5. Update only rows with the translated column filled in

Actor Names, Customer Names, Staff Names: Handle with IDs

For example, actor names are exported to CSV via sql/export_actor_name_translation_targets.sql, outputting actor_id and English names.

copy (
    select a.actor_id,
           a.first_name as source_first_name,
           a.last_name as source_last_name,
           '' as translated_first_name,
           '' as translated_last_name
    from public.actor a
    order by a.actor_id
) to stdout with csv header;

With this setup, the translation work only requires filling in translated_first_name and translated_last_name.

When applying, sql/apply_actor_name_translation_from_csv.sql creates a temporary table and updates using actor_id as the key.

\copy tmp_actor_name_translation (actor_id, source_first_name, source_last_name, translated_first_name, translated_last_name) from 'c:/Users/y_104/git/dvd-rental-admin/sql/actor_name_translation_work_utf8.csv' with (format csv, header true, encoding 'UTF8');

update public.actor a
set first_name = case
                     when t.translated_first_name is not null and btrim(t.translated_first_name) <> '' then t.translated_first_name
                     else a.first_name
                 end,
    last_name = case
                    when t.translated_last_name is not null and btrim(t.translated_last_name) <> '' then t.translated_last_name
                    else a.last_name
                end
from tmp_actor_name_translation t
where a.actor_id = t.actor_id;

Columns left empty are not updated, so even a partially translated CSV can be applied safely.

Note that staff.first_name and staff.last_name follow the same approach. Since there were only 2 staff members, separate CSV files weren’t created like for actor/customer names, but they’re included as update targets in sql/ja_localization_patch.sql.

Addresses and City Names: Translate with Context

Since translating address strings alone can be dangerous, sql/export_address_translation_targets.sql and sql/export_city_translation_targets.sql include context columns.

The address CSV included:

  • address_id
  • source_address
  • source_address2
  • source_district
  • source_city
  • source_country
copy (
    select a.address_id,
           a.address as source_address,
           coalesce(a.address2, '') as source_address2,
           a.district as source_district,
           ci.city as source_city,
           co.country as source_country,
           '' as translated_address,
           '' as translated_address2,
           '' as translated_district
    from public.address a
    join public.city ci on ci.city_id = a.city_id
    join public.country co on co.country_id = ci.country_id
    order by a.address_id
) to stdout with csv header;

City names are also exported with country names. This makes it easier to handle cities with duplicate names or similar notations.

District names were handled with deduplicated translations rather than per-row in the address table, using source_district as the primary key for updates.

Film Descriptions: Split CSV for Translation

The highest volume is film.description. Working with one file was unwieldy, so the CSV was split.

In the sql directory, you’ll find split files like:

  • film_description_translation_chunk1_utf8.csv
  • film_description_translation_chunk2_utf8.csv
  • film_description_translation_chunk3_utf8.csv
  • ...
  • film_description_translation_chunk8_utf8.csv

Translation was done in chunks, then consolidated into film_description_translation_work_utf8.csv for final application.

The application side is very simple—just inserting film_id and translated descriptions into a temporary table and updating.

\copy tmp_film_description_translation (film_id, translated_description) from 'c:/Users/y_104/git/dvd-rental-admin/sql/film_description_translation_work_utf8.csv' with (format csv, header true, encoding 'UTF8');

update public.film f
set description = t.translated_description
from tmp_film_description_translation t
where f.film_id = t.film_id
  and t.translated_description is not null
  and btrim(t.translated_description) <> '';

Free-form text inevitably requires human review, so splitting into chunks from the start was the right call.

Amounts: Not Left as Dollar-Based

In this localization, handling currency was just as important as translating strings.

dvdrental is originally a dollar-based sample DB, so film.rental_rate, film.replacement_cost, payment.amount, etc. have dollar-based values. Just adding next to 4.99 or 19.99 on the admin screen looks unnatural.

Therefore, this repository uses src/main/resources/db/migration/postgresql/V4__convert_currency_to_jpy.sql to convert amount columns to yen.

alter table film
    alter column rental_rate type numeric(10,0) using ceil(rental_rate * 150),
    alter column replacement_cost type numeric(10,0) using ceil(replacement_cost * 150),
    alter column rental_rate set default 749,
    alter column replacement_cost set default 2999;

alter table payment
    alter column amount type numeric(10,0) using ceil(amount * 150);

Converting at 1 dollar = 150 yen with ceiling makes it possible to use the sample DB’s price feel naturally on a Japan-oriented admin screen. Adding the dollar-to-yen conversion alongside string localization makes the appearance of list screens, detail screens, and aggregate screens much more natural.

A Batch Localization Patch Was Also Prepared

When I wanted to check translation targets all together, I prepared sql/ja_localization_patch.sql.

This SQL creates a temporary table tmp_ja_translation_map to extract all English strings targeted for translation.

create temporary table tmp_ja_translation_map (
    table_name text not null,
    column_name text not null,
    source_text text not null,
    translated_text text,
    primary key (table_name, column_name, source_text)
);

It then updates categories and languages with initial translations and finally reflects them to the base tables.

This file served more as a foundation for reviewing “what’s targeted for translation” comprehensively, rather than as the center of final operations.

Precautions When Applying

During this localization, I was more conscious of “being able to revert without breaking” than of the translations themselves.

So I aligned these points:

  • Don’t write directly to the original tables; go through temporary tables
  • Don’t update if the translation column is empty
  • Standardize CSVs to UTF-8
  • For things with IDs, update by ID
  • For things needing context like addresses and cities, include helper columns

With this approach, updates are less likely to break even mid-translation, and redoing is easy.

Finally: Made the Localized DB into a Full Dataset

The localized data wasn’t just left in the local DB—it was ultimately reflected into docker/postgres/init/01-dvdrental-full.sql.

In this repository, Docker loads this full dataset SQL at startup, allowing the localized state to be used directly as the app’s initial data.

The flow:

  1. Organize translation targets based on dvdrental
  2. Localize using SQL and CSV
  3. Apply to local DB and verify
  4. Recreate the full dataset SQL with pg_dump
  5. Reuse the same data for Docker and AWS initialization

With this setup, localization doesn’t end as a one-off process—it can be reused for both the development environment and distributed data.

Summary

Localizing dvdrental wasn’t about forcing everything through one SQL file.

In practice, I divided it like this:

  • Categories, languages, country names: translated with SQL CASE expressions
  • Actor names, customer names, staff names, addresses, city names, district names, film descriptions: organized as translation targets
  • Amount columns: converted from dollar-based values to yen
  • When applying: safely updated using temporary tables and \copy
  • Finally: reused the localized DB as a full dataset

For those wanting to localize the PostgreSQL sample DB dvdrental into Japanese for an admin screen app, the approach of “fixed master data in SQL, high-volume data requiring human work in CSV” was very manageable.

For how the app itself was built, see this article:

Building a DVD Rental Admin App with Spring Boot + Thymeleaf Based on the PostgreSQL Sample DB dvdrental

For the configuration and approach when deploying to AWS, see this article:

Configuration, Operations, and Security When Deploying a Spring Boot + Thymeleaf + PostgreSQL Admin App to AWS ECS/Fargate + RDS

Feel free to send a message

Please send a message if you have any technical questions, feedback, or inquiries.