How I Localized the dvdrental Sample Database into Japanese: Using SQL and CSV Together to Create Admin Screen Data
To make the DVD rental admin app comfortable to use in Japanese, I gradually localized the data displayed on screen rather than using the PostgreSQL sample DB dvdrental as-is.
What I wanted wasn’t a simple bulk replacement. I replaced actor names, customer names, staff names, addresses, city names, and film descriptions shown on the admin screen step by step to match the Japanese UI.
Additionally, since the amounts in dvdrental are sample data based on dollars, adjusting for a Japanese UI meant the currency needed to be converted to yen. Converting the amount columns from dollars to yen—not just translating strings—was also an important point in this work.
Therefore, I took a two-stage approach: using SQL CASE expressions for master data with few records and fixed translations, and exporting large volumes of people’s names, addresses, and film descriptions to CSV for translation, then importing back with \copy.
This article is for those looking for how to localize the PostgreSQL sample DB dvdrental into Japanese, how to translate PostgreSQL data using SQL and CSV, and how to replace dvdrental data with Japanese data for admin screens.
The PostgreSQL sample DB used as a base here can be found at:
What You’ll Learn
- Which columns were targeted for translation when localizing
dvdrental - How to translate fixed master data using only SQL
- How to export data requiring human translation to CSV and process it
- How to change
dvdrentalamounts from dollar-based to yen - How to safely apply changes using
\copyand temporary tables - How the localized data was reflected into the Docker full dataset
First Decision
The first decision was the policy: “translate only natural language shown on screen.”
The primary targets were:
- Actor names
- Customer names
- Staff names
- Addresses
- District names
- City names
- Country names
- Category names
- Language names
- Film titles
- Film descriptions
Deciding this up front makes it easier to prioritize and localize the natural language you want to show on screen.
Additionally, amount columns were treated as localization targets separate from strings. film.rental_rate, film.replacement_cost, and payment.amount are dollar-based sample data, so rather than just displaying 円 next to them, the policy was to convert to yen values before displaying on screen.
The Policy: Not Completing Everything with Just SQL
In practice, there were two types of translation targets:
- Those with few records where translations are mostly fixed
- Those with many records where you want to translate while checking context
The former is faster to write directly in SQL, while the latter is overwhelmingly easier to handle by exporting to CSV for translation.
So in practice, I divided it like this:
- Category names, language names, and country names: updated with SQL
CASEexpressions - Actor names, customer names, addresses, city names, district names, film descriptions: exported to CSV for translation
- Staff names had few records, so instead of separate CSV management, they were included as translation targets in
sql/ja_localization_patch.sql
The overall flow:
flowchart TD
A[Select translation target columns in dvdrental] --> B[Localize fixed master data with SQL]
A --> C[Export large data to CSV]
C --> D[Fill in Japanese in CSV]
D --> E[\copy to temporary table]
B --> F[Apply to local DB]
E --> F
F --> G[Verify on screen]
G --> H[Update full dataset with pg_dump]
H --> I[Reuse for Docker and AWS initialization]
First: Translate Fixed Master Data with SQL
For things like category names and language names with few records and stable translations, I used sql/ja_localization_stage1.sql.
It’s simple—just replacing English values with Japanese using CASE expressions.
update public.category
set name = case name
when 'Action' then 'アクション'
when 'Animation' then 'アニメーション'
when 'Children' then '子ども向け'
when 'Comedy' then 'コメディ'
else name
end;
Language names were handled the same way, replacing English with 英語, Japanese with 日本語, etc.
This stage is closer to “defining names” than “translation,” so managing it in a SQL file rather than escaping to CSV is clearer.
Country Names: Separated into a Different SQL File
Country names have many records and the CASE statement gets long, so I separated them into sql/ja_localization_stage2_country.sql.
update public.country
set country = case country
when 'Japan' then '日本'
when 'United States' then 'アメリカ合衆国'
when 'France' then 'フランス'
else country
end;
Separating categories/languages from country names into different files gives better visibility and makes later reviews easier than cramming everything into one file.
High-Volume Data: Export to CSV for Translation
For data with many records like actor names, customer names, staff names, addresses, city names, and film descriptions, I didn’t write translations directly inside SQL.
The flow here was:
- Export target data with
copy (...) to stdout with csv header - Fill in Japanese in the empty translation columns of the CSV
- Save as UTF-8 CSV
- Import to a temporary table with
\copy - Update only rows with the translated column filled in
Actor Names, Customer Names, Staff Names: Handle with IDs
For example, actor names are exported to CSV via sql/export_actor_name_translation_targets.sql, outputting actor_id and English names.
copy (
select a.actor_id,
a.first_name as source_first_name,
a.last_name as source_last_name,
'' as translated_first_name,
'' as translated_last_name
from public.actor a
order by a.actor_id
) to stdout with csv header;
With this setup, the translation work only requires filling in translated_first_name and translated_last_name.
When applying, sql/apply_actor_name_translation_from_csv.sql creates a temporary table and updates using actor_id as the key.
\copy tmp_actor_name_translation (actor_id, source_first_name, source_last_name, translated_first_name, translated_last_name) from 'c:/Users/y_104/git/dvd-rental-admin/sql/actor_name_translation_work_utf8.csv' with (format csv, header true, encoding 'UTF8');
update public.actor a
set first_name = case
when t.translated_first_name is not null and btrim(t.translated_first_name) <> '' then t.translated_first_name
else a.first_name
end,
last_name = case
when t.translated_last_name is not null and btrim(t.translated_last_name) <> '' then t.translated_last_name
else a.last_name
end
from tmp_actor_name_translation t
where a.actor_id = t.actor_id;
Columns left empty are not updated, so even a partially translated CSV can be applied safely.
Note that staff.first_name and staff.last_name follow the same approach. Since there were only 2 staff members, separate CSV files weren’t created like for actor/customer names, but they’re included as update targets in sql/ja_localization_patch.sql.
Addresses and City Names: Translate with Context
Since translating address strings alone can be dangerous, sql/export_address_translation_targets.sql and sql/export_city_translation_targets.sql include context columns.
The address CSV included:
address_idsource_addresssource_address2source_districtsource_citysource_country
copy (
select a.address_id,
a.address as source_address,
coalesce(a.address2, '') as source_address2,
a.district as source_district,
ci.city as source_city,
co.country as source_country,
'' as translated_address,
'' as translated_address2,
'' as translated_district
from public.address a
join public.city ci on ci.city_id = a.city_id
join public.country co on co.country_id = ci.country_id
order by a.address_id
) to stdout with csv header;
City names are also exported with country names. This makes it easier to handle cities with duplicate names or similar notations.
District names were handled with deduplicated translations rather than per-row in the address table, using source_district as the primary key for updates.
Film Descriptions: Split CSV for Translation
The highest volume is film.description. Working with one file was unwieldy, so the CSV was split.
In the sql directory, you’ll find split files like:
film_description_translation_chunk1_utf8.csvfilm_description_translation_chunk2_utf8.csvfilm_description_translation_chunk3_utf8.csv...film_description_translation_chunk8_utf8.csv
Translation was done in chunks, then consolidated into film_description_translation_work_utf8.csv for final application.
The application side is very simple—just inserting film_id and translated descriptions into a temporary table and updating.
\copy tmp_film_description_translation (film_id, translated_description) from 'c:/Users/y_104/git/dvd-rental-admin/sql/film_description_translation_work_utf8.csv' with (format csv, header true, encoding 'UTF8');
update public.film f
set description = t.translated_description
from tmp_film_description_translation t
where f.film_id = t.film_id
and t.translated_description is not null
and btrim(t.translated_description) <> '';
Free-form text inevitably requires human review, so splitting into chunks from the start was the right call.
Amounts: Not Left as Dollar-Based
In this localization, handling currency was just as important as translating strings.
dvdrental is originally a dollar-based sample DB, so film.rental_rate, film.replacement_cost, payment.amount, etc. have dollar-based values. Just adding 円 next to 4.99 or 19.99 on the admin screen looks unnatural.
Therefore, this repository uses src/main/resources/db/migration/postgresql/V4__convert_currency_to_jpy.sql to convert amount columns to yen.
alter table film
alter column rental_rate type numeric(10,0) using ceil(rental_rate * 150),
alter column replacement_cost type numeric(10,0) using ceil(replacement_cost * 150),
alter column rental_rate set default 749,
alter column replacement_cost set default 2999;
alter table payment
alter column amount type numeric(10,0) using ceil(amount * 150);
Converting at 1 dollar = 150 yen with ceiling makes it possible to use the sample DB’s price feel naturally on a Japan-oriented admin screen. Adding the dollar-to-yen conversion alongside string localization makes the appearance of list screens, detail screens, and aggregate screens much more natural.
A Batch Localization Patch Was Also Prepared
When I wanted to check translation targets all together, I prepared sql/ja_localization_patch.sql.
This SQL creates a temporary table tmp_ja_translation_map to extract all English strings targeted for translation.
create temporary table tmp_ja_translation_map (
table_name text not null,
column_name text not null,
source_text text not null,
translated_text text,
primary key (table_name, column_name, source_text)
);
It then updates categories and languages with initial translations and finally reflects them to the base tables.
This file served more as a foundation for reviewing “what’s targeted for translation” comprehensively, rather than as the center of final operations.
Precautions When Applying
During this localization, I was more conscious of “being able to revert without breaking” than of the translations themselves.
So I aligned these points:
- Don’t write directly to the original tables; go through temporary tables
- Don’t update if the translation column is empty
- Standardize CSVs to UTF-8
- For things with IDs, update by ID
- For things needing context like addresses and cities, include helper columns
With this approach, updates are less likely to break even mid-translation, and redoing is easy.
Finally: Made the Localized DB into a Full Dataset
The localized data wasn’t just left in the local DB—it was ultimately reflected into docker/postgres/init/01-dvdrental-full.sql.
In this repository, Docker loads this full dataset SQL at startup, allowing the localized state to be used directly as the app’s initial data.
The flow:
- Organize translation targets based on
dvdrental - Localize using SQL and CSV
- Apply to local DB and verify
- Recreate the full dataset SQL with
pg_dump - Reuse the same data for Docker and AWS initialization
With this setup, localization doesn’t end as a one-off process—it can be reused for both the development environment and distributed data.
Summary
Localizing dvdrental wasn’t about forcing everything through one SQL file.
In practice, I divided it like this:
- Categories, languages, country names: translated with SQL
CASEexpressions - Actor names, customer names, staff names, addresses, city names, district names, film descriptions: organized as translation targets
- Amount columns: converted from dollar-based values to yen
- When applying: safely updated using temporary tables and
\copy - Finally: reused the localized DB as a full dataset
For those wanting to localize the PostgreSQL sample DB dvdrental into Japanese for an admin screen app, the approach of “fixed master data in SQL, high-volume data requiring human work in CSV” was very manageable.
For how the app itself was built, see this article:
For the configuration and approach when deploying to AWS, see this article: