การรวม LLM Batch (Ollama/OpenAI) สำหรับ Auto-Generate Tags ภาพยนตร์เข้ากับ Spring Boot

บทนำ

ในแอป DVD rental มีข้อกำหนดในการเพิ่ม “mood tags” ให้กับภาพยนตร์

เหล่านี้คือ emotional tags ที่ละเอียดกว่าหมวดหมู่ อย่างเช่น “Action,” “Heartwarming,” “Family-friendly,” “Horror”
ด้วยภาพยนตร์ประมาณ 1,000 เรื่อง การ tag ด้วยมือไม่ใช่เรื่องปฏิบัติได้

ดังนั้นเราจึงสร้าง batch process ที่ส่งชื่อและคำอธิบายภาพยนตร์ไปยัง LLM เพื่อ auto-generate tags

สถาปัตยกรรมโดยรวม

Spring Boot Batch Process
  ↓ ดึงภาพยนตร์ที่ยังไม่ได้ tag จาก film table
  ↓ สร้าง prompt ด้วย title + description
  ↓ ส่งไปยัง LLM API (Ollama หรือ OpenAI)
  ↓ Extract tags จาก response
  ↓ บันทึกไปยัง column taste_tags ใน film table

การเลือก LLM Provider

สำหรับ local development เราใช้ Ollama (ฟรี, ปลอดภัยด้านความเป็นส่วนตัว) และสำหรับ production ใช้ OpenAI API

Provider	ข้อดี	ข้อเสีย
Ollama (llama3)	ฟรี, offline, ปลอดภัย	ความแม่นยำต่ำกว่า OpenAI, ช้ากว่า
OpenAI (GPT-4o-mini)	ความแม่นยำสูง, เร็ว	ค่า API, ต้องการ internet

การรวม LLM Client เข้ากับ Spring Boot

Dependencies

<!-- pom.xml -->
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
    <version>1.0.0-M6</version>
</dependency>

<!-- หรือ Ollama -->
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
    <version>1.0.0-M6</version>
</dependency>

Configuration

# application.yml (เมื่อใช้ Ollama)
spring:
  ai:
    ollama:
      base-url: http://localhost:11434
      chat:
        model: llama3

---
# application-prod.yml (เมื่อใช้ OpenAI)
spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-4o-mini

การ Implement Batch Processing

Tag Generation Service

@Service
@Slf4j
public class FilmTagGenerationService {
    
    private final ChatClient chatClient;
    private final FilmTagMapper filmTagMapper;
    
    public FilmTagGenerationService(ChatClient.Builder builder, FilmTagMapper mapper) {
        this.chatClient = builder.build();
        this.filmTagMapper = mapper;
    }
    
    public void generateTagsForAllFilms(int batchSize) {
        // ดึงภาพยนตร์ที่ยังไม่ได้ tag
        List<FilmForTagging> films = filmTagMapper.findFilmsWithoutTags(batchSize);
        log.info("เป้าหมายการสร้าง tag: {} เรื่อง", films.size());
        
        for (FilmForTagging film : films) {
            try {
                String[] tags = generateTags(film);
                filmTagMapper.updateTasteTags(film.getFilmId(), tags);
                log.info("Tags applied: {} → {}", film.getTitle(), Arrays.toString(tags));
                
                // มาตรการ rate limit ของ API
                Thread.sleep(500);
            } catch (Exception e) {
                log.error("ข้อผิดพลาดในการสร้าง tag: filmId={}", film.getFilmId(), e);
            }
        }
    }
    
    private String[] generateTags(FilmForTagging film) {
        String prompt = buildPrompt(film);
        
        String response = chatClient.prompt()
            .user(prompt)
            .call()
            .content();
        
        return parseTagsFromResponse(response);
    }
    
    private String buildPrompt(FilmForTagging film) {
        return String.format("""
            สร้าง tag สูงสุด 5 ตัวที่อธิบาย mood ของภาพยนตร์ต่อไปนี้
            ส่ง tags คั่นด้วยจุลภาค ไม่ต้องอธิบายเพิ่มเติม
            
            ชื่อ: %s
            คำอธิบาย: %s
            
            ตัวอย่างรูปแบบ output: Action,Thrilling,Family-friendly,Exhilarating,Battle
            """,
            film.getTitle(),
            film.getDescription()
        );
    }
    
    private String[] parseTagsFromResponse(String response) {
        return Arrays.stream(response.split("[,、]"))
            .map(String::trim)
            .filter(s -> !s.isEmpty())
            .limit(5)
            .toArray(String[]::new);
    }
}

วิธี Trigger Batch

Manual Execution Endpoint (เรียกจากหน้า Admin)

@RestController
@RequestMapping("/admin/api/batch")
@PreAuthorize("hasRole('ADMIN')")
public class BatchController {
    
    private final FilmTagGenerationService tagGenerationService;
    
    @PostMapping("/generate-tags")
    public ResponseEntity<Map<String, Object>> generateTags(
            @RequestParam(defaultValue = "50") int batchSize) {
        
        tagGenerationService.generateTagsForAllFilms(batchSize);
        return ResponseEntity.ok(Map.of(
            "message", "สร้าง tag เสร็จสิ้น",
            "processedCount", batchSize
        ));
    }
}

Scheduled Execution (Nightly Batch)

@Component
public class TagGenerationScheduler {
    
    private final FilmTagGenerationService tagGenerationService;
    
    @Scheduled(cron = "0 0 2 * * ?")  // ตี 2 ทุกวัน
    public void scheduledTagGeneration() {
        tagGenerationService.generateTagsForAllFilms(100);
    }
}

# application.yml
spring:
  task:
    scheduling:
      enabled: true

ตัวอย่างผลลัพธ์ที่สร้างได้จริง

ชื่อภาพยนตร์	Tags ที่สร้าง
ACADEMY DINOSAUR	Educational, Adventure, Family-friendly, Heartwarming, Fun
ACE GOLDFINGER	Spy, Thrilling, Action, Mystery, Adult
AFFAIR PREJUDICE	Romance, Drama, Emotional, Deep, Adult

~1,000 เรื่อง tag เสร็จในประมาณ 8 นาที (ใช้ Ollama llama3, local environment)

จุดที่ติดขัด

① LLM ส่ง Output นอก Format ที่กำหนด

LLM บางครั้งส่งกลับพร้อม preamble อย่าง:

"Here are the generated tags: Action, Thrilling, Family-friendly"

ให้ extract เฉพาะส่วน tag ด้วย regex หรือตรึง output format ด้วย few-shot prompting

// ตัวอย่าง few-shot prompt
String fewShotPrompt = """
    คำสั่ง: ส่ง mood tags สำหรับภาพยนตร์คั่นด้วยจุลภาค
    
    ตัวอย่าง 1:
    Input: Die Hard (detective/action)
    Output: Action,Thrilling,Suspenseful,Adult,Exciting

    ตัวอย่าง 2:
    Input: Toy Story (animated/children)
    Output: Family,Heartwarming,Adventure,Fun,Moving

    Input: %s (%s)
    Output:
    """.formatted(film.getTitle(), film.getDescription());

② การจัดการ API Cost

OpenAI API คิดค่าบริการต่อการเรียกแต่ละครั้ง
1,000 เรื่อง × 1 ครั้ง = ไม่กี่ร้อยเยนอย่างมาก (GPT-4o-mini) แต่ cost จะพองตัวหากเรียกซ้ำโดยไม่ตั้งใจ

// ข้ามภาพยนตร์ที่มี tags แล้ว
filmTagMapper.findFilmsWithoutTags(batchSize)
// → Query ที่ดึงเฉพาะภาพยนตร์ที่ taste_tags IS NULL

สรุป

Spring AI ทำให้รวม LLM client เข้ากับ Spring Boot ได้ง่าย
การแบ่งระหว่าง Ollama (ฟรี) สำหรับ local development และ OpenAI API สำหรับ production สามารถทำได้ด้วย @Profile
การตรึง output format ด้วย few-shot prompting ใน tag generation prompts ทำให้ผลลัพธ์เสถียร
รักษา batches ให้ “target เฉพาะที่ยังไม่ได้ประมวลผล” เพื่อรักษา idempotency (ผลลัพธ์เดิมไม่ว่าจะรันกี่ครั้ง)

การใช้ LLM กับข้อมูลในระดับที่งานมือไม่สามารถรับมือได้ ช่วยให้ enrich content ได้ในเวลาอันสั้น

แผนที่บทความสำหรับซีรีส์นี้

→ สร้างแอป DVD Rental สำหรับผู้ใช้ปลายทาง — โครงสร้าง Vue 3 + Spring Boot คู่กับแอปผู้ดูแลระบบ พร้อมแผนที่บทความ