Llama3.1 軽量版 8B の日本語能力を高める方法 (Dify の DSL あります)

ローカル LLM で満足のいくレベルの日本語を使うことは大きなチャレンジであり、研究機関や企業、高校生などによって、日々研究開発が行われています。海外の大手企業などが新たな LLM をリリースするたびに、日本語に対応していると正式に謳っているか、実際に使い物になるか、等という情報が LLM 界隈に行き交い、様々なテストが行われます (X とかの SNS をやってないので想像含む)。同じ質問を投げかけてどのような回答があるか、日本語は正しいか、英語や中国語が混じっていないか、回答は正しいか、内容は充実しているか、というあたりはブログなどの結果を見ると判断できますが、実際に使って思うのは、文法や言葉遣いは二の次で、本当に欲しいのは内容の充実度とその応答速度である、というところです。また、長文を翻訳させたいとか、LLM を切り替えるときの読み込み時間がイヤということもあり、ボクがここしばらくチャットで使っているローカル LLM は、llama3.1:8b-instruct-fp16 一本です。実サイズ 16GB なので、32GB しかメモリがない Mac でコンテキスト長を 3万トークンにしても高速で動いてくれます。

Contents

1 日本語能力の高い LLM を調べる
2 結論、英語で推論してもらえば良い
3 紹介する方法の使い方
4 日本語能力を高めたチャットボットの紹介
5 まとめ
6 (蛇足) センシティブな情報と情報発信

日本語能力の高い LLM を調べる

さて、LLM の日本語能力を機械的に多角的に評価され、信頼性が高そうなサイトに「Nejumi LLMリーダーボード3」さん (リンクは下記) があります。更新頻度は高く、商用/オープン・パラメータ数・リリースのタイミング・instruct/chat/text 等のバージョン違いを含めた非常に多くの LLM を評価されているようで、細かい内容まではわかりませんが、とにかく信頼してよさそうな情報量です。

リンク: https://wandb.ai/wandb-japan/llm-leaderboard3/reports/Nejumi-LLM-3–Vmlldzo3OTg2NjM2?accessToken=wpnwc9whr96pxm40dfe4k3xq513f9jc4yhj7q6pnvj4jtayoefbc77qhzbsrztgz

こちらの Nejumi さんでの Llama3.1 の順位はというと、2024/09/28 現在 8位で、オープンなモデルでは Alibaba 社の Qwen 2.5 に次いで 2番目です。すばらしいですね。ただし、パラメータ数は Llama3.1 最上位の 405B なので、ボクの持っている 32GB RAM の Mac では動かせません。8B モデルの順位はというと、次のページの真ん中よりちょっと上あたりにやっと登場。他の選択肢の多さを考えたら選ぶ理由は無い、というレベルです。

でも、個人の感想ですが、実際に使っているとそんなに大きな問題は感じません。元々学習に使ったソースの質が高かったのでしょう、405B を 8B に蒸留してもある程度の質が高い状態で保たれていると感じます。Dify でコンテキスト長を 32,768 トークンにしてもトータルで 22GB に収まるため、32GB RAM でも処理は 100% GPU で行えます (つまり速い。14TPS 以上)。問題となるのは単純な知識の量ですので、必要なら英語で質問すればより多い知識から回答が得られます。そもそも対応言語に日本語が無いんですから、日本語でチャットができても、日本語での知識は乏しくても仕方ないんです。

HuggingFace: https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct

結論、英語で推論してもらえば良い

前置きが長くなりました。というわけで、この記事で紹介するのは、Llama3.1 に英語で推論して日本語で回答してもらう方法です。知識は英語で蓄えているのですから、知識を引き出すところだけ英語にしてもらえば、日本語の回答の精度や内容の質が高まるはずです。他の日本語対応を正式には謳っていない LLM でも同様の手法は有効だと思います。下記する方法を試したところ、プロトタイプ二つを経て、最終的には極めて簡単なプロンプトでほぼ期待したような結果が得られるようになりました。むしろ、20B 以下の少ないパラメータ数の LLM の場合、多言語対応だと浅く広くの知識にならざるを得ないでしょうから、「日本語での会話」がそれなりにでき、知識は英語やらフランス語やらの限られた言語で多く・深く収集された LLM の方が利用価値は高いと思います。

紹介する方法の使い方

Mac で Dify と Ollama で作ったので、同環境でしかテストしていません。ただ、最適化バージョンと、見える化バージョンは、実質システムプロンプトのみなので、LM Studio 等の System Prompt を設定できる AI ツールを使っても簡単に利用できると思います。Ollama 単体でも template を書き換えるか、毎回質問するときに記入すれば同様のことは実現できるかもしれません。

Dify と Ollama を別々の Mac で動かすローカル LLM 環境

以下に Dify 用 DSL と System プロンプトを貼っておきます。Dify なら適当な名前.ymlとしてファイルに保存してから「DSL をインポート」で読み込んでください。LM Studio 等のツールで試すなら、DSL の後に貼ってある System prompt の内容をコピペしてください。LLM にはモデルプロバイダ Ollama でダウンロードした llama3.1:8b-instruct-fp16 を使用しています。Dify で別のものを使用する際には適宜変更してください (使用する LLM によっては期待した効果は得られない可能性があります)。

効果を見るには、System prompt を何も指定していない (デフォルト) の状態と、最適化したものを使った場合とで出力を比較してみてください。日本人なら当然知っているであろうこと (「漫画家鳥山明の代表作は」等) も日本語で聞くとハルシネーションを起こしがちですが、英語で推論されると正しい答えが得られるケースが多いです。また、海外から日本を見た第三者的解釈による回答が得られやすいのも良い側面だと思います。対して難点としては、英訳・和訳の処理が挟まるため微妙なニュアンスが変わってしまったり、固有名詞が間違った漢字やローマ時表記になることがあります。

まずは評価を簡単にしたかったので、回答内容に大きなブレや遊びが出ないように、Temperature と Top_P のそれぞれを 0.2 にしています。もっと厳密に評価・比較する場合は数値を下げたり、実際に運用フェーズで多様な生成を行いたいという場合には 1に近い値を採用するなど、目的に応じて工夫して使ってください。

また、Size of context window は 32GB RAM Mac で最大になるように32768にしてしまっています。GPU 使用率がマックスにならない場合はこの値が大きすぎる可能性があるので、チェックを外すなり小さな値にするなり、使用する LLM やご自身の RAM サイズに合わせて調整してください。参考:

Ollama 単体では速い LLM が、なぜか Dify や Continue から使うと遅い、という時の解決方法

日本語能力を高めたチャットボットの紹介

適当な名前.ymlとして保存した DSL を Dify の「DSL ファイルをインポート」でインポートすると現れるチャットボット

最適化バージョン

最終バージョンです。日本語で質問すると、内部的に英語で推論し、日本語で回答してくれます。一行のシステムプロンプトのみなので、レスポンスタイムに影響はありません。

★ Dify 用 DSL (クリックして表示する) ★:

app:
  description: Llama3.1 の持つ豊富な英語の知識を日本語で回答してくれる。日本語の固有名詞の変換が苦手なところは、日本語ペラペラなアメリカ人インテリも漢字はニガテ、みたいでかわいい
  icon: male-student
  icon_background: '#FFE4E8'
  mode: advanced-chat
  name: Llama3.1 最適化 (脳内処理バージョン)
  use_icon_as_answer_icon: true
kind: app
version: 0.1.1
workflow:
  conversation_variables: []
  environment_variables: []
  features:
    file_upload:
      image:
        enabled: false
        number_limits: 3
        transfer_methods:
        - local_file
        - remote_url
    opening_statement: ''
    retriever_resource:
      enabled: false
    sensitive_word_avoidance:
      enabled: false
    speech_to_text:
      enabled: false
    suggested_questions: []
    suggested_questions_after_answer:
      enabled: true
    text_to_speech:
      enabled: false
      language: ''
      voice: ''
  graph:
    edges:
    - data:
        sourceType: start
        targetType: llm
      id: 1727272665783-llm
      source: '1727272665783'
      sourceHandle: source
      target: llm
      targetHandle: target
      type: custom
    - data:
        sourceType: llm
        targetType: answer
      id: llm-answer
      source: llm
      sourceHandle: source
      target: answer
      targetHandle: target
      type: custom
    nodes:
    - data:
        desc: ''
        selected: false
        title: 開始
        type: start
        variables: []
      height: 54
      id: '1727272665783'
      position:
        x: 80
        y: 282
      positionAbsolute:
        x: 80
        y: 282
      sourcePosition: right
      targetPosition: left
      type: custom
      width: 244
    - data:
        context:
          enabled: false
          variable_selector: []
        desc: ''
        memory:
          role_prefix:
            assistant: ''
            user: ''
          window:
            enabled: false
            size: 10
        model:
          completion_params:
            keep_alive: 30m
            num_ctx: 32768
            temperature: 0.2
            top_p: 0.2
          mode: chat
          name: llama3.1:8b-instruct-fp16
          provider: ollama
        prompt_template:
        - id: d80ef4de-35f3-4106-87af-ff57023b2649
          role: system
          text: Infer question in English, generate rich answer, and output in the
            language used to ask.
        selected: true
        title: LLM
        type: llm
        variables: []
        vision:
          enabled: false
      height: 98
      id: llm
      position:
        x: 379
        y: 282
      positionAbsolute:
        x: 379
        y: 282
      selected: true
      sourcePosition: right
      targetPosition: left
      type: custom
      width: 244
    - data:
        answer: '{{#llm.text#}}'
        desc: ''
        selected: false
        title: 回答
        type: answer
        variables: []
      height: 107
      id: answer
      position:
        x: 680
        y: 282
      positionAbsolute:
        x: 680
        y: 282
      sourcePosition: right
      targetPosition: left
      type: custom
      width: 244
    viewport:
      x: -111
      y: 197
      zoom: 1

System prompt (LM Studio 等で使用する場合はこちらをどうぞ):

Infer question in English, generate rich answer, and output in the language used to ask.

見える化バージョン

質問の英訳、英語での推論、最終的な和訳、の全てが表示されるバージョンです。それぞれの処理は一つの LLM ブロックで実施しています。上の最終バージョンでは本当に英語で推論したのかわかりませんが、こちらは途中経過もはっきり見て取れます。英語の勉強にも使えるかもしれません。

★ Dify 用 DSL (クリックして表示する) ★:

app:
  description: Llama3.1 が、質問を英訳し、推論し、日本語で返す、全てのプロセスを透明化したバージョン。冗長だが、英語の知識を使って回答していることがわかる。英語の勉強にもなるかも？
  icon: two_men_holding_hands
  icon_background: '#E4FBCC'
  mode: advanced-chat
  name: Llama3.1 英語で推論 (見える化バージョン)
  use_icon_as_answer_icon: true
kind: app
version: 0.1.1
workflow:
  conversation_variables: []
  environment_variables: []
  features:
    file_upload:
      image:
        enabled: false
        number_limits: 3
        transfer_methods:
        - local_file
        - remote_url
    opening_statement: ''
    retriever_resource:
      enabled: false
    sensitive_word_avoidance:
      enabled: false
    speech_to_text:
      enabled: false
    suggested_questions: []
    suggested_questions_after_answer:
      enabled: true
    text_to_speech:
      enabled: false
      language: ''
      voice: ''
  graph:
    edges:
    - data:
        sourceType: start
        targetType: llm
      id: 1727270833994-llm
      source: '1727270833994'
      sourceHandle: source
      target: llm
      targetHandle: target
      type: custom
    - data:
        sourceType: llm
        targetType: answer
      id: llm-answer
      source: llm
      sourceHandle: source
      target: answer
      targetHandle: target
      type: custom
    nodes:
    - data:
        desc: ''
        selected: false
        title: 開始
        type: start
        variables: []
      height: 54
      id: '1727270833994'
      position:
        x: 80
        y: 282
      positionAbsolute:
        x: 80
        y: 282
      sourcePosition: right
      targetPosition: left
      type: custom
      width: 244
    - data:
        context:
          enabled: false
          variable_selector: []
        desc: ''
        memory:
          role_prefix:
            assistant: ''
            user: ''
          window:
            enabled: false
            size: 10
        model:
          completion_params:
            keep_alive: 30m
            num_ctx: 32768
            temperature: 0.2
            top_p: 0.2
          mode: chat
          name: llama3.1:8b-instruct-fp16
          provider: ollama
        prompt_template:
        - id: 342e3642-d8b5-42c8-b003-7816a8ec7f3a
          role: system
          text: 'You are a skilled AI translator. Since your knowledge is best in
            English, translate any question into English for inference and generate
            answer. Follow the steps described below.


            ### Steps:

            1. You translate {{#sys.query#}}directly into English. Try maintaining
            the original format without omitting or adding any information.

            2. Generate response in English.

            3. Translate the response literary back into the language originally used
            by the user and output.'
        selected: true
        title: LLM
        type: llm
        variables: []
        vision:
          enabled: false
      height: 98
      id: llm
      position:
        x: 381
        y: 282
      positionAbsolute:
        x: 381
        y: 282
      selected: true
      sourcePosition: right
      targetPosition: left
      type: custom
      width: 244
    - data:
        answer: '{{#llm.text#}}'
        desc: ''
        selected: false
        title: 回答
        type: answer
        variables: []
      height: 107
      id: answer
      position:
        x: 680
        y: 282
      positionAbsolute:
        x: 680
        y: 282
      sourcePosition: right
      targetPosition: left
      type: custom
      width: 244
    viewport:
      x: -205
      y: 134
      zoom: 1

System prompt (LM Studio 等で使用する場合はこちらをどうぞ):

You are a skilled AI translator. Since your knowledge is best in English, translate any question into English for inference and generate answer. Follow the steps described below.

### Steps:
1. You translate the query directly into English. Try maintaining the original format without omitting or adding any information.
2. Generate response in English.
3. Translate the response literary back into the language originally used by the user and output.

ステップバイステップバージョン

質問の英訳、英語での推論、最終的な和訳、のそれぞれを個別の LLM ブロックで実行しています。最終的にユーザに返ってくるのは和訳されたものだけなので余計な情報は含まれませんが、ブロック毎にどういう処理をしたのかが調べられるのでデバッグ向きです (最初に作った PoC バージョン)。回答の質は高い傾向があります。これが最終バージョンでは無いのは、3回LLM が動くので単純に時間がかかるからです。Dify のフローを利用しているため、こちらに限っては LM Studio にシステムプロンプトに指示を投げるという使い方では実現できません。

★ Dify 用 DSL (クリックして表示する) ★:

app:
  description: 質問の英訳、推論、和訳、と 3つのパートそれぞれに Llama3.1 を使用したバージョン。無駄に時間がかかる、PoC 版
  icon: family
  icon_background: '#FEF7C3'
  mode: advanced-chat
  name: Llama3.1 開発途上 (3倍労力バージョン)
  use_icon_as_answer_icon: true
kind: app
version: 0.1.1
workflow:
  conversation_variables: []
  environment_variables: []
  features:
    file_upload:
      image:
        enabled: false
        number_limits: 3
        transfer_methods:
        - local_file
        - remote_url
    opening_statement: ''
    retriever_resource:
      enabled: false
    sensitive_word_avoidance:
      enabled: false
    speech_to_text:
      enabled: false
    suggested_questions: []
    suggested_questions_after_answer:
      enabled: true
    text_to_speech:
      enabled: false
      language: ''
      voice: ''
  graph:
    edges:
    - data:
        isInIteration: false
        sourceType: start
        targetType: llm
      id: 1726927256338-source-1727251929525-target
      source: '1726927256338'
      sourceHandle: source
      target: '1727251929525'
      targetHandle: target
      type: custom
      zIndex: 0
    - data:
        isInIteration: false
        sourceType: llm
        targetType: answer
      id: 1727252264986-source-answer-target
      source: '1727252264986'
      sourceHandle: source
      target: answer
      targetHandle: target
      type: custom
      zIndex: 0
    - data:
        isInIteration: false
        sourceType: llm
        targetType: llm
      id: 1727251929525-source-1727252519406-target
      source: '1727251929525'
      sourceHandle: source
      target: '1727252519406'
      targetHandle: target
      type: custom
      zIndex: 0
    - data:
        isInIteration: false
        sourceType: llm
        targetType: llm
      id: 1727252519406-source-1727252264986-target
      source: '1727252519406'
      sourceHandle: source
      target: '1727252264986'
      targetHandle: target
      type: custom
      zIndex: 0
    nodes:
    - data:
        desc: ''
        selected: false
        title: 開始
        type: start
        variables: []
      height: 54
      id: '1726927256338'
      position:
        x: 80
        y: 282
      positionAbsolute:
        x: 80
        y: 282
      sourcePosition: right
      targetPosition: left
      type: custom
      width: 244
    - data:
        answer: '{{#1727252264986.text#}}'
        desc: ''
        selected: false
        title: 回答
        type: answer
        variables: []
      height: 107
      id: answer
      position:
        x: 1278.680492089227
        y: 282
      positionAbsolute:
        x: 1278.680492089227
        y: 282
      selected: false
      sourcePosition: right
      targetPosition: left
      type: custom
      width: 244
    - data:
        context:
          enabled: true
          variable_selector:
          - sys
          - query
        desc: ''
        memory:
          query_prompt_template: ''
          role_prefix:
            assistant: ''
            user: ''
          window:
            enabled: false
            size: 50
        model:
          completion_params:
            keep_alive: 30m
            num_ctx: 32768
            temperature: 0.2
            top_p: 0.2
          mode: chat
          name: llama3.1:8b-instruct-fp16
          provider: ollama
        prompt_template:
        - id: 7e4cffec-808f-4f8b-968d-945955648c2b
          role: system
          text: You are a skilled translator in English. You translate {{#sys.query#}}directly
            into English so the entire output can be used as a replacement of the
            original text. Based on the content, maintaining the original format without
            omitting or adding any information.
        - id: d152eb91-5a38-4d31-97d9-c57c4b18ed51
          role: user
          text: '{{#context#}}'
        selected: false
        title: Translation to English
        type: llm
        variables: []
        vision:
          enabled: false
      height: 98
      id: '1727251929525'
      position:
        x: 380
        y: 275.68049208922713
      positionAbsolute:
        x: 380
        y: 275.68049208922713
      selected: false
      sourcePosition: right
      targetPosition: left
      type: custom
      width: 244
    - data:
        context:
          enabled: false
          variable_selector: []
        desc: ''
        memory:
          query_prompt_template: ''
          role_prefix:
            assistant: ''
            user: ''
          window:
            enabled: false
            size: 50
        model:
          completion_params:
            keep_alive: 30m
            num_ctx: 32768
            temperature: 0.2
            top_p: 0.2
          mode: chat
          name: llama3.1:8b-instruct-fp16
          provider: ollama
        prompt_template:
        - id: 9e13c0fb-e680-48da-8270-2d8ed2f8b676
          role: system
          text: You are a skilled translator in Japanese. You translate {{#1727252519406.text#}}directly
            into Japanese so the entire output can be used as a replacement of the
            original text. Based on the content, maintaining the original format without
            omitting or adding any information.
        - id: be046c8d-1d48-47f4-91d0-1c6bb7fc3909
          role: user
          text: '{{#1727252519406.text#}}'
        selected: true
        title: translation to japanese
        type: llm
        variables: []
        vision:
          enabled: false
      height: 98
      id: '1727252264986'
      position:
        x: 980
        y: 282
      positionAbsolute:
        x: 980
        y: 282
      selected: true
      sourcePosition: right
      targetPosition: left
      type: custom
      width: 244
    - data:
        context:
          enabled: false
          variable_selector: []
        desc: ''
        memory:
          query_prompt_template: ''
          role_prefix:
            assistant: ''
            user: ''
          window:
            enabled: false
            size: 50
        model:
          completion_params:
            keep_alive: 30m
            num_ctx: 32768
            temperature: 0.2
            top_p: 0.2
          mode: chat
          name: llama3.1:8b-instruct-fp16
          provider: ollama
        prompt_template:
        - id: 5a722de8-e6f9-48df-81cb-86cfa183f447
          role: system
          text: Generate rich answer in English.
        - id: a3587e3e-84f3-41c9-a45d-ea5fb8392c73
          role: user
          text: '{{#1727251929525.text#}}'
        selected: false
        title: inference and generation
        type: llm
        variables: []
        vision:
          enabled: false
      height: 98
      id: '1727252519406'
      position:
        x: 684
        y: 275.68049208922713
      positionAbsolute:
        x: 684
        y: 275.68049208922713
      selected: false
      sourcePosition: right
      targetPosition: left
      type: custom
      width: 244
    viewport:
      x: -270.21689933279276
      y: 120.80412637177352
      zoom: 0.8083334534373365

まとめ

プロンプトには改善の余地はあるでしょうし、対象とする LLM によって最適な方法も変わると思います。ただいずれにせよ、英語の情報が豊富な代わりに日本語で学習した情報が少ないという LLM には有効な手法だと思います。すぐに試せるはずなので、具体的な実行例は載せませんでした。そのまま使わなくても、アイディアは流用できるでしょう。ぜひお気に入りの LLM の知識を最大限に活用できる方法を生み出してください。そして、ぜひボクや世の中に共有してください。

(蛇足) センシティブな情報と情報発信

アメリカの大手企業が作った LLM である Llama (Meta 社) や Phi (Microsoft 社)、Gemma (Google 社) は、ローカルにダウンロードして利用できるオープンソース・オープンウェイトではありますが、それぞれの企業のコンプライアンスポリシーに従ってセンシティブな情報は回答してくれません。つまり、エロいこと、差別的なこと、暴力的なこと、犯罪に関わること、等を答えさせようとしても、ことごとく、かたくなに、徹底的に拒否されてしまいます。ところが、中国産の LLM は大手がリリースしたものであっても制限がゆるいようです。特にパラメータ数の小さいモデルほど、質問の仕方によってはかなりの部分まで回答してくれます。例えば Nejumi さんのリーダーボードで 20位以内に入っている Qwen 2.5 14B Instruct はかなり寛容な感じです。

これは、「危ないことを教えてくれるぜ、えへへ」ということではなく、どのような LLM を使ってどのような文章 (絵や映像、音声も同じです) が得られたとしても、それを使用して外部に何かを発信する前には注意が必要、ということです。注意すべきは LLM 自体のライセンスだけではありません。文法、誤字脱字、ファクトチェックも重要ですが、誰かを傷つけたり、権利を侵害するようなことが無いよう、十分気をつけましょう。

本ブログでは LLM が生成した文章をそのまま注釈無しで使用しないようにしていますが、何か不備がございました、ご指摘いただけると幸いです。

また、気に入った記事には文末のイイネボタンをクリックしていただけるとうれしいです。

Image by Stable Diffusion

アメリカ人大学教授が寿司の作り方を教えている授業の風景を想定してお願いしてみました。

Date:
2024年9月29日 0:21:23

Model:
realisticVision-v51VAE_original_768x512_cn

Size:
768 x 512

Include in Image:
white male american college professor in formal suit teaching how to make sushi

Exclude from Image:

Seed:
1340386485

Steps:
20

Guidance Scale:
20.0

Scheduler:
DPM-Solver++

ML Compute Unit:
CPU & GPU

日本語能力の高い LLM を調べる

結論、英語で推論してもらえば良い

紹介する方法の使い方

日本語能力を高めたチャットボットの紹介

最適化バージョン

見える化バージョン

ステップバイステップバージョン

まとめ

(蛇足) センシティブな情報と情報発信

コメントを残す コメントをキャンセル

コメントを残すコメントをキャンセル