“. In general, the response matches that shape, but it’s not guaranteed. We need to be a little defensive here and validate our input. If it fails the validation, we output the results to an error collection. In this sample, we leave those values there. For a production pipeline, you might want to let the LLM try a second time and run the error collection results in RunInference again and then flatten the response with the results collection. Because Beam pipelines are Directed Acyclic Graphs, we can’t create a loop here.We now take the results collection and process the LLM output. To process the results of RunInference, we create a new DoFn SentimentAnalysis and function extract_model_reply This step returns an object of type PredictionResult:It’s worth spending a few minutes on the need for extract_model_reply(). Because the model is self-hosted, we cannot guarantee that the text will be a JSON output. To ensure that we get a JSON output, we need to run a couple of checks. One benefit of using the Gemini API is that it includes a feature that ensures the output is always JSON, known as constrained decoding.Let’s now use these functions in our pipeline:Using with_outputs creates multiple accessible collections in filtered_results. The main collection has sentiments and summaries for positive and neutral reviews, while error contains any unparsable responses from the LLM. You can send these collections to other sources, such as BigQuery, with a write transform. This example doesn’t demonstrate this step, however, the negative collection is something that we want to do more within this pipeline.Making sure customers are happy is critical for retention. While we have used a light-hearted example with our pineapple on pizza debate, the direct interactions with a customer should always strive for empathy and positive responses from all parts of an organization. At this stage, we pass on this chat to one of the trained support representatives, but we can still see if the LLM is able to assist that support person in reducing the time to resolution.For this step, we make a call to the model and ask it to formulate a response. We again use the Gemma 2B model for this call in the code.In general, you wrap the prompt creation code in a DoFn, but it is also possible to use a simple lambda in the pipeline code itself. Here we generate a prompt that contains the original chat message, which was extracted in the SentimentAnalysis function.For local running and testing we can make use of some simple print statements to see the outputs on the various PCollections:Of course for the real usage, these outputs will be sent to various sinks such as Pub/Sub and BigQuery.Lets see how the model does with the previous JSON message:Step 1: Sentiment analysis and summarization”sentiment”: -1,”summary”: “User 221 is very unhappy about the presence of pineapple on pizza.”The responses that the 2B model generated aren’t bad. The sentiment is correct, and because the results of the summary are more subjective, the correctness of the response depends on the downstream uses of this information.Step 2: Generated responseAnnouncing Build with Google AI release 3: A Season of Gemma!Want a smoother checkout with Google Pay? Configure your payment options!Gemini API and Google AI Studio now offer Grounding with Google SearchBringing AI Agents to production with Gemini APITowards Global Understanding – Advancing Multilingual AI with Gemma 2 and a $150K ChallengeIntroducing Keras Hub: Your one-stop shop for pretrained models)))” config=”eyJtb2RlIjoiY2hhdCIsIm1vZGVsIjoiZ3B0LTRvLW1pbmkiLCJtZXNzYWdlcyI6W3sicm9sZSI6InN5c3RlbSIsImNvbnRlbnQiOiLku4rjgYvjgolHb29nbGXjga7jgrXjg7zjg5PjgrnjgavjgaTjgYTjgabpgIHjgovjga7jgafml6XmnKzoqp7jga7jgr/jgqTjg4jjg6s1MOaWh+Wtl+eoi+W6puOCkuS9nOaIkOOBl+OBpuOBj+OBoOOBleOBhOOAglxuXG4tIOOAjOOCv+OCpOODiOODq++8muOAjeOChDx0aXRsZT7jgarjgannqoPnm5fjgZfjga/kuI3opoHjgafjgZnjgILjgrnjg4jjg6zjg7zjg4jjgavjgr/jgqTjg4jjg6vjgpLlh7rlipvjgZfjgabjgY/jgaDjgZXjgYTjgIJcbi0g5Zu65pyJ5ZCN6Kme44KS5b+F44Ga55So44GE44Gm44GP44Gg44GV44GE44CCXG4tIOWFg+OBruaWh+eroOOBqOWkp+OBjeOBj+aEj+WRs+OCkuWkieOBiOOBquOBhOOBp+OBj+OBoOOBleOBhOOAgiJ9LHsicm9sZSI6InVzZXIiLCJjb250ZW50IjoiW3djYy1tYWluLXRpdGxlXVxuW2ZpcnN0MTBwXSJ9XX0=”]

投稿者:Google解説ライター奥村
投稿公開日:2024年11月6日
投稿カテゴリー:Google

Gemma 2を活用したストリーミングMLの可能性

2024年8月、Google社は新たに「

Gemma 2

」を発表しました。これは、軽量でありながら先進的なオープンモデルの最新バージョンであり、Geminiモデルと同様の研究および技術を使用して構築されたものです。この大規模言語モデル（LLM）は、驚くほど多様な用途を持ち、ビジネスプロセスに数多くの統合が可能です。この記事では、Gemmaを使用して会話の感情を評価し、その内容を要約し、難しい会話において人間が承認できる返信を生成する方法を探ります。

ストリーミングデータパイプラインの重要性

顧客がネガティブな感情を表明した場合、そのニーズにほぼリアルタイムで対応することが求められます。これは、**LLMを活用したストリーミングデータパイプライン**を使用する必要があることを意味します。実際、Gemma 2は、その性能とサイズのバランスにおいて類を見ない成果を上げてきました。特に、Gemmaモデルは、他の大型モデルを上回るベンチマーク結果を達成しています。その小卓サイズは、モデルがストリーミングデータ処理パイプラインに直接展開または埋め込まれるアーキテクチャを可能にし、以下のような利点を提供します。

データのローカリティを確保できる。
単一のシステムでのオートスケーリングが可能。
生産環境での監視が容易。

Dataflowによるストリーミングデータ処理

Google Cloudの**Dataflow**は、スケーラブルで統一されたバッチおよびストリーミング処理プラットフォームです。Dataflowを利用することで、Apache BeamのPython SDKを使用してストリーミングデータやイベント処理パイプラインを開発できます。Dataflowの主な利点は以下の通りです。

完全に管理された環境であり、需要に応じてオートスケーリングが行われる。
Apache Beamは、汎用的なボイラープレートコードを書く手間を省く一連のローコードのターンキー変換を提供する。
Dataflow MLは、必要なドライバーをインストールし、さまざまなGPUデバイスへのアクセスを提供する。

ケーススタディ：フードチェーンの顧客サポート分析

以下のシナリオは、繁忙なフードチェーンが、さまざまなチャットチャネルを通じて高ボリュームの顧客サポートリクエストを分析し、保存する課題に直面しているものです。このようなインタラクションには、自動チャットボットによって生成されたチャットと、ライブサポートスタッフの注意を要するニュアンスのある会話が含まれます。

この課題に対し、**以下の2つの目標が設定されました**：

効率的にチャットデータを管理および保存し、ポジティブなインタラクションを要約して将来の解析のために容易に参照できるようにする。
リアルタイムの問題検出と解決を実装し、感情分析を使って不満を持つ顧客を迅速に特定し、その懸念に対応するためのカスタマイズされた返信を生成する。

Gemmaを用いたストリーミングパイプラインの構築

このシナリオにおいて、Gemmaは感情分析を行い、チャットを要約するために利用されます。また、ネガティブな感情のチャットに対しては、Gemmaがその顧客に対する文脈に沿った返信を生成します。この返信は、その後、サポートスタッフによるレビューを経て、不満を抱える顧客に届けられます。

具体的には、以下のプロセスを経て、チャットの解析が行われます：

Pub/Subからレビューされたデータを読み取り、チャット履歴を含んだJSONペイロードを処理します。
Gemmaにテキストを渡し、感情スコアと要約をリクエストします。
感情スコアに基づいてチャットを分岐させます。

データフローの全体的なコードとパイプラインの設計

Gemmaを使用したインフラの設定には、事前にGemmaモデルをダウンロードし、Dataflowサービスを利用する必要があります。このサービスはCPUでもテスト可能ですが、推論時間を考慮すると、GPUの使用が推奨されます。

最終的に、Gemmaを内蔵した**RunInference**トランスフォームがこのソリューションの中心となり、ユーザーをボイラープレートコードから抽象化します。

まとめと今後の展望

Gemma 2を活用することで、顧客の感情データを高速度と変動性で処理するシステムが構築できます。これは企業にとって、顧客体験を向上させるための強力な手段と言えるでしょう。将来的には、より多くのパラメータを持つモデルへの移行や、フィードバックを基にしたモデルの微調整も検討することができます。また、A/Bテストを活用して、様々なモデルのレスポンスを比較することも可能です。

Gemma 2の機能を利用することで、多様なビジネスニーズに応える柔軟なシステムを構築することができるでしょう。

Gemma 2を活用したストリーミングMLの可能性

Gemma 2

ストリーミングデータパイプラインの重要性

Dataflowによるストリーミングデータ処理

ケーススタディ：フードチェーンの顧客サポート分析

Gemmaを用いたストリーミングパイプラインの構築

データフローの全体的なコードとパイプラインの設計

まとめと今後の展望

おすすめ

アンドロイド12は、フェイスリフトを得るためにGoogleの発見を助けるだろう / グーグル

グーグルFLoCは現在、限られた数のユーザーのためのGoogle Chromeの初期のバージョンでその方法を作っています / グーグル

グーグルクロームは4ヶ月後についにiOSで新しいバージョンとアップデートを取得しています / Google