開発用の検証端末として、Pixel 8 Proを使用しましたので、
搭載されているGemini Nanoについて調べてみました。

Gemini Nanoとは?

Gemini Nanoはデバイス単体で稼働できるようコンパクトサイズにしたバージョンのGemini。
3つあるGeminiのモデルの中で最も効率性の高いオンデバイスAIと銘打たれています。

何ができるの?

外部サーバに接続せずに、効率的なAI処理を必要とするデバイス上のタスク(チャットアプリケーション内での返信の提案、音声からテキスト要約など)が実行できます。

具体的には、以下の事が実現できると周知されております。

  • レコーダーアプリにおける自動要約機能
  • Gboardキーボードの校正機能とスマートリプライ機能

搭載されている機種

Google Pixel 8 Pro

執筆時点では、Pixel 8 Proのみでしたが
Pixel 8でも次回のアップデートで利用可能になると正式に発表があったみたいです。
ハイエンドモデルのPixel 8 ProよりミドルレンジモデルのPixel 8の方がお値段的に手を出しやすく、使用率も高いので、Gemini Nanoの普及率が高まると思われます。

機能の概要

レコーダーアプリの自動要約

会話やインタビュー、プレゼンテーションの録音ファイルをデバイス上に即座に要約できます。
デバイス上(オンデバイス)のみで動作し、ネットワークに接続しなくても動作する事が可能です。
※要約機能は現時点で英語しか対応しておりません。

Gboardキーボードの校正機能とスマートリプライ機能

端末上のLLMでGboardのスマートリプライ(返信する内容の候補を提示する機能)が利用できます。
(スマートリプライとは、チャットアプリの会話内容を意識した高品質な応答提案)

現時点では、以下のAppに対応しているそうです。

  • WhatsApp
  • Google ハングアウト
  • WeChat
  • Snapchat

実際に使ってみた

今回は、レコーダーアプリの要約機能を実際に試してみました。

レコーダーアプリ起動

録音開始

Google I/O ‘23のDevelopersというプレゼンを録音しました。
録音中はアプリ内で自動的に文字起こしが進行しており、話し声やノイズ、音楽や拍手を識別できていて
実施の話の声の文字起こしも精度は高かったです。

録音中の画面

リアルタイムに文字起こし

文字起こししたデータ

That's so much more to come. Next, Josh is here to show you exactly how we're

making it easy and scalable for every developer to innovate with Ai and palm 2.

Thanks Thomas. Our work is enabling businesses and it's also empowering

developers. Pawn 2, our most capable language model, that Sundar talked about

Powers, the Palm API. Since March, we've been running a private preview with

our Palm API and it's been amazing to see how quickly developers have used it in

their applications.

Like chapter who are generating stories. So you can choose your own adventure

forever. Changing story time. Or game on technology a company that makes chat apps

for sports fans and Retail Brands to connect with their audiences. And there's also

wind news. They're using the Palm API to help customers place that correct order

for the junior bacon.

Cheeseburger, they talked about and they talked to many feature But I'm most excited

about the response, we've gotten from the developer tools. Developers want Choice

when it comes to language models. And we're working with leading developer tools

companies like Lane chain, chroma, and many more to support the palm API.

We've also integrated it into Google developer tools, like Firebase and colab.

You can hear a lot more about the Palm API in the developer keynote and sign up today.

Now to show you just how powerful the Palm API is, I want to share one concept

that five Engineers at Google put together over the last few weeks. The idea is called

project Tailwind and we think of it as an AI.

First notebook, that helps you learn faster. Like a real notebook, your notes and

your sources power Tailwind. How it works is, you can simply pick the files from

Google Drive and it effectively creates a personalized and private AI model

that has expertise in the information that you give it.

We've been developing this idea with authors like, Stephen, Johnson and testing it

at universities like, Arizona State and the University of Oklahoma where I went

to school, do you want to see how it works? Let's do a live demo. Now, imagine

that I'm a student taking a computer science history class, I'll open up Tailwind

and I can quickly see in Google Drive, all my different notes and assignments

and readings.

I can insert them. And what will happen when Tailwind loads up. As you can see

my different notes and articles on the side, Sure. They are in the middle and it instantly

creates a study guide on the right to give me bearings. You can see it's pulling out

key Concepts and questions grounded in the materials that I've given it.

Now, I can come over here and quickly, change it to go across all the different sources

and type something like create glossary for Hopper. And what's going to happen behind

the scenes is it'll automatically compile a glossary associated with all the different notes,

and articles relating to Grace Hopper, the computer science history Pioneer.

Look at this flomatic, Cobalt compiler. All created based on my notes. Now, let's try

one more. I'm gonna try something else, called different viewpoints on Dyna book.

So the Dyna book. This was a concept from Alan K. Again, Tailwind going out. Finding all

the different things. You can see how quick it comes back.

There it is. And what's interesting here is it's helping me think through the concept.

So it's giving me different viewpoints. It was a Visionary product, it was a missed

opportunity but my favorite part is it shows its work. You can see the citations here

when I hover over, here's something from my class notes.

Here's something from an article, the teacher has signed. It's all right here,

grounded in my sources.

Now project Tailwind is still in its early days, but we've had so much fun making

this prototype and we realized it's not just for students. It's helpful for anyone,

synthesizing information from many different sources that you choose. Like, writers

researching an article or analysts going through earnings calls or even lawyers,

preparing for a case.

Imagine collaborating with an AI that's grounded in what you've read in all of your notes.

We want to make it available to try it out if you want to see it.

There's a lot more you can do with palm 2. And we can't wait to see what you build

using the Palm API. Generative AI is changing what it means to develop new products.

At Google, we offer the best ml infrastructure. With powerful models, including those in vertex

and the apis and tools to quickly generate your own applications.

Building bold AI requires a responsible approach, so let me hand it over to James to share more.

Thanks.

要約実行

要約結果

5分程度の講演がかなり短く要約されてます!
(個人的には、もう少し詳細に要約して欲しい気もします。。)

Palm API is a powerful language model that is being used by developers to

create new applications.

Palm API is also being used to power developer tools and to create

AI-powered tools.

Project Tailwind is a prototype of an AI-powered notebook that is

being developed by Google.

感想

インターネットを介さずデバイスのみで録音データの要約は利用できる場面が多く、とても便利な機能だと感じた一方で、要約の精度はまだまだ向上の余地があるように見受けられました。
また、要約の精度は各ユーザーでチューニングできれば、用途に合わせて使い分けができて利便性の向上に繋がるかとも感じました。