AWSの機械学習サービスを使ってアプリ開発！ChatGPTのオススメプロジェクトに挑戦③

はい、M10iです。
では、ChatGPTに言われるがままにアプリを作るシリーズ2回目。今回はどんなアプリが出来るでしょう？何が出来るかお楽しみ！

前回はこちら

おさらい
ルール：
AWSの機械学習サービスをふたつランダムに組み合わせてアプリを作る。
どんなアプリかはchatGPTにオススメを聞いてその中から選ぶ。
サービスは以下の6つ。

1.Comprehend	NLP。環状分析、言語検出
2.Rekognition	画像・動画解析、物体検出
3.Lex	会話型インターフェース。botなど
4.Polly	テキスト読み上げ
5.Translate	自動翻訳
6.Transcribe	音声からのテキスト起こし

てことで、さっそくサイコロどーん！！！！
(※たまたま被ってないですが、被ったらもう一回振ろうと思います)

dice2
今回は1.Amazon Comprehend × 6.Amazon Transcribeです！！！
さっそくChatGPTに丸投げしていきたいと思います。
recommend2

おお、なんか全部それっぽい。どれにしよう・・・

んーーーーーー。。。。。他のはデータ集めるの難しそうだし、シンプルそうな
「4.インタビュー解析プラットフォーム」に決定！
インタビュー音声は、YouTubeをターゲットにしようと思います。

では早速コードを書いていきたいと思います。
Comprehend もTranscribeもsdkで呼び出すだけなので、今回もPython+djangoです。

※7割くらいはChatGPT君に書いてもらいましたがpytubeが動かないので、YouTubeのダウンロード部分をyt_dlpにするなどちょっと手直ししました。

views.py

from django.shortcuts import render
import yt_dlp
import boto3
import os
import requests
from datetime import datetime
from django.views.decorators.csrf import csrf_exempt

#① Youtubeからmp3をdownloadし、S3にアップする
def download_audio(youtube_url, output_path='path_to_save_audio'):
if not os.path.exists(output_path):
os.makedirs(output_path)

ydl_opts = {
'format': 'bestaudio/best',
'outtmpl': os.path.join(output_path, '%(title)s.%(ext)s'),
'postprocessors': [{
'key': 'FFmpegExtractAudio',
'preferredcodec': 'mp3',
'preferredquality': '192',
}],
}

try:
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
info_dict = ydl.extract_info(youtube_url, download=True)
audio_file = ydl.prepare_filename(info_dict)
s3_client = boto3.client('s3')
local_file = audio_file.replace('webm','mp3')
audio_file = local_file.replace('path_to_save_audio\\','')
with open(local_file, 'rb') as body_file:
s3_client.put_object(Bucket=bucket_name, Key=f"transcribe/{audio_file}", Body=body_file)
return audio_file

except Exception as e:
print(f"An error occurred: {e}")
return None

# ②文字起こしURI生成
def transcbrie_audio(audio_file_path):
transcribe = boto3.client('transcribe')
job_name = datetime.now().strftime('%Y%m%d%H%M%S')
transcribe.start_transcription_job(
TranscriptionJobName = job_name,
Media={'MediaFileUri': f's3://{bucket_name}/transcribe/{audio_file_path}'},
MediaFormat='mp3',
LanguageCode='en-US'
)
while True:
status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
break
transcript_uri = status['TranscriptionJob']['Transcript']['TranscriptFileUri']
return transcript_uri

# ③文字起こし結果取得
def fetch_transcription_text(transcript_uri):
response = requests.get(transcript_uri)
transcript_data = response.json()
transcript_text = ""

for item in transcript_data['results']['transcripts']:
transcript_text += item['transcript'] + " "

return transcript_text.strip()

# ④感情分析
def analyze_text_with_comprehend(text):
comprehend = boto3.client('comprehend')
sentiment = comprehend.detect_sentiment(Text=text, LanguageCode='en')
key_phrases = comprehend.detect_key_phrases(Text=text, LanguageCode='en')
return sentiment, key_phrases

# 文字起こし＋解析実行
@csrf_exempt
def transcribe_and_analyze(request):
if request.method == 'POST':
youtube_url = request.POST.get('youtube_url')
audio_file_path = download_audio(youtube_url)
transcript_uri = transcbrie_audio(audio_file_path)
transcript_text = fetch_transcription_text(transcript_uri)
sentiment, key_phrases = analyze_text_with_comprehend(transcript_text)

context = {
'youtube_embed_url': youtube_url.replace("watch?v=", "embed/"),
'transcript': transcript_text,
'sentiment': sentiment['Sentiment'],
'key_phrases': key_phrases
}
return render(request, 'test/video_player.html', context)

return render(request, 'test/index.html')

def index(req):
return render(req, 'test/index.html')

①Youtubeからmp3をdownloadし、S3にアップする
TranscribeはS3に対象がアップされている必要があるのでyt_dlpを使って指定されたyoutubeのurlからmp3を抜き出して一時ローカルに保存＋そのファイルをS3にアップします。
②文字起こしURI生成
start_transcription_jobにMediaを指定し、文字起こしをスタート。
get_transcription_jobで完了を確認したらurlを返します。
③文字起こし結果取得
urlにアクセスするとjsonで結果が受け取れます。
④感情分析
comprehendのdetect_sentimentで感情分析、detect_key_phrasesでキーフレーズ解析を行います。

video_player.html

<!DOCTYPE html>
<html>
<head>
<title>Video Player</title>
</head>
<body>
<h1>Video Player</h1>
<div>
<iframe width="1000" height="315" src="" frameborder="0" allowfullscreen></iframe>
</div>

<h2>Transcription:</h2>
<div id="transcription-display">

</div>

<h2>Sentiment Analysis:</h2>
<p></p>

<h2>Key Phrases:</h2>
<ul>

</ul>
</body>
</html>

ではでは早速。対象は英語なので
パリオリンピックのノア・ライルズ選手の100mの勝利のインタビュー動画(YouTube)
これを解析します。

urlを入れて「Start Trunscription」でview.pyのtranscribe_and_analyzeを呼びます。
cnp_trnscribe_index
結果は・・・・・

cnp_trnscribe1
おお！シッカリ文字起こしできてますね！
感情解析結果もPOSITIVEが出てます。(キーフレーズ出てないな・・・GPT君ェ・・・)

ちょっと修正してキーフレーズ、スコアの高いテキスト上位10だけ出してみました。
cnp_trnscribe2
もう少し詳細欲しいなぁ。って事で
解析結果のSentimentScoreの内容を表示＆キーフレーズをハイライトさせてみました。

cnp_trnscribe3
おお！それっぽくなりましたｗ
SentimentScoreで割合が見えると解析してる感がでますね。
複数人が喋るより、単独で語っているような動画が望ましいかもですね。

最終的なソースは以下になります。

views.py(ピンクの字の部分が追加・変更コード)

# キーフレーズを上位10件に絞る(ついでにhtml)
def key_pherasestop10(comprehend,text):
key_phrases_result = comprehend.detect_key_phrases(Text=text, LanguageCode='en')
top_key_phrases = sorted(key_phrases_result['KeyPhrases'],
key=lambda x: x['Score'], reverse=True)[:10]
# HTML 構築
html_output = "<div><ul>"

for phrase in top_key_phrases:
key_phrase = phrase['Text']
score = phrase['Score']
html_output += f"<li><strong>{key_phrase}</strong> (Score: {score:.2f})</li>"

html_output += "</ul></div>"

return top_key_phrases,html_output

# 文字起こし＋解析実行
@csrf_exempt
def transcribe_and_analyze(request):
if request.method == 'POST':
comprehend = boto3.client('comprehend')
youtube_url = request.POST.get('youtube_url')
audio_file_path = download_audio(youtube_url)
transcript_uri = transcbrie_audio(audio_file_path)
transcript_text = fetch_transcription_text(transcript_uri)
sentiment = analyze_text_with_comprehend(comprehend, transcript_text)
key_phrases, phraseshtml = key_pherasestop10(comprehend,transcript_text)

# キーフレーズのハイライト
for phrase in key_phrases:
key_phrase = phrase['Text']
trans_text = transcript_text.replace(key_phrase, f"<span style='background-color:yellow;'>{key_phrase}</span>")

# SentimentScore を分解して1行にする
sentiment_score = sentiment['SentimentScore']
sentiment_string = f"Positive: {sentiment_score['Positive']}, " \
f"Negative: {sentiment_score['Negative']}, " \
f"Neutral: {sentiment_score['Neutral']}, " \
f"Mixed: {sentiment_score['Mixed']}"

context = {
'youtube_embed_url': youtube_url.replace("watch?v=", "embed/"),
'transcript': trans_text ,
'sentiment': sentiment['Sentiment'],
'sentiment_score': sentiment_string,
'key_phrases': phraseshtml
}
return render(request, 'test/video_player.html', context)

て事で、インタビュー解析プラットフォームの出来上がり！
今回も最低限ソースでアレですが、数時間で実装できてしまうのが嬉しいですね。

次はどんなアプリになるのやら？？？
M10iでした！

参考サイト

今回のサイコロ
https://stopwatchtimer.yokochou.com/dice.html

yt-dlpでエラーが出た場合の対処
https://denshigomi.com/%E3%80%90yt-dlp%E3%80%91error-postprocessing-ffprobe-and-ffmpeg-not-found%E3%81%AE%E8%A7%A3%E6%B1%BA%E6%B3%95