☁️ Google Cloud * Speech to Text 알아보기

이 글에서 설명한 내용의 예제는 Speech To Text 버전 1.9X.XX, v1p1beta1 workspace를 사용하고 있습니다.

버전별로 포함하고 있는 기능이 상이하니 실행하실 때 꼭 참고하세요!

바로 돈을 내지 않아도 $300으 무료 크레딧을 12개월간 사용할 수 있으며, 무료 평가판 종료 후 자동 청구되지 않는다고 한다.

신용카드를 등록하게 되어 있는데 이는 자동 가입을 방지하기 이해서이며 사용자가 유료 계정으로 직접 업그레이드하지 않는 한 요금이 청구되지 않는다고 나와있다.

클라우드 음성 텍스트 Cloud Speech-to-Text

음성 텍스트(STT) 변환은 머신러닝(기계학습)을 사용하며 짧거나 긴 형식의 오디오를 사용할 수 있다.
Speech-to-text conversion powered by machine learning and available for short-form or long-form audio.

STT를 위한 문서보기 View Documentation for this product.

강력한 음성 인식 Powerful speech recognition

구글 클라우드 STT는 강력한 신경 네트워크 모델을 사용하기 쉬운 API에 적용하여 개발자가 오디오를 텍스트로 변환 할 수 있게 한다. API는 전세계 사용자들을 지원하기 위해 120개의 다양한 언어와 변형을 인식한다. 콜센터에서 오디오를 녹음하는 것 이상을 음성으로 명령하고 제어 할 수 있다. API는 구글 머신러닝 기술을 이용해 실시간 스트리밍과 사전에 녹음 된 오디오를 처리할 수 있다.

Google Cloud Speech-to-Text enables developers to convert audio to text by applying powerful neural network models in an easy-to-use API. The API recognizes 120 languages and variants to support your global user base. You can enable voice command-and-control, transcribe audio from call centers, and more. It can process real-time streaming or prerecorded audio, using Google’s machine learning technology.

클라우드 음성-텍스트 변환 기능 Cloud Speech-to-Text features

1. 자동 음성 인식 Automatic Speech Recognition

자동 음성 인식(ARS)은 음성 녹음 및 학습과 같은 응용프로그램에 강력한 신경 네트워킹을 제공한다.

Automatic Speech Recognition (ASR) powered by deep learning neural networking to power your applications like voice search or speech transcription.

2. 소음 억제 Noise Robustness

다양한 환경에서 추가로 소음을 제거를 요구하지 않고 소음이 있는 오디오 파일을 인식한다.

Handles noisy audio from many environments without requiring additional noise cancellation.

3. 광범위 어휘 Global Vocabulary

120개의 언어와 풍부한 어휘를 인식한다.

Recognizes 120 languages and variants with an extensive vocabulary.

4. 부적절한 콘텐츠 필터링 Inappropriate Content Filtering

일부 언어의 텍스트 결과에서 부적절한 콘텐츠를 필터링 한다.

Filter inappropriate content in text results for some languages.

5. 구문 힌트 Phrase Hints

음성 인식은 말하기 쉬운 단어와 구를 제공함으로써 특별한 맥락을 위해 사용자화 될 수 있다.

이것은 특히 사용자가 추가한 단어나 이름을 어휘와 음성 제어 사용 사례에 추가할 때 유용하다.

Speech recognition can be customized to a specific context by providing a set of words and phrases that are likely to be spoken. This is especially useful for adding custom words and names to the vocabulary and in voice-control use cases.

6. 실시간 스트리밍과 사전 녹음 된 오디오 지원 Real-time Streaming or Prerecorded Audio Support

오디오 입력은 응용프로그램의 마이크에서 스트리밍 하거나 사전 녹음된 오디오 파일을 보낸다 (인라인 또는 구글 클라우드 스토리지를 통해).

FLAC, AMR, PCMU, and Linear-16등과 같은 다양한 인코더들을 지원한다.

Audio input can be streamed from an application’s microphone or sent from a prerecorded audio file (inline or through Google Cloud Storage). Multiple audio encodings are supported, including FLAC, AMR, PCMU, and Linear-16.

7. 자동 구두점 Automatic Punctuation BETA

기계 학습을 통해 정확한 구두점을(콤마, 물음표 및 마침표와 같은) 생성한다.

Accurately punctuates transcriptions (e.g., commas, question marks, and periods) with machine learning.

9. 스피커 디아라이제이션 Speaker Diarization BETA

누가 무엇을 말했는지 아는것 - 대화에서 각각이 말한 것을 자동으로 인식할 수 있습니다.

Know who said what - you can now get automatic predictions about which of the speakers in a conversation spoke each utterance.

10. 자동 언어 감지 Auto-Detect Language BETA

다국어 시나리오를 지원해야 할 경우, 2~4개의 언어 코드를 명시할 수 있다. 구글 STT는 올바른 언어를 식별하고 스크립트를 제공한다.

When you need to support multilingual scenarios, you can now specify two to four language codes and Cloud Speech-to-Text will identify the correct language spoken and provide the transcript.

11. 멀티 채널 인식 Multichannel Recognition BETA

각 참가자가 별도의 채널(예: 두 개의 채널이 있는 전화 통화 또는 네 개의 채널이 있는 화상 회의)에 녹음이 되는 다중 녹음에서는 클라우드 STT가 각 채널을 개별적으로 인식 한 다음 사본을 주석 처리하여 실생활 예 처럼 동일하게 합니다.

In multiparticipant recordings where each participant is recorded in a separate channel (e.g., phone call with two channels or video conference with four channels), Cloud Speech-to-Text will recognize each channel separately and then annotate the transcripts so that they follow the same order as in real life.

Pricing

FEATURE	0-60 MINUTES	OVER 60 MINUTES, UP TO 1 MILLION MINUTES
Speech Recognition (all models except video)	Free	$0.006 USD / 15 seconds*
Video Speech Recognition	$0.006	$0.012 USD / 15 seconds*

- 음성 인식(Speech Recognition)의 경우 60분까지는 무료이며 60분 초과 시 15초를 기준으로 6.67원($0.006 USB)이 부과되며 한달에 사용가능한 전체 시간은 약 16666.6667시간(1million minutes) 입니다.

- 각 요청은 15초를 기본단위로 측정합니다.

- 위 테이블의 가격은 개인용 시스템에 적용되며 기업에서 사용되는 가격은 가격 가이드를 참조하여 연락하여야 한다.

- 각 요청은 가장 가까운 15초 단위로 올림된다.

예를 들어, 각각 7초의 오디오를 포함하는 세 가지 개별 요청을 하면 45( 3x15 )동안 오디오 이용료가 첨부됩니다.

15초, 14초를 사용하였을 경우 (15+14=29)반올림하고 30초로 청구합니다.

Thanks for

Google Cloud/speech-to-text

저작자표시 비영리 변경금지 (새창열림)

'Platform > ☁️ Google Cloud' 카테고리의 다른 글

☁️ Google Cloud * 텍스트를 읽어주겠니? for Python (0)	2020.03.06
☁️ Google Cloud * STT 인식률을 개선보자 1/3 (2)	2020.03.05
☁️ Google Cloud * 프로젝트 삭제 좀 하자 ㅠ.ㅠ (0)	2020.03.05

Programmer Leni 🤪

☁️ Google Cloud * Speech to Text 알아보기

'Platform > ☁️ Google Cloud' 카테고리의 다른 글

티스토리툴바