μ΄ κΈμμ μ€λͺ ν λ΄μ©μ μμ λ Speech To Text λ²μ 1.9X.XX, v1p1beta1 workspaceλ₯Ό μ¬μ©νκ³ μμ΅λλ€.
λ²μ λ³λ‘ ν¬ν¨νκ³ μλ κΈ°λ₯μ΄ μμ΄νλ μ€ννμ€ λ κΌ μ°Έκ³ νμΈμ!
λ°λ‘ λμ λ΄μ§ μμλ $300μΌ λ¬΄λ£ ν¬λ λ§μ 12κ°μκ° μ¬μ©ν μ μμΌλ©°, λ¬΄λ£ νκ°ν μ’ λ£ ν μλ μ²κ΅¬λμ§ μλλ€κ³ νλ€.
μ μ©μΉ΄λλ₯Ό λ±λ‘νκ² λμ΄ μλλ° μ΄λ μλ κ°μ μ λ°©μ§νκΈ° μ΄ν΄μμ΄λ©° μ¬μ©μκ° μ λ£ κ³μ μΌλ‘ μ§μ μ κ·Έλ μ΄λνμ§ μλ ν μκΈμ΄ μ²κ΅¬λμ§ μλλ€κ³ λμμλ€.
ν΄λΌμ°λ μμ± ν μ€νΈ Cloud Speech-to-Text
μμ± ν
μ€νΈ(STT) λ³νμ λ¨Έμ λ¬λ(κΈ°κ³νμ΅)μ μ¬μ©νλ©° 짧거λ κΈ΄ νμμ μ€λμ€λ₯Ό μ¬μ©ν μ μλ€.
Speech-to-text conversion powered by machine learning and available for short-form or long-form audio.
STTλ₯Ό μν λ¬Έμ보기 View Documentation for this product.
κ°λ ₯ν μμ± μΈμ Powerful speech recognition
κ΅¬κΈ ν΄λΌμ°λ STTλ κ°λ ₯ν μ κ²½ λ€νΈμν¬ λͺ¨λΈμ μ¬μ©νκΈ° μ¬μ΄ APIμ μ μ©νμ¬ κ°λ°μκ° μ€λμ€λ₯Ό ν μ€νΈλ‘ λ³ν ν μ μκ² νλ€. APIλ μ μΈκ³ μ¬μ©μλ€μ μ§μνκΈ° μν΄ 120κ°μ λ€μν μΈμ΄μ λ³νμ μΈμνλ€. μ½μΌν°μμ μ€λμ€λ₯Ό λ Ήμνλ κ² μ΄μμ μμ±μΌλ‘ λͺ λ Ήνκ³ μ μ΄ ν μ μλ€. APIλ κ΅¬κΈ λ¨Έμ λ¬λ κΈ°μ μ μ΄μ©ν΄ μ€μκ° μ€νΈλ¦¬λ°κ³Ό μ¬μ μ λ Ήμ λ μ€λμ€λ₯Ό μ²λ¦¬ν μ μλ€.
Google Cloud Speech-to-Text enables developers to convert audio to text by applying powerful neural network models in an easy-to-use API. The API recognizes 120 languages and variants to support your global user base. You can enable voice command-and-control, transcribe audio from call centers, and more. It can process real-time streaming or prerecorded audio, using Google’s machine learning technology.
ν΄λΌμ°λ μμ±-ν μ€νΈ λ³ν κΈ°λ₯ Cloud Speech-to-Text features
1. μλ μμ± μΈμ Automatic Speech Recognition
μλ μμ± μΈμ(ARS)μ μμ± λ Ήμ λ° νμ΅κ³Ό κ°μ μμ©νλ‘κ·Έλ¨μ κ°λ ₯ν μ κ²½ λ€νΈμνΉμ μ 곡νλ€.
Automatic Speech Recognition (ASR) powered by deep learning neural networking to power your applications like voice search or speech transcription.
2. μμ μ΅μ Noise Robustness
λ€μν νκ²½μμ μΆκ°λ‘ μμμ μ κ±°λ₯Ό μꡬνμ§ μκ³ μμμ΄ μλ μ€λμ€ νμΌμ μΈμνλ€.
Handles noisy audio from many environments without requiring additional noise cancellation.
3. κ΄λ²μ μ΄ν Global Vocabulary
120κ°μ μΈμ΄μ νλΆν μ΄νλ₯Ό μΈμνλ€.
Recognizes 120 languages and variants with an extensive vocabulary.
4. λΆμ μ ν μ½ν μΈ νν°λ§ Inappropriate Content Filtering
μΌλΆ μΈμ΄μ ν μ€νΈ κ²°κ³Όμμ λΆμ μ ν μ½ν μΈ λ₯Ό νν°λ§ νλ€.
Filter inappropriate content in text results for some languages.
5. ꡬ문 ννΈ Phrase Hints
μμ± μΈμμ λ§νκΈ° μ¬μ΄ λ¨μ΄μ ꡬλ₯Ό μ 곡ν¨μΌλ‘μ¨ νΉλ³ν λ§₯λ½μ μν΄ μ¬μ©μν λ μ μλ€.
μ΄κ²μ νΉν μ¬μ©μκ° μΆκ°ν λ¨μ΄λ μ΄λ¦μ μ΄νμ μμ± μ μ΄ μ¬μ© μ¬λ‘μ μΆκ°ν λ μ μ©νλ€.
Speech recognition can be customized to a specific context by providing a set of words and phrases that are likely to be spoken. This is especially useful for adding custom words and names to the vocabulary and in voice-control use cases.
6. μ€μκ° μ€νΈλ¦¬λ°κ³Ό μ¬μ λ Ήμ λ μ€λμ€ μ§μ Real-time Streaming or Prerecorded Audio Support
μ€λμ€ μ λ ₯μ μμ©νλ‘κ·Έλ¨μ λ§μ΄ν¬μμ μ€νΈλ¦¬λ° νκ±°λ μ¬μ λ Ήμλ μ€λμ€ νμΌμ 보λΈλ€ (μΈλΌμΈ λλ κ΅¬κΈ ν΄λΌμ°λ μ€ν 리μ§λ₯Ό ν΅ν΄).
FLAC, AMR, PCMU, and Linear-16λ±κ³Ό κ°μ λ€μν μΈμ½λλ€μ μ§μνλ€.
Audio input can be streamed from an application’s microphone or sent from a prerecorded audio file (inline or through Google Cloud Storage). Multiple audio encodings are supported, including FLAC, AMR, PCMU, and Linear-16.
7. μλ ꡬλμ Automatic Punctuation BETA
κΈ°κ³ νμ΅μ ν΅ν΄ μ νν ꡬλμ μ(μ½€λ§, λ¬Όμν λ° λ§μΉ¨νμ κ°μ) μμ±νλ€.
Accurately punctuates transcriptions (e.g., commas, question marks, and periods) with machine learning.
9. μ€νΌμ»€ λμλΌμ΄μ μ΄μ Speaker Diarization BETA
λκ° λ¬΄μμ λ§νλμ§ μλκ² - λνμμ κ°κ°μ΄ λ§ν κ²μ μλμΌλ‘ μΈμν μ μμ΅λλ€.
Know who said what - you can now get automatic predictions about which of the speakers in a conversation spoke each utterance.
10. μλ μΈμ΄ κ°μ§ Auto-Detect Language BETA
λ€κ΅μ΄ μλ리μ€λ₯Ό μ§μν΄μΌ ν κ²½μ°, 2~4κ°μ μΈμ΄ μ½λλ₯Ό λͺ μν μ μλ€. κ΅¬κΈ STTλ μ¬λ°λ₯Έ μΈμ΄λ₯Ό μλ³νκ³ μ€ν¬λ¦½νΈλ₯Ό μ 곡νλ€.
When you need to support multilingual scenarios, you can now specify two to four language codes and Cloud Speech-to-Text will identify the correct language spoken and provide the transcript.
11. λ©ν° μ±λ μΈμ Multichannel Recognition BETA
κ° μ°Έκ°μκ° λ³λμ μ±λ(μ: λ κ°μ μ±λμ΄ μλ μ ν ν΅ν λλ λ€ κ°μ μ±λμ΄ μλ νμ νμ)μ λ Ήμμ΄ λλ λ€μ€ λ Ήμμμλ ν΄λΌμ°λ STTκ° κ° μ±λμ κ°λ³μ μΌλ‘ μΈμ ν λ€μ μ¬λ³Έμ μ£Όμ μ²λ¦¬νμ¬ μ€μν μ μ²λΌ λμΌνκ² ν©λλ€.
In multiparticipant recordings where each participant is recorded in a separate channel (e.g., phone call with two channels or video conference with four channels), Cloud Speech-to-Text will recognize each channel separately and then annotate the transcripts so that they follow the same order as in real life.
Pricing
FEATURE | 0-60 MINUTES | OVER 60 MINUTES, UP TO 1 MILLION MINUTES |
---|---|---|
Speech Recognition (all models except video) | Free | $0.006 USD / 15 seconds* |
Video Speech Recognition | $0.006 | $0.012 USD / 15 seconds* |
- μμ± μΈμ(Speech Recognition)μ κ²½μ° 60λΆκΉμ§λ 무λ£μ΄λ©° 60λΆ μ΄κ³Ό μ 15μ΄λ₯Ό κΈ°μ€μΌλ‘ 6.67μ($0.006 USB)μ΄ λΆκ³Όλλ©° νλ¬μ μ¬μ©κ°λ₯ν μ 체 μκ°μ μ½ 16666.6667μκ°(1million minutes) μ λλ€.
- κ° μμ²μ 15μ΄λ₯Ό κΈ°λ³Έλ¨μλ‘ μΈ‘μ ν©λλ€.
- μ ν μ΄λΈμ κ°κ²©μ κ°μΈμ© μμ€ν μ μ μ©λλ©° κΈ°μ μμ μ¬μ©λλ κ°κ²©μ κ°κ²© κ°μ΄λλ₯Ό μ°Έμ‘°νμ¬ μ°λ½νμ¬μΌ νλ€.
- κ° μμ²μ κ°μ₯ κ°κΉμ΄ 15μ΄ λ¨μλ‘ μ¬λ¦Όλλ€.
μλ₯Ό λ€μ΄, κ°κ° 7μ΄μ μ€λμ€λ₯Ό ν¬ν¨νλ μΈ κ°μ§ κ°λ³ μμ²μ νλ©΄ 45( 3x15 )λμ μ€λμ€ μ΄μ©λ£κ° 첨λΆλ©λλ€.
15μ΄, 14μ΄λ₯Ό μ¬μ©νμμ κ²½μ° (15+14=29)λ°μ¬λ¦Όνκ³ 30μ΄λ‘ μ²κ΅¬ν©λλ€.
Thanks for
'Platform > βοΈ Google Cloud' μΉ΄ν κ³ λ¦¬μ λ€λ₯Έ κΈ
βοΈ Google Cloud * ν μ€νΈλ₯Ό μ½μ΄μ£Όκ² λ? for Python (0) | 2020.03.06 |
---|---|
βοΈ Google Cloud * STT μΈμλ₯ μ κ°μ 보μ 1/3 (2) | 2020.03.05 |
βοΈ Google Cloud * νλ‘μ νΈ μμ μ’ νμ γ .γ (0) | 2020.03.05 |