API reference

Arabic speech-to-text and text-to-speech behind a single API key. One key works for both.

Base URL

http://sa-gateway.tryhamsa.com

Authentication

Pass your API key on every request, either header works:

x-api-key: YOUR_KEY
# or
Authorization: Bearer YOUR_KEY

Keys are issued and revoked from the console. Missing, invalid, or revoked keys return 401.

Billing

Usage is metered in audio-seconds — the same unit for both services. STT bills the duration of audio you send; TTS bills the duration of audio generated. Track it in the console.

GET/v1/voices

Lists the 113 voices and dialect codes. Use a voice id as the TTS speaker.

curl http://sa-gateway.tryhamsa.com/v1/voices -H "x-api-key: $HAMSA_KEY"
# -> { "count": 113, "dialects": [...],
#      "data": [ {"id": "Layla", "dialect": "egy"}, ... ] }
dialects
codecode
msaStd ArabicirqIraqi
egyEgyptianjorJordanian
ksaSaudiplsPalestinian
uaeEmiratilebLebanese
kuwKuwaitisyrSyrian
qatQataribahBahraini
omaOmanienEnglish

POST/v1/stt

Transcribe audio. Send a base64-encoded WAV; get back text and the billed duration. Add the flags below for gender, diarization, or turn detection.

request body
fieldtype
audiostringbase64-encoded WAV required
langstringar, en, or omit to auto-detect
promptstringoptional decoding hint
gender_detectionboolalso return speaker gender
speaker_identificationboolalso return a speaker embedding (diarization)
eos_enabledboolalso return end-of-speech / turn detection
curl
curl -X POST http://sa-gateway.tryhamsa.com/v1/stt \
  -H "x-api-key: $HAMSA_KEY" \
  -H "content-type: application/json" \
  -d '{"audio":"<base64-wav>","lang":"ar"}'
python
import base64, requests
audio = base64.b64encode(open("clip.wav","rb").read()).decode()
r = requests.post("http://sa-gateway.tryhamsa.com/v1/stt",
    headers={"x-api-key": KEY},
    json={"audio": audio, "lang": "ar"})
print(r.json())   # {"text": "...", "duration": 3.5}
response
{"text": "مرحبا بك في همسة", "duration": 3.5}

# with gender_detection / speaker_identification / eos_enabled:
{"text": "...", "duration": 3.5, "gender": "Female",
 "eos": {"prediction": 1, "probability": 0.82}, "speaker": "<embedding>"}

POST/v1/tts

Synthesize speech. Returns a stream of audio bytes (wav, or 8 kHz µ-law when mulaw is true).

request body
fieldtype
textstringtext to synthesize required — supports <pause_short> / <pause_long>
speakerstringvoice id, case-insensitive (see /v1/voices) required
language_idstringvoice language id required
dialectstringdialect code (egy, ksa, uae, …); auto if omitted
expressivenessfloat0.0–2.0, default 1.0 (neutral → emotional)
speedfloat0.5–2.0, default 1.0
mulawbool8 kHz µ-law output, default false (16 kHz pcm16)
curl
curl -X POST http://sa-gateway.tryhamsa.com/v1/tts \
  -H "x-api-key: $HAMSA_KEY" \
  -H "content-type: application/json" \
  -d '{"text":"مرحبا بالعالم","speaker":"layla","language_id":"ar"}' \
  --output out.wav
python (stream to file)
import requests
with requests.post("http://sa-gateway.tryhamsa.com/v1/tts",
        headers={"x-api-key": KEY},
        json={"text":"مرحبا بالعالم","speaker":"layla","language_id":"ar"},
        stream=True) as r:
    with open("out.wav","wb") as f:
        for chunk in r.iter_content(8192):
            f.write(chunk)

Batch transcription — async, with diarization

For full recordings (calls, meetings, media): submit a job, poll for the result. The completed job returns the full transcript, speaker diarization with word-level timestamps, and the detected language. Billed once, on completion, by audio duration.

POST/v1/transcriptions

request body
fieldtype
audio_urlstringpublic / pre-signed URL of the media file one of these two
audio_base64stringbase64-encoded audio
languagestringhint, e.g. ar; auto-detected if omitted
return_srt_formatboolalso return subtitles (SRT)
srt_optionsobjectmax_lines_per_subtitle, max_chars_per_line, …
curl
curl -X POST http://sa-gateway.tryhamsa.com/v1/transcriptions \
  -H "x-api-key: $HAMSA_KEY" -H "content-type: application/json" \
  -d '{"audio_url":"https://.../call.mp3","language":"ar"}'
# -> 202 {"job_id":"28b3ff340f58...","status":"IN_QUEUE"}

GET/v1/transcriptions/{job_id}

Poll until status is COMPLETED (or FAILED / CANCELLED / TIMED_OUT). Cancel with DELETE on the same path.

completed response
{
  "job_id": "28b3ff340f58...", "status": "COMPLETED", "audio_seconds": 0.52,
  "result": {
    "detected_language": "ar",
    "transcription": "مرحبا",
    "diarization": [
      { "speaker": "speaker_0", "start": 0.07, "end": 0.49, "text": "مرحبا",
        "words": [ {"word": "مرحبا", "start": 0.07, "end": 0.49, "score": 0.95, "speaker": "speaker_0"} ] }
    ],
    "usage": { "audio_length": 0.0087 }
  }
}
python (submit + poll)
import time, requests
H = {"x-api-key": KEY}
job = requests.post("http://sa-gateway.tryhamsa.com/v1/transcriptions",
    headers=H, json={"audio_url": MEDIA_URL, "language": "ar"}).json()
while True:
    r = requests.get(f"http://sa-gateway.tryhamsa.com/v1/transcriptions/{job['job_id']}", headers=H).json()
    if r["status"] in ("COMPLETED", "FAILED", "CANCELLED", "TIMED_OUT"): break
    time.sleep(3)
for seg in r["result"]["diarization"]:
    print(seg["speaker"], seg["start"], seg["end"], seg["text"])

Real-time streaming (WebSocket)

For live transcription, open a WebSocket to /v1/stt/stream — use ws:// (wss:// over TLS). Authenticate with ?api_key=YOUR_KEY or the x-api-key header.

connect  http://sa-gateway.tryhamsa.com/v1/stt/stream?api_key=YOUR_KEY      (ws:// or wss://)

-> {"type":"handshake","options":{"gender_detection":true,"eos_enabled":true}}
<- {"type":"handshake_ack","status":"authenticated"}
-> {"type":"media","payload":"<base64 audio>"}           # or raw binary frames
<- {"type":"transcription","data":{"transcription":"...","gender":"Female"},"duration_ms":500}

Handshake options: audio_type (PCM/MULAW), sample_rate, vad_threshold, min_silence_duration_ms, eos_enabled, eos_threshold, gender_detection, speaker_identification. Billed by streamed speech-seconds.

OpenAI-compatible APIs

Already on the OpenAI SDK? Point its base_url at http://sa-gateway.tryhamsa.com/v1, pass your Hamsa key, and it just works. Usage is metered the same per-key audio-seconds way.

field mapping
OpenAI→ Hamsa model
file (multipart)base64 → /transcribe audio
languagelang
input/tts/stream text
voicespeaker (your Hamsa speaker id)
speedspeed (clamped 0.5–2.0)
dialect, expressiveness (extra)passed through to the model

response_format — speech: wav (default) · pcm · mulaw (others → 400). Transcription: json (default) · text · verbose_json · srt · vtt. Use model="hamsa"; TTS input ≤ 2000 chars, upload ≤ 25 MB.

python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(api_key="YOUR_HAMSA_KEY", base_url="http://sa-gateway.tryhamsa.com/v1")

# transcription
t = client.audio.transcriptions.create(
    model="whisper-1", language="ar", file=open("clip.wav", "rb"))
print(t.text)

# speech (voice = your Hamsa speaker id)
with client.audio.speech.with_streaming_response.create(
        model="tts-1", voice="layla", input="مرحبا بالعالم") as r:
    r.stream_to_file("out.wav")
curl — POST /v1/audio/transcriptions
curl http://sa-gateway.tryhamsa.com/v1/audio/transcriptions \
  -H "Authorization: Bearer $HAMSA_KEY" \
  -F file=@clip.wav -F model=whisper-1 -F language=ar
# -> {"text": "..."}   (or response_format=text | verbose_json)
curl — POST /v1/audio/speech
curl http://sa-gateway.tryhamsa.com/v1/audio/speech \
  -H "Authorization: Bearer $HAMSA_KEY" \
  -H "content-type: application/json" \
  -d '{"model":"tts-1","voice":"layla","input":"مرحبا بالعالم"}' \
  --output out.wav

Errors

statusmeaning
401missing, invalid, or revoked API key
422invalid request body (missing required field)
4xx / 5xxerror from the model — body is passed through

Prefer to click around? The interactive docs let you try every endpoint in the browser.