Hamsa API Reference Interactive docs Console

API reference

Arabic speech-to-text and text-to-speech behind a single API key. One key works for both.

Base URL

http://sa-gateway.tryhamsa.com

Authentication

Pass your API key on every request, either header works:

x-api-key: YOUR_KEY
# or
Authorization: Bearer YOUR_KEY

Keys are issued and revoked from the console. Missing, invalid, or revoked keys return 401.

Billing

Usage is metered in audio-seconds — the same unit for both services. STT bills the duration of audio you send; TTS bills the duration of audio generated. Track it in the console.

GET/v1/voices

Lists the 113 voices and dialect codes. Use a voice id as the TTS speaker.

curl http://sa-gateway.tryhamsa.com/v1/voices -H "x-api-key: $HAMSA_KEY"
# -> { "count": 113, "dialects": [...],
#      "data": [ {"id": "Layla", "dialect": "egy"}, ... ] }

dialects

code		code
`msa`	Std Arabic	`irq`	Iraqi
`egy`	Egyptian	`jor`	Jordanian
`ksa`	Saudi	`pls`	Palestinian
`uae`	Emirati	`leb`	Lebanese
`kuw`	Kuwaiti	`syr`	Syrian
`qat`	Qatari	`bah`	Bahraini
`oma`	Omani	`en`	English

POST/v1/stt

Transcribe audio. Send a base64-encoded WAV; get back text and the billed duration. Add the flags below for gender, diarization, or turn detection.

request body

field	type
`audio`	string	base64-encoded WAV required
`lang`	string	`ar`, `en`, or omit to auto-detect
`prompt`	string	optional decoding hint
`gender_detection`	bool	also return speaker gender
`speaker_identification`	bool	also return a speaker embedding (diarization)
`eos_enabled`	bool	also return end-of-speech / turn detection

curl

curl -X POST http://sa-gateway.tryhamsa.com/v1/stt \
  -H "x-api-key: $HAMSA_KEY" \
  -H "content-type: application/json" \
  -d '{"audio":"<base64-wav>","lang":"ar"}'

python

import base64, requests
audio = base64.b64encode(open("clip.wav","rb").read()).decode()
r = requests.post("http://sa-gateway.tryhamsa.com/v1/stt",
    headers={"x-api-key": KEY},
    json={"audio": audio, "lang": "ar"})
print(r.json())   # {"text": "...", "duration": 3.5}

response

{"text": "مرحبا بك في همسة", "duration": 3.5}

# with gender_detection / speaker_identification / eos_enabled:
{"text": "...", "duration": 3.5, "gender": "Female",
 "eos": {"prediction": 1, "probability": 0.82}, "speaker": "<embedding>"}

POST/v1/tts

Synthesize speech. Returns a stream of audio bytes (wav, or 8 kHz µ-law when mulaw is true).

request body

field	type
`text`	string	text to synthesize required — supports `<pause_short>` / `<pause_long>`
`speaker`	string	voice id, case-insensitive (see /v1/voices) required
`language_id`	string	voice language id required
`dialect`	string	dialect code (egy, ksa, uae, …); auto if omitted
`expressiveness`	float	0.0–2.0, default 1.0 (neutral → emotional)
`speed`	float	0.5–2.0, default 1.0
`mulaw`	bool	8 kHz µ-law output, default false (16 kHz pcm16)

curl

curl -X POST http://sa-gateway.tryhamsa.com/v1/tts \
  -H "x-api-key: $HAMSA_KEY" \
  -H "content-type: application/json" \
  -d '{"text":"مرحبا بالعالم","speaker":"layla","language_id":"ar"}' \
  --output out.wav

python (stream to file)

import requests
with requests.post("http://sa-gateway.tryhamsa.com/v1/tts",
        headers={"x-api-key": KEY},
        json={"text":"مرحبا بالعالم","speaker":"layla","language_id":"ar"},
        stream=True) as r:
    with open("out.wav","wb") as f:
        for chunk in r.iter_content(8192):
            f.write(chunk)

Batch transcription — async, with diarization

For full recordings (calls, meetings, media): submit a job, poll for the result. The completed job returns the full transcript, speaker diarization with word-level timestamps, and the detected language. Billed once, on completion, by audio duration.

POST/v1/transcriptions

request body

field	type
`audio_url`	string	public / pre-signed URL of the media file one of these two
`audio_base64`	string	base64-encoded audio
`language`	string	hint, e.g. `ar`; auto-detected if omitted
`return_srt_format`	bool	also return subtitles (SRT)
`srt_options`	object	`max_lines_per_subtitle`, `max_chars_per_line`, …

curl

curl -X POST http://sa-gateway.tryhamsa.com/v1/transcriptions \
  -H "x-api-key: $HAMSA_KEY" -H "content-type: application/json" \
  -d '{"audio_url":"https://.../call.mp3","language":"ar"}'
# -> 202 {"job_id":"28b3ff340f58...","status":"IN_QUEUE"}

GET/v1/transcriptions/{job_id}

Poll until status is COMPLETED (or FAILED / CANCELLED / TIMED_OUT). Cancel with DELETE on the same path.

completed response

{
  "job_id": "28b3ff340f58...", "status": "COMPLETED", "audio_seconds": 0.52,
  "result": {
    "detected_language": "ar",
    "transcription": "مرحبا",
    "diarization": [
      { "speaker": "speaker_0", "start": 0.07, "end": 0.49, "text": "مرحبا",
        "words": [ {"word": "مرحبا", "start": 0.07, "end": 0.49, "score": 0.95, "speaker": "speaker_0"} ] }
    ],
    "usage": { "audio_length": 0.0087 }
  }
}

python (submit + poll)

import time, requests
H = {"x-api-key": KEY}
job = requests.post("http://sa-gateway.tryhamsa.com/v1/transcriptions",
    headers=H, json={"audio_url": MEDIA_URL, "language": "ar"}).json()
while True:
    r = requests.get(f"http://sa-gateway.tryhamsa.com/v1/transcriptions/{job['job_id']}", headers=H).json()
    if r["status"] in ("COMPLETED", "FAILED", "CANCELLED", "TIMED_OUT"): break
    time.sleep(3)
for seg in r["result"]["diarization"]:
    print(seg["speaker"], seg["start"], seg["end"], seg["text"])

Real-time streaming (WebSocket)

For live transcription, open a WebSocket to /v1/stt/stream — use ws:// (wss:// over TLS). Authenticate with ?api_key=YOUR_KEY or the x-api-key header.

connect  http://sa-gateway.tryhamsa.com/v1/stt/stream?api_key=YOUR_KEY      (ws:// or wss://)

-> {"type":"handshake","options":{"gender_detection":true,"eos_enabled":true}}
<- {"type":"handshake_ack","status":"authenticated"}
-> {"type":"media","payload":"<base64 audio>"}           # or raw binary frames
<- {"type":"transcription","data":{"transcription":"...","gender":"Female"},"duration_ms":500}

Handshake options: audio_type (PCM/MULAW), sample_rate, vad_threshold, min_silence_duration_ms, eos_enabled, eos_threshold, gender_detection, speaker_identification. Billed by streamed speech-seconds.

OpenAI-compatible APIs

Already on the OpenAI SDK? Point its base_url at http://sa-gateway.tryhamsa.com/v1, pass your Hamsa key, and it just works. Usage is metered the same per-key audio-seconds way.

field mapping

OpenAI	→ Hamsa model
`file` (multipart)	base64 → `/transcribe` `audio`
`language`	`lang`
`input`	`/tts/stream` `text`
`voice`	`speaker` (your Hamsa speaker id)
`speed`	`speed` (clamped 0.5–2.0)
`dialect`, `expressiveness` (extra)	passed through to the model

response_format — speech: wav (default) · pcm · mulaw (others → 400). Transcription: json (default) · text · verbose_json · srt · vtt. Use model="hamsa"; TTS input ≤ 2000 chars, upload ≤ 25 MB.

python (OpenAI SDK)

from openai import OpenAI
client = OpenAI(api_key="YOUR_HAMSA_KEY", base_url="http://sa-gateway.tryhamsa.com/v1")

# transcription
t = client.audio.transcriptions.create(
    model="whisper-1", language="ar", file=open("clip.wav", "rb"))
print(t.text)

# speech (voice = your Hamsa speaker id)
with client.audio.speech.with_streaming_response.create(
        model="tts-1", voice="layla", input="مرحبا بالعالم") as r:
    r.stream_to_file("out.wav")

curl — POST /v1/audio/transcriptions

curl http://sa-gateway.tryhamsa.com/v1/audio/transcriptions \
  -H "Authorization: Bearer $HAMSA_KEY" \
  -F file=@clip.wav -F model=whisper-1 -F language=ar
# -> {"text": "..."}   (or response_format=text | verbose_json)

curl — POST /v1/audio/speech

curl http://sa-gateway.tryhamsa.com/v1/audio/speech \
  -H "Authorization: Bearer $HAMSA_KEY" \
  -H "content-type: application/json" \
  -d '{"model":"tts-1","voice":"layla","input":"مرحبا بالعالم"}' \
  --output out.wav

Errors

status	meaning
`401`	missing, invalid, or revoked API key
`422`	invalid request body (missing required field)
`4xx / 5xx`	error from the model — body is passed through

Prefer to click around? The interactive docs let you try every endpoint in the browser.