Arabic speech-to-text and text-to-speech behind a single API key. One key works for both.
http://sa-gateway.tryhamsa.com
Pass your API key on every request, either header works:
x-api-key: YOUR_KEY
# or
Authorization: Bearer YOUR_KEY
Keys are issued and revoked from the console. Missing, invalid, or revoked keys return 401.
Usage is metered in audio-seconds — the same unit for both services. STT bills the duration of audio you send; TTS bills the duration of audio generated. Track it in the console.
Lists the 113 voices and dialect codes. Use a voice id as the TTS speaker.
curl http://sa-gateway.tryhamsa.com/v1/voices -H "x-api-key: $HAMSA_KEY"
# -> { "count": 113, "dialects": [...],
# "data": [ {"id": "Layla", "dialect": "egy"}, ... ] }
| code | code | ||
|---|---|---|---|
msa | Std Arabic | irq | Iraqi |
egy | Egyptian | jor | Jordanian |
ksa | Saudi | pls | Palestinian |
uae | Emirati | leb | Lebanese |
kuw | Kuwaiti | syr | Syrian |
qat | Qatari | bah | Bahraini |
oma | Omani | en | English |
Transcribe audio. Send a base64-encoded WAV; get back text and the billed duration. Add the flags below for gender, diarization, or turn detection.
| field | type | |
|---|---|---|
audio | string | base64-encoded WAV required |
lang | string | ar, en, or omit to auto-detect |
prompt | string | optional decoding hint |
gender_detection | bool | also return speaker gender |
speaker_identification | bool | also return a speaker embedding (diarization) |
eos_enabled | bool | also return end-of-speech / turn detection |
curl -X POST http://sa-gateway.tryhamsa.com/v1/stt \
-H "x-api-key: $HAMSA_KEY" \
-H "content-type: application/json" \
-d '{"audio":"<base64-wav>","lang":"ar"}'
import base64, requests
audio = base64.b64encode(open("clip.wav","rb").read()).decode()
r = requests.post("http://sa-gateway.tryhamsa.com/v1/stt",
headers={"x-api-key": KEY},
json={"audio": audio, "lang": "ar"})
print(r.json()) # {"text": "...", "duration": 3.5}
{"text": "مرحبا بك في همسة", "duration": 3.5}
# with gender_detection / speaker_identification / eos_enabled:
{"text": "...", "duration": 3.5, "gender": "Female",
"eos": {"prediction": 1, "probability": 0.82}, "speaker": "<embedding>"}
Synthesize speech. Returns a stream of audio bytes (wav, or 8 kHz µ-law when mulaw is true).
| field | type | |
|---|---|---|
text | string | text to synthesize required — supports <pause_short> / <pause_long> |
speaker | string | voice id, case-insensitive (see /v1/voices) required |
language_id | string | voice language id required |
dialect | string | dialect code (egy, ksa, uae, …); auto if omitted |
expressiveness | float | 0.0–2.0, default 1.0 (neutral → emotional) |
speed | float | 0.5–2.0, default 1.0 |
mulaw | bool | 8 kHz µ-law output, default false (16 kHz pcm16) |
curl -X POST http://sa-gateway.tryhamsa.com/v1/tts \
-H "x-api-key: $HAMSA_KEY" \
-H "content-type: application/json" \
-d '{"text":"مرحبا بالعالم","speaker":"layla","language_id":"ar"}' \
--output out.wav
import requests
with requests.post("http://sa-gateway.tryhamsa.com/v1/tts",
headers={"x-api-key": KEY},
json={"text":"مرحبا بالعالم","speaker":"layla","language_id":"ar"},
stream=True) as r:
with open("out.wav","wb") as f:
for chunk in r.iter_content(8192):
f.write(chunk)
For full recordings (calls, meetings, media): submit a job, poll for the result. The completed job returns the full transcript, speaker diarization with word-level timestamps, and the detected language. Billed once, on completion, by audio duration.
| field | type | |
|---|---|---|
audio_url | string | public / pre-signed URL of the media file one of these two |
audio_base64 | string | base64-encoded audio |
language | string | hint, e.g. ar; auto-detected if omitted |
return_srt_format | bool | also return subtitles (SRT) |
srt_options | object | max_lines_per_subtitle, max_chars_per_line, … |
curl -X POST http://sa-gateway.tryhamsa.com/v1/transcriptions \
-H "x-api-key: $HAMSA_KEY" -H "content-type: application/json" \
-d '{"audio_url":"https://.../call.mp3","language":"ar"}'
# -> 202 {"job_id":"28b3ff340f58...","status":"IN_QUEUE"}
Poll until status is COMPLETED (or FAILED / CANCELLED / TIMED_OUT). Cancel with DELETE on the same path.
{
"job_id": "28b3ff340f58...", "status": "COMPLETED", "audio_seconds": 0.52,
"result": {
"detected_language": "ar",
"transcription": "مرحبا",
"diarization": [
{ "speaker": "speaker_0", "start": 0.07, "end": 0.49, "text": "مرحبا",
"words": [ {"word": "مرحبا", "start": 0.07, "end": 0.49, "score": 0.95, "speaker": "speaker_0"} ] }
],
"usage": { "audio_length": 0.0087 }
}
}
import time, requests
H = {"x-api-key": KEY}
job = requests.post("http://sa-gateway.tryhamsa.com/v1/transcriptions",
headers=H, json={"audio_url": MEDIA_URL, "language": "ar"}).json()
while True:
r = requests.get(f"http://sa-gateway.tryhamsa.com/v1/transcriptions/{job['job_id']}", headers=H).json()
if r["status"] in ("COMPLETED", "FAILED", "CANCELLED", "TIMED_OUT"): break
time.sleep(3)
for seg in r["result"]["diarization"]:
print(seg["speaker"], seg["start"], seg["end"], seg["text"])
For live transcription, open a WebSocket to /v1/stt/stream — use ws:// (wss:// over TLS). Authenticate with ?api_key=YOUR_KEY or the x-api-key header.
connect http://sa-gateway.tryhamsa.com/v1/stt/stream?api_key=YOUR_KEY (ws:// or wss://)
-> {"type":"handshake","options":{"gender_detection":true,"eos_enabled":true}}
<- {"type":"handshake_ack","status":"authenticated"}
-> {"type":"media","payload":"<base64 audio>"} # or raw binary frames
<- {"type":"transcription","data":{"transcription":"...","gender":"Female"},"duration_ms":500}
Handshake options: audio_type (PCM/MULAW), sample_rate, vad_threshold, min_silence_duration_ms, eos_enabled, eos_threshold, gender_detection, speaker_identification. Billed by streamed speech-seconds.
Already on the OpenAI SDK? Point its base_url at http://sa-gateway.tryhamsa.com/v1, pass your Hamsa key, and it just works. Usage is metered the same per-key audio-seconds way.
| OpenAI | → Hamsa model |
|---|---|
file (multipart) | base64 → /transcribe audio |
language | lang |
input | /tts/stream text |
voice | speaker (your Hamsa speaker id) |
speed | speed (clamped 0.5–2.0) |
dialect, expressiveness (extra) | passed through to the model |
response_format — speech: wav (default) · pcm · mulaw (others → 400). Transcription: json (default) · text · verbose_json · srt · vtt. Use model="hamsa"; TTS input ≤ 2000 chars, upload ≤ 25 MB.
from openai import OpenAI
client = OpenAI(api_key="YOUR_HAMSA_KEY", base_url="http://sa-gateway.tryhamsa.com/v1")
# transcription
t = client.audio.transcriptions.create(
model="whisper-1", language="ar", file=open("clip.wav", "rb"))
print(t.text)
# speech (voice = your Hamsa speaker id)
with client.audio.speech.with_streaming_response.create(
model="tts-1", voice="layla", input="مرحبا بالعالم") as r:
r.stream_to_file("out.wav")
curl http://sa-gateway.tryhamsa.com/v1/audio/transcriptions \
-H "Authorization: Bearer $HAMSA_KEY" \
-F file=@clip.wav -F model=whisper-1 -F language=ar
# -> {"text": "..."} (or response_format=text | verbose_json)
curl http://sa-gateway.tryhamsa.com/v1/audio/speech \
-H "Authorization: Bearer $HAMSA_KEY" \
-H "content-type: application/json" \
-d '{"model":"tts-1","voice":"layla","input":"مرحبا بالعالم"}' \
--output out.wav
| status | meaning |
|---|---|
401 | missing, invalid, or revoked API key |
422 | invalid request body (missing required field) |
4xx / 5xx | error from the model — body is passed through |
Prefer to click around? The interactive docs let you try every endpoint in the browser.