YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

GPT-2 Medium Somali β€” Text Generation Only

Short description: A Somali text generation model based on GPT‑2 Medium (β‰ˆ345M parameters). Optimized for general Somali generation: headlines, news‑style sentences, short stories, and assistant‑like completions. This repository is intended only for generation use cases.

Suggested model ID: FatihJimale/gpt2-medium-somali

πŸ”‘ Key facts

  • Architecture: GPT‑2 Medium (12 layers Γ— 1024 hidden size Γ— 16 heads, ~345M params)
  • Objective: causal language modeling (next‑token prediction)
  • Context length: 1024 tokens (set to your actual value if different)
  • Tokenizer: GPT‑2 BPE (fast)
  • Framework: πŸ€— Transformers
  • Precision: FP16/BF16 compatible at inference

βœ… Intended use

  • Somali text generation (stories, headlines, news‑style sentences, prompts)
  • Assistant‑style completions in Somali

⚠️ Limitations

  • May generate inaccurate, offensive, or biased content.
  • Not suitable for factual QA without verification.
  • Avoid safety‑critical usage.

πŸš€ Quick start (Transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "FatihJimale/gpt2-medium-somali"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

prompt = "qarax xoogan ayaa ka dhacay magaalada"
inputs = tok(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=80,
    do_sample=True,
    temperature=0.9,
    top_p=0.92,
    repetition_penalty=1.08,
)
print(tok.decode(outputs[0], skip_special_tokens=True))

Inference tips

  • If repetition appears, increase repetition_penalty to 1.1–1.2 or lower temperature (0.7–0.9).
  • For more focused generations, reduce max_new_tokens and set top_p around 0.9.
  • Deterministic output: do_sample=False, tune top_k=None, temperature=1.0.

πŸ”§ Model details

  • Training steps: β‰ˆ14,850 (completed at ~epoch 2.00)
  • Epochs: 2
  • Effective batch size: 64
  • Learning rate & schedule: final logged LR β‰ˆ 8.998e-10
  • Optimizer: AdamW (Ξ²1=0.9, Ξ²2=0.999)
  • Weight decay: 0.01
  • Mixed precision: bf16
  • Hardware: AWS ml.g5.24xlarge β€” 4Γ— NVIDIA A10 (24β€―GB each), 96 vCPU, 384β€―GiB RAM; data-parallel across 4 GPUs
  • Context length: 1024 tokens
  • Tokenizer: GPT‑2 BPE (fast) (no custom Somali tokenizer in this version)
  • Train date: 2025‑09‑25
  • Runtime: evaluation runtime β‰ˆ 1652.22 s (~27.5 min); overall training wall‑clock β‰ˆ 1.337 days (β‰ˆ 32 h 05 m 17 s)

Note: Dataset specifics and cleaning steps are intentionally not disclosed here, per the author's request. This card focuses on model size, parameters, and usage.

πŸ“Š Evaluation (please populate)

  • Train loss (last logged): 1.8449 @ step 14850 (~epoch 2.00)
  • Eval/validation loss: 1.78604
  • Perplexity (valid/test): 5.9658 (final recorded value @ 2025‑09‑25 09:06:42)
  • Eval runtime: 1652.22 s, 72.272 samples/s, 9.035 steps/s
  • Human eval notes: TBD (fluency, coherence)
  • Train loss: 1.8449
  • Eval/validation loss: 1.78604
  • Perplexity (valid/test): 5.9658
  • Human eval notes: TBD (fluency, coherence)

πŸ“ Repo layout

config.json
pytorch_model.bin  (or model.safetensors)
merges.txt
vocab.json
tokenizer.json
tokenizer_config.json
special_tokens_map.json (if any)
README.md (this file)

Tusaale (Isticmaal Somali)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "FatihJimale/gpt2-medium-somali"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

prompt = "qarax xoogan ayaa ka dhacay magaalada"
inputs = tok(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=80, do_sample=True, temperature=0.9, top_p=0.92, repetition_penalty=1.08)
print(tok.decode(outputs[0], skip_special_tokens=True))

πŸ“£ Citation

@software{gpt2_medium_somali_2025,
  title        = {GPT-2 Medium Somali},
  author       = {Mohamed Abdirizak Ahmed},
  year         = {2025},
  url          = {https://huggingface.co/FatihJimale/gpt2-medium-somali}
}

πŸ” Safety

This model can produce hallucinations and harmful content. Use with content filters and human review. Do not use for medical, legal, or financial advice.

Downloads last month
9
Safetensors
Model size
0.4B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for FatihJimale/gpt2-medium-somali

Quantizations
1 model