Writer

Enterprise

company

Verified

https://writer.com/

Get_Writer

writer

Activity Feed

AI & ML interests

AGI, LLMs, Knowledge Graph, Palmyra, Domain Specific LLM

Recent Activity

sanderland authored a paper about 2 months ago

Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures

kiranr published a model 2 months ago

Writer/Palmyra-Fin-70B-32K

kiranr published a model 2 months ago

Writer/palmyra-vision

View all activity

Articles

Introducing the Palmyra-mini family: Powerful, lightweight, and ready to reason!

Sep 11

•

tperes

posted an update 3 months ago

Post

228

Introducing Palmyra-mini: Compact AI Models for Efficient Inference

The Palmyra-mini family from Writer includes three lightweight models designed for high performance and efficient inference. These models are ideal for developers looking to integrate AI capabilities without excessive computational overhead.

Model Variants

* palmyra-mini: A base model for general-purpose generative tasks, achieving 52.6% on Big Bench Hard (exact match).

* palmyra-mini-thinking-a: Optimized for complex logical reasoning with a Chain of Thought (CoT) approach, scoring 82.87% on GSM8K (strict match).

* palmyra-mini-thinking-b: Specialized for mathematical reasoning, achieving 92.5% on AMC23.

Technical Details

* All models are based on the Qwen architecture, compatible with popular inference frameworks like vLLM, SGLang, and TGI.

* "Thinking" models utilize CoT training for enhanced reasoning capabilities.

* GGUF and MLX quantizations are available for optimized performance.

For more information, including benchmark methodologies and detailed performance metrics, refer to our blog post: (https://huggingface.co/blog/Writer/announcing-palmyra-mini).

Model repos can be found here:
* Writer/palmyra-mini
* Writer/palmyra-mini-thinking-a
* Writer/palmyra-mini-thinking-b

Also check out a mobile implementation of palmyra-mini on iOS here to see a to see a working example of how inference can be incorporated on-device.(https://github.com/tsperes/palmyra-mini-mobile/)

dmytro-writer

authored a paper 7 months ago

Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published May 30 • 277

wassemgtk

authored a paper 7 months ago

Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published May 30 • 277

melisa

authored a paper 7 months ago

Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published May 30 • 277

wassemgtk

posted an update 9 months ago

Post

3246

I’ve been diving into the iRoPE architecture from Llama 4—a game-changer for long-context models! It interleaves local attention (with RoPE) for short contexts and global attention (with inference-time temp scaling) for long-range reasoning, aiming for infinite context. I’m going to try writing iRoPE—who wants to help?

Code: https://github.com/wassemgtk/iRoPE-try/blob/main/iRoPE.ipynb

1 reply

wassemgtk

posted an update 9 months ago

Post

2136

For fun, a new project: SuperTokenizer! A BPE tokenizer trained on C4 to beat GPT-4. Byte-level, A100-powered, and open-source. Messing around with tokens!
https://github.com/wassemgtk/SuperTokenizer

1 reply

melisa

authored a paper 9 months ago

Expect the Unexpected: FailSafe Long Context QA for Finance

Paper • 2502.06329 • Published Feb 10 • 133

wassemgtk

posted an update 10 months ago

Post

1921

# GESAL: Real-Time Adaptation for LLMs

We’re excited to unveil **Graph-Enhanced Singular Adaptive Learning (GESAL)**, a framework that lets LLMs like meta-llama/Llama-3.2-1B adapt in real time using user feedback. Check out the code and white paper on GitHub!

🔗 **Code**: [https://github.com/writer/AI-Adaptive-Learning-GESAL](https://github.com/writer/AI-Adaptive-Learning-GESAL)

---

## Why GESAL?

Static LLMs struggle to adapt without heavy retraining. GESAL solves this with:
- **SVF**: Adapts weights via \( W' = U (\Sigma \cdot z) V^T \), using few parameters.
- **Graph Memory**: Stores adaptations in nodes for scalability.
- **RL**: Updates via \( J(z) = \mathbb{E}[\log \pi_z(y|x) r] \) based on feedback.

---

## How It Works

Ask "How many R’s in ‘strawberry’?" If it says "2" and you say "no," GESAL learns to say "3" next time, avoiding repeats.

---

## Try It

Built with Hugging Face’s transformers:

pip install transformers torch numpy
python Adaptive_Learning_(GESAL).py

Needs a Hugging Face token for Llama-3.2-1B.

---

## Results

GESAL hits 95% accuracy after 5 feedbacks vs. LoRA’s 70%. It’s efficient (~0.5M params) and scalable.

15 replies

wassemgtk

authored a paper 10 months ago

Expect the Unexpected: FailSafe Long Context QA for Finance

Paper • 2502.06329 • Published Feb 10 • 133

dmytro-writer

authored 2 papers 10 months ago

Comparative Analysis of Retrieval Systems in the Real World

Paper • 2405.02048 • Published May 3, 2024

Expect the Unexpected: FailSafe Long Context QA for Finance

Paper • 2502.06329 • Published Feb 10 • 133

melisa

posted an update over 1 year ago

Post

3269

🔥 Introducing "Writing in the Margins (WiM)" - better inference pattern for long context LLMs that solves the Lost-in-the-Middle problem 🔥

Paper page: Writing in the Margins: Better Inference Pattern for Long Context Retrieval (2408.14906)

TL;DR
Make your model write "margin notes" as you chunk prefill the KV cache. Then ask it reread all notes before it speaks up.
Works with humans, works with AI 🤖

WiM leverages the chunked prefill of the key-value cache, which concurrently generates query-based extractive summaries at each step of the prefill that are subsequently reintegrated at the end of the computation. We term these intermediate outputs “margins”, drawing inspiration from the practice of making margin notes for improved comprehension of long contexts in human reading. We show that this technique, which adds only minimal additional computation, significantly improves LLMs long context reasoning capabilities.

Think: Every chunk has a chance to be attended to/ be at the end of the context at least once. 🎉

📊 Results:
- An average accuracy boost of 7.5% in multi-hop reasoning tasks like HotpotQA and MultiHop-RAG.
- Even a 30% increase in F1-score for summarisation-like tasks (CWE).

Plus, WiM fits seamlessly into interactive applications (think: progress bar!). It can provide real-time progress updates during data retrieval and integration, making it user-friendly and transparent - a stark contrast to feeding 1mln tokens to an LLMs and waiting 6 min for the first token. 🤯

👩‍💻🧑‍💻 Check it out and contribute to our open-source project here: https://github.com/writer/writing-in-the-margins

🧠 More about chunked prefill: https://docs.vllm.ai/en/latest/models/performance.html#chunked-prefill

2 replies

axel-writer

authored a paper over 1 year ago

Writing in the Margins: Better Inference Pattern for Long Context Retrieval

Paper • 2408.14906 • Published Aug 27, 2024 • 144

melisa

authored a paper over 1 year ago

Writing in the Margins: Better Inference Pattern for Long Context Retrieval

Paper • 2408.14906 • Published Aug 27, 2024 • 144

wassemgtk

authored a paper over 1 year ago

Writing in the Margins: Better Inference Pattern for Long Context Retrieval

Paper • 2408.14906 • Published Aug 27, 2024 • 144

melisa

authored 2 papers over 1 year ago

OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web

Paper • 2402.17553 • Published Feb 27, 2024 • 25

Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning

Paper • 2307.03692 • Published Jul 5, 2023 • 26

wassemgtk

authored a paper over 1 year ago

Comparative Analysis of Retrieval Systems in the Real World

Paper • 2405.02048 • Published May 3, 2024

wassemgtk

posted an update over 1 year ago

Post

3656

Writer team had the opportunity to run an eval for Mixtral-8x22b, results were interesting.

| ---------------------------- |
| #mmlu 77.26 |
| ---------------------------- |
| #hellaswag 88.81 |
| ---------------------------- |
| #truthfulqa 52.05 |
| ---------------------------- |
| #arc_challenge 70.31 |
| ---------------------------- |
| #winogrande 84.93 |
| ---------------------------- |
| #gsm8k 76.65 |
| ---------------------------- |

2 replies

wassemgtk

posted an update almost 2 years ago

Post

We are thrilled to announce the release of the OmniACT dataset! This revolutionary dataset and benchmark focuses on pushing the limits of how virtual agents can facilitate the automation of our computer tasks. Imagine less clicking and typing, and more observation as your computer takes care of tasks such as organizing schedules or arranging travel arrangements on its own.

Check it out ➡️ [OmniACT Dataset on Hugging Face]( Writer/omniact)

For a deep dive, here’s the paper: [OmniACT Paper](https://arxiv.org/abs/2402.17553)

AI & ML interests

Recent Activity

Articles

Introducing the Palmyra-mini family: Powerful, lightweight, and ready to reason!

Team members 176

Writer's activity