Projects
Anka: A Domain-Specific Language for Reliable LLM Code Generation
Published on arXiv (arXiv:2512.23214) in December 2025, Anka is a solo undergraduate research project in which I designed a novel domain-specific language (DSL) for data transformation pipelines, purpose-built to improve the reliability of code generated by Large Language Models. The core insight behind Anka is that the flexibility of general-purpose languages like Python, while powerful for human programmers, introduces ambiguity that causes LLMs to make systematic errors on complex, multi-step tasks. Anka addresses this by enforcing explicit, constrained syntax where each operation has exactly one canonical form, reducing the room for error during code generation. Despite having zero prior training exposure to the language, Claude 3.5 Haiku achieved 99.9% parse success and 95.8% overall task accuracy across a benchmark suite of 100 data transformation tasks. Most notably, Anka demonstrated a 40 percentage point accuracy advantage over Python on multi-step pipeline tasks (100% vs. 60%), a result validated across both Claude 3.5 Haiku and GPT-4o-mini. This project demonstrates that domain-specific languages purposefully designed for LLM generation can outperform general-purpose languages on which the model has extensive training. I released the complete language implementation, benchmark suite, and evaluation framework as open-source tools to support further research in this emerging area of AI-assisted programming.
Paper: https://arxiv.org/abs/2512.23214 Code: https://github.com/BleBlo/Anka
MANSHUR: A Prediction Market Platform Powered by Collective Intelligence
MANSHUR (manshur.ai) is a prediction market platform I am building as the founder through my venture, 6265labs. Unlike conventional prediction market platforms that only display probabilities, MANSHUR captures the reasoning behind every trade, creating a dataset of human forecasting intelligence. The platform features over 100 active markets spanning economics, politics, sports, crypto, tech, and world events, with a unique "Intelligence Capture" system that requires traders to share why they are making a prediction through quick tags and detailed theses. MANSHUR introduces Forecaster IQ, a reputation system that ranks users based on reasoning quality rather than just returns, rewarding how people think rather than simply their accuracy. The platform is designed from the ground up for UAE GCGRA regulatory compliance, positioning it as the first prediction intelligence platform in the MENA region. Built with a full-stack architecture including a real-time market engine, intelligence rankings leaderboard, an ideas feed, and a public API, MANSHUR represents my work at the intersection of AI, quantitative finance, and platform design. The project draws directly on my experience at ADIA and my research in multi-agent AI trading systems, bringing together my technical skills in Python, API development, and data engineering with my understanding of financial markets and regulatory frameworks in the Gulf region.
Website: https://manshur.ai