• Even the most advanced AI

    From Dumas Walker@DIGDIST/CAPCITY2 to All on Monday, March 23, 2026 08:23:12
    Even the most advanced AI models fail more often than you think on structured outputs raising doubts about the effectiveness of coding assistants

    Date:
    Sun, 22 Mar 2026 17:05:00 +0000

    Description:
    Large language models struggle with structured outputs, achieving only 75% accuracy on complex tasks, leaving reliability concerns for developers.

    Report finds AI coding assistants regularly fail one in four structured-output tasks
    Even advanced proprietary models only reach approximately 75% accuracy
    Open source AI models perform worse, averaging closer to 65% reliability

    The promise of artificial intelligence as a
    tireless coding assistant has encountered a significant roadblock after new research claimed such tools can experience a range of issues.

    A recent study from the University of Waterloo found AI struggles with
    software development, with even the most advanced models failing on one in
    four structured-output tasks. The research evaluated 11 large language models across 18 different structured formats and 44 tasks to test how well the systems could follow predefined rules, finding a clear disparity between performance on text-based tasks and outputs involving multimedia or complex structures. Benchmarking reveals a troubling reliability gap --
    While text-related tasks were generally handled with moderate success, tasks requiring image, video, or website generation proved far more problematic.

    Accuracy in these areas dropped sharply, raising questions about how these AI tools can be integrated safely into professional workflows.

    With this kind of study, we want to measure not only the syntax of the code that is, whether its following the set rules but also whether the outputs produced for various tasks were accurate, said Dongfu Jiang, a PhD student
    and co-first author of the study.

    Structured outputs, designed to impose format consistency through JSON, XML,
    or Markdown, were intended to make AI responses more reliable for developers. AI companies, including OpenAI, Google , and Anthropic, introduced structured outputs to force responses into predictable formats.

    The Waterloo research suggests this approach has not yet delivered the level
    of dependability developers require.

    Waterloos benchmarking revealed even the most advanced proprietary models reached only about 75% accuracy, while open source alternatives performed closer to 65%. What to read next Testing AI is not like testing software and most companies haven't figured that out yet 5 AI myths taken apart AI can summarize meetings, but heres what it still cant do in 2026

    These results suggest that, despite improvements, AI systems still make significant errors that cannot be ignored in professional development environments.

    The report emphasized the need for human oversight, noting, Developers might have these agents working for them, but they still need significant human supervision.

    Although structured outputs are a step forward from free-form natural
    language responses, errors remain common.

    The technology is not yet robust enough to operate independently in complex development scenarios.

    One might reasonably question whether the industrys enthusiasm for AI and
    vibe coding assistants has outpaced the actual capabilities of the underlying technology.

    Even the most advanced models demonstrate a significant failure rate on structured tasks, revealing a wide gap between marketing claims and actual performance.

    Therefore, for now, developers should treat these tools as experimental aids rather than autonomous colleagues.

    Link to news story: https://www.techradar.com/pro/even-the-most-advanced-ai-models-fail-more-often -than-you-think-on-structured-outputs-raising-doubts-about-the-effectiveness-o f-coding-assistants

    $$
    ---
    � Synchronet � CAPCITY2 * Capitol City Online
  • From Lonewolf@DIGDIST/BINARYDR to Dumas Walker on Monday, March 23, 2026 20:54:42
    Re: Even the most advanced AI
    By: Dumas Walker to All on Mon Mar 23 2026 08:23 am

    Even the most advanced AI models fail more often than you think on structured outputs raising doubts about the effectiveness of coding assistants

    Thank goodness I became a senior software engineer on my own merit long before and without AI. At least when I create something new now, I can keep my dignity about it, even if I use SOME AI help in the process. ;)

    Lonewolf
    ---
    � Synchronet � Fireside BBS - AI-WX - firesidebbs.com:23231
  • From Nightfox@DIGDIST to Lonewolf on Tuesday, March 24, 2026 17:22:44
    Re: Even the most advanced AI
    By: Lonewolf to Dumas Walker on Mon Mar 23 2026 08:54 pm

    Thank goodness I became a senior software engineer on my own merit long before and without AI. At least when I create something new now, I can keep my dignity about it, even if I use SOME AI help in the process. ;)

    I did as well. And I'm just recently starting to use AI more with software projects. It think it really helps to know how to do it "manually".

    Nightfox

    ---
    � Synchronet � Digital Distortion: digitaldistortionbbs.com