Hello SundAI - our world through the lense of AI

https://anchor.fm:443/s/a222338/podcast/rss

5 Followers 51 Episodes Claim Ownership

"Hello SundAI - Our World Through the Lens of AI," is your twice-weekly dive into how artificial intelligence shapes our digital landscape. Hosted by Roger and SundAI the AI, this podcast brings you practical tips, cutting-edge tools, and insightful interviews every Sunday and Wednesday morning. Whether you're a seasoned tech enthusiast or just starting to explore the digital domain, tune in to discover innovative ways to get things done and propel yourself forward in a world increasingly...

Episode List

AI Cannot Think: When AI Reasoning Models Hit Their Limit

Jun 9th, 2025 5:43 PM

Join us as we dive into a groundbreaking study that systematically investigates the strengths and fundamental limitations of Large Reasoning Models (LRMs), the cutting-edge AI systems behind advanced "thinking" mechanisms like Chain-of-Thought with self-reflection.Moving beyond traditional, often contaminated, mathematical and coding benchmarks, this research uses controllable puzzle environments like the Tower of Hanoi, Checker Jumping, River Crossing, and Blocks World to precisely manipulate problem complexity and offer unprecedented insights into how LRMs "think".You'll discover surprising findings, including:Three distinct performance regimes: Standard Large Language Models (LLMs) surprisingly outperform LRMs on low-complexity tasks; LRMs demonstrate an advantage on medium-complexity tasks due to their additional "thinking" processes; but crucially, both model types experience a complete accuracy collapse on high-complexity tasks.A counter-intuitive scaling limit: LRMs' reasoning effort, measured by token usage, increases up to a certain complexity point, then paradoxically declines despite having an adequate token budget.This suggests a fundamental inference-time scaling limitation in their reasoning capabilities relative to problem complexity.Inconsistencies and limitations in exact computation: LRMs struggle to benefit from being explicitly given algorithms, failing to improve performance even when provided with step-by-step instructions for puzzles like the Tower of HanoiThey also exhibit inconsistent reasoning across different puzzle types, performing many correct moves in one scenario (e.g., Tower of Hanoi) but failing much earlier in another (e.g., River Crossing), indicating potential issues with generalizable reasoning rather than just problem-solving strategy discovery"Overthinking" phenomenon: For simpler problems, LRMs often find correct solutions early in their reasoning trace but then continue to inefficiently explore incorrect alternatives, wasting computational effortThis episode challenges prevailing assumptions about LRM capabilities and raises crucial questions about their true reasoning potential, paving the way for future investigations into more robust AI reasoning.Disclaimer: This podcast is generated by Roger Basler de Roca (contact) by the use of AI. The voices are artificially generated and the discussion is based on public research data. I do not claim any ownership of the presented material as it is for education purpose only.⁠https://rogerbasler.ch/en/contact/

The Art and Science of Prompt Engineering by Google

Apr 27th, 2025 4:46 AM

In this show, we break down the art of crafting prompts that help AI deliver precise, useful, and reliable results.Whether you're summarising text, answering questions, generating code, or translating content — we’ll show you how to guide LLMs effectively.We explore real-world techniques, from simple zero-shot prompts to advanced strategies like Chain of Thought, Tree of Thoughts, and ReAct, combining reasoning with external tools.We’ll also dive into how to control AI output — tweaking things like temperature, token limits, and sampling settings — to shape your results.Plus, we’ll share best practices for writing, testing, and refining prompts — including tips on examples, formatting, and structured outputs like JSON.Whether you’re just getting started or already deep into advanced prompting, this podcast will help you sharpen your skills and stay ahead of the curve.Let’s unlock the full potential of AI — one prompt at a time.Disclaimer: This podcast is generated by Roger Basler de Roca (contact) by the use of AI. The voices are artificially generated and the discussion is based on public research data. I do not claim any ownership of the presented material as it is for education purpose only.⁠https://rogerbasler.ch/en/contact/

AI finally passed the Turing Test

Apr 20th, 2025 4:00 AM

Has AI finally passed the Turing Test? Dive into the groundbreaking news from UC San Diego, where research published in March 2025 claims that GPT 4.5 convinced human judges it was a real person 73% of the time, even more often than actual humans in the same test. But what does this historic moment truly signify for the future of artificial intelligence?This podcast explores the original concept of the Turing Test, proposed by Alan Turing in 1950 as a practical measure of a machine's ability to exhibit intelligent behavior indistinguishable from that of a human through conversation. We'll examine the rigorous controlled study that led to GPT 4.5's alleged success, involving 284 participants and five-minute conversations.We'll delve into what passing the Turing Test actually means – and, crucially, what it doesn't. Is this the dawn of true AI consciousness or Artificial General Intelligence (AGI)? The sources clarify that the Turing Test specifically measures conversational ability and human likeness in dialogue, not sentience or general intelligence.Discover the key factors that contributed to this breakthrough, including massive increases in model parameters and training data, sophisticated prompting (especially the use of a "persona prompt"), learning from human feedback, and models designed for conversation. We will also discuss the intriguing finding that human judges often identified someone as human when they lacked knowledge or made mistakes, showing a shift in our perception of AI.However, the podcast will also address the criticisms and limitations of the Turing Test. We'll explore the argument that it's merely a test of functionality and doesn't necessarily indicate genuine human-like thinking. We'll also touch on alternative tests for AI that aim to assess creativity, problem-solving, and other aspects of intelligence beyond conversation, such as the Metzinger Test and the Lovelace 2.0 Test.Finally, we will consider the profound implications of AI systems convincingly simulating human conversation, including the economic impact on roles requiring human-like interaction, the potential effects on social relationships, and the ethical considerations around deception and manipulation.Join us to unpack this milestone in computing history and discuss what the blurring lines between human and machine communication mean for our society, economy, and lives.Source: https://theconversation.com/chatgpt-just-passed-the-turing-test-but-that-doesnt-mean-ai-is-now-as-smart-as-humans-253946Disclaimer: This podcast is generated by Roger Basler de Roca (contact) by the use of AI. The voices are artificially generated and the discussion is based on public research data. I do not claim any ownership of the presented material as it is for education purpose only.⁠https://rogerbasler.ch/en/contact/

Googles approach to AGI - artificial general intelligence

Apr 15th, 2025 12:36 PM

h 145-page paper from Google DeepMind, outlining their strategic approach to managing the risks and responsibilities of AGI development.1. Defining AGI and ‘Exceptional AGI’We begin by clarifying what DeepMind means by AGI: an AI system capable of performing any task a human can. More specifically, they introduce the notion of ‘Exceptional AGI’ – a system whose performance matches or exceeds that of the top 1% of professionals across a wide range of non-physical tasks.(Note: DeepMind is a British AI company, founded in 2012 and acquired by Google in 2014.)2. Understanding the Risk LandscapeAGI, while full of potential, also presents serious risks – from systemic harm to outright existential threats. DeepMind identifies four core areas of concern:Abuse (intentional misuse by actors with harmful intent)Misconduct (reckless or unethical use)Errors (unexpected failures or flaws in design)Structural risks (long-term unintended societal or economic consequences)Among these, abuse and misconduct are given particular attention due to their immediacy and severity.3. Mitigating AGI Threats: DeepMind’s Technical StrategyTo counter these dangers, DeepMind proposes a multi-layered technical safety strategy. The goal is twofold:To prevent access to powerful capabilities by bad actorsTo better understand and predict AI behaviour as systems grow in autonomy and complexityThis approach integrates mechanisms for oversight, constraint, and continual evaluation.4. Debate Within the AI FieldHowever, the path is far from settled. Within the AI research community, there is ongoing skepticism regarding both the feasibility of AGI and the assumptions underlying safety interventions. Critics argue that AGI remains too vaguely defined to justify such extensive safeguards, while others warn that dismissing risks could be equally shortsighted.5. Timelines and TrajectoriesWhen might we see AGI? DeepMind’s report considers the emergence of ‘Exceptional AGI’ as plausible before the end of this decade – that is, before 2030. While no exact date is predicted, the implication is clear: preparation cannot wait.This episode offers a rare look behind the scenes at how a leading AI lab is thinking about, and preparing for, the future of artificial general intelligence. It also raises the broader question: how should societies respond when technology begins to exceed traditional human limits? Source: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/evaluating-potential-cybersecurity-threats-of-advanced-ai/An_Approach_to_Technical_AGI_Safety_Apr_2025.pdfDisclaimer: This podcast is generated by Roger Basler de Roca (contact) by the use of AI. The voices are artificially generated and the discussion is based on public research data. I do not claim any ownership of the presented material as it is for education purpose only.⁠https://rogerbasler.ch/en/contact/

The Anthropic Economic Index: Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations

Mar 30th, 2025 4:38 PM

This academic paper from Anthropic provides an empirical analysis of how artificial intelligence, specifically their Claude model, is being used across the economy. The researchers developed a novel method to analyse millions of Claude conversations and map them to tasks and occupations listed in the US Department of Labor's O*NET database. Their findings indicate that AI usage is currently concentrated in areas like software development and writing, with a notable portion of occupations showing AI use for some of their tasks. The study also distinguishes between AI being used to automate tasks versus augment human capabilities and examines usage patterns across different Claude models, providing early, data-driven insights into AI's evolving role in the labour market.Source: https://www.anthropic.com/news/the-anthropic-economic-indexDisclaimer: This podcast is generated by Roger Basler de Roca (contact) by the use of AI. The voices are artificially generated and the discussion is based on public research data. I do not claim any ownership of the presented material as it is for education purpose only.⁠https://rogerbasler.ch/en/contact/

More Episodes