This semi-monthly column highlights news, government documents, NGO/IGO papers, conferences, industry white papers and reports, academic papers and speeches, and central bank actions on the subject of AI’s fast paced impact on the banking and finance sectors.
NEWS:
Import AI 429: Eval the world economy; singularity economics; and Swiss sovereign AI – If you’re measuring how well your system performs against the world economy, it’s probably because you expect to deploy your system into the entire world economy. Welcome to Import AI, a newsletter about AI research. Import AI runs on lattes, ramen, and feedback from readers. If you’d like to support this, please subscribe. OpenAI builds an eval that could be to the broad economy as SWE-Bench is to code:…GDPval is a very good benchmark with extremely significant implications… OpenAI has built and released GDPval, an extremely well put together benchmark for testing out how well AI systems do on the kinds of tasks people do in the real world economy. GDPval may end up being to broad real world economic impact as SWE-Bench is to coding impact, as far as evals go – which is a big deal!
Defining The Big Picture Framework When It Comes To The Economics Of Transformative AI, Forbes, September 29, 2025. In today’s column, I examine a newly released research paper that tackles an important topic, namely, the need to formulate and promulgate a big picture perspective regarding the economic and societal impacts of transformative AI. The paper was recently posted by the esteemed National Bureau of Economic Research (NBER) and does a yeoman’s job in laying out an engaging and foundational big picture or framework that deserves keen consideration. I will walk you through the key aspects and aim to whet your appetite on the altogether weighty matter. We definitely need more work of this kind. The economic upheaval that might very well coincide with the rise of artificial general intelligence (AGI) and someday artificial superintelligence (ASI) requires rapt attention now. We can’t put off these crucial analyses. The usual refrain by high-tech is that we should mindlessly move fast and break things. But misguidedly breaking our economies and economic formations carries enormously adverse consequences, especially if we aren’t preparing ourselves for the consequences…”
Measuring the performance of our models on real-world tasks (OpenAI). Our mission is to ensure that artificial general intelligence benefits all of humanity. As part of our mission, we want to transparently communicate progress on how AI models can help people in the real world. That’s why we’re introducing GDPval: a new evaluation designed to help us track how well our models and others perform on economically valuable, real-world tasks. We call this evaluation GDPval because we started with the concept of Gross Domestic Product (GDP) as a key economic indicator and drew tasks from the key occupations in the industries that contribute most to GDP.
People often speculate about AI’s broader impact on society, but the clearest way to understand its potential is by looking at what models are already capable of doing. History shows that major technologies—from the internet to smartphones—took more than a decade to go from invention to widespread adoption. Evaluations like GDPval help ground conversations about future AI improvements in evidence rather than guesswork, and can help us track model improvement over time.
Previous AI evaluations like challenging academic tests and competitive coding challenges have been essential in pushing the boundaries of model reasoning capabilities, but they often fall short of the kind of tasks that many people handle in their everyday work.
To bridge this gap, we’ve been developing evaluations that measure increasingly realistic and economically relevant capabilities. This progression has moved from classic academic benchmarks like MMLU (exam-style questions across dozens of subjects), to more applied evaluations like SWE-Bench (software engineering bug-fixing tasks), MLE-Bench (machine learning engineering tasks such as model training and analysis), and Paper-Bench (scientific reasoning and critique on research papers), and more recently to market-based evaluations like SWE-Lancer (freelance software engineering projects based on real payouts).
GDPval is the next step in that progression. It measures model performance on tasks drawn directly from the real-world knowledge work of experienced professionals across a wide range of occupations and sectors, providing a clearer picture on how models perform on economically valuable tasks. Evaluating models on realistic occupational tasks helps us understand not just how well they perform in the lab, but how they might support people in the work they do every day…
PAPERS:
GDPval: Evaluating AI Model Performance On Real-World Economically Valuable Tasks (OpenAI, PDF). We introduce GDPval, a benchmark evaluating AI model capabilities on real-world economically valuable tasks. GDPval covers the majority of U.S. Bureau of Labor Statistics Work Activities for 44 occupations across the top 9 sectors contributing to U.S. GDP (Gross Domestic Product). Tasks are constructed from the representative work of industry professionals with an average of 14 years of experience. We find that frontier model performance on GDPval is improving roughly linearly over time, and that the current best frontier models are approaching industry experts in deliverable quality. We analyze the potential for frontier models, when paired with human oversight, to perform GDPval tasks cheaper and faster than unaided experts. We also demonstrate that increased reasoning effort, increased task context, and increased scaffolding improves model performance on GDPval. Finally, we open-source a gold subset of 220 tasks and provide a public automated grading service at evals.openai.com to facilitate future research in understanding real-world model capabilities.
Economists: If transformative AI arrives soon, we need to radically rethink economics:…Taxes! Altered economic growth! Geoeconomics! Oh my!…
Researchers with Stanford, the University of Virginia, and the University of Toronto have written a position paper arguing that the potential arrival of powerful AI systems in the coming years poses a major challenge to society, and economists need to get off their proverbial butts and start doing research on the assumption that technologists are right about timelines.
Definitions: For the purpose of the paper, they define transformative AI as an “artificial intelligence that enables a sustained increase in total factor productivity growth of at least 3 – 5x historical averages.”
Such a system would generate vast wealth and vast changes to the social order – and it could arrive in the next few years.
The importance of economic analysis: “Our agenda is relevant to all researchers and policymakers interested in the broader effects of AI on society,” they write. “Unlike technical analyses that focus on capabilities, economic analysis emphasizes societal outcomes: who benefits, what trade-offs emerge, and how institutions might adapt to technological change.”
21 key questions: The paper outlines 21 key questions which people should study to get their arms around this problem, grouped into nine distinct categories:
- Economic Growth: How can TAI change the rate and determinants of economic growth? What will be the main bottlenecks for growth? How can TAI affect the relative scarcity of inputs including labor, capital and compute? How will the role of knowledge and human capital change? What new types of business processes and organizational capital will emerge?
- Invention, Discovery and Innovation: For what processes and techniques will TAI boost the rate and direction of invention, discovery, and innovation? Which fields of innovation and discovery will be most affected and what breakthroughs could be achieved?
- Income Distribution: How could TAI exacerbate or reduce income and wealth inequality? How could TAI affect labor markets, wages and employment? How might TAI interact with social safety nets?
- Concentration of Decision-making and Power: What are the risks of AI-driven economic power becoming concentrated in the hands of a few companies, countries or other entities? How might AI shift political power dynamics?
- Geoeconomics: How could AI redefine the structure of international relations, including trade, global security, economic power and inequality, political stability, and global governance?
- Information, Communication, and Knowledge: How can truth vs. misinformation, cooperation vs. polarization, and insight vs. confusion be amplified or dampened? How can TAI affect the spread of information and knowledge?
- AI Safety & Alignment: How can we balance the economic benefits of TAI with its risks, including catastrophic and existential risks? What can economists contribute to help align TAI with social preferences and welfare?
- Meaning and Well-being: How can people retain their sense of meaning and worth if “the economic problem is solved” as Keynes predicted? What objectives should we direct TAI to help us maximize?
- Transition Dynamics: How does the speed mismatch between TAI and complementary factors affect the rollout of TAI and how can adjustment costs be minimized? How can societies prepare for and respond to potential transition crises, e.g.., sudden mass unemployment, system failures, or conflicts triggered by TAI developments?
Why this matters – this research agenda speaks to an utterly changed world: Often, the questions people ask are a leading indicator of what they think they’re about to need to do. If economists start asking the kinds of questions outlined here, then it suggests they expect we may need radical changes to society, the like of which we haven’t seen since the social reformations following the second world war in England, or the general slew of changes that arrived with and followed the industrial revolution.
The fundamental question this is all pointing at is “how to equitably share the benefits and how to reform taxation systems in a world where traditional labor may be significantly diminished”. How, indeed? Read more: A Research Agenda for the Economics of Transformative AI (NBER).
PAPERS – NBER:
A Research Agenda for the Economics of Transformative AI, Working Paper 34256. DOI 10.3386/w34256. Issue Date September 2025. As we approach Transformative Artificial Intelligence (TAI), there is an urgent need to advance our understanding of how it could reshape our economic models, institutions and policies. We propose a research agenda for the economics of TAI by identifying nine Grand Challenges: economic growth, innovation, income distribution, decision-making power, geoeconomics, information flows, safety risks, human well-being, and transition dynamics. By accelerating work in these areas, researchers can develop insights and tools to help fulfill the economic potential of TAI.
Do Markets Believe in Transformative AI? Working Paper 34243. DOI 10.3386/w34243. Issue Date September 2025. Economic theory predicts that transformative technologies may influence interest rates by changing growth expectations, increasing uncertainty about growth, or raising concerns about existential risk. Examining US bond yields around major AI model releases in 2023-4, we find economically large and statistically significant movements concentrated at longer maturities. The median and mean yield responses across releases in our sample are negative: long-term Treasury, TIPS, and corporate yields fall and remain lower for weeks. Viewed through the lens of a simple, representative agent consumption-based asset pricing model, these declines correspond to downward revisions in expected consumption growth and/or a reduction in the perceived probability of extreme outcomes such as existential risk or arrival of a post-scarcity economy. By contrast, changes in consumption growth uncertainty do not appear to drive our results.
AI and Task Efficiency, Working Paper 34295. DOI 10.3386/w34295. Issue Date September 2025. We model several ways in which AI may improve decisions, raise the productivity of firms, and raise human capital growth. Each focuses on activities that involve problem solving, with solutions being guided by signals. If AI raises the accuracy of the signals, humans will then make better decisions — individually and in groups.
How People Use ChatGPT, Working Paper 34255. DOI 10.3386/w34255. Issue Date, September 2025. Despite the rapid adoption of LLM chatbots, little is known about how they are used. We document the growth of ChatGPT’s consumer product from its launch in November 2022 through July 2025, when it had been adopted by around 10% of the world’s adult population. Early adopters were disproportionately male but the gender gap has narrowed dramatically, and we find higher growth rates in lower-income countries. Using a privacy-preserving automated pipeline, we classify usage patterns within a representative sample of ChatGPT conversations. We find steady growth in work-related messages but even faster growth in non-work-related messages, which have grown from 53% to more than 70% of all usage. Work usage is more common for educated users in highly-paid professional occupations. We classify messages by conversation topic and find that “Practical Guidance,” “Seeking Information,” and “Writing” are the three most common topics and collectively account for nearly 80% of all conversations. Writing dominates work-related tasks, highlighting chatbots’ unique ability to generate digital outputs compared to traditional search engines. Computer programming and self-expression both represent relatively small shares of use. Overall, we find that ChatGPT provides economic value through decision support, which is especially important in knowledge-intensive jobs.
PAPER Bank for International Settlement (BIS)
Harnessing artificial intelligence for monitoring financial markets. BIS Working Papers | No 1291 | 24 September 2025 PDF full text – We study how artificial intelligence can help monitor financial markets. We build a two-step tool that forecasts market stress and explains the reasons behind its forecast. First, a recurrent neural network learns from over one hundred daily market indicators. It predicts the average size of gaps between euro–yen traded directly and euro–dollar–yen traded via the US dollar. These “triangular arbitrage parity” gaps should vanish within seconds in normal times, and big or persistent gaps signal that market frictions are rising. Second, the model shows, day by day, which market indicators matter most for its signal. This information can then direct a large language model to search recent news about those high-importance indicators to add timely context. Forecasting stress is hard. Severe events are rare, links across markets are non-linear and standard early warning tools often miss new risks. Our approach joins statistical power with reasoning. The network’s data-driven weights make its decisions transparent. Their movement over time is itself an early signal that market dynamics may be shifting. The language model then turns these signals into short narratives that point supervisors and analysts to the right topics at the right moment. This helps close the gap between “a number went up” and “here is why it may be rising”.
Findings – Using more than one hundred daily indicators, the system flags periods of likely dysfunction up to 60 business days ahead. In tests on data not used for training from 2021–24, it correctly highlights episodes later linked to real events, including the March 2023 banking strains. When the model raises an alert, its highest-weight indicators guide targeted news searches. In case studies, those searches pointed to discussions of the relevant drivers days before turbulence. In short, the tool detects risk early and explains it in accessible terms, helping authorities focus their surveillance and prepare responses.
