The Week in Green Software: AI Energy Scores & Leaderboards
Host Chris Adams is joined by Asim Hussain to explore the latest news from The Week in Green Software. They look at Hugging Face’s AI energy tools, Mistral’s lifecycle analysis, and the push for better data disclosure in the pursuit for AI sustainability. They discuss how prompt design, context windows, and model choice impact emissions, as well as the role of emerging standards like the Software Carbon Intensity for AI, and new research on website energy use. Learn more about our people:Chris Adams: LinkedIn | GitHub | WebsiteAsim Hussain: LinkedIn | WebsiteFind out more about the GSF:The Green Software Foundation Website Sign up to the Green Software Foundation NewsletterNews:A Gift from Hugging Face on Earth Day: ChatUI-Energy Lets You See Your AI Chat’s Energy Impact Live [04:02]Our contribution to a global environmental standard for AI | Mistral AI [19:47]AI Energy Score Leaderboard - a Hugging Face Space by AIEnergyScore [30:42]Challenges Related to Approximating the Energy Consumption of a Website | IEEE [55:14]National Drought Group meets to address “nationally significant” water shortfall - GOV.UK Resources:GitHub - huggingface/chat-ui: Open source codebase powering the HuggingChat app [07:47]General policy framework for the ecodesign of digital services version 2024 [29:37]Software Carbon Intensity (SCI) Specification Project | GSF [37:35]Neural scaling law - Wikipedia [45:26]Software Carbon Intensity for Artificial Intelligence | GSF [52:25]Announcement:Green Software Movement | GSF [01:01:45] If you enjoyed this episode then please either:Follow, rate, and review on Apple PodcastsFollow and rate on SpotifyWatch our videos on The Green Software Foundation YouTube Channel!Connect with us on Twitter, Github and LinkedIn!TRANSCRIPT BELOW:Asim Hussain: ChatGPT, they're all like working towards a space of how do we build a tool where people can literally pour junk into it, and it will figure something out. Whereas what we should be doing, is how do you use that context window very carefully. And it is like programming. Chris Adams: Hello, and welcome to Environment Variables, brought to you by the Green Software Foundation. In each episode, we discuss the latest news and events surrounding green software. On our show, you can expect candid conversations with top experts in their field who have a passion for how to reduce the greenhouse gas emissions of software.I'm your host, Chris Adams. Hello and welcome to this week in Green Software where we look at the latest news in sustainable software development. I am joined once again by my friend and partner in crime or occasionally crimes, Asim Hussain, of the Green Software Foundation. My name is Chris Adams. I am the Director of Policy and Technology at the Green Web Foundation, no longer the executive director there,and, as we've moved to a co-leadership model. And, Asim, really lovely to see you again, and I believe this is the first time we've been on a video podcast together, right?Asim Hussain: Yeah. I have to put clothes on now, so, so that's,Chris Adams: That raises all kinds of questions to how intimate our podcast discussions were before. Maybe they had a different meaning to you than they did to me, actually.Asim Hussain: Maybe you didn't know I was naked, but anyway.Chris Adams: No, and that makes it fine. That's what, that's what matters. I also have to say, this is the first time we get to, I like the kind of rocking the Galactus style headphones that you've got on here.Asim Hussain: These are my, yeah, no, these are old ones that I posted recently. I actually repaired them. I got my soldering iron and I repaired the jack at the end there. So, I'm very proud of myself for having repaired. I had the right to repair. Chris. I had the right to repair it.Chris Adams: Yeah. This is why policy matters.Asim Hussain: I also have the capability.Chris Adams: Good. So you can get, so, good on you for saving a bunch of embodied carbon and, how that's calculated is something we might touch on. So, yes. So if you are new to this podcast, my friends, we're just gonna be reviewing some of the news and stories that are kinda showed up on our respective radars as we work in our kind of corresponding roles in both the Green Software Foundation and the Green Web Foundation.And hopefully this will be somewhat interesting or at least diverting to people as they wash their dishes whilst listening to us. So that's the plan. Asim, should I give you a chance to just briefly introduce what you do at the Green Software Foundation before I go into this?'Cause I realized, I've just assumed that everyone knows who you are. And I know who you are, but maybe there's people who are listening for the first time, for example.Asim Hussain: Oh yeah. So, yeah. So my name's Asim Hussain. I am a technologist by trade. I've been building software for several decades now. I formed the green software, yeah, Green Software Foundation, you know, four years ago. And, now I'm the executive director and I'm basically in charge of, yeah, just running the foundation and making sure we deliver against our vision of a future where software has zero harmful environmental impacts.Chris Adams: That's a noble goal to be working for. And Asim, I wanted to check. How long is it now? Is it three years or four years? 'Cause we've been doing this a while.Asim Hussain: We, yeah. So we just fin, well, four years was May, so yeah, four years. So next birthday's the fifth birthday.Chris Adams: Wow. Time flies when the world is burning, I suppose. Alright, so anyway, as per usual, what we'll do, we share all the show notes and any links that we discuss or projects we discuss, we'll do our damnedest to make sure that they're available for anyone who wants to continue their quest and learning more about sustainability in the field of software.And I suppose, Asim, it looks like you're sitting comfortably now. Should we start looking at some of the news stories?Asim Hussain: Let's go for it.Chris Adams: Alright. Okay. The first one we have, is a story from Hugging Face. This is actually a few months back, but it's one to be aware of if it missed you the first time. So, Hugging Face released a new tool called Chat UI Energy that essentially lets you see, the energy impact live from using a kind of chat session,a bit like ChatGPT or something like that. Asim, I think we both had a chance to play around with this, and we'll share a link to the actual story around this as well as the actual repo that's online. What do you think of this? what's your immediate take when you see this and have a little poke around with this? Asim Hussain: Well, it's good. I wanna make sure. It's a really nice addition to a chat interface. So just so the audience who's not seeing it, every time you do a prompt, it tells you the energy in, well, in watt hours, what I'm seeing right now. But then also, you know, some other stats as well.And then also kind of how much of a phone charge it is. And that's probably the most surprising one. I just did a prompt, which was 5.7% of a phone charge, which was, that's pretty significant. Actually, I dunno, is that significant? So, one of the things is, I'm trying to, what I'm trying to find out from it though is how does that calculation, 'cause that's my world, it's like, how does, what do you really mean by a calculation?Is it cumulative? Is it session based? Is it just, you know, how, what have you calculated in terms of the energy emissions? The little info on the side is just the energy of the GPU during inference. So it's not the energy of kind of anything else in the entire user journey of me using a UI to ask a prompt.But we also know that's probably the most significant. And I'm kind of quite interested in figuring out, as I'm prompting it, I'm one, I'm, one of the things I'm seeing is that every single prompt is actually, the emissions are bigger than the previous prompt. Oh no, it's not actually, that's not true.Yeah, it is.Chris Adams: Ah, this is the thing you've been mentioning about cumulative, Asim Hussain: Cumulative. Yeah. Which is a confusing one. 'Cause I've had a lot of people who are really very good AI engineers go, "Asim, no, that's not true." And other people going, "yeah, it kind of is true." But they've just optimized it to the point where the point at which you get hit with that is at a much larger number.But the idea is that there's, there, it used to be an n squared issue for your prompt and your prompt session history. So every time you put a new prompt in all of your past session history was sent with your next prompt. And if you are actually building, like a your own chat system, if you are actually building like your own chat solution for your company or wherever, that is typically how you would implement it as a very toy solution to begin with is just, you know, take all the texts that was previous and the new text and send it, in the next session.But I think what, they were explaining to me, which was actually in the more advanced solutions, you know, the ones from Claude or ChatGPT, there's a lot of optimization that happens behind the scenes. So it doesn't really, it doesn't really happen that way, but I was trying to figure out whether it happens with this interface and I haven't quite figured it out yet.Chris Adams: Oh, okay. So I think what you might be referring to is the fact that when you have like a GPU card or something like that, there's like new tokens and kind of cashed tokens, which are priced somewhat differently now. And this is one of the things that we've seen.'Cause it's using maybe a slightly different kind of memory, which might be slightly faster or is slightly kind of is slightly lower cost to service in that sense. Yeah. Okay. So this is one thing that we don't see. What I, the good news is we can share a link to this, for anyone listening, this source code is all on GitHub, so we can have a look at some of this.And one of the key things you'll see actually is, well this is sending a message. When you see the actual numbers update, the, it's not actually, what it's actually doing is it's calculating all this stuff client site based on how big each model is likely to be. 'Cause when you look at this, you can A,Asim Hussain: It's a model.Chris Adams: You can work out the, I mean, so when people talk about should I be using the word please or thank you, and am I making the things worse by treating this like a human or should I just be prompting the machine like a machine, is there a carbon footprint to that? This will display some numbers that you can see there, but this has all been calculated inside your browser rather than actually on the server.So like you said, Asim, there is a bit of a model that's taking place here, but as a kind of way to like mess around and kind of have a way into this. This is quite interesting and even now it's kind of telling that there are so few providers that make any of this available, right now. We're still struggling even in like the third quarter of 2025,to have a commercial service that will expose these numbers to you in a way that you can actually meaningfully change the environmental footprint of through either your prompting behavior or well maybe model choice. But that's one of the key things that I see. I can't think, I can't think of any large commercial service that's doing this.The only one is possibly GreenPT,which is basically put a front end on Scaleway's, inference service and I'm not sure how much is being exposed there for them to make some assumptions as well.Asim Hussain: Do you know how bad, do you know how,I feel very uncomfortable with the idea of a future where a whole bunch of people are not saying please or thank you, and the reason for it is they're proudly saying, "well, I care about, I care about sustainability, so I'm not gonna say please or thank you anymore 'cause it's costing too many, too much carbon." I find that very uncomfortable. I personally, I don't wanna, we could, choose not to say please or thank you in all of our communications because it causes, emissions no matter what you do. I don't know.Chris Adams: I'm glad you weren't there, Asim. 'Cause I was thinking about that too. There's a carbon cost to breathing out and if, you, I guess maybe that's 'cause we're both English and it's kinda hardwired into us. It's like the same way that, you know, if you were to step on my toe, I would apologize to you stepping on my toe because I'm just English and I, and it's a muscle memory, kind of like impulsing.Okay.Asim Hussain: Yeah.Chris Adams: That's, what we found. We will share some couple, a couple of links to both the news article, the project on Hugging Face, and I believe it's also on GitHub, so we can like, check this out and possibly make a PR to account for the different kinds of caching that we just discussed to see if that does actually make a meaningful difference on this.For other people who are just looking, curious about this, this is one of the tools which also allows you to look at a, basically not only through weird etiquette, how etiquette can of impact the carbon footprint of using a tool, but also your choice of model. So some models might be, say 10 times the size of something, but if they're 10, if they're not 10 times as good, then there's an open question about whether it's really worth using them, for example.And I guess that might be a nice segue to the next story that we touch on. But Asim, I'll let you, you gotta say something. IAsim Hussain: No, I was gonna say, because I, this is, 'cause I've been diving into this like a lot recently, which is, you know, how do you efficiently use AI? Because I think a lot of the, a lot of the content that's out there about, you know, oh, AI's emissions and what to do to reduce AI's emissions, there are all the choices that as a consumer of AI, you have absolutely no ability to affect. I mean, unless you are somebody who's quite comfortable, you know, taking an open source model and rolling out your own infrastructure or this or that or the other. If you're just like an everyday, not even an everyday person, but just somebody who works in a company who's, you know, the company bought Claude, you know, you're using Claude,end of story, what are you, like, what do you do? And I think that's really, it is a really interesting area. I might just derail our whole conversation to talk about this, but I think it's a really interesting area because, what it's really boiling down to is your use of the context window.And so you have a certain number of tokens in a chat before that chat implodes, and you can't use that chat anymore. And historically, those number of tokens were quite low. Relative to, because of all the caching stuff hadn't been invented yet and this and that and the other. So the tokens were quite low.What, didn't mean they didn't mean they were, the prompts were cheaper before. I think they were still causing a lot of emissions. But because they've improved the efficiency and rather than just said, I've improved the efficiency, leave it at that, I've improved the efficiency, Jevons paradox, I've improved the efficiency,let's just give people more tokens to play around with before we lock them out. So the game that we're always playing is how to actually efficiently use that context. And the please or thank you question is actually, see this is, I don't think it's that good one. 'Cause it's two tokens in a context window of a million now, is what's coming down the pipeline.The whole game. And I think this is where we're coming from as you know, if you wanna be in the green software space and actually have something positive to say about how to actually have a relationship with AI, it's all about managing that context. 'Cause the way context works is you're just trying to, it's like you've got this intern and if you flash a document at this intern, you can't then say, "oh, ignore that.Forget it I didn't mean to show you that." It's too late. They've got it and it's in their memory and you can't get rid of it. the only solution is to literally execute that intern and bury their body and get a new intern and then make sure they see the information in the order and only the information they need to see so that when you finally ask 'em that question, they give you the right answer. And so what a lot of people do is they just, because there's a very limited understanding of how to play, how to understand, how to play with this context space, what people end up doing is they're just going, "listen, here's my entire fricking document. It's actually 50,000 words long. You've got it, and now I'm gonna ask you, you know, what did I do last Thursday?"So it's, and all of that context is wasted. And I think that's, and it's also like a very simplistic way of using an AI, which is why like a lot of companies are, kind of moving towards that space because they know that it means their end user doesn't have to be very well versed in the use of the tool in order to get benefit out of it.So that's why ChatGPT, they're all like working towards a space of how do we build a tool where people can literally pour junk into it, and It will figure something out. Whereas what we should be doing and what I'm like, and I think it's not only what we should be doing, it's, what the people who are like really looking at how to actually get real benefit from AI,is how do you use that context window very carefully. And it is like programming. It is really like program. That's what, that's my experience with it so far. It's like, I want this, I need to feed this AI information. It's gonna get fed in an order that matters. It's gonna get fed in a format that matters.I need to make sure that the context I'm giving it is exactly right and minimal. Minimal for the question that I wanna answer, get it answered at the end of it. So we're kind of in this like space of abundance where, because every AI provider's like, "well do what you want. Here's a million tokens.Do what you want, do what you want."And they're all, we're all just chucking money. These we're just chucking all our context tokens at it. They're burning money on the other side because they're not about making a profit at the moment. They're just about becoming the winner. So they don't really care about kind of profitability to that level.So what us It's all about, I'm just getting back to it again. I think, we need to eventually be telling that story of like, how do you actually use the context window very carefully? And again, it's annoyed me that the conversation has landed at please and thank you. 'Cause the actual conversation should be, you know, turning that Excel file into a CSV because it knows how to parse a CSV and it uses fewer tokens to parse a CSV than an Excel file. Don't dump the whole Excel file, export the sheet that you need in order for it to, answer that question. If you f up, don't just kill the session and start a new session.This is, there's this advice that we need to be giving that I don't even know yet.Chris Adams: MVP. Minimal viable prompt.Asim Hussain: Minimal viable prompt! Yeah. What is the minimal viable prompt and the, what's frustrating me is that like one of the things that we use Claude and I use Claude a lot, and Claude's got a very limited context window and I love that.It was like Twitter when you had to, remember Twitter when you had to like have 160 characters?It was beautiful.Chris Adams: to 280, and then you're prepared to be on that website, you can be as, you can monologue as much as you wantAsim Hussain: Yeah. You can now monologue, but it was beautiful having to express an idea in this short, like short, I love that whole, how do I express this complex thing in a tweet? And so with the short context windows, were kind of forced to do that, and now I'm really scared because now everybody, Claude literally two days ago has now gone, right, you've got a million context window, and I'm like, oh, damn it.Now I don't even, now I don't have personallyChris Adams: That's a million token context window when you say that. Right. So that's enough for a small book basically. I can dump entire book into it, then ask questions about it. Okay. Well, I guess it depends on the size of your book really, but yeah, so that's, what you're referring to when you talk about a million context window there.Asim Hussain: Yeah, yeah. And it's kind of an energy question, but the energy doesn't really, kind of, knowing how much, like I've just looked at chat UI window and I've checked a couple of prompts and it's told me the energy, and it's kinda that same world.It's just it's just there to make me feel guilty, whereas the actual advice you should be getting is well, actually no, I, what do I do? How am I supposed to prompt this thing to actually make it consume less energy? And that's the,Chris Adams: Oh, I see. So this is basically, so this is, you're showing me the thing and now you're making me feel bad. And this may be why various providers have hosted chat tools who want people to use them more, don't automatically ship the features that make people feel bad without giving 'em a thing they can actually do to improve that experience.And it may be that it's harder to share some of the guidance like you've just shared about making minimum viable prompt or kind of clear prompt. I mean, to be honest, in defence of Anthropic, they do actually have some pretty good guidance now, but I'm not aware of any of it that actually talks about in terms of here's how to do it for the lowest amount of potential tokens, for example.Asim Hussain: No, I don't see them. I don't see them. I mean, they, yeah, they do have like stuff, which is how to optimize your context window, but at the same time, they're living in this world where everybody's now working to a bigger, that's what they have to do.And I don't know, it's kinda like, where do we, because we, 'cause the AI advice we would typically have given in the past, or we would typically give is listen, just run your AI in a cleaner region. And you are like, well, I can't bloody do that with Anthropic, can I? It's just, it's whatever it is, it's, you know.Chris Adams: That's a soluble problem though. Like,Asim Hussain: Like what I'm just saying or,Chris Adams: Yeah. You know, but like the idea they're saying, "Hey, I want to use the service. And I want to have some control over where this is actually served from."That is a thing that you can plausibly do. And that's maybe a thing that's not exposed by end users, but that is something that is doable.And, I mean, we can touch on, we actually did speak about, we've got Mistral's LCA reporting as one of the things, where they do offer some kind of control, not directly, but basically by saying, "well, because we run our stuff in France, we're already using a low carbon grid."So it's almost like by default you're choosing this rather than you explicitly opting in to have like the kind of greener one by, the greener one through an active choice,I suppose.Asim Hussain: They're building some data centers over there as well, aren't they? So it's a big, it's a big advantage for Mistral to be in France, to be honest with you. It's yeah, they're inChris Adams: this definitely does help, there's, I mean, okay. Well, we had this on our list, actually, so maybe this is something we can talk about for our next story, because another one on our list since we last spoke was actually a blog post from Mistral.ai talking about, they refer to, in a rather grandiose terms, our contribution to a global environmental standard for AI.And this is them sharing for the first time something like a lifecycle analysis data about using their models. And, it's actually one that has, it's not just them who've been sharing this. They actually did work with a number of organizations, both France's agency, ADM. They were following a methodology specifically set out by AFNOR, which is a little bit like one of the French kind of, environmental agency, the frugal AI methodology.And they've also, they were working with I think, two organizations. I think it's Sopra Steria, and I forget the name of the other one who was mentioned here, but it's not just like a kind of throwaway quote from say Sam Altman. It's actually, yeah, here we are is working with Hubblo, which is a nonprofit consultancy based in Paris and Resilio who are a Swiss organization, who are actually also quite, who are quite very well respected and peer reviewed inside this.So you had something, some things to share about this one as well. 'Cause I, this felt like it was a real step forward from commercial operators, but still falling somewhat short of where we kind of need to be. So, Asim, what, when you read this, what were the first things that occurred to you, I suppose, were there any real takeaways for you?Asim Hussain: Well, I'd heard about this, on the grapevine, last year because I think, one of the researchers from Resilio was at greenIO, yeah, in Singapore. And I was there and he gave a little a sneak. They didn't say who it was gonna be, they didn't say it was Mistral, but they said, we are working on one.And he had like enough to tease some of the aspects of it. I suspect once it's got released, some of the actual detail work has not, that's what I'm, I think I'm, unless I, unless there's a paper I'm missing. But yeah, there is kind of more work I think here that didn't end up to actually get released once it's, once it got announced, but there was, it was a large piece of work.It's good. It's the first AI company in the world of this, you know, size that has done any work in this space and released it. Other than like a flippant comment from Sam Altman, "I heard some people seem to care about the emission, energy consumption of AI." So, so that's good. And I think we're gonna use this, it's gonna be used in as a, as I'd say, a proxy or an analog for kind of many other, situations.I think it's, it is lacking a little bit in the detail. But that's okay. I think we, every single company that starts, we should celebrate every organization that leads forward with some of this stuff. it's always very, when you're inside these organizations, It's always a very hard headwind to push against.'Cause there's a lot of negative reasons to release stuff like this, especially when you're in a very competitive space like AI. So they took the lead, we just celebrate that. I think we're going to, there's some data here that we can use as models for other, as, you know, when we now want to look at what are the emissions of Anthropic or OpenAI or Gemini or something like that,there's some more, you know, analogs that we can use. But also not a huge amount of surprise, I'd say, it's kind of a training and inference,Chris Adams: Yep.That turns be where the environmental footprint is.Asim Hussain: Yeah. Training and inference, which is kind of, which is good. I mean, I think obviously hardware and embodied impacts is, they kind of separate kind of the two together.I suspect, the data center construction is probably gonna be, I don't know that is quite low. Yeah, yeah,Chris Adams: I looked at this, I mean this is, it's been very difficult to actually find any kind of meaningful numbers to see what share this might actually make. 'Cause as the energy gets cleaner, it's likely that this will be a larger share of emissions. But one thing that was surprising here was like, this is, you know, France, which is a relatively cr clean grid, like maybe between 40 and say 60 grams of CO2 per kilowatt hour, which is, that's 10 times better than the global average, right?Or maybe 9, between 8 and 10 times cleaner than the global average. And even then it's, so with the industry being that clean, you would expect the embodied emissions from like data centers and stuff to represent a larger one. But the kind of high level, kind of pretty looking graphic that we see here shows that in, it's less than 2% across all these different kind of impact criteria like carbon emissions or water consumption or materials, for example.This is one thing that, I was expecting it to be to be larger, to be honest. The other thing that I noticed when I looked at this is that, dude, there's no energy numbers. Asim Hussain: Oh, yeah. Chris Adams: Yeah. And this is the thing that it feels like a, this is the thing that everyone's continually asking for.Asim Hussain: It's an LCA. So they use the LCAs specification, soChris Adams: That's, a very good point. You're right. that is, that's a valid response, I suppose. 'Cause energy by itself doesn't have a, doesn't have a carbon footprint, but the results of generating that energy does, electricity does have that impact. So yeah.Okay. Maybe that's For Asim Hussain: the audience, they use like a well known, well respected, standardized way of reporting the lifecycle emissions using the LCA lifecycle analysis methodology, which is like an ISO certified standard of doing it. So they adhere to a standard.Chris Adams: So this actually made me realize, if this is basically here and you are a customer of a AI provider, 'cause we were looking at this ourselves trying to figure out, okay, well what people speak to us about a AI policies? And we realized well, we should probably, you know, what would you want to have inside one?The fact that you have a provider here who's actually done this work, does suggest that for that it's possible to actually request this information if you're a customer under NDAs. In the same way that with, if you're speaking to Amazon or probably any of the large providers, if you're spending enough money with them, you can have information that is disclosed to you directly under NDA.So it may not be great for the world to see, but if you are an organization and you are using, say, Mistral, for example, or Mistral services, this would make me think that they're probably more able to provide much more detailed information so that you can at least make some informed decisions in a way that you might not be able to get from some of the other competing providers.So maybe that's one thing that we actually do see that is a kind of. Not really a published benefit in this sense, but it's something that you're able to do if you are in a decision making position yourself and you're looking to choose a particular provider, for example.Asim Hussain: I mean, you should always be picking the providers who've actually got some, you know,Chris Adams: optimize for disclosure,Asim Hussain: optimize for disclosure. Yeah. Always be picking the providers if you optimize for disclosure. I mean, if we, the people listening to this, that is the thing that you can do. And Mistral, They're also, they have some arguments in here as well, which is kind of, they did kind of also surface that it is like a pretty linear relationship between your emissions and the size of the model, which is a very useful piece of information for us to know, as a consumer.Because then we can go, well actually I've heard all these stories about use Smaller models use smaller models and now you actually have some data behind it, which is supporting the fact that, yeah, using a smaller model isn't, it's not got some weird non-linearity to it, so a half size model is only like 10% less, emissions.A half size model is half the emissions. So that's pretty, that's a pretty good thing to know. Helps Mistral, the fact that they have a lot of small models that you can pick and choose, is not, so a lot of this stuff really benefits Mistral. They are the kind of the kind of organization which has a product offering which is benefited, which does benefit a sustainability community.So they have like small models you can use. I think, I wonder actually, Chris, 'cause they do say that they're building their own data center in France, but they've never said where there exists, where they until now, where they've been running their AI. So that might be the reason for, they might have been running it in East Coast US or somethinglike Chris Adams: I think that would be quite unlike, wouldn't be very likely, given that most of their provider, most of their customers are based in probably Western Europe still. Right. There is very much a kinda like Gaelic kind of flavor to the tooling. And I've, I mean actually Mistral, or Mistral's tools are ones which I've been using myself personally over the last, like few months, for example.And it's also worth bearing in mind that they, took on a significant amount of investment from Microsoft a few years back and I would be very surprised if they weren't, or if they weren't using a French data center serving French providers. 'Cause if you were to choose between two countries, okay, if, France or like France actually has, and since 2021, I believe, has had actually a law specifically about measuring the environmental footprint of digital services.So they've got things that they, I think it's called, I'm going to, I'm just gonna share a link to that, to the name of the law because I'm gonna butcher the French pronunciation, but it basically, it translates to Reduce the Environmental Footprint of Digital Services Law.That's pretty much it. And that's where, as a follow on from that, that's what, that's what the RGESN, the kind of general guidance that it shares across kind of government websites in general for France. They've already got a bunch of this stuff out there for like how to do greener IT. I suspect that France is probably gonna be one of, well, probably the premier country, if you'd run, be running a startup to see something like this happening much more so than, well probably the US right now, especially given the current kind of push with its current kind of federal approach, which is basically calling into doubt climate change in the wider sense basically.We were talking about disclosure, right? And we said an optimization for disclosure. And that's probably a nice segue to talk about, another link we had here, which was the energy score leaderboard. Because this is one thing that we frequently point to. And this is one thing that we've suggested in my line of work, that if you are looking to find some particular models, one of the places to look would be the AI Energy Score Leaderboard, which is actually maintained by Hugging Face.And, I share this 'cause it's one of the few places where you can say, I'm looking for a model to help me maybe do something like image generation or captioning text or generating text or doing various things like this. And you can get an idea of how much power these use on a standardized setup.Plus, how satisfied, you know, what the kind of satisfaction score might be, based on these tools and based on a kind of standardized set of like tests, I suppose. The thing is though, this looks like it hasn't been updated since February. So for a while I was thinking, oh, Jesus, does this mean we actually need to, do we have to be careful about who we, how we recommend this?But it turns out that there's a new release that will be coming out in September. It's updated every six months. And, now that I do have to know about AI, this is one thing that I'm looking forward to seeing some of the releases on because if you look at the leaderboard for various slices, you'll see things like Microsoft Phi 1 or Google Gemma 2 or something like that.Asim Hussain: That quite old?Chris Adams: yeah, these are old now, it's six months in generative AI land is quite a long time. There's Phi 4 now, for example, and there's a bunch of these out there. So I do hope that we'll see this actually. And if you feel the same way, then yeah, go on.Asim Hussain: Is it, 'cause, is I always assume this was like a, live leaderboard. So as soon as a model, I suppose once a model, like the emissions of a model are linked to the model and the version of it. So once you've computed that and put on the leaderboard, it's not gonna change. So then it's just the case of as new models come out, you just measure and it just sees how it goes on the leaderboard.Because I'm seeing something here. I'm, I thought open, I'm seeing OpenAI, GPT. Isn't that the one they just released?Chris Adams: No, you're thinking GPT-OSS, perhapsAsim Hussain: Oh.Chris Adams: One thing they had from a while ago. So that one, for example, came out less than two weeks ago, I believe. That isn't showing up here.Asim Hussain: That isn't showing upChris Adams: The, I'm, I was actually looking at this thinking, oh, hang on, it's six months, something being updated, six months,that's, it'd be nice if there was a way, a faster way to expedite kind of getting things disclosed to this. For example, let's say I'm working in a company and I've, someone's written in a policy that says only choose models that disclose in the public somewhere. This is one of the logical places where you might be looking for this stuff right now, for example, and there's a six month lag, and I can totally see a bunch of people saying, no, I don't wanna do that.But right now there's a six month kind of update process for this.Asim Hussain: In the AI realm is an eternity. Yeah.Chris Adams: Yeah. But at the same time, this is, it feels like a thing that this is a thing that should be funded, right? I mean, it's, it feels :I wish there was a mechanism by which organizations that do want to list the things, how to make them to kind of pay for something like that so they can actually get this updated so that you've actually got some kind of meaningful, centralized way to see this.Because whether we like it or not, people are basically rolling this stuff out, whether we like it or not, and I feel In the absence of any kind of meaningful information or very patchy disclosure, you do need something. And like this is one of the best resources I've seen so far, but it would be nice to have it updated.So this is why I'm looking forward to seeing what happens in September. And if you think, if you too realize that like models and timely access to information models might be useful, it's worth getting in touch with these folks here because, I asked 'em about this when I was trying to see when they were, what the update cycle was.And basically the thing they said was like, yeah, we're, really open to people speaking to us to figure out a way to actually create a faster funded mechanism for actually getting things listed so that you can have this stuff visible. Because as I'm aware, as I understand it, this is a labor of love by various people, you know, between their day jobs, basically.So it's not like they've got two or three FTE all day long working on this, but it's something that is used by hundreds of people. It's the same kind of open source problem that we see again and again. But this is like one of the pivotal data sources that you could probably cite in the public domain right now.So this is something that would be really nice to actually have resolved.Asim Hussain: Because there is actually, 'cause the way Hugging Face works is, they have a lab and they have their own infrastructure. Is that how it works? Yeah. So that'sChris Adams: this would, that was be, that was either, that was physically theirs, or it was just some space. Asim Hussain: Spin up. But yeah. But yeah, but they have to effectively like to get the score here. It's not self certified, I presume, but there's a, you know, each of these things has got to get run against the benchmark. So there's basically, if I remember, there was a way of like self certifying.There was literally a way forChris Adams: You could upload your stuff.Asim Hussain: Yeah. OpenAI could disclose to the Hugging Face to the, what the emissions of, you know, what the energy of it was. But most of it is, there's actually, you gotta run against the H100 and there's a benchmarkChris Adams: Yep, exactly. So there's a bit of manual. There's a bit of manual steps to do that, and this is precisely the thing that you'd expect that really, it's not like an insoluble problem to have some way to actually expedite this so that people across the industry have some mechanism to do this. 'cause right now it's really hard to make informed decisions about either model choice or anything like that.Even if you were to architect a more responsibly designed system, particularly in terms of environmental impact here.Asim Hussain: Because if you were to release a new model and you wanted it listed in the leaderboard, you would have to run every other model against. Why would you need to do that? You need toChris Adams: You wouldn't need to do that. You just need to, you, because you don't have control over when it's released, you have to wait six months until the people who are working in that get round to doing that.Asim Hussain: Just the time. It's just a time. Yeah. Someone'sChris Adams: If you're gonna spend like a millions of dollars on something like this, it feels like this is not, even if you were to drop say, if, even if it was to cost, maybe say a figure in the low thousands to do something like this, just to get that listed and get that visible, that would be worth it.So that you've actually got like a functioning way for people to actually disclose this information, to inform decisions. 'Cause right now there's, nothing that's easy to find. This is probably the easiest option I've seen so far and we've only just seen like the AI code of practice that's actually kind of been kind of pub that came into effect in August in Europe for example.But even then, you still don't really have that much in the way of like public ways to filter or look for something based on the particular task you're trying to achieve.I wanted to ask you actually, Asim, so I think, I can't remember last time if I was speaking to you, if this came up, I know that in your, with your GSF hat on, there's been some work to create a software carbon intensity for AI spec, right. Now, I know that there's a thing where like court cases, you don't wanna kind of prejudice the discussions too much by having things internally.Although you're probably not, there isn't like AI court, you can be in contempt of, but I mean, yeah, not yet, but, who knows? Give it another six months. Is there anything that, is there anything, any, juicy gossip or anything you can share that people have been learning? 'cause like you folks have been diving into this with a bunch of domain experts so far, and this isn't my, like, while I do some of this, I'm not involved in those discussions.So I mean, and I'm aware that there has been a bunch of work trying to figure out, okay, how do you standardize around this? What do you measure? You know, do you count tokens? Do you count like a prompt? What's the thing? Is there anything that you can share that you're allowed to talk about before it goes?Asim Hussain: Yeah. I think, we, I think that what we've landed on is that as long as I'm not discussing stuff which is in, you know, active discussion and it's kind of made its way into the spec and there's been, you know, broad consensus over, I think it's pretty safe to talk about it.If there's something that's kind of, and what we do, we do everything in GitHub. So if there's something which is like, I won't, I won't discuss anything which has only been discussed in like an issue or a discussion or comment thread or something. If it's actually made its way into the actual spare, that's pretty safe.So yeah, the way it's really landed is that there's, there was a lot of conversations at the start. There was a lot of conversations and I was very confused. I didn't really know where things were gonna end up with. But you know, at the start there was a lot of conversations around well, how do we deal with training?How do we deal with training? There's this thing called inference. And it's interesting 'cause when we look at a lot of other specs that have been created, even the way the Mistral LCA was done, so they, they gave a per inference, or per request. I've forgotten what they did. It, they didn't do per token.So perChris Adams: they do per chat session or per task, right. I think it's something along those lines. Yeah.Asim Hussain: Something along that, it wasn't a per token thing. But even then they, they added the training cost to it. And like those, some of the questions we were adding, can you add, is there a way of adding like the training? The training happened like ages ago. Can you somehow, is there a function that you can use to amortize that training to like future inference runs?And we explored like lots of conversations. There's like a decay function. So if you were the first person to use a new model, the emissions per token would be higher because you are amortizing more of the training cost and the older models, the, so you explored like a decay function, we explored, yeah.There's lots of ideas.Chris Adams: Similar to the embodied usage, essentially like what we have with embodied versus, embodied carbon versus like use time carbon. You're essentially doing the same thing for training, being like the embodied bit and inference being the usage. And if you had training and you had three inferences, each of those inferences is massive.Like in terms of the car embodied carbon, if there's like a billion, it's gonna much lower per, for each one.Asim Hussain: But then you get into really weird problems because I mean it, we do that with the embodied carbon hardware, but we do that by saying, do you know what? The lifespans gone be four years and that's it. And we're just gonna pretend it's an equal waiting every single day for four years.Chris Adams: Not with the GHG protocol. You can't do it with the GHG protocol. You can't amortize it out like that. You can, you have to do it the same year, so it, your emissions look awful one yearAsim Hussain: Ah, the year that you bought it from. Chris Adams: So this is actually one of the reasons, but yeah, this is actually one of the problems with the kind of default way of measuring embodied carbon versus other things inside this is, it's not, like Facebook for example, they've proposed another way of measuring it, which does that, this kind of amortization approach, which is quite a bit closer to how you might do, I guess, like typical amortization of capital, capitalAsim Hussain: Cap, yeah.Chris Adams: So that's the, that's the difference in the models. And this is, these are some of the kind of honestly sometimes tedious details that actually have quite a significant impact. Because if you did have to, that's gonna have totally different incentive incentives. If you, especially at the beginning of something, if you said, well, if you pay the full cost, then you are incentivized not to use this shiny new model.'Cause it makes you look awful compared to you using an existing one for example.Asim Hussain: And that's one of the other questions like, is like, how do you, I mean, a lot of these questions were coming up like what do you... A we never, we didn't pick that solution. and we also didn't pick the solution of we had the, we actually had the conversation of you amortize it over a year, and then there's a cliff.And then that was like, we're gonna incentivize people to use older models with this idea that older models were the thing. There were questions that pop up all the time. Like, what do you do when you have an open source model? If you were to, if I was to fine tune an open source model and then make a service based off of that, is the emissions of the model the open source model that I got Llama whatever it was, am I responsible for that?Or is the,and there was like, if you were to say, if you were to say no, then you're incentivizing people to just like open source their models and go, "meh well the emissions are free now 'cause I'm using an open source model." So there's lots of these, it's very nuanced. Kind of the, a lot of the conversations we have in the standards space, is like a small decision can actually have a cascading series of unintended consequences.So the thing that we really like sat down was like, what, well, what actually, what do you want to incentivize? Let's just start there. What do we want to incentivize? Okay, we've listed those things we wanna incentivize. Right. Now, let's design a metric, which through no accident incentivizes those things. And where they ended up was basically two,there's gonna be two measures. So we didn't, we didn't solve the training one because there isn't a solution to it. It's a different audience cares about the training emissions than that doesn't, consumers, it's not important to you because it doesn't really matter. It doesn't change how you behave with a model.It doesn't change how you prompt a model just because it had some training emissions in the past. What matters to you most is your direct emissions from your actions you're performing at that given moment in time. So it's likely gonna be like two SCI scores for AI, a consumer and a provider. So the consumer is like inference plus everything else.and also what is the functional unit? There's a lot of conversations here as well, and that's likely to land that now very basically the same as how you sell an AI model. So if you are an LLM, you're typically selling by token. And so why for us to pick something which isn't token in a world where everybody else is thinking token, token, token, token, it would be a very strange choice and it would make the decision really hard for people when they're evaluating certain models. They'd be like, oh, it's this many dollars per token for this one and this many dollars per token for that one. But it's a carbon per growth. And it's a carbon per growth,I can't rationalize that. Where, if it's well look, that's $2 per token, but one gram per token of emissions and that's $4 per token, but half a gram per token for emissions. I can evaluate the kind of cost, carbon trade off, like a lot easier. The cognitive load is a lot easier.Chris Adams: So you're normalizing on the same units, essentially, right?Asim Hussain: Yeah. As how, however it's sold, however, it's, 'cause that's sort of, it's a fast, AI is also a very fast moving space and we dunno where it's gonna land in six months, but we are pretty sure that people are gonna figure out how to sell it, in a way that makes sense. So lining up the carbon emissions to how it's sold.And the provider one is going to be, that's gonna include like the training emissions, but also like data and everything else. And that's gonna be probably per version of an AI. And that will, so you can imagine like OpenAI, like ChatGPT would have a consumer score of carbon per token and also a provider score of ChatGPT 5 has, and it's gonna be probably like per flop or something,so per flop of generating ChatGPT 5, it was this many, this much carbon. And that's really like how it's gonna, it's also not gonna be total totals are like, forget about totals. Totals are pointless when it comes to, to change the behavior. You really want to have a, there's this thing called neural scaling laws.The paper.Chris Adams: Is that the one that you double the size of the model when it's supposed to double the performance? Is that the thing? Asim Hussain: It's not double, but yeah, got relationship. Yeah. So there's this logarithmic, perfectly logarithmic relationship between model accuracy and model size, model accuracy, and the data, the number of training you put into it, and model size and the amount of compute you put into, it's all logarithmic.So it's often used as the reason, the rationale for like why we need to, yeah, larger models is because we can prove it. So, but that basically comes down to like really then, you know, like if like I care more about, but for instance, I don't particularly, it doesn't matter to me how much, it's not that important to know the total training emissions of ChatGPT 5 versus ChatGPT 4.What's far more useful, is to know, well, what was the carbon per flop of training for 4 versus the carbon per flop of training for 5? 'Cause then that gives you more interesting information. Have you, did you,Chris Adams: What does that allow?Asim Hussain: Bother to do anything? Huh?Chris Adams: Yeah. What does that allow me to do? If I know if 5 is 10 times worse per flop than 4, what that incentivize me to do differently? 'Cause I think I might need a bit of hand help here making this call here.Asim Hussain: Because I think, 'cause it, what, let's say ChatGPT 6 is going to come along. The one thing we know absolutely sure is it's just gonna be in terms of total bigger than ChatGPT 5. So as like a metric, it's not, if you are an engineer, if you are somebody trying to make decisions regarding what do I do to actually train this model with causing less emissions, it doesn't really help me because it's just, a number that goes higher and higher.Chris Adams: Oh, it's a bit like carbon intensity of a firm versus, absolute emissions. Is that the much, the argument you're using? So it doesn't matter that Amazon's emissions have increased by 20%, the argument is well, at least if they've got more efficient per dollar of revenue, then that's still improvement.That's the line of reasoning that's using, right?Asim Hussain: Yeah. So it's, because of the way the SCI is, it's not if you want to do a total, there are LCAs, like the thing that Mistral did, there's existing standards that are very well used. They're very well respected. There's a lot of, there's a lot of information about how to do them.You can just use those mechanisms to calculate a total. What the SCI is all about is what is a, KPI that a team can use and they can optimize against, so over time, the product gets more and more efficient? Obviously, you should also be calculating your totals and be making a decision based upon both.But just having a total is, I've gotta be honest with you, it's just, I don't see totals having, in terms of changing behavior, I don't think it changes any behavior. Full stop.Chris Adams: Okay. I wanna put aside the whole, we live in a physical world with physical limits and everything like that, but I think the argument you're making is essentially that, because the, you need something to at least allow you to course correct on the way to reducing emissions in absolute terms, for example. And your argument you're making is if you at least have an efficiency figure, that's something you can kind of calibrate and change over time in a way that you can't with absolute figures, which might be like having a, you know, a budget between now and 2030, for example.That's the thinking behind it, right?Asim Hussain: Yeah. I mean, if you, I've actually got an example here from 'cause we, so we don't have actual compute. They, no, no one's ever disclosed like the actual compute that they used per model. But they have, or they used to disclose the number of parameters per model. And we know that there's a relationship.So there's a really interesting, so for 2, 3 and 4, we have some idea regarding the training emissions and the parameters, not from a disclosure, from like research as well, so between, but when you compute the emissions per billion parameters of the model, so per billion parameters of the model, GPT two was 33.3 tons of carbon per billion parameters of the model.Chris Adams: Okay.Asim Hussain: GPT-3 went down to 6.86 tons of carbon per billion parameters. So it went down from 33 to 6. So that was a good thing. It feels like a good thing, but we know the total emissions of 3 was higher. Interestingly, GPT-4 went up to 20 tons of carbon per billion parameters. So that's like an interesting thing to know.It's like you did something efficient between two and three. You did something good. Whatever it was, we don't know what it was, we did something good actually the carbon emissions per parameter reduced. Then you did something. Maybe it was bad. Maybe I, some, maybe it was necessary. Maybe it was architectural. But for some reason your emissions,Chris Adams: You became massively less efficient in the set, in that next Asim Hussain: In terms of carbon. In terms of carbon, you became a lot less efficient in GPT-4. We have no information about GPT 5. I hope it's less than 20 metric tons per billion parameters.Chris Adams: I think I'm starting to wanna step, follow your argument and I'm not, I'm not gonna say I agree with it or not, but I, the, I think the argument you're making is essentially by switching from, you know, that that in itself is a useful signal that you can then do something with. there was maybe like a regression or a bug that happened in that one that you can say, well, what change that I need to do so I can actually start working my way towards, I don't know, us caree
LLM Energy Transparency with Scott Chamberlin
In this episode of Environment Variables, host Chris Adams welcomes Scott Chamberlin, co-founder of Neuralwatt and ex-Microsoft Software Engineer, to discuss energy transparency in large language models (LLMs). They explore the challenges of measuring AI emissions, the importance of data center transparency, and projects that work to enable flexible, carbon-aware use of AI. Scott shares insights into the current state of LLM energy reporting, the complexities of benchmarking across vendors, and how collaborative efforts can help create shared metrics to guide responsible AI development.Learn more about our people:Chris Adams: LinkedIn | GitHub | WebsiteScott Chamberlin: LinkedIn | WebsiteFind out more about the GSF:The Green Software Foundation Website Sign up to the Green Software Foundation NewsletterNews:Set a carbon fee in Sustainability Manager | Microsoft [26:45]Making an Impact with Microsoft's Carbon Fee | Microsoft Report [28:40] AI Training Load Fluctuations at Gigawatt-scale – Risk of Power Grid Blackout? – SemiAnalysis [49:12]Resources:Chris’s question on LinkedIn about understanding the energy usage from personal use of Generative AI tools [01:56]Neuralwatt Demo on YouTube [02:04]Charting the path towards sustainable AI with Azure Machine Learning resource metrics | Will Alpine [24:53] NVApi - Nvidia GPU Monitoring API | smcleod.net [29:44]Azure Machine Learning monitoring data reference | Microsoft Environment Variables Episode 63 - Greening Serverless with Kate Goldenring [31:18]NVIDIA to Acquire GPU Orchestration Software Provider Run:ai [33:20]Run.AINVIDIA Run:ai Documentation GitHub - huggingface/AIEnergyScore: AI Energy Score: Initiative to establish comparable energy efficiency ratings for AI models. [56:20]Carbon accounting in the Cloud: a methodology for allocating emissions across data center users If you enjoyed this episode then please either:Follow, rate, and review on Apple PodcastsFollow and rate on SpotifyWatch our videos on The Green Software Foundation YouTube Channel!Connect with us on Twitter, Github and LinkedIn!TRANSCRIPT BELOW:Scott Chamberlin: Every AI factory is going to be power constrained in the future. And so what does compute look like if power is the number one limiting factor that you have to deal with? Chris Adams: Hello, and welcome to Environment Variables, brought to you by the Green Software Foundation. In each episode, we discuss the latest news and events surrounding green software. On our show, you can expect candid conversations with top experts in their field who have a passion for how to reduce the greenhouse gas emissions of software.I'm your host, Chris Adams. Hello and welcome to Environment Variables, where we bring you the latest news and updates from the world of sustainable software development. I'm your host, Chris Adams. We talk a lot about transparency on this podcast when talking about green software, because if you want to manage the environmental impact of software, it really helps if you can actually measure it.And as we've covered on this podcast before, measurement can very quickly become quite the rabbit hole to go down, particularly in new domains such as generative AI. So I'm glad to have our guest, Scott Chamberlain today here to help us navigate as we plum these depths. Why am I glad in particular?Well, in previous lives, Scott not only built the Microsoft Windows operating system power and carbon tracking tooling, getting deep into the weeds of measuring how devices consume electricity, but he was also key in helping Microsoft Azure work out their own internal carbon accounting standards. He then moved on to working at Intel to work on a few related projects, including work to expose these kinds of numbers in usable form to developers when people when making the chips that go in these servers. His new project Neuralwatt is bringing more transparency and control to AI language models.And a few weeks back when I was asking on LinkedIn for pointers on how to understand the energy usage from LLMs I use, he shared a link to a very cool demo showing basically the thing I was asking for: real-time energy usage figures from Nvidia cards directly in the interface of a chat tool. The video's in the show notes if you're curious.And it is really, cool. So Scott, thank you so much for joining us. Is there anything else that I missed that you'd like to add for the intro before we dive into any of this stuff?Scott Chamberlin: No, that sounds good.Chris Adams: Cool. Well, Scott, thank you very much once again for joining us. If you are new to this podcast, just a reminder, we'll try and share a link to every single project in the show notes.So if there are things that are particularly interest, go to podcast.greensoftware.foundation and we'll do our best to make sure that we have links to any papers, projects, or demos like we said. Alright, Scott, I've done a bit of an intro about your background and everything like that, and you're calling me from a kind of pleasingly green room today.So maybe I should ask you, can I ask where you're calling from today and a little bit about like the place?Scott Chamberlin: So I live in the mountains just west of Denver, Colorado, in a small town called Evergreen. I moved here in the big reshuffles just after the pandemic, like a lot of people wanted to shift to a slightly different lifestyle. And so yeah, my kids are growing here, going to high school here, and yeah, super enjoy it.It gives me quick ability to get outside right outside my door.Chris Adams: Cool. All right. Thank you very much for that. So it's a green software podcast and you're calling from Evergreen as well, in a green room, right? Wow.Scott Chamberlin: That's right. I have a, I actually have a funny story I want to share from the first time I was on this podcast. It was me and Henry Richardson from Watttime talking about carbon awareness. And I made some focus on how the future, I believe, everything's going to be carbon aware. And I used a specific example of my robot vacuum of like, it's certainly gonna be charging in a carbon aware way at some point in the future.I shared the podcast with my dad and he listened to it and he comes back to me and says, "Scott, the most carbon reduced vacuum is a broom."Chris Adams: Well, it, he's not wrong. I mean, it's a, it's manual but it does definitely solve the problem and it's definitely got lower embedded carbon, that's for sure, actually.Scott Chamberlin: Yeah.Chris Adams: Cool. So Scott, thank you very much for that. Now, I spoke a little bit about your kind of career working in ginormous trillion dollar or multi-billion dollar tech companies, but you are now working at a startup Neuralwatt, but you mentioned before, like during, in our prep call, you said that actually after leaving a couple of the big corporate jobs, you spent a bit of time working on like, building your own version of like what a cloud it might be.And I, we kind of ended up calling it like, what I called it Scott Cloud, like the most carbon aware, battery backed up, like really, kind of green software, cloud possible and like pretty much applying everything you learned in your various roles when you were basically paid to become an expert in this.Can you talk a little bit about, okay, first of all, if it's, if I should be calling it something other than Scott Cloud and like are there any particular takeaways you did from that? Because that's had like quite an interesting project and that's probably what I think half of the people who listened to this podcast, if they had essentially a bunch of time to build this, they'd probably build something similar.So yeah. Talk. I mean, why did you build that and, yeah, what are the, were there any things you learned that you'd like to share from there?Scott Chamberlin: Sure. So, I think it's important to know that I had spent basically every year from about 2019 through about 2022, trying to work to add features to existing systems to make them more, have less environmental impact, lower CO2, both embodied as well as runtime carbon.And I think it's, I came to realize that adding these systems on to existing systems is always going to come with a significant amount of compromises or significant amount of challenges because, I mean, I think it's just a core principle of carbon awareness is that there is going to be some trade off with how the system was already designed.And a lot of times it's fairly challenging to navigate those trade offs. I tend to approach them fairly algorithmically, doing optimization on them, but I had always in the back of my mind thought about what would a system look like if the most important principle that we were designing the system from was to minimize emissions? Like if that was the number one thing, and then say performance came second, reliability came second, security has to come first before everything. There's not a lot of tradeoffs you have to make with carbon awareness and security. So I started thinking, I'm like, "what does a data center architecture look like if this is the most important thing?"So of course, starts with the lowest, it's not the lowest, it's the highest performance-per-watt hardware you can get your hands on. And so really serving the landscape of really what that looked like. Architecting all the, everything we know about carbon awareness into the platform so that developers don't necessarily have to put it into their code, but get to take advantage of it in a fairly transparent and automatic way. And so you end up having things like location shifting as a fundamental principle of how your platform looks to a developer. So, as the idea was, we'd have a data center in France and a data center in the Pacific Northwest of the United States, where you have fairly non-correlated solar and wind values, but you also have very green base loads, so you're not trying to overcome your base load from the beginning.But that time shifting was basically transparent to the platform. I mean, not time shifting, I'm sorry. Location shifting was transparent to the platform. And then time shifting was implemented for the appropriate parts. but it was all done with just standard open source software, in a way that we minimized carbon while taking a little bit of a hit on performance a little bit of a hit on latency, but in a way the developer could continue to focus on performance and latency, but got all the benefits of carbon reduction at the same time.Chris Adams: Ah, okay. So when you said system, you weren't talking about like just maybe like an orchestrator, like Kubernetes that just spins up virtual machines. You're talking about going quite a bit deeper down into that then, like looking at hardware itself?Scott Chamberlin: I started the hardware itself. 'Cause you have to have batteries, you have to have ability to store renewable energy when it's available. You have to have low power chips. You have to have low powered networking. You have to have redundancy. And there's always these challenges when you talk about shifting in carbon awareness of, I guess the word is, leaving your resource, your capital resources idle.So you have to take costs into account with that. And so the goal, but the other challenge that I wanted to do was the goal was have this all location based, very basic carbon accounting, and have as close to theoretically possible minimizing the carbon, as you can. Because it's not possible to get to zero without market based mechanics in when you're dealing with actual hardware.So get as close to net zero as possible from a location based very, basic emissions accounting. So that was kind of the principle. And so, on that journey, we got pretty far to the point of ready to productize it, but then we decided to really pivot around energy and AI, which is where I'm at now.But, so I don't have a lot of numbers of what that actual like net, close to the zero theoretically, baseline is. But I'm pretty close. It's like drastically smaller than what we are using in, say, Hyperscale or public cloud today. Chris Adams: Oh, I see. Okay. So you basically, so rather than retrofitting a bunch of like green ideas onto, I guess Hyperscale big box out outta town style data centers, which already have a bunch of assumptions already made into them, you, it was almost like a clean sheet of paper, basically. You're working with that and that's the thing you spend a bunch of time into. And it sounds like if you were making some of this stuff transparent, it was almost like it wasn't really a developer's job to figure out, know what it was like shifting a piece of code to run in, say, Oregon versus France, for example, that would, that, the system would take care of that stuff.You would just say, I just want you to run this in the cleanest possible fashion and don't, and as long as you respect my requirements about security or where the data's allowed to go, and it would take care of the rest. Basically that was the idea behind some of that, right? Scott Chamberlin: That's the goal because in the many years I've been spending on this, like there's a great set of passionate developers that want to like minimize the emissions of the code, but it's a small percent, and I think the real change happens is if you make it part of the platform that you get a majority of the benefit, maybe, 80th percentile of the benefit, by making it automatic in a way.Chris Adams: The default?Yeah. Scott Chamberlin: My software behaves as expected, but I get all the benefits of carbon reduction automatically. 'Cause developers already have so much to care about. And again, like, it's not every developer actually is able to make the trade offs between performance and CO2 awareness appropriately.Right. It's really hard and we haven't made it easy for people. So that was the goal. Like how do you actually like enable the system to do that for you while the developer can focus on the demands, the principles that they're used to focusing on, making their software fast, making their software secure, making it reliable, making it have good user experience, that kind of stuff. Chris Adams: Ah, that's interesting though. That's almost like, so like the kind of green aspect is almost like a implementation detail that doesn't necessarily need to be exposed to the developers somewhat in a way that when people talk about, say, designing systems for end users to use, there's a whole discussion about whether you, whether it's fair to expect someone to feel terrible for using Zoom and using Netflix, when really like, it makes more sense to actually do the work yourself as a designer or as a developer to design the system so by default is green. So rather than trying to get people to change their behavior massively, you're essentially going with the fact that people are kind of frail, busy, distracted people, and you're working at that level almost.Scott Chamberlin: Yeah, I think that's the exact right term. It is green by default. And that phrase, when I started working on this in Windows, so you know, like you referred to earlier, like I created all the carbon aware features in Windows and there was a debate early on like how do we enable these? Like should the carbon awareness feature, should it be a user experience?I mean, should the user be able to opt in, opt out, that kind of stuff? And it was actually my boss, I was talking to this, he's like, "if you're doing this, it has to be the default," right? And so, you're never going to make the impact on any system if somebody, at the scale we really need to make this impact on, if people have to opt in. It has to be the default. And then sure, they can opt out if there's certain reasons that they want a different behavior. But green by default has to be the main way we make impact. Chris Adams: That's actually quite an interesting, like framing because particularly when you talk about carbon aware and at devices themselves, this is something that we've seen with like a, I guess there is a, there, there's a default and then there's maybe like you, the thing you said before about it's really important to leave people in control so they can override that, feels like quite an important thing.'Cause I remember when Apple rolled out the whole kind of carbon away charging for their phones, for example. Some people are like, "oh, ah, this is really cool. Things have, are slightly greener by default based on what Apple have showed me." But there are some other people who absolutely hated this because the user experience from their point of view is basically, I've got a phone, I need to charge it up, and I plugged it into my wall.And then overnight it's been a really, high carbon grid period. So my phone hasn't been charged up and I woke up and now I've go to work and I've got no phone charger. And it just feels like this is exactly the thing. Like if you don't provide the, like a sensible kind of get out clause, then that can lead to a really, awful experience as well.So there is like quite a lot of thought that needs to guess go into that kind of default, I suppose.Scott Chamberlin: Definitely. Like the user experience of all of these things have to ultimately satisfy the expectations and needs of the users, right. You're, it is another like learning experience we had, it was a deep, it was really a thought experiment, right? When we were working on some of the, and Windows is actually, we were working on the ability to change the timer for how fast the device goes to sleep.Because there's a drastic difference even in between an active mode, and the sleep state that, it's basically when the device will turn on if you touch the mouse, screen's off, it goes into low power state. And so one of the changes we made in Windows was to lower that value from the defaults.And it's fairly complex about how these defaults get set. Basically, they're set by the OEMs and different power profiles. But we wanted to lower the default that all software was provided. And we did some analysis of what the ideal default would be. But the question in the user experience point of view was "if we set this too low, will there be too many people turning it to, basically, entirely off, rather than what the old default was, which was like 10 minutes?" So let's use these values. Theoretically, I can't remember what the exact values are, but old default, 10 minutes, new default three minutes for going from active to sleep.If people were, if three minutes was not the right value and we got maybe 20% of the people entirely turning it off, is the carbon impact worse for the overall, fleet of Windows devices by those 20% people turning off 'cause we got a bad user experience by changing the default? So we had to do all these analyses, and have this ability to really look for unintended consequences of changing these.And that's why the user experience is really critical when you're dealing with some of these things.Chris Adams: Ah, that's, okay, that's quite useful nuance to actually take into account 'cause there is, there's a whole discussion about kind of setting defaults for green, but then there's also some of the other things. And I actually, now that you said that I realize I'm actually just, 'cause I am one of these terrible people who does that because I've, like,I mean I'm using a Mac. Right. And, you see when people are using a laptop and it starts to dim and they start like touching the touch pat thing to kinda make it brighten again. And you see people do that a few times. There's an application called Caffeine on a Mac, and that basically stops it going to sleep, right. And so that's great. I mean, but It's also then introduces the idea of like, am Is my a DD bad adult brain gonna remember to switch that back off again? Like, this are the things that come up. So this is actually something that I have direct experience, so that is very much hitting true with me, actually.Okay. So that was the thing you did with, I'm calling it Scott Cloud, but I assume there was another name that we had for that, but that's, that work eventually became something that Neuralwatt. That's like you went from there and move into this stuff, right?Scott Chamberlin: Right. So, Scott Cloud or Carbon, Net Zero Cloud, was basically a science experiment. And I wanted to deploy it for the purposes of just really seeing, you learn so much when things are in production and you have real users, but before I did it, I started talking to a lot of people I trusted in my network.And one of my old colleagues from Microsoft and a good friend of mine, he really dug into it and started pushing me on like some serious questions like, "well, what does this really impact in terms of energy?" Like it was a CO2 optimization exercise, was that project. And he's like, "well what's the impact on energy?What's the impact on AI?" And actually to, Asim Hussain, he is, he's asked the same question. He's like, "you can't release anything today," and this is, let's rewind, like a year ago, he's like, "you can't release anything today that doesn't have some story about AI," right? And this was just a basic just compute platform with nothing specific about AI.So both of those comments really struck home. I was like, okay, I gotta like figure out this AI stuff we got. And I've gotta answer the energy question, it's wasn't hard 'cause it was already being measured as part of the platform, but I just was focused on CO2. And what it turned out was that there were some really interesting implications once we started to apply some of the optimization techniques to the GPU and how the GPU was being run from energy point of view, that ended up being in, that we, when we looked into it and it ended up being like potentially more impactful in the short term than the overall platform. And so, that colleague Chad Gibson, really convinced me in our discussions to really spin that piece out of the platform as a basis of the startup that we went and decided to build, which we call Neuralwatt now.So yeah, what Neuralwatt really is, like the legacy of that, all that work, but the pieces that we could really take out of it that were focused on GPU energy optimization, within the context of AI, growth and energy demands, because those are becoming really critical challenges, not just for just businesses, but there are critical challenges that are underlying all of our, the work against green software, underlying all of the work, and around trying to reduce emissions of compute as a whole.Right? And we're just really looking at a new paradigm with the exponential increase in energy use of compute and what behaviors that's driving in terms of getting new generators online, as well as what is the user experience behaviors when LLMs are built into everything, LLMs or other AIs are built into everything?And so I felt that was really important to get focused on as quickly as possible. And that's where we really, really jumped off, with Neuralwatt on.Chris Adams: Oh, I see. Okay. So the, basically there is a chunk of like, usage and there's the ability to kind of make an improvement in the existing set of like, like a fleet of servers and everything. Like that's already could have deployed around the world. But you see this thing which is growing really fast.And if we look at things like the International Energy Agency's own report, AI and Energy, they basically say over the next five years looks like it's gonna be a rough, their various projections are saying it's probably gonna be the same energy use as all data centers. So it makes more sense to try and blunt some of that shift as early as possible.Or like that's where you felt like you had more chance for leverage essentially. Scott Chamberlin: More chance for leverage, more interest in really having an impact. Because, I mean, we were in really in a period of flat growth in terms of energy for data centers prior to the AI boom because the increase in use in data centers was basically equaled out by the improvement in energy efficiency of the systems themselves.And there's a lot of factors that went into why that was really balancing, relatively balancing out, but the deployment of the GPUs and the deployment of massively parallel compute and utilization of those from the point of view of AI both training and inference, really changed that equation entirely. Right. And so basically from 2019 on, we've basically seen going from relatively flat growth in data centers to very steep ramp in terms of energy growth.Chris Adams: Okay. Alright. Now we're gonna come back to Neuralwatt for a little bit later. Partly because the demo you shared was pretty actually quite cool actually, and I still haven't had anything that provides that kind of live information. But one thing that I did learn when I was asking about this, and this is probably speaks to your time when you're working in a number of larger companies, is that there is a bit of a art to get large companies who are largely driven by like, say, profits for the next quarter to actually invest in kind of transparency or sustainability measures. And one thing that I do know that when you were working at Microsoft, one thing I saw actually, and this is one thing I was surprised by when I was asked, I was asking on LinkedIn, like, okay, well if I'm using various AI tools, what's out there that can expose numbers to me?And there was actually some work by a guy, Will Alpine, providing some metrics on existing AI for an existing kind of AI pipeline. that's one of the only tools I've seen that does expose the numbers or provide the numbers from the actual, the cloud provider themselves. And as I understood it, that wasn't a thing that was almost like a passion project that was funded by some internal kind of carbon fund or something.Could you maybe talk a little bit about that and how that, and what it's like getting, I guess, large organizations to fund some ideas like that because I found that really interesting to see that, and I, and there was, and as I understand it, the way that there was actually a kind of pool of resources for employees to do that kind of work was actually quite novel.And not something I've seen in that many, places before.Scott Chamberlin: Yeah, no, I think that was great work and Will is, want to, I'm a big fan of Will's work and I had the fortune to collaborate with him at that period of both of our careers when really it was, I don't think carbon work is easy to get done anywhere, in my experience, but that, I think Microsoft had a little bit of forethought in terms of designing the carbon tax. And yeah, we did have the ability to really vet a mission vet projects that could have a material impact against Microsoft's net zero goals and get those funded by the carbon tax that was implemented internally.And so the mechanism was, every, as Microsoft built the capability to audit and report on their carbon, they would assign a dollar value to that from teams and then that money went from those teams budget into a central budget that was then reallocated for carbon reduction goals.And yeah, I think Will was really at the forefront of identifying that these AI and, we all just really said ML back then, but now we all just say AI, but this GPU energy use was a big driver of the growth and so he really did a ton of work to figure out what that looked like at scale, figure out the mechanics of really exposing it within the Hyperscale cloud environment, taking, essentially like NVIDIA's also done a great job in terms of keeping energy values in.their APIs and exposed through their chips and through their drivers, so that you can use it fairly easy on GPU. I would say it's more challenging on CPUs to do so, or the rest of the system, but, so he like did a great job in collaboration with those interfaces to get that exposed into the Azure, I think it's the ML studio is what it's called.So that it has been there for many years, this ability to see and audit your energy values, if you're using the Azure platform. Yeah, those super good work.Chris Adams: Yeah, so this was the thing. I forget the name of it and I'm a bit embarrassed to actually forget it. But, let, I'm just gonna play back to what I think you're saying. 'Cause when I was reading about this is something that I hadn't seen in that many other organizations. So like there's an internal carbon levy, which is basically for every ton that gets emitted, there was like a kind of a dollar amount allocated to that. And that went to like a kind of internal, let's call it a carbon war chest, right? So like there's a bunch of money that you could use. And then any member of staff was basically then able to say, I think we should use some of this to deliver this thing because we think it's gonna provide some savings or it's gonna help us hit our whatever kind of sustainability targets we actually have.And one of the things that came outta that was essentially, actual meaningful energy report energy figures, if you're using these tools, and this is something that no, the other clouds, you're definitely not gonna get from Amazon right now. Google will show you the carbon but won't show you the energy.And if you're using chat GPT, you definitely can't see this stuff. But it sounds like the APIs do exist. So it's just a, it has been largely a case of staff being prepared, they're being kind of will inside the system. And people being able to kind of win those, some of those fights to get people to allocate time and money to actually make this thing that's available for people, right? Scott Chamberlin: The Nvidia APIs definitely exist. I think the challenge is the methodology and the standards, right? So, within a cloud there's a lot of like complexity around how cycles and compute is getting assigned to users and how do you fairly and accurately count for that? GPUs happen to be a little bit simpler 'cause we tend to allocate a single chip to a single user at a single time.Whereas in like CPUs, there's a lot of like hyper threading, most clouds are moving to over subscription or even just single hardware threads are 10 are starting to get shared between multiple users. And how do we allocate the, first the energy, all this starts with energy, how to allocate first the energy, and then the CO2 based on a location.And then, the big complexity in terms of the perception that these clouds want to have around net zero. They're, they want to, everyone wants to say they're net zero for a market-based mechanic. And what's the prevailing viewpoint within the, what is allowed with the GHG protocol or what is the perception that the marketing team wants to have?Is a lot of the challenges. it tends to, at least in the GPU energy, there's not like huge technical challenges, but there's a lot of like marketing and accounting and methodology challenges to overcome.Chris Adams: So that's interesting. Well, so I did an interview with Kate Goldenring who was working at Fermyon at the time. We'll share a link to that for people and I will also share some links to both the internal carbon levy and how essentially large organizations have funded this kind of like climate kind of green software stuff internally.'Cause I think other people working inside their companies will kind of want, will find that useful. But I'm just gonna play back to you a little bit about what you said there and then we'll talk a little bit about the, demo you shared with me. So it does seem like, so GPUs like, the thing that's used for AI accelerators, they can provide the numbers.And that is actually something that's technically possible a lot of the time. And it sounds like that might be kind of tech technically slightly less complex at one level than way the way people sell kind of cloud computing. 'cause when we did the interview with Kate Goldenring, and we'll share the link to that, she basically told, she could have explained to me that, okay, let's say there is a server and it's got maybe, say 32 little processes like, cores inside this, what tends to happen, because not everyone is using all 32 cores at all the same time, you can pretty much get away with selling maybe 40 or 50 cores because not everyone's using all the same tool, all the cores at the same time. And that allows you to basically, essentially sell more compute.So end up having, you make slightly more money and you end up having a much more kinda like profitable service. And that's been one of the kind of offers of cloud. And also from the perspective of people who are actually customers that is providing a degree of efficiency. So if you have, like, if you don't need to build another server because that one server is able to serve more customers, then there's a kinda hardware efficiency argument.But it sounds like you're saying that with GPUs, you don't have that kind of over a subscription thing, so you could get the numbers, but there's a whole bunch of other things that might make it a bit more complicated elsewhere, simply because it's a new domain and we are finding out there are new things that happen with GpUs, for example.Scott Chamberlin: Yeah. So, yeah, that's exactly what I was trying to say. And I think we are seeing emerging GPU over subscription, GPU sharing. So at the end of that will probably change at some point and at scale. It's certainly the technology is there. Like I think NVIDIA's acquisition of run.ai, enables some of this GPU sharing and that, they acquired that company of like six months ago and It's now open source and so people can take advantage of that.But yes, I think that the core principle is like, from a embodied admissions point of view and in a, green software point of view, it's relatively a good practice to drive up the utilization of these embodied missions you've already like purchased and deployed. There are a lot, some performance implications around doing the sharing that how, it gives back user experience, but today the really, the state of the art is GPU, is that it's mostly just singly allocated and fully utilized when it's utilized or it's not fully utilized, but it's utilized for a single customer, at a time. But that is certainly changing.Chris Adams: Oh, okay. So I basically, if I'm using a tool, then I'm not sharing it with anyone else in the same way that we typically we'd be doing with cloud and that, okay, that probably helps me understand that cool demo you shared with me then. So maybe we'll just talk a little bit about that. 'cause this was actually pretty, pretty neat when I actually asked that when you showed like, here's a video of literally the thing you wished existed, that was kind of handy.Right? So, basically if you, we will share the link to the video, but the key thing that Scott shared with me was that using tools like say a chat GPT thing or anthropic where I'm asking questions and I'll see kind of tokens come out in us when I'm asking a question. It we were, we basically saw charts of realtime energy usage and it changing depending on what I was actually doing.And, maybe you could talk a little bit about actually what's actually going on there and how you came to that. Because it sounds like Neuralwatt wasn't just about trying to provide some transparency. There's actually some other things you can do. So not only do you see it, but you can manage some of the energy use in the middle of a, for like an LLM session, for example, right? Scott Chamberlin: So yeah, at the first stage, the question is really just what is, can we measure what's happening today and what does it really look like in terms of how you typically deploy, say, a chat interface or inference system? So, like I was mentioning, we have ability fairly easily because NVIDIA does great work in this space to read those values on the GPU specifically, again, there's system implications for what's happening on the CPU what's happening on the network, the discs.They tend to be outstripped by far because these GPUs use so much energy. But so, the first step in really that demo is really just to show what the behavior is like because what we ultimately do within the Neuralwatt code is we take over all of the energy management, all of the system, and We train our own models to basically shift the behavior of servers from one that is focused on maximizing performance for the available energy to balancing the performance for the energy in a energy efficiency mode, essentially. So we are training models that shift the behavior of energy of the computer for energy efficiency.And so that's why we want to visualize multiple things. We want to visualize what the user experience trade off is. Again, going back to the user experience. You have to have great user experience if you're gonna be doing these things. And we want to visualize the potential gains and the potential value add for our customers in making this shift.Because, I think we talk about, Jensen Huang made a quote at GTC that we love is that, we are a power constrained industry. Every AI factory is going to be power constrained in the future. And so what does compute look like if power is the number one limiting factor that you have to deal with?So that's why we believe, we really want to enable service to operate differently than what they've done in the past. And we want there to be some, essentially think about, as, like energy awareness, right? That's the word I come back to. Like we want behavior of servers to be energy aware because of these power constraints.Chris Adams: Ah, okay. Alright. you said a couple of things that I, that kind of, I just want to run by you to check. So, with the, there's this thing, there's all these new awarenesses, there's like carbon aware, then there's grid aware, then there's energy aware. This is clearly like an area where people were trying to figure out what to call things.But the Neuralwatt, the neural, the thing that you folks are doing was basically okay, yes, you have access to the power and you can make that available, so I'm using something, but I'm just gonna try and run this by you and I might be right and you, I might need you to correct me on this, but it sounds a little bit like the thing that you are allowing to do is almost throttling the power that gets allocated to a given chip. 'Cause if you use, like things like Linux or certain systems they have, like they can introduce limits on the power that is allocated to a particular chip. But if you do that, that can have a unintended effect of making things run a little bit too slowly, for example.But there, there's a bit of head, there's a bit of headroom there. But if you are able to go from giving absolute power, like, take as much power as you want to, having a kind of finite amount allocated, then you can basically still have a kind of a good, useful experience, but you can reduce it to the amount of power that's actually be consumed. It sounds like you're doing something a little bit like that, but with Neuralwatt thing. So rather than giving it, carte blanche to take all the power, you are kind of asking it to work within a kind of power envelope. That means that you're not having to use quite so much power to do the same kind of work.Is that it?Scott Chamberlin: Yeah. So if you go back to the history of like, before we had GPUs everywhere, the CPUs have fairly, let's call 'em like moderate level sophistication of terms of power management. They have sleep states, they have performance states, and there's components that run on the OS that are called, basically CPU governors that govern how a CPU behaves relative to various constraints.And so, when you allocate a, let's say a Linux VM in the cloud, I don't know why this is, but a lot of 'em get default allocated with a, I'm the name of, it's slipping in my mind, but there's about five default CPU governors in the default Linux Distros, and they get out allocated with the power save one, actually.And so what it does, it actually limits the top frequencies that you can get to, but it essentially is balancing power and performance is kind of the default that you get allocated. You can check these things, you can change it to a performance mode, which basically is gonna use all of the capability of the processor at a much higher energy use.And, but on the GPU it's a lot less sophisticated, right? There's, GPUs don't tend to support any sleep states other than just power off and on. And they do have different performance states, but they're not as sophisticated as the CPU has historically been. And so essentially we are inserting ourselves into the OS Neuralwatt and managing it in a more sophisticated manner around exactly how you're describing.We're making the trade off and we're learning the trade off really through modeling. We're learning the trade off to maintain great user experience, but get power gains, power savings, with our technology, and doing this across the system. So, yes, I think your description essentially, very good. And, we're just essentially adding a level sophistication into the OS, than what exists today.Chris Adams: Okay. So basically, rather than being able to pull infinite power is, has, like, it's an upper limit by how much it can pull, but you'd probably want to kind of, the reason you're doing some of the training is you're based on how people use this, you'd like the upper limit, the kind of, the upper limit available to what's actually being needed so that you've, you're still giving enough room, but you're not, you're delivering some kind of savings.Scott Chamberlin: Yeah, and it's important to understand that there's, it's fairly complex, which is why we train models to do this rather than do it, Chris Adams: Like sit at one level and just one and done. Yeah.Scott Chamberlin: Because think about like a LLM, right? So there's essentially two large phases in inference for an LLM. And one of the first phase is really compute heavy, and then the second phase is more memory heavy. And so we can use different, power performance trade-offs in those phases. And understanding what those phases are and what the transition looks like from a reservable state is part of what we do. And then the GPU is just one part of the larger system, right?It's engaged in the CPU. A lot of these LLMs are engaged in the network. And so how do we balance all the, tradeoffs so to maintain the great user experience for the best amount of power efficiency? That's essentially like what we're, our fitness function is when we're essentially training.Chris Adams: Ah. I think I understand that now. And, and what you said about the, those two phases, presumably that's like one of, one of it is like taking a model, loading it to something a bit like memory. And then there's a second part which might be accessing, doing the lookups against that memory. Because you need to have the thing, the lookup process when you're seeing the text come out that is quite memory intensive rather than CPU intensive.So if you're able to change how the power is used to reflect that, then you can deliver some kind of savings inside that. And if you scale that up a data center level that's like, like 10%, 20, I mean, maybe even, yeah. Do you have an idea of like what kindScott Chamberlin: We tend to shoot for at least 20% improvements in what I would say performance per unit of energy. So tokens per Joule is the metric I tend to come back to fairly often. But again, how exactly you measure energy on these things, what is the right metric, I think is, I think you need to use a bunch of 'em.But, I like tokens per Joule 'cause it's fairly simple and it's fairly, it's easy to normalize. But like, it's, it gets super interesting in this conversation about like, inference time, compute and thinking LLMs and stuff like that. 'Cause they're generating tons and tons of tokens and not all of 'em are exposed to, essentially improve their output.And so people use all their metrics, but they're harder to normalize. So, yeah, long, long story short, I tend to come back to tokens for Joule is my favorite, but,Chris Adams: So what, so it sounds like the thing that you're working on doing is basically through kind of judicious use of like power envelopes that more accurately match what is actually being required by a GPU or anything like that, you're able to deliver some savings that way. That's essentially, that's one of the things, and like you said before when we were talking about kind of Scott Cloud, that's transparent to the user.I don't have to be thinking about my prompt or something like that. This is happening in the background, so I don't really, my experience isn't changed, but I am basically receipt of that, 20% of power is basically not being turned into carbon dioxide in the sky, for example, but it's basically the same other than that though.Scott Chamberlin: That's the goal, right? Essentially we've informed our work continually on number one, user experience has to be great, number two, developer experience has to be great, which means the developer shouldn't have to care about it. So, yeah, it's a single container download, it runs in the background.It does all the governance in a fairly transparent way. But you know, all throughout as well, like, we actually have CO2 optimization mode as well, so we can do all of this. Fall mode is really energy, but we actually can flip a switch and we get an extra degree of variability where if we're optimizing on CO2, average or marginal emissions. so, we can vary those behaviors of the system relative to the intensity of the carbon in the grid as well. SoChris Adams: Okay. Or possibly, if not the grid, then the 29 data, 29 gas turbines that are powering that data center today, for example.Scott Chamberlin: I think that's an emerging problem. And I actually would love to talk to somebody that has a data center that is having a microgrid with a gas turbine, because I actually do believe there's additional optimization available for these little microgrids, that are being deployed alongside these data centers.If you were to do plummet all the way through in this energy, again, go back to energy awareness, right. Like if your servers knew how your microgrids were behaving relative to the macro grid that they were connected to, like, there's so many interesting optimizations available and, people are looking at this from the point of view of the site infrastructure, but like the reality is all of the load is generated by the compute on the server.Right. And that's what we're really trying to bring it all the way through to where load originates and the behavior where, while maintain that user experience. SoChris Adams: Okay. So you said something interesting there, I think, about this part, like the fact that a, you mentioned before that GPU usage is a little bit less sophisticated right now. You said it's either all on and all off. And when you've got something which is like, the power, the multiple thousands of homes worth of power, that can be all on and all off very, quickly.That's surely gotta have to have some kind of implications, within the data center, but also any poor people connected to the data center. Right? Because, if you are basically making the equivalent to tens of thousands of people disappear from the grid, then reappear from the grid like inside in less than a second, there's gotta be some like a knock on effect for that.Like, you spoke about like gas turbines. It's like, is there, do you reckon we're gonna see people finding something in the middle to act like a kind of shock absorber for these changes that kind of go through to the grid? Because if you're operating a grid, that feels like the kind of thing that's gonna really mess with you being able to provide like a consistent, kind quality of power to everyone else.If you've got the biggest use of energy also swinging up and down the most as well, surely.Scott Chamberlin: Yeah, and it's certainly like a, I don't know if existential problem is the right word, but it's certainly a emerging, very challenging problem, Chris Adams: Mm-hmm. Scott Chamberlin: within the space of data centers is the essentially like seeking up of some of these behaviors among the GPUs to cause correlated spikes and drops in power and it's not, it has implications within your data center infrastructure and, to the point where we hear from customers that they're no longer deploying some of their UPSs or their battery backups within the GPU clusters because they don't have the electronics to handle the loads shifting so dramatically, to the point where we're also getting emerging challenges in the grid in terms of how these loads ramp up or down and affect, say, I'm not gonna get into where I'm not an expert in terms of the generational, aspects of generation on the grid and maintaining frequency, but it has implications for that as well. But so, we, in the software we can certainly smooth those things out, but there's also, I mean, there's weird behaviors happening right now in terms of trying to manage this.My favorite, and I don't know if you've heard of this too, Chris, is PyTorch has a mode now where they basically burn just empty cycles to keep the power from dropping down dramatically when, I think it's when weights are sinking, in PyTorch, I'm not exactly sure when it was implemented.Because i've only read about it, but you know, when you maybe need to sink weights across your network and so some GPUs have to stop, what they've implemented is some busy work so that the power doesn't drop dramatically and cause this really spiky behavior. Chris Adams: Ah. Scott Chamberlin: So I think what Chris Adams: you're referring to, Yeah. So this the PyTorch_no_powerplant_blowup=1 they had, right? Yeah. This, I remember reading about this in semi analysis. It just blew my mind. The idea that you have to essentially, keep it running because the spike would be so damaging to the rest of the grid that they have to kind of simulate some power, so it doesn't, so they don't have that that change propagate through to the rest of the grid, basically.Scott Chamberlin: Correct. And so, that's one of the, we look at problems like that, there's that problem in terms of the thinking of the way the problem if you train, start a training runwhere all the GPUs basically start at the same time and create a surge. And, so, we help with some of those situations in our software.But yes, I think that some of the behaviors that are getting implemented, like the, no_powerplant_blowup=1 , they're fairly, I would say they're probably not great from a green software point of view because anytime we are, we're, doing busy work, that's an opportunity to reduce energy, reduce CO2 and there probably are ways of just managing that in a bit with a bit more sophistication depending on the amount of, the scale that you're working at, that is, probably may have been more appropriate than that. So this is definitely Chris Adams: still needs Scott Chamberlin: to be looked at a little bit. Chris Adams: So like, I mean before like in the pre AI days, there was this notion of like a thundering herd problem where everything tries to use the same connection or do the same at the same time. It sounds like this kind PyTorch_no_powerplant_blowup=1 is essentially like the kind of AI equivalent to like seeing that problem coming and then realizing it's a much greater magnitude and then figuring out, okay, we need to find a elegant solution to this in the long run.But right now we're just gonna use this thing for now. Because it turns out that having incredibly spiky power usage kind of propagating outside the data center wrecks all kinds of havoc basically. And we probably don't want to do that if we want to keep being connected to the grid.Scott Chamberlin: Yeah. But at it's really a, spiky behavior at scale is really problematic. Yes.Chris Adams: Okay, cool. Dude, I'm so sorry. We're totally down this kind of like, this AI energy spikiness rabbit hole, but I guess it is what happenswhen Scott Chamberlin: it's certainly a, it's certainly customers are really interested in this because it's, it, I mean if we were to like bubble up one level, like there's this core challenge in the AI space where the energy people don't necessarily talk the same language as the software people.And we, I think that's one place where maybe Hyperscale has a little bit more advantage 'cause it has emerged from software companies, but Hyperscale is not the only game in town and especially when we're going to neo clouds and stuff like that. And so, I think one of our like side goals is really how do we actually enable people talking energy and infrastructure to have the same conversations and create requirements and coordinate with the software people running the loads within the data centers? I think that's the only way to really solve this holistically. So,Chris Adams: I think you're right. I mean, this is, to bring this to some of the kind of world of green software, I suppose, the Green Software Foundation did a merger with, I think they're called it, SSIA, the Sustainable Servers and Infrastructure Alliance. I think it's something like that. We had them on a couple of episodes a while ago, one where there was a whole discussion about, okay, how do, setting out some hardware standards to have this thing kind of crossing this barrier.Because, like you said, it does it, as we've learned on this podcast, some changes you might make at AI level can have all these quite significant implications. Not just thinking about like the air quality and climate related issues of having masses and masses of on-premise gas turbines. But there's a whole thing about power quality, which is not something that you've had to think about in terms of relating to other people, but that's something that's clearly needs to be on the agenda as we go forward.Just like, like responsible developers. I should, before we just kind of go further down there, I should just check, we just, we're just coming up to time. we've spent all this time talking about this and like we've mentioned a couple of projects. Are there any other things that aren't related to, like spiky AI power that you are looking at and you find, Hey, I wish, I'm on this podcast, I wish more people knew about this project here or that project there.Like, are there any things that you are, you've, read about the news or any people's work that you're really ins impressed by and you wish more people knew about right now?Scott Chamberlin: Yeah, I mean, I think a lot of people probably on this podcast probably know about AI Energy Score. Like I think that's a promising project. Like I really believe that we need to have the ability to understand both the energy and the CO2 implications of some of these models we're using and the ability to compare them and compare the trade-offs.I do think that, the level of sophistication needs to get a bit higher because it's, right now it's super easy to trade off model size and energy. Like, I can go, single GPU and, but I'm trading off capabilities for that. So how do we, I think on one of my blog posts, it was someone's ideas.Like you really have to normalize against the capabilities and the energy at the same time for making your decisions about what the right model is for your use cases relative to the energy available to say the CO2 goals you have. So, but yeah, I think eventually they'll get there in that project.So I think that's a super promising project. Chris Adams: We will share a link to that. So we definitely got some of that stuff for the AI Energy Score, 'cause it's entirely open source and you can run it for open models, you can run it for private models and if you are someone with a budget you can require customers to, or you can require suppliers to publish the results to the leaderboard, which would be incredibly useful because this whole thing was about energy transparency and like.Scott Chamberlin: Yeah,Chris Adams: I'm glad you mentioned that. That's like one of the, I think that's one of the more useful tools out there that is actually relatively, like, relatively easy to kind of write into contracts or to put into a policy for a team to be using or team to be adopting, for example.Scott Chamberlin: Correct. Yep. No, a big fan, so.Chris Adams: Well that's good news for Boris. Boris, if you're hearing this then yeah, thumbs up, and the rest of the team there, I only mentioned Boris 'cause he's in one of the team I know and he's in the Climateaction.tech Slack that a few of us tendScott Chamberlin: Yeah. Boris and I talked last week. Yeah. A big fan of his work and I think Sasha Luccioni, who I actually never met, but yeah, I think she's also the project lead on that one.Chris Adams: Oh, Scott, we are coming up to the time and I didn't get a chance to talk about anything going on in France, and with things like Mistral sharing some of their data, some of their environmental impact figures and stuff like that because it's actually, it's kind of, I mean, literal, just two days ago we had Mistral, the French kind of competitor to open AI,they, for the first time started sharing some environmental figures and quite a lot of detail. More so than a single kind of like mention from Sam Altman about power, about the energy used by AI query. We've, we actually got quite a lot of data about the carbon and the water usage and stuff like that.But no energy though. But that's something we'll have to speak about another time. So hopefully maybe I'll get, be able to get you on and we can talk a little bit about that and talk about, I don't know the off-grid data centers of Crusoe and all the things like that. But until then though, Scott, I really, I'm, I've really enjoyed this deep dive with you and I do hope that the, our listeners have been able to keep up as we go progressively more detailed.And, if you have stayed with us, listeners, what we'll do is we'll make sure that we've got plenty of show notes so that people who are curious about any of this stuff can have plenty to read over the weekend. Scott, this has been loads of fun. Thank you so much for coming on and I hope you have a lovely day in Evergreen Town.Scott Chamberlin: Thanks Chris.Chris Adams:
Real Efficiency at Scale with Sean Varley
Anne Currie is joined by Sean Varley, Chief Evangelist and VP of Business Development at Ampere Computing, a leader in building energy-efficient, cloud-native processors. They unpack the energy demands of AI, why power caps and utilization matter more than raw compute, and how to rethink metrics like performance-per-rack for a greener digital future. Sean also discusses Ampere’s role in the AI Platform Alliance, the company’s partnership with Rakuten, and how infrastructure choices impact the climate trajectory of AI.Learn more about our people:Anne Currie: LinkedIn | WebsiteSean Varley: LinkedIn | WebsiteFind out more about the GSF:The Green Software Foundation Website Sign up to the Green Software Foundation NewsletterResources:Ampere Cloud Native Processors – Ultra-efficient ARM-based chips powering cloud and edge workloads [02:30]AI Platform Alliance – Coalition promoting energy-efficient AI hardware [04:55]Ampere + Rakuten Case Study – Real-world deployment with 36% less energy per rack [05:50]Green Software Foundation Real Time Cloud Project – Standardizing real-time carbon data from cloud providers [15:10]Software Carbon Intensity Specification – Measuring the carbon intensity of software [17:45]FinOps Foundation – Financial accountability in cloud usage, with sustainability guidance [24:20]Kepler Project – Kubernetes power usage monitoring [26:30]LLaMA Models by Meta [29:10]Anthropic’s Claude AI [31:25]Anne Currie, Sara Bergman & Sarah Hsu: Building Green Software [34:00]If you enjoyed this episode then please either:Follow, rate, and review on Apple PodcastsFollow and rate on SpotifyWatch our videos on The Green Software Foundation YouTube Channel!Connect with us on Twitter, Github and LinkedIn!TRANSCRIPT BELOW:Sean Varley: Because at the end of the day, if you want to be more sustainable, then just use less electricity. That's the whole point, right. Chris Adams: Hello, and welcome to Environment Variables, brought to you by the Green Software Foundation. In each episode, we discuss the latest news and events surrounding green software. On our show, you can expect candid conversations with top experts in their field who have a passion for how to reduce the greenhouse gas emissions of software.I'm your host, Chris Adams. Anne Currie: Hello and welcome to the World of Environment Variables, where we bring you the latest news and updates from the world of sustainable software. So I'm your guest host today. It's not, you're not hearing the usual dulcet tones of Chris Adams. My name is Anne Currie. And today we'll be diving into a pressing and timely topic, how to scale AI infrastructure sustainably in a world where energy constraints are becoming a hard limit. And that means that we are gonna be, have to be a little bit more clever and a little bit more careful when we choose the chips we run on. So it's tempting to believe that innovation alone will lead us towards greener compute, but in reality, real sustainability gains happen when efficiency becomes a business imperative when performance per watt, cost and carbon footprint are all measured and all have weight. So, that's where companies like Ampere come in, with cloud native energy efficient approaches to chip design. They're rethinking how we power the AI boom, not just faster but smarter. It's a strategy that aligns directly with Green Software Foundation's mission to reduce carbon emissions from the software lifecycle, particularly in the cloud. So in this episode, we'll explore what this looks like at scale and what we can learn from Ampere's approach to real world efficiency. So what did it take? What does it take to make an AI ready infrastructure that's both powerful, effective, and sustainable? Let's find out. And today we have with us Sean Varley from Ampere.So Sean, welcome to the show. Can you tell us a little bit about yourself?Sean Varley: Yeah, absolutely Anne, and thanks first for having me on the podcast. I'm a big fan, so, I'm looking forward to this conversation. So I'm the chief evangelist of Ampere Computing. And, I, now what that means is that we run a lot of the ecosystem building and all of the partnership kind of, works that go on to support our silicon products in the marketplace.And also, build a lot of awareness right around some of these concepts you introduced. You know, all of the, you know, kind of building out that awareness around sustainability and power efficiency and how that also really kinda works, within different workload contexts and workload context change over time.So all of those sorts of things are kind of in scope, for the evangelism role.Anne Currie: That's, that is fantastic. So I'll just introduce myself a little bit as well. My name is Anne Currie. If you haven't heard the podcast before, I am one of the authors of O'Reilly's new book, Building Green Software, which I, as I always say, everybody who's listening to this podcast should read Building Green Software.That was, that is entirely why we wrote the book. I'm also the CEO of the training and Green Consulting Company as Strategically Green. So, hit me up on LinkedIn if you want to talk a little bit about training consultancy, but back to the, back to the podcast. Oh, and I need to remember that everything we'll be talking about today, there will be links about it in the show notes.So you don't need to worry about writing down URLs or anything. Just look at the show notes before. So, now, I'm actually gonna start off the question by harking, start off the podcast by harking back to somebody that we had on the podcast a couple of months ago. A chap called, Charles Humble. And his, the assertion that he was making was that we all need to wake up to the fact that there isn't just one chip anymore, there isn't a default chip anymore that everybody uses and is kind of good enough for the best in all circumstances to use. when you are, setting up infrastructure, or in the cloud for example, and you have the dropdown that picks witch chip you're going use, the defaults might be Intel, for example. That is no longer a no-brainer, that you just go with the default. There are lots and lots of options, to the extent that, I mean, Ampere is a new chip company that decided to go into the market. So one of the questions that I have is why? You know, what gap did you see that it was worth coming in to fill?Because 10 years ago we would've said there was no real gap, wouldn't we?Sean Varley: That's right. Yeah. Actually it was a much more homogenous ecosystem back in those days. You know, and I, full disclosure, I came from Intel. I did a lot of time there. But about seven years, six years ago, I chose to come to Ampere. and part of this was the evolution of the market, right?The cloud market came in and changed a lot of different things, because there's kind of classically, especially in server computing, there's sort of the enterprise and the cloud and the cloud of course has had a lot of years to grow now. And the way that the cloud has evolved was to, really kind of, you know, push all of the computingto the top of its performance, the peak performance that you could get out of it. But there, you know, nobody really paid attention to power. Going back, you know, 10, 15, 20 years, nobody cared. And those were in the early days of Moore's law. And, part of what happened with Moore's Law is as frequencies, you know, grew then so did performance, you know, linearly.And I think that sort of trained into the industry a lot of complacency. And that complacency then became more ossified into the, you know, the way that people architected and what they paid attention to, metrics that they paid attention to when they built chips. But going back about seven, eight years, we actually saw that there was a major opportunity to get equal or better performance for about half the power. And that's kind of what forms some of our interest in building a company like Ampere. Now, of course, Ampere, since its inception has been about sustainable computing and, me being personally sort of in interested in sustainability and green technology and those sorts of thingsjust outside of the, my profession, you know, I, was super happy to come to a company like Ampere that had that in its core. Anne Currie: And that's very interesting. So really and Ampere, your chip is a, is an X86 chip, so it's not competing against ARM is more competing against Intel and AMD.Sean Varley: It's actually, it is an ARM chip. It's a, it's based on the ARM instruction set. And, yeah, so it's kind of an interesting dynamic, right? There was, there's been a number of different compute architectures that have been put into the marketplace. and the X86 instruction set classically by Intel and a MD who followed them, have dominated the marketplace, right?And, well at least they've dominated the server marketplace. Now, ARM has traditionally been in mobile handsets, embedded computing, things like this. But part of where the, that architecture was built and its roots were grown up in more power-conscious markets, you know, because anything running on a battery you want to have be pretty power miserlyAnne Currie: Yeah.Sean Varley: to use the word. So yeah, the ARM instruction set and the ARM architecture did offer us some opportunities to get a lift when we first, when we were a young company, but it doesn't necessarily have that much of a bearing on overall what we can do for sustainability, because there's many things that we can do for sustainability and the instruction set of the architecture is only one of them.And it's a much smaller one. I, it is probably way too detailed to get into on this podcast, but it is one factor and so yes, we are ARM instruction set based and about four years back, we actually started creating our own course, on the instruction set. And that's sort of been an evolution for us because we wanted to maintain this focus on sustainability, low power consumption, and of course, along with that, high performance. Anne Currie: Oh, that's interesting. So as you say, the instruction set is only one part of what you're attempting, of what you're doing to be more efficient, to be, to use less power to per operation. What else are you, what else are you doing?Sean Varley: Oh, many things. Yeah. So the part of this that kind gets away from the instruction set is how you architect and how you present the compute to the user, which may get further into kind of some of your background and interest around software because, part of what we've done is architect a chip or a set of family of chips that now that are very, well, they start off with area efficiency in the core.And how we do a lot of that is we focus, on cache, cache configuration. So we, you, we use a lot more of what we call L2 cache, which is right next to the cores that helps us get performance. We've, kind of steered away from the X86 industry, which is much more of a larger L3 cache, which is a much bigger piece and area, part of the area of the chip.And so that's one of the things that we've done. We've, but we've also kind of just decided that many of the features of the X86 architecture are not necessary for high performance or efficiency in the cloud. And part of this is because software has evolved. So what are those things? Turbo, for example. Turbo is a feature that kind of moves the frequency of the actual cores around, depending on how much thermal headroom the chip has. And so if you have a small amount of cores, the frequency could be really high. But if you have a lot amount of cores doing things, then you, then it pulls the frequency back down low because you've only got so much thermal budget in the chip. So we got, we said, oh, we're just gonna run all of our cores at the same frequency.And we've designed ourselves at a point, in the, you know, voltage frequency curve that allows us that thermal headroom. Now, that's just one other concept, but, so many things have really kind of, you know, created this capability for us to focus on performance per watt and all of those things are contributors to how you get more efficient.Anne Currie: Now that's, that is very interesting. So why, yeah, it's, what was your original motivation? Was it for the cloud? What did you, were you designing with the cloud in mind or were you designing more with the devices in mind?Sean Varley: Yeah, we absolutely, we're in, are, you know, designing for cloud, because, cloud is such a big mover in how things evolve, right? I mean, if you're looking at markets, there's always market movers, market makers and the way that you can best accomplish getting something done. So if our goal is to create a more sustainable computing infrastructure, and now in the age of ai, that's even become more important, but, if our goal is that, then we need to go after the influencers, right? The people that will actually, you know, move this, the needle. And so the cloud was really important and we've, had a kind of this, you know, overall focus on that market, but it's not,our technology is not limited to it. Our technology is, you know, by far and away much more power efficient anywhere from all the way out at the edge and devices and automotive and networks all the way into the cloud. But the cloud also gave us a lot of the paradigms that we have also been attached to.So when we talk about cloud native computing, we're really kind of hearkening to that software model that was built out of the cloud. The software model built out of the cloud is something that they call serverless, in the older days. Or now it's, you know, microservices and some of these sorts of concepts.And so as software has grown, so have we, you know, kind of put together a hardware architecture that meets that software where it is, because what that software is about is lots of processes, you know, working together to formulate a big service. And so those little processes are very latency sensitive.They need to have predictability, and that's what we provide is our architectures, lots of cores that all run at the same kind of pace, and so you get high degree of predictability out of that architecture, which then makes the software and the entire service more efficient.Anne Currie: So that's, that is very interesting. And I hadn't realized that. So obviously things like serverless going on in clouds, that is a, the software that's actually running on the chip is software that was written by usually the cloud provider. You know, the, clouds wrote that software.So it, you are isolating from, it is, one of the interesting things about high performance software is that it's hard, really hard to write. In fact, in Building Green Software, I always talk about people about don't start there, it's really hard. You need specialist skills. You need to know the difference between L2 caches and L3 caches.And you need to know how to use them. And the vast majority of engineers do not have those skills. And it will never achieve, will never acquire those skills. But the cloud providers where they are managing, providing managed services that you are using, like, you're just writing a code snippet that's running in Lambda or whatever. You are not writing the code that makes that snippet run. You're not writing the code that talks to the chip. Really super specialist engineers at AWS or Azure or whatever are writing that code.So is that the, is that the move that you were anticipating?Sean Varley: Absolutely. I mean, that's a big part of it, right? And as you just articulated a lot of the platform as a service kind of code, right, so that managed service that's coming out of a hyperscaler is, you know, built to be cloud native. It's built to be very microservice based.And it has a lot of what we call SLAs in the industry, right? Service level agreements, which mean that you need to have a lot of, different functions complete, on time for the rest of the code to work as it was designed. And as you said, it is a much more complex way to do things, but the overall software industry has started to make it a lot easier to do this, right. And things like containers, you know, which are inherently much more efficient. you know, sort of, you know, entities, yeah, like, footprints, images is what I was really kind of going for there. They're, they are, you know, already you've cut out a lot of the fat, right, in the software. You've gotten down to a function. You mentioned Lambda, for example. A function is the most, you know, sort of nuclear piece of code that you could potentially write, I suppose, to do something. And so all of these functions working together, they need these types of execution architectures to really thrive and yes, you're right, that developers, you know, they have come a long way in having these serviceable components in the industry. You know, Docker sort of changed the world about, what is it, 10 years ago now, maybe longer. And all of a sudden people could go and grab these little units of, what they call endpoints in kind of, you know, kinda software lingo, you know? And so if I wanna get something done, I can go grab this container that will do it. And those containers and the number of containers that you can run on a cloud native architecture like Ampere's is vastly better than what you can find in most X86 architectures.Why? Because these things run on cores. Right. And we have a lot of them.Anne Currie: Yeah, so that is very interesting, the, so I also. Everybody who's listening to the podcast must also in like my other book on this very subjects, which is called the Cloud Native Attitude. And it was about why Docker is so important, why containers are so important.Because they wrapped up, they allowed you to wrap up programs and then move those programs around so that's, it basically put a little handle that made you be able to move stuff around and started and stop it and orchestrate it. And what that meant wasSean Varley: I love that analogy, by the way, the handle, and you just pick it up and move it anywhere you want it, right.Anne Currie: Yeah, because really that was what, that was all that Docker really did. It wrapped something that was, a fairly standard Linux concept that had been around quite a long time. And it put a nice little API on it, which was effectively a handle, which let other tools move it around.And then you've got orchestrators like Kubernetes, but you also got lots of other orchestrators too.But what that meant in the cloud native world was that you could have services that were written by super experts or open source. So it had lots of experts from all over the place, writing them and tuning them and improving them and get, letting Moore's law and write, well, not Moore's Law, Wright's Law, which the law systems get better if you use them. Yet it gave people a chance to go in and improve things. But have those be the people who are improving things, be specialists and let that specialist code was incredibly hard to write, be shared with others. So you're kind of amortizing the incredibly difficult work. So fundamentally, what you are saying, and I think this is, you know, I, you could not be singing more from my hymn sheet on this, is that it's really hard to write code that interfaces well and uses CPUs well so that they're highly efficient and you get code efficiency and you get operational efficiency really hard to do. But, if you can do it, if you can find a way that it doesn't require every single person to write that code, which is really hard, but you can share it and leverage it through open source implementations or cloud implementations written by the cloud providers, then suddenly your CPUs can do all kinds of stuff that they couldn't have done previously.Is that what you're saying?Sean Varley: Absolutely, and I would've, I was gonna put tack on one little thing to your line was it's really hard to do this by yourself, right? And this is where the open source communities and all of these sorts of things that have really kind of revolutionized, especially the cloud, coming back to that topic that we were talking about.Because the cloud has really been, I think evolved on the back of open source software, right? And that radically changed how software was written. But now coming back to your package and your handle, you can go get a function that was written in and probably optimize by somebody who spent the time to go look at how it ran in a specific architecture.And now with things like Docker and GitHub and all these other tool chains where you can go out and grab containers that are already binary compiled for that instruction set that we were talking about earlier, this makes things a lot more accessible to a lot more people. And in some ways, you have to trust that, you know, this code was written to get the most out of this architecture, but sometimes there's labeling, right?This was written for that, or, you know, a classic example in code is that certain types of algorithms get inline assembly done to make them the most efficient that they can be. And all of that usually was done in the service of performance, right? But one of the cool things about trying to do things in service of performance is that you can actually usually get better power efficiency out of that if you use the right methodologies. Now, if the performance came solely from something that was frequency scaled, that's not gonna be good for power necessarily. But if it's going to be done in what we call a scale out mechanism where you get your performance by scheduling things on, not just one core, but many cores,and they can all work together in service of that one function, then that can actually create a real opportunity for power efficiency. Anne Currie: Yeah, so that maps back to something that in Building Green Software we talk about, which is utilization. So, you know, a machine is. And a machine use needs to be really well utilized because if it's not well utilized, it still uses pretty much the same power, but it's not doing anything if it's not actually doing anything. It's not doing anything useful with it. It's just a waste.Sean Varley: I'm so glad you brought this up.Anne Currie: Well go for it. Go for it. You know, you are the expert in this area.Sean Varley: Oh, no. Yeah, I think you're, exactly right. You hit it on the, the nail on the head, and the part of the problem in the world today is that you have a lot of machines out there that are underutilized, and that low utilization of these machines contributes a lot to power inefficiency. Now I'm gonna come back to some other things that maybe go back to the, where we were talking about in certain terms of processor architecture, but is still super relevant to code and efficiency. So the one thing going back to everybody only had one choice on the menu, which was Intel at the time,was that architecture instilled some biases or some habits, pick your sort of word here, but, people defaulted to a certain type of behavior. Now, one of the things that it trained into everyone out there in the world, especially code writers and infrastructure managers, was that you didn't ever get over about 50% utilization of the processor because what happened is if you did then at, after 50% all of the SLAs I was talking about earlier, those, that service level agreement where things are behaving nicely, went out the window, right? Nobody could then get predictable performance out of their code because why?Hyperthreading. So Hyperthreading is where you share a core with two execution threads. That sharing at once you got went over 50%, then all of a sudden you are heavily dependent on the hyperthreading to get any more performance. And what that does is it just messes up all the predictability of the rest of the processes operating on that machine.So the net result was train people 50% or below. Now our processors, if you're running at 50% or below, that means you're only using half of our complete capacity, right? So we've had to go out and train people, "no, run this thing at 80 or 90% utilization because that's where you hit this sweet spot," right?That's where you're going to save 30, 40, 50% of the power required to do something because that's how we architected the chip. So these are the kinds of biases and habits and sort of rules of thumb that we all end up having to kind of combat. Anne Currie: Yeah, and it's interesting. I mean, that's say as, you say that completely maps back to a world in which we just weren't thinking about power, you know, we just didn't care about the level of waste. So, I, quite often en enterprise, enterprise engineers, architects are very used these days to the idea of lean, and agile.It's about reduction of waste. And the biggest waste there is, underutilized machines. And we don't tend to think about it. And as you say, in this part, because we were trained now to thinking about it. Sean Varley: And also people were, didn't really care there, you know, back in the day, you know, going back again, 10, 15, 20 years ago, people didn't really care that much about how much power was consumed in their computing tasks because it wasn't top of mind for people, right. And frankly, we consumed a lot less of it, primarily because we had a lot of less infrastructure in service in, you know, worldwide I'm talking about, but also because, you know, back in, you know, in older chip architectures and older silicon process technology, it consumed less power. Now as we've gotten into modern process technology, that whole thing has changed. And now you've got chips that can burn hundreds and hundreds of watts by themselves, not to mention the GPUs, which can burn thousands of watts. And that's just a wholesale shift in, you know, kind of the trajectory of power consumption for our industry.Anne Currie: So you've brought up AI and GPUs there, and obviously, and even more AI focused chips that are even potentially more power hungry. How does Ampere help? 'Cause Ampere is a CPU, not a GPU or a TPU, how does itfit into this story?Sean Varley: It fits in a number of different ways. So, maybe a couple of definitions for people. CPU is a general purpose processor, right? If we, it runs everything, and in, you know, kind of everyday parlance, it's an omnivore. It can do a lot of different things and it can, you know, do a lot of theso pretty well, but what you have is an industry that is evolving into more specialized computing. That's what A GPU is. But there are many other examples, accelerators and others types of, you know, kind of, not homogenous type computing, but heterogeneous computing, where you've got different specializations. GPU is just one of those.And, but in AI, what we've found is, that the GPU architecture, of course, has driven that overall workload, you know, to a point where the power consumption of that type of a workload, because there's a lot of computational horsepower required to do, AI modelsand, so that has driven, you know, the industry up into the right in terms of power consumption. And that has, you know, there's a bias now in the industry about, well, if you're gonna do AI, it's gonna just take a ton of power to do it. The answer to that is, "maybe..." right? Because what you've got is, maybe a little bit of a lack of education about the whole pantheon of AI, you know, kind of execution environments and models and things like that, and frameworks and all sorts of things.All of these things matter because a CPU can do a really good job of doing the inference operation, for AI and it can do an excellent job of doing it efficiently. 'Cause coming back to your utilization, you know, kind of argument we were talking about earlier. Now, in GPUs, the utilization is even far more important because as you said, it sits there and burns a lot of power no matter what.So if you're not using it, then you definitely don't want that thing just kind of, you know, running the meter. And so utilization has become a huge topic in GpU, you know, kinda circles and so, but CPUs kind of have a ton of technology in them for low power when not utilized.You know, that's been a famous, you know, kind of set of capabilities. But also AI is not one thing. And so AI is the combination of specialized things that are being run in models and then a lot of generalized stuff that can be run and is run on CPUs. So where we come in, Ampere's concept for all that is what we call AI compute.So AI compute is the ability to do a lot of the general purpose stuff and quite a bit of that AI specific stuff on CPUs, and you have a much more kind of flexible platform for doing either.Anne Currie: So it's interesting. Do you, now I'm going show my own ignorance here 'cause I've just thought of this and therefore I'm gonna go horribly roll with it. There are kind of a, there are kind of platforms to help people be more hardware agnostic when it comes to stuff like, Triton, is it, and,are there things that, do you fit in with anything like that,or is it just, does everybody have to kind of decipher themselves whether they're gonna be, which bit of hardware they're gonna be using?Sean Varley: Oh man. We could do a whole podcast on this. Okay.Yeah. Let me try to like, break this down at least in a couple of simple terms. So, yes, I mean, there's two, first of all, there's two main operations in AI. There's training and there's inference. Now training is very high batch, high consumption, high utilization of a lot of compute.So we will think of this as maybe racks full of GPUs because it's also high precision and it's a big, it's a kind of a very uniform operation, right? once you set it, you kind of forget it and you let it run for famously weeks or months, right? And it turns out a model, but once the model's turned out, it can be run on a lot of different frameworks.Right. And so this is where, you know, that platform of choice part comes back in because inference is the operation where you're gonna get some result, some decision, some output out of a model. And that's gonna be the, by far and away the vast majority of AI operations of the future, right?We've been, we're still training a lot of models, don't get me wrong. But in the future is gonna be a lot of inference and that particular operation doesn't require as high a precision. It doesn't require a lot of the same characteristics there that are required in training. Now that can be run a lot of different places on these open source frameworks.And also what you're starting to see is now specializations in certain model genres. A genre, I would say is like a llama genre, you know, from meta, you know, they've built all of their own, much more efficient, you know, kind of frameworks in their CPP, their C++ implementation of the llama frameworks.So you got specialization going on there. All that stuff can run on CPUs and GPUs and accelerators and lots of other types of things. Now it becomes more of a choice. What do I want to focus on when I do this AI operation? Do I really want to focus on something that's going to, you know, get me the fastest result, you know, ever?Or can I maybe let that sort of thing run for a while and then give me results as they come? And a lot of this sort of decision making, use case based decision making will dictate a lot of the power efficiency of the actual AI operation.Anne Currie: That is interesting. Thank you very much for that. So Ampere, you see, so Ampere is basically in that second thing, you are one of the options for inference.Sean Varley: That's right, yeah. And we actually, we, our sort of whole thought process around this is, that we want to provide a very utilitarian resource, right? Maybe it's the right word. Because the utilitarianism of it is not that it's like low performance or anything like that, it's still high performance.It's just that you're not going to necessarily need, you know, all of the resources of the most expensive or the most, kind of, parameter-laden model. So, 'cause models come in, a lot of parameters. We hear this term, right? You know, up to trillions of parameters, down to millions of parameters.And somewhere in the middle is kind of that sweet spot right now, right? Somewhere in the 10 to 30 million per, or billion, sorry, billion parameter range and that sort of thing requires optimization and distillation. So we are building a resource that will be that sort of utility belt for AI of the future, where you need something that runs, you know, a like a llama 8 billion type of model, which is gonna be a workhorse of a lot of the AI operations that are done in GenAI, for example, that will run really well and it will also run with a lot less power than what it might have been required if you were to run it on a GPU. So there's gonna be a lot of choices in there will need to be, you know, folks that specialize in doing AI for a lot less you know, kind of power and cost.Anne Currie: Something that Renee mentioned on stage when we were so, the CEO of Ampere and I were on stage at the same, in a panel a few months ago, which is how comes we're talking today, and one of the things she said that very much interested me was that Ampere chips could, didn't have to be water cooled, they could be air cooled. Is that, true? Because obviously that's something that comes up a lot in the water use and AI's terrible water use. What's, the story on that?Sean Varley: Yes. That is actually one of our design objectives, right? If you put in a design objective, sustainability is one of your design objectives. That is what we do, right? So part of what we've done is we've said, look, our chips run at a certain kind of ceiling from a power perspective, and we can get a lot of performance out of that power envelope.But that power envelope's gonna stay in the range where you can air cool the chip. This provides a lot of versatility. Because if you're talking about sort of the modern data center dynamic, which is, oh, I've got a lot of Brownfield, you know, older data centers that, now are they gonna become obsolete?And then in the age of AI, because they can't actually run liquid cooling and stuff like that. No. We have infrastructure that goes into those types of data centers and also will get you a lot of computational horsepower for AI compute inside a power envelope that was more reasonable or already provisioned for that data center.Right? We're talking about racks that run 15 kilowatts, 22 kilowatts. Somewhere in that 10 to 25 kilowatt range is sort of a sweet spot in those types of data centers. But now what you hear these days is that racks are starting to go to 60 kilowatts, a hundred kilowatts even higher. Recently, you know, Nvidia had been pushing the industry even higher than that.Those things require a lot of specialization, and one of the specializations that are required is direct liquid cooling, what they call DLC. And that requires a whole different refit for the data center. It's also, of course, the reason why it's there is to dissipate a lot of heat.Right. And that requires a lot of. Water.Anne Currie: Which is fascinating because it, the water use implications of AI data centers comes up a lot at the moment and perfectly reasonably so. It is yet, it is not sustainable at the moment to put the, to put data centers in places where, and it's a shame because, places where there is a lot of solar power, for example, there's also often and not a lot of water. Right. Yeah.If you can turn solar, the sun into air conditioning, that's so much better than taking away all their lovely clean water that they really needed to live on. Sean Varley: Yes. Anne Currie: So that's, I mean, is that the kind of thing that's, that you are envisaging, that it doesn't have to, you know, it works better in places where there's sunshine.Sean Varley: Absolutely. And we create technology that can very efficiently implement a lot of these types of AI enabled or traditional, you know, kind of compute. And they could be anywhere. They could be, you know, at an edge data center in a much smaller, you know, environment where there's, you know, only a dozen racks.But it's also equally comfortable in something where there's thousands of racks, because at the end of the day, if you want to be more sustainable, then just use less electricity. That's the whole point, right. And you know, we can get into a lot of these other schemes. you know, for trying to offset carbon emissions and all these sorts of things and, all those schemes,i'm not saying they're, bad or anything like that, but at the end of the day, our whole mission is to just use less power for these types of operations. And it comes back to many of the concepts we've talked about, right? You know, utilize your in infrastructure. Use code efficient, you know, practices, which comes back to like containers and there's even much more refined you know, code practices now for, doing really efficient coding. And then, you know, utilize a power efficient hardware platform, right? Or the most power efficient platform for whatever job you're trying to do. And certain things can be done to advertise, you know, how much, you know, electricity you're consuming to get something done, right? And there's, that's a whole sort of, you know, next generation of code I think is just that power aware, you know, kind of capacity for what you're gonna run at any given moment.Anne Currie: Well, that's fantastic. I, we've talked for quite a long time and that was very information dense. It was high utilization of time to information there. I think we had a quite a high rate there of information passed. So, is there, so that was incredibly interesting and I really enjoyed it and I hope that, the listeners enjoyed it. All the, if there's anything that we talked about, we'll try and make sure that it's in the show notes below. Make sure that you read Building Green Software and the Cloud Native Attitude, because that would, that's a lot of what we talked about here today. and is there anything else, is there anything you wanna finish with, Sean?Sean Varley: Well, I just, I really enjoyed our discussion, Anne, thank you very much for having me. I think these technologies that are very important, and these concepts are very important, you know, there's a lot of misinformation out there in the world as we know, it's not just in, not just confined to politics,Anne Currie: Yep.Sean Varley: but yeah, there, you know, there's a lot of education I think that needs to go on in these types of environments that will help all of us to create something that is much greener and much more efficient. And by the way, it's good practice because almost every time you do something that's green, you're gonna end up saving money too.Anne Currie: Oh, absolutely. Yes, totally. If you're not doing it because you're, well, you can do it because you're a good person, which is good,but also do it 'cause you're a sensible person who doesn't have aSean Varley: That's great. Yeah. Successful businesses will be green, shall be green! Let's, there needs be a rule of thumb there.Anne Currie: Yeah. Yeah. So it is interesting. If you've enjoyed this podcast, listen as well to the podcast that I did with Charles Humble a few weeks ago, that we, again, he touched on, it's an interesting one, is there's a lot of disinformation out there, misinformation out there, but a lot of that is because the situation has changed.So things that were true 10 years ago are just not true today. So it's not deliberate misinformation, it's just that the situation has changed. You know, the context has changed. So if you, you might hear things and think, "but that didn't used to be true. So it can't be true." You can't make that judgment anymore. You know, it might be true now and it wasn't true then. But yeah, that's the world. We are moving quite quickly.Sean Varley: Yeah, technology, it moves super fast. Anne Currie: Absolutely. I don't, I've been in, so I suspect that you and I have been in for, you know, 30 years past, but it's never moved as fast as it's moving now, is it really? Sean Varley: Oh, I agree. Yeah. AI has just put a whole like, you know, afterburner on the whole thing. Yeah.Anne Currie: Yeah, it's just astonishing. But yeah. Yeah. So the world, yes, all the rules have changed and we need to change with it. So thank you very much indeed. And thank you very much for listening and I hope that you all enjoyed the podcast and I will speak to you again soon. So goodbye from me and goodbye from Sean.Sean Varley: Thank you very much. Bye-bye.Anne Currie: Bye-bye. Chris Adams: Hey everyone, thanks for listening. Just a reminder to follow Environment Variables on Apple Podcasts, Spotify, or wherever you get your podcasts. And please do leave a rating and review if you like what we're doing. It helps other people discover the show, and of course, we'd love to have more listeners.To find out more about the Green Software Foundation, please visit greensoftware.foundation. That's greensoftware.foundation in any browser. Thanks again, and see you in the next episode. Hosted on Acast. See acast.com/privacy for more information.
Real Time Cloud with Adrian Cockcroft
Chris Adams is joined by Adrian Cockcroft, former VP of Cloud Architecture Strategy at AWS, a pioneer of microservices at Netflix, and contributor to the Green Software Foundation’s Real Time Cloud project. They explore the evolution of cloud sustainability—from monoliths to microservices to serverless—and what it really takes to track carbon emissions in real time. Adrian explains why GPUs offer rare transparency in energy data, how the Real Time Cloud dataset works, and what’s holding cloud providers back from full carbon disclosure. Plus, he shares his latest obsession: building a generative AI-powered house automation system using agent swarms.Learn more about our people:Chris Adams: LinkedIn | GitHub | WebsiteAdrian Cockcroft: LinkedIn | GitHub | MediumFind out more about the GSF:The Green Software Foundation Website Sign up to the Green Software Foundation NewsletterResources:Serverless vs. Microservices vs. Monolith – Adrian's influential blog post [08:08]Monitorama 2022: Monitoring Carbon – Adrian’s talk at Monitorama Portland [25:08]Real Time Cloud Project – Green Software Foundation [30:23]Google Cloud Sustainability Report (2024) – Includes regional carbon data [33:39]Microsoft Sustainability Report [36:49]AWS Sustainability Practices & AWS Customer Carbon Footprint Tool [39:59]Kepler – Kubernetes-based Efficient Power Level Exporter [48:01]Focus – FinOps Sustainability Working Group [50:10]Agent Swarm by Reuven Cohen – AI agent-based coding framework [01:05:01]Claude AI by Anthropic [01:05:32]GitHub Codespaces [01:11:47]Soopra AI – Chat with an AI trained on Adrian’s blog [01:17:01]If you enjoyed this episode then please either:Follow, rate, and review on Apple PodcastsFollow and rate on SpotifyWatch our videos on The Green Software Foundation YouTube Channel!Connect with us on Twitter, Github and LinkedIn!TRANSCRIPT BELOW:Adrian Cockcroft: We figured out it wasn't really possible to get real time energy statistics out of cloud providers because the numbers just didn't exist.It turns out the only place you can get real time numbers is on things that are not virtualized.Chris Adams: Hello, and welcome to Environment Variables, brought to you by the Green Software Foundation. In each episode, we discuss the latest news and events surrounding green software. On our show, you can expect candid conversations with top experts in their field who have a passion for how to reduce the greenhouse gas emissions of software.I'm your host, Chris Adams.Welcome to Environment Variables, where we bring you the latest news and updates from the world of sustainable software development. I'm your host, Chris Adams. If you have worked in cloud computing for any length of time, then even if you do not know the name yet yourself, it's very likely that the way you design systems will have been influenced by my guest today, Adrian Cockcroft.When at Netflix, Adrian led the move to the cloud there helping, popularize many of the patterns we use when deploying applications ourselves to the cloud. And his name then became synonymous with serverless throughout the 2010s when he joined AWS first leading on open source engagement, and then as a VP focused on what we might refer to now as cloud sustainability.After leaving AWS, Adrian's kept his fingers in many pies, one of which is the Green Software Foundation's real time cloud project, an initiative to bring transparency and consistency to cloud emissions reporting. With the first dataset release from that project out the door, it seemed a good idea to invite him onto the show to see what's up.Adrian, thank you so much for joining us today. Can I give you a bit of time to tell us about yourself and what you are, what's what you're keeping? What's keeping you busy these days? I.Adrian Cockcroft: Yeah, it's great to see you and thanks also for your contributions to the project. We've had a lot of discussions over the last few years as we've worked on that together. well, I'm sort of semi-retired. I stopped my big corporate job at Amazon in 2022. and yeah, I spend my time worrying about my family.I've got old parents that live in the uk, so I spend a lot of time with them. And, fixing stuff around the house and generally goofing around and doing things I feel like doing rather than stuff that's driven by some corporate agenda. So I'm enjoying that freedom. And, let's see the, yeah, I spend time on the, Green Software Foundation project.I go to a few conferences and give a few talks and I try to keep up with, you know, what's happening in technology by playing around with whatever the latest tools are and things like that. And that's been my career over the years. I've generally been an early adopter through my entire career. as you mentioned, we were early adopters in cloud.Back when people said This isn't gonna work and you'll be back in the data center soon. People forgot that was the initial reaction to what we said. it's a little bit like that now with people saying, all this AI stuff doesn't work and we're gonna be giving up and whatever. And it's like, well, I'm making bits of it work well enough to be interesting.We can talk a bit about that later. and then I know you probably see behind me various musical instruments and things like that, so that's kind of, I, collect musical instruments that I don't have time to really learn how to play and mess around and make bad noises that make me happy. But luckily no one else has to listen to them particularly.So that's kind of my, that and messing around with cars and things, that's sort of the entertainment for me.Chris Adams: That sounds like quite a fun, state of stem semi-retirement, I have to say actually. So before we dive into the details of cloud, I have to ask, where are you calling from today Because you have an English accent and like, I have an English accent, but I'm calling from Berlin and I'm guessing you're not in England, so maybe you could do that.'cause I follow you on social media and I see all these kind of cryptic and interesting posts about cars and stuff and it's usually sunnier than where I am as well. So there's gotta be a story there. What's going on there, Adrian?Adrian Cockcroft: Well, I lived in England long enough to decide I didn't want to be rained on all the time. which is why I never moved to Seattle when, you know, I didn't move to California to move to America to go live in somewhere with the same weather as England. So that was one reason I never moved to Seattle when I was working for Amazon.So used to live in the Bay Area in Los Gatos, near Netflix. about five years ago we moved down near Monterey, about an hour or two south of the Bay Area. I. Depending on traffic. we are within earshot of a race track called Laguna Seka that most people know. I can kind of see it outta my window.I can see a few dots on the horizon on the, you know, moving and that's, there's a few cars you can just about hear them on if they're loud cars. and this is where they have in every August, this thing called Monterey Car Week with the Pebble Beach concourse and historic races. And we used to go to that every year and we like the kind of messing around with cars and going to the track occasionally culture.So we moved down here and that's been, it's been fun. It's, you know, I don't have to commute anywhere. We have a nice place. The house prices are a lot cheaper down here than they are in the Bay Area itself. So we live in, technically we live in Salinas. lots of good vegetables around here. That's where a lot of the growers are.and it's, we live actually out in the countryside, sort of. Just in the hills near, near there. So we have a nice place, have plenty of room for messing around and a big house, which requires lots of messing around with. And we can talk a bit about one of the projects I have later on to try and automate some of that.Chris Adams: Yeah, that's quite a hint. Alright, well that does explain all the kind of cars and coffee stuff when I, like say 30 verse and Okay. If you're near a racetrack, that would explain some of the cars as well. Alright. Thank youAdrian Cockcroft: Well, actually there's cars and coffee events just about everywhere in the world. If you, like looking at old cars and hanging out with car people, there's one probably every Saturday morning somewhere within 10 miles away. Pretty much anyone. Anyway, the other things, on that front that's sort of more related to Green Software Foundation is we've had a whole bunch of electric cars over the years.I have one of the original Tesla Roadster cars that was made in 2010. I've had it since 2013. it actually has a sticker on the back saying, I bought this before Elon went nuts. so I'm keeping that. we used to have a Tesla model three and we replaced it recently with a Polestar three, which is quite a nice car with very bad software initially.But they did a software update recently that basically fixed just about every bug and we, it's actually fun driving a car where you don't worry if it's about to do something strange and need a software reset, which was the state it was in when we first got it in April. But the difference, a bug fix can make whether they actually went and just fixed everything that was currently going wrong with it and went, transformed the car into something That's just actually a fun thing to drive now.Chris Adams: So it was a bit like turning it off and turning it off and on again. And then you've got like a working car,Adrian Cockcroft: Yeah. Well, yeah, we got really used to pushing the reset button. You hold the volume control down for 30 seconds and that resets the software and we would be doing that most days that we drove itChris Adams: Oh my God. I didn't realize that was a real thing that people did. Wow.Adrian Cockcroft: Yeah. It's one of these things where a product can be transformed from something buggy and annoying to, oh, we just fixed all the software now.It actually works properly. And, you know, it's, interesting to see. So, so it went from bad, really bad to actually pretty good with one software release. Yeah.Chris Adams: guess that's the, wonders of software I suppose. Wow. Alright then, and I guess that gives us a nice segue to talk about, I guess some back to some of the cloud and serverless stuff then. So. Before you were helping out in some of the Green Software Foundation projects. I remember reading a post from you called the evolution from Monoliths to microservices to functions.And I think for a lot of people it actually really joined the dots between how we think about sustainability and how things like scale to zero designs, might kind of what role they play when we design cloud services. And in that post, you laid out a few things, which I found quite interesting. You spoke about the idea that like, okay, most of the time when we build services, they may be being used maybe 40 hours a week and there's 168 hours a week.So like 75% of the time it's doing nothing. And just like waiting there. Yet we've still spent all this time and money building all this stuff and, post. I remember you writing a little bit about saying, this actually aligns incentives in a way that we haven't seen before. And I think this idea of actually like changing the programming model that actually incentivizes the correct behavior.I think that's really, that, that was really profound for me. And I figure like, now that I've got a chance to have you on the call on this podcast, I wanted to ask you what drove you to write that in the first place? And for folks who haven't actually read it, maybe, you could just talk a little bit about the argument that you were making and then why you wanted to actually write that as well.Adrian Cockcroft: Yeah, that's actually one of the highest traffic blog posts that I ever wrote. There was a lot of, reads of that. The context then, so it was soon after I joined AWS, so it was probably 25. Early 2017, something like that. I joined AWS in 2016. I'd spent a few years basically involved in kind of, helping promote microservices as an architecture.And, I was also interested in serverless and AWS Lambda as, an architecture. And I wanted to connect the dots. And it's a kind of, when I write things, some of the things I write, the approach I take is along the lines of his, this is how to think about a thing, right? These are the, it, I have a systems thinking approach generally, and so what I do is I try to expose the systems that I'm thinking about and the incentives and feedback loops and reasons why things are the way they are, rather than being prescriptive and saying, just do this, and this.I. And the world will be great, or whatever the, you know, the more typical instructive things. So I tend to try and explain why things are the way they are and, sort of work in that. So that's, it's, an example of that type of writing for me. And we were, at the time, people were talking a lot about the monolith and microservices transition and what it meant and how to do it and things like that.And I was trying to explain what we'd done at Netflix. And then I was thinking that there was a, the next generation of that transition was to serverless. And the, post was basically to just try and connect those dots, that was the overall goal of it. And then it is quite a long post. It's one of these things when you work with somebody, you know, PR people or whatever, and they say, you, you should write short blog posts and you should, you know, da Well this, and they shouldn't be so technical. So this is one of the longest and most technical posts I wrote, and it actually has the highest traffic. So, you know, ignore the PR people. It turns out if you put real content in something, it will get traffic. and, that's, the value you can, provide by trying to explain an idea.So I think that's generally what that was about. This idea that. it was, I mean, the microservices idea was, is a tactic for implementing a for solving a problem. It isn't an end in itself. Right. And that's one of the distinctions I was trying to make. It's like if you have a large team working on a code base, they'll keep getting in each other's way.And if you're trying to ship code and the code has a hundred people's contributions in it, one person has a bug, then that stops the shipment of the other 99 people. So there's this blocking effect of, of bugs in, in, in the whole thing. And then it also, you've got it destabilizes the entire thing.You're shipping completely new code when you ship a new monolith was when you have say a hundred microservices with one person working on each. They can ship independently. And yeah, you have some interaction things you have to debug, but 99 of those services didn't change when you pushed your code. So it's easy to isolate where the problem is and roll it back.So there's a bunch of things that make it easier. And then we thought, well, you've got the microservice, which does a thing. But it contains a bunch of functions. If you blow that up into individual functions, then you don't actually need all those functions all the time. And some code paths are very busy through the code.They may be do it a hundred times, you know, every request goes through this part of the code, but may one times in a hundred or a thousand it does something else. So what you can do is break those into separate functions and different lambda functions. And you've got, so the code parts that don't get executed very often just aren't running.The code gets called and then it stops and it's doesn't get called again, for a long time. Whereas the busy ones tend to stay in memory and get called a lot. Right. So that way you're actually, the memory footprint is more tuned to, and the execution footprint is tuned to what's actually going on.So that was, the second thing. And then the third thing was that a lot of applications, particularly corporate in access, you mentioned they're only used during work hours. And those are the perfect ones to build serverless. They're internal. They are, they only exist for as long as anybody is actually trying to use them.And they aren't just sitting their idle most of the time just because you need to have a wiki or something, or you need to have a thing that people check in with in the morning. Like anything that salespeople at the end of the quarter or the end of the month, those sorts of things make things super busy and it's idle the rest of the time, so you need very high concurrency for short periods of time.Anything like that is, is sort of the area where I think serverless is particularly good. And later on I did another, series of talks where I basically said serverless first, not serverless only, but start trying to build something with serverless because you'll build it super quickly. And, one of the books I should reference is by, David Anderson.is it called the Value Flywheel Effect or something like that will give a link in the show notes. And I helped. Talked, I, talked to him, helped him get, find the publisher for that book. And I wrote, did I write, I think I wrote a foreword for it, or at least put some nice words on the cover.and that book talks about people developing app, entire applications in a few days. And then you get to tune it and optimize it. And maybe you take some part of it where you say, really, I need a container here. Something like that. but, you can rapidly build it with the tag I used to say was in the time it takes to, have meetings about how you're going to configure Kubernetes, you could have finished or building your entire application serverless, right?And, you just get these internal discussions about exactly what version of Kubernetes to use and how to set it up and all this stuff. And it's like, I could have finished building the whole thing with the amount of effort you just put into trying to figure out how to configure something. So that's the sort of, a slightly flippant view I have on that.And, anyway, the, other thing is just, and effectively the carbon footprint of a serverless application is minimal. But you do have to think about the additional systems that are running there all the time when you are not running. And a little bit of a, sort of a future segue, but AWS just changed them, their own accounting model to include those support services so that, when you look at the carbon footprint of a Lambda app that isn't running, you actually have a carbon footprint because the Lambda service needs to be there ready.So you actually get a share of the shared service attributed to each customer that's using the, using it, right? So it's a little, it's a little bit deeper and it's kind of an interesting change in the model to be explicit that's what they're doing.Chris Adams: Ah, I see. Okay. So on one level, some of this post was about like the, I guess the unit of code or the unit of change can become smaller by using this, but there's also a kind of corresponding thing on the hardware level. Like, you know, typically you might be, I remember when I was reading this, there was like, okay, I'm shipping a monolithic piece of code and I've got a physical server to begin with.It's like the kind of. That was like how we were starting at maybe, I dunno, 10, 20 years ago. And then over time it's becoming smaller and smaller and that has made it a bit easier to build things kind of quickly. And, but one of the, flip side that we have seen is that, if you just look at say the Lambda function, then that's not a really accurate representation of all the stuff that's actually there.You can't pretend that there is an infrastructure that has to be there. And it sounds like the accounting has now starting to reflect that the fact that yeah, you, someone needs to pay for the capacity in the same way that someone has to pay for the electricity grid, even though you're not, even when you're not using the grid for example, there is still a cost to make that capacity available for you to use.Basically that's, what it seems to be a reference to.Adrian Cockcroft: Yeah. And just going back to the car analogy.People own cars. People lease cars. People rent cars, right? And you can, if you rent a car for a day, you can say, well, my carbon footprint of renting the car is one day's worth of car ownership, right? Except that in order for you to rent a car for the day, there has to be a fleet of cars sitting around idle That's ready for you to rent one. So really you want to take into account the overhead of your car rental company managing a fleet, and it's maybe got whatever, 70% utilization of the fleet. So 30% of the cars are sitting around waiting for somebody. So you basically have to uplevel your, I just need a car for a day to add an extra overhead of running that service, right?So it's, it kind of follows that same thing, you know? And if you basically rent a car for every single day and you have a car every day of the year, but it's a rental car, that's an expensive way to own a car, right? I mean, even at a monthly rate, it's still more expensive than buying a car or leasing a car because you're paying for some overhead.But it's kind of those sorts of models. So it's a bit like owning a car, maybe leasing a car, and, doing a rental car with sort of the monolith microservices. Serverless sort of analogy, if you like. cost model's a little different because, you're giving stuff back when you don't want it anymore.is sort of the cloud analogy, right? The regular cloud service. I can just deep, I can scale things down.Chris Adams: mm going back to something else you mentioned, I was talking to a CIO once and he was very annoyed 'cause he said that he'd only just found out that he could turn off all his test infrastructure at the weekends and overnight. and it was like they, he'd been running this stuff for two years and this, he finally realized and, he'd just, like, three quarters of his cost had just gone away from his test environment. And, he, was happy that had happened, but he was annoyed that it, took him two years for him to somebody to mention to him that this was possible and for him to tell them to do it.Adrian Cockcroft: Right. So there's. Yeah. Any, tests, anything that's driven off people should absolutely be, you know, shut down. There are ways to just freeze a bunch of a, bunch of cloud instances can just be shut down and frozen and come back again later.Chris Adams: so this is something I might come back to actually, because one of the things that in somewhat on, in some ways, if you look at, say maybe cloud computing, each individual server is probably quite a bit more efficient than maybe a corresponding, server you might buy from Dell or something like that from a few years ago because it's in a very optimized state.But because it's so easy to turn on, this is one of the cha challenges that we can consistently have. So it's almost like a, and also in many ways. It's kind of in the interest of the people running very effect, very efficient servers to run, but have to basically have people paying for this capacity, which they're not using.'cause it makes it easier to then like resell that. Like this is, I guess maybe this is one of the things that the shifts to serverless is supposed to address, or in theory, you know, it does align things somewhat, better and more. More in terms of like reducing usage when you're not actually using it, for example, rather than leaving things running like you're saying actually.Adrian Cockcroft: Yeah, you don't have to remember to turn it off With serverless, it's off by default and it comes on and it's sort of a hundred percent utilized while you're running and then it turns off again. So in that sense, it is much more like you have a rental car that returns itself after 15 minutes or whatever.Whatever your timeoutis or when you're done with it. It's more, maybe it's more like a taxi, right? That kind of going, one level beyond rental car, you have taxi, right? Which is you just use it to get there and you're done. So serverless is maybe more like a taxi service, right? And then, right. And then a daily rental is more like a.Like an EC2 instance or something like that. And there's all these different things. So there we're used to dealing with these things and you wouldn't, you know, you wouldn't have a taxi sitting outside your house 24 hours a day just waiting for you to want to go somewhere, right? People say, well, serverless is expensive.if you used it in that very stupid way, right?Chris Adams: wouldn't, you'd, either lease a car or you'd buy a car ifAdrian Cockcroft: Yeah. If you, if it's being used continuously, if you've got traffic, enough traffic that the thing is a hundred percent active, sure you should put it in a container and just have a thing there rather than, waking it up every time, you know, having it woken up all the timeChris Adams: Ah. I never really thought to make the comparison to cars, to be honest. 'cause I, I wrote a, piece a while back called A demand curve for compute, which compares these two, like, I just like energy for example. Like if you do something all the time, then you have something running all the time, it's a bit like maybe a nuclear power station, like it's expensive to buy, but per unit it makes a load of sense.And then you work your way up from there basically. So, at the other end, like serverless, there are things like peak plants, which are only on for a little bit of time and they're really expensive for that short period of time. But because they're only on, 'cause they, can charge so much, you'll need to have them running maybe five to 15% of the year.And that's how they, and that's how people design grids. And like, this idea of demand curves seems like, it's quite applicable to how we think about computing and how we might use different kinds of computing to solve different kinds of problems. For example.Adrian Cockcroft: Yeah. Well that brings up another current topic. What's actually happening now is the peaker plants are running flat out running AI data centers capacity load, and the peaking is moving to battery, which is now getting to the point where batteries are sufficiently cheap and high capacity, that the peaker capacity is being driven by batteries which respond much more quickly to load.And, some of the instabilities we've seen in the grids can be fixed by having enough battery capacity to handle, You know, a cloudy day or whatever, you know, the sort of the effects that you get from sudden surges in power demand or supply, right? And once you get enough battery capacity, that problem is soluble that the problem historically as the batteries have been too expensive, but they're getting cheaper very quickly.So there've been a few, there's a few cost curves that I've seen recently showing that it's actually the cheapest thing to do for power now is, solar and batteries just put that in. And the batteries that they're now getting, originally they were saying you can get a few hours worth of battery cost effectively.I think they're now up to like six to eight hours is cost effective. And we're getting close to the sort of 12 to 18 hours, which is means that you can go through the night in the winter on batteries. and it's cost effective to deploy batteries to do that. It's something about the economics that means that you have.A certain amount of capacity, you still need some base load. geothermal isn't particularly interesting for that. I think as one of the cleaner technologies, a company called Vos building a station that, Google are using for some of their energy, I've spent some time looking at alternative energy.But yeah, those peak of plants, they were sitting there mostly idle, and then all this extra demand suddenly appeared that wasn't in the plan for these big AI data centers and they're hoovering up all that capacity. So people are desperately trying to figure out how to add additional capacity, to take that on.Chris Adams: We will come to that a little bit later in a bit more detail actually. So, but thank you. So maybe we can talk a little bit about, actually some of this stuff about. Essentially observability and being able to track some of this stuff because one thing that I've seen you present before is this idea of like carbon being just another metric.And I think, what we'll do is we'll share a link in the show notes to a YouTube video. I called Monitoring Carbon. I think you presented this at Monitorama I Portland in 2022. And the argument that I understood it covers various other, it, it does talk a little bit about like the state of the art in 2022, but one of the key things you were kind of saying was basically as developers, we're gonna have to learn to track carbon because it's just gonna be another thing we have to track.Just like, space left on a disc requests and things like that. So maybe you could talk a little bit about that and some of the re and just tell me if you think that's still the direction that we're going in. Basically, I.Adrian Cockcroft: Yeah, so that was the first talk I gave after I left AWS I'd already given, agreed to present there. and then I left AWS I think just a few weeks before that event. so it was kind of an interesting thing. Hey, I, by the way, I quit my job and sort of retired now and, but this is the thing I was working on.So I was, the last job I had a WSI was a VP in the sustainability organization, which is an Amazon wide organization, but I was focused on the AWS part of the, that problem in particular, the how to get. all of AWS sort of on the same page every, there was lots of individual things popping up. so we and lots of people writing their little presentations about what they thought AWS was doing.And so we basically created a master PR approved, you know, press, press relations approved, deck that everyone agreed was like what we could say and should say, and it was high quality deck and got everyone to use the same, get on the same, be saying the same thing externally. Now, part of the problem there was that the various constraints we had at Amazon, we couldn't really talk about a lot of the things we were doing for all kinds of reasons.So the story of Amazon, I think is better than most people think, but the, way it's told is really poor and it's very difficult to get, get things out of Amazon to actually, I. cover what they've been up to. So, so that was what I was working on. And along the way I thought, you know, we need to monitor.ARM is a monitoring, observability conference I've been to many times and I have a long history in monitoring tools in particular. I thought, yeah, we should, I, should be trying to get everybody to add carbon as some kind of metric. And the problem is, then where do you get that metric from? And that wasn't very obvious at the time.And I think there's sort of two things that have happened since 2022. One is that we actually haven't made much progress in terms of getting carbon as a metric in, most areas. There's a co with a couple of exceptions that we'll get to, but we haven't made as much progress as I hoped we would. And then the other one is that the sort of standards bodies and.government regulations that were on the horizon then have mostly been stalled or slowed down, or delayed, whatever. so the requirement to do it from the business has generally come back, has reduced. Right. So, which is disappointing. 'cause now we're seeing even more climate change impacts and, you know, the globe doesn't care whether you're,what your, corporate profitability or what you're trying to do or you know, what the reasons why you aren't doing it.But, so we're just gonna get more and more cost from dealing with various types of climate disasters and we're seeing those happen all around us all the time. So, I think in some sense it's got to get much worse before people pay attention. And we're, you know, there's a big sort of battle going on to try and just make it, keep it focused and certainly Europe is doing a much better job of.Right now. but even, the European regulations are a little watered down. And that's, I mean, I know that you are all over that's really your specialist area, you know, far more than I do about what's going on in, in that area.Chris Adams: But yeah.Adrian Cockcroft: It's a big topic, but I think in 2022, I thought that we would be having more regulations sooner, and that would be pushing more activity.And then I wanted to basically, by talking about this, at that event, I wanted to get some of the tools, vendors to basically I would, for me to talk to them about how to do this. I ended up doing a little bit of advisory work for a few people, as a result, but not really that substantial. So that's kind of where I was then.And then over the next year or so, I did some more talks, saying it's basically I just tried to figure out what was available from the different cloud providers. Did a talk about that, and then, wrote a. A-P-R-F-A-Q or a, proposal for a project for DSF saying, well, we should fix this. And it would be really nice if we did actually have a, this is what people would like to see, and then went and tried to see what we could get done.Chris Adams: Okay, so that's, that, that's useful sort of kind bring, us up to this point here. And like, one thing I've appre appreciated about being on the Real Time Cloud project is that it's very easy, to basically call for transparency bec and there are absolutely reasons why you, why a company might not want to share their stuff, which are kind of considered like, I don't know, wrong reasons I suppose, or kind of like greedy reasons.So, I used to work at a company called A that stood for avoid mass Extinction engine. And one thing we did was I. we were, we raised something in the region of 20 million US, dollars to find out all the ways you can't sell or carbon API in the early 2010s. And, you know, pivoting like a turntable, it's kind of a bit embarrassing at times.Right? And one of the things that we, one of the potential routes that people went down was basically, we are gonna do this stuff and we are gonna work with large buyers to basically get people in their supply chain to share. Their emissions information, with the idea being that this would then be able to kind of highlight what they refer to as, supply chain engagement.So that sounds great. Like we'll lend you some money so you can buy cheaper, you can buy more efficient fridges and do stuff like that. But there was another flip side to this, where when you're working with large enough companies or large enough buyers, one of the things they would basically say is they could use this information to then say, well, who are the people who are the least efficient?And like, who am I gonna hit with my cost cutting stick first? Basically like who is, and this is one of, and for this reason, I can totally understand why organizations might not want to expose some of their cost structure. But at the same time, there is actually a kind of imperative coming from, well, like you said, the planet and from the science and everything like that.And like, this is one thing that I feel like this is one of the drive, this is one of the thing that's been a real blocker right now. Because companies are basically saying we can't share this information 'cause we are going to end up revealing in how many times we maybe sell the same server, for example, like the, and these are kind of, you can see why people might, might not want to release that or, disclose that information.'cause it can be sited, considered commercially sensitive. But there is also the imperative elsewhere. And like I wanted to ask you like. Faced with that, how do we navigate that? Or are there any things that you think we can be pushing for this for? Because I think this disclosure conundrum is a really difficult one to actually,, to get around basically.And I, figured like you are on the call, you've been on both sides. Maybe you have some perspectives or some viewpoints on this that might be better. Shed some light here rather than it just being this, you are transparent. No, we're not gonna destroy our business kind of thing, because there's gotta be something, there's gotta be a third way or a more, useful way to talk about this.Adrian Cockcroft: Yeah. And I think, I mean, there are three primary cloud providers that we've been working with or attempting to work with. And they're all different, right? And just Google generally have been the most transparent. they produce data that's easy to find, that's basically in a useful format. And they came out with their, their annual sustainability report recently, and there's a table of data in it, which is pretty much what we've been adopting as this is useful data.Right? So that's one. but still they don't disclose some things because they don't have the right to disclose it. For example, if you want to know the power usage effectiveness, the PUE, they don't have it for all of their data centers. When you dig into that, you find that some of their regions are hosted in data centers they don't own,right?So somewhere in the world there's a big colo facility owned by Equinix or somebody, right? And they are, they needed to drop a small region in that area. So they leased some capacity in another data center. Now, the PUE for that data center is not the they, because they're not the only tenant. It's actually hard to calculate, but also the owner doesn't necessarily want to disclose the PUE, right?So there's a one, the number isn't really obtainable. You could come up with a number, but they have to, you know, as a third party that they'd have to get to approve it. So that's a valid reason for not supplying a number. It's very annoying because you have p OE for some data centers and not others, and that applies to all the cloud providers.so that's a valid, yeah, it's annoying, but valid reason for not providing a number. Right. So that's one level. And Google are pretty good at providing all the numbers, and they've been engaged with the project. They've had a few people turn up at the, on the meetings. they've fixed a few things where something wasn't quite right.there was some missing data or something that didn't make sense and they just went fixed it. And there was also a mapping we needed from there. They're the Google data centers, which support things like Gmail and whatever, Google search to the Google Cloud data centers, which is a subset of it. But that we, they actually went and figured out their mapping for us and gave us a little table so we could look up the PUE for the data center and basically say, okay, this cloud region is in that data center.They've worked well with it. So that's kind of what I'd like to see from the other cloud providers. It show, it's like, I like to see existence proofs. Well, they did it. Why can't you do that? Right. So that's what I'd expect to see from everybody. Microsoft were involved in setting up the GSF and were very enthusiastic for a while.Particularly when Asif was there and driving it and, since he's moved on and, is now working directly for the GSF, I think the leadership at Microsoft is off worrying about the next shiny object, which is ai, whatever. Right? There's less su less support for sustainability and, we've found it hard to get, engagement from the Microsoft, Ah,to get data out of them.they have a report, they issued their new report for the year and they had total numbers for carbon, but they didn't release their individual regions updates, you know, so they released overall carbon data for 2024, but we haven't got any updated, nothing that I can find anyway on the individual regions, which is what we've been producing as our data set.Chris Adams: Ah, okay. So basically as the moon and the moonshot has got further away, as they say, it's also got harder to see. Basically we still have this issuer then that this, it's less clear and we have less transparency from that. That's a bit depressed. That's a bit depressing. When early on they were basically very, they were real one.They were. I was really glad to have them inside that because that they, they shared this stuff before Google shared it, so we actually had, okay, great. We've got two of the big three starting to disclose this stuff. Maybe we might be able to use this to kind of find against concessions from the largest provider to share this.Because if you are a consumer of cloud, then you have some legal obligations that you still need to kind of, kind of meet, and this is not making it easy. And for the most part, it feels like if you don't have this, then you end up having to reach for a third party, for example, where you, like, you might use something like green pixie, for example, and like, that's totally okay to use something like that, but you happen to go via a third party where you know, you're, that, that's secondary data at best.Basically it feels like there's something that you should be able to have with your, supplier, for example.Adrian Cockcroft: Yeah. Just to clarify, I think there's several different types of, Sustainability data or sustainability related data that you get from a cloud provider. One of them is, well, I'm a customer and I have my account and I pay so much money to it, and how much carbon is associated with the, the things I've used, right?And that is they all provide something along those lines to greater or lesser degree.Chris Adams: Mm.Adrian Cockcroft: but you can get, an estimate for the carbon footprint of an account, right? typically delayed by several months, two to three months, and it's a fairly, and it's pretty high level. So, and it gets, there's more detail available on, Google and Microsoft, and there's fairly high level data from AWS, but that's, one source.The other source that we're interested in is, let's say I. I'm trying to decide where should I put a workload? And it could be I have flexibility, I can put it pretty much anywhere in the world or I can choose between different cloud providers in a particular country. what's the, and I want to know what the carbon footprint of that would be.Right? So to do that, you need to be able to compare regions, and that's the data set that we've produced and standardized so that it lists every cloud region for the three main cloud providers. And for each of them we've got whatever information we can get about that region. And back in 2022, we have a fairly complete data set and 2023, it's missing.Microsoft provide less data than in 2022. And in 2024 data, currently we have Google data, we have Microsoft have released their report, but haven't given us any new data. And AWS are probably releasing their data in the next, Few days, last year, it was on July the ninth, and I just checked this morning and it hasn't been released yet, so it's probably coming next week.It's sometime in July. Right. So, we're hoping to see, well, we'll see what information we get from AWS and I'll, I, every year I write a blog post where I, they said, okay, the three reports are out. This is what happened. This is the trend year on year, and I'm working on an update of that blog post.So probably by the time this, this podcast airs, I'm hoping that pod, that blog post willChris Adams: out there.Adrian Cockcroft: I should have got it. I, you know, I've written as much as I can right now, but I'm waiting for the AWS ones, so. So we've sort of discussed Google have been pretty good, I guess, corporate citizens, disclosing whatever they can and engaging with the project.Microsoft's sort of early enthusiasm. In their latest report, they actually mentioned the GSF and they mentioned they founded it and they mentioned that they support the real time cloud project, but they're not actually providing us any data and we're still trying to find the right people at Microsoft to escalate this to, to figure out, well, so gimme the data.Right? and then AWS then they have, some different issues going on. they, the way that they run their systems, one of the things they found is that if they disclose something about how they work, people will start leveraging it. Right. You get this sort of gamifying thing. If there's an interface or, a disclosed piece of information, people will, optimize around it and start building on it.You see, there's a lot in eBay. One of the reasons eBay's interface hasn't changed much over the years is that there are sellers that optimize around some weird feature of eBay and build a business around it. And every time eBay plans to change that, they're like, some sellers gonna lose their business, right?So, if you over expose the details of how you work, there's sort of an arbitrage opportunity where somebody will build something on that and if you change it data, they get upset. So that's a one of the reasons that AWS doesn't like saying how it works,right? Because it would cause people to optimize,Chris Adams: yeah. PrivateAdrian Cockcroft: optimize for the wrong things.And, one example is that there's an Archive capability, tape Archive capability. That AWS has, and you can, and if you're thinking about I have lots of data sitting on desk, I should move it to tape. 'cause that is a much lower carbon footprint. And it is, except if you're in a tiny region that AWS has just set up, they haven't actually really got tapes there, the same services there, they're actually just storing it to disc until they have enough volume there, for them to put in a tape unit and transfer that to tape.Like they want the same interface, but the implementation is different. Now, if they exposed which regions the, this is actually going to dis, it would say, well, this is a high carbon region, so I shouldn't store my data in there. Which means it would not get enough volume to actually install the tape.Right? So you get the sort of negative feedback loop that's actually counterproductive. Right. So, so, so there's this, there's that sort of a, an example of. It's one of the reasons that they don't want to tell you how much carbon every different service is because it could cause you to optimize for things that are gonna cause you to do the opposite of what's the right thing to do Ultimately.Chris Adams: okay. So that's one of the argument we see used for not disclosing how an organ, like. Per, like, per like service level and per region level things. 'cause one thing that when you use, say Amazon's carbon calculator, you'll get a display which broadly incentivizes to do, incentivizes you, you use to change basically nothing.Right? like that's one thing we actually see. But, and that's different to say Google and Microsoft. We do provide service level stuff and region level stuff. So one of the reasons they're trying to hide some of that information is basically it's making it harder for us to kind of basically provide that service, for example, or there's all these second order effects that they're trying to basically avoid.That's one of the arguments people are using,Adrian Cockcroft: That's the argument that they have, and it's something that's pervasive. It's not just related to carbon. This is something that they've seen across lots of services is that people will, people will depend on an implementation. And they changed the implementation frequently. Like we're on, I dunno what the eighth or the ninth version of S3 total rewrite from scratch.I dunno. When I was there, I think they were up to the seventh or eighth version and I knew somebody that was working on the team that was building the next version. Right. And this is tens of exabytes of storage that is migrated to a completely new underlying architecture every few years. If you depend upon the way it used to work, then you end up being suboptimal.So there's some truth in that, however, and this is the example we were pointing at when I was at AWS, is that Microsoft and Google are releasing this data and we haven't, there's no evidence of bad thingsChris Adams: Yeah. The sky hasn't fallen when theyAdrian Cockcroft: Yeah. So, so I think that it, would be just fine too. And they are gradually increasing the resolution.So what they had when. When they first released the, the account level information when I was there, and we'd managed to get this thing out in 2022, I guess 20 21, 20 22 was the, you had regions being continents, right? You just said Europe, Asia, and Americas.And you had S3, E, c two, and other,Chris Adams: yeah.Adrian Cockcroft: and you had it to the nearest a hundred tons or something, or nearest a hundred kilograms.Yeah, a hundred 10th of a ton. So most, so a bunch of people in Europe just got zero for everything and went, well, this is stupid. But actually, yeah, because of the way they, the, model works, they were generate, generating lots of energy to offset the carbon. It probably is zero for at least scope two.scope, scope two, for the market based model.Chris Adams: where you, count the, green energy you've used to kind of offset the, actual kind of, yeah. Figure. Alright.Adrian Cockcroft: Yeah. So what they've done in the last couple of years, they finally got a team working on it. There's a manager called Alexis Bateman that I used to work with in the sustainability team that's now managing this, and she's cranking stuff out and they finally started releasing stuff. So the very latest release from AWS now has per region down to per region.It has location based, just got added to the market based. So we actually have that finally.Chris Adams: okay. Yeah.Adrian Cockcroft: So this happened a few weeks ago. and the, and they've added, I think they have cloud. CloudFront because it's a global, CDN, it doesn't really live in a region. So they've separated CloudFront out and they also changed this model, as I mentioned earlier, so that the carbon model now includes supporting services that are required for you to use the thing.So your, Lambda functions, even if they're not running, you've still got a carbon footprint because you need to have the lambda control planes there, ready to run you. So you pay for a share of that. And then the question is, how do you calculate these shares? And it's probably, you know, dollar based or something like that.Some kind of usage based thing,Chris Adams: Okay. Alright. So that's, yeah, I think I've, I read the, I hadn't realized about the location based, information being out there as well.Actually,Adrian Cockcroft: the location and the model with a new thing and they've now got this sort of, every few months they're getting a new thing out. They have def, they've clearly said they're going to do scope three. I know they're trying to do scope three where they real scope three thing rather than a financial allocation scope three.So we could talk about that if you want, how much you wanna get into the weeds, of this stuff. But anyway,So what we ended up with in the real Time cloud project was we figured out it wasn't really possible to get real time energy statistics out of cloud providers because the numbers just didn't exist.It turns out the only place you can get real time numbers is on things that are not virtualized. And the thing that people don't generally virtualize is the GPUs. Yeah. So if you're using an Nvidia GPU, you can get a number out of it, which is the energy consumption of that GPU. So if anyone working on AI based workloads, you can get the dominant energy usage cap calculation is available to you, sources available.But the CPUs, because the way virtualization works, you can't provide the information unless you're using, what they call a bare metal instance in the cloud, which you get access to the whole thing. So that's we gave up a bit on having like real time energy data and also the CNCF came up with a project called Kepler, which does good estimates and it does a workload analysis for people running on Kubernetes.So it just, we just did a big, like point over at that. Just use, Kepler. If you want workload level energy estimates, use Kepler. and then. If we want to, and we focused instead on trying to gather and normalize the data, the metadata available on a region so that you could make region level decisions about where you want to deploy things and understand why certain regions were probably more efficient than others in terms of PUE and water usage and, carbon and the carbon free energy percentage that the carbon that the cloud provider had, meaning how much local generation did they have in that region.So that was the table of data that we've produced and standardized, and we've put a 1.0 standard on it. And the current activity there is to rewrite the doc to be, basically, standards compliant so that we can create an ISO standard or propose an ISO standard around it. And the other thing we're doing is talking to the finops Foundation who come at this from the point of view of.standardizing the way billing is done across cloud providers and they have all the cloud providers as members and all working on billing and they're trying to extend that billing to include the carbon aspectsof what's produced. working. so, we've done an interview with someone from Focus already who is basically talking about, they are almost. You, like you mentioned before, the idea that, okay, Microsoft and Google have shared this kind of per service level information and the sky hasn't fallen.Chris Adams: They've created something a bit like that to kind of almost list these diff different kind of services. What, if I understand it, the GSF, you know, the, real time cloud thing might be like a carbon extension for some of that, because that doesn't necessarily, the, right now the focus stuff doesn't have that much detail about what carbon is or what, the kind of subtleties might be related to the kind of other, the kind of non-cash non, yeah, the, non-cash things you might wanna associate with, the way you, purchase cloud for example.Adrian Cockcroft: Yeah, so focus is the name of the standard they've produced. Really all the cloud providers have signed up to it. If you go to an AWS's billing page, it talks about focus and has a focus, a conformant, schema. So the idea was all the cloud providers would have the same schema for their billing. Great obvious thing to do, but all the cloud providers have joined up to do that, which is fine.Now Focus does, has some proposals for sustainability data, but they are just proposals for maybe the next version. They had a working group that looked at it and the problem they run into. One of the things is we've deeply looked into that in our group. We know why you can't do that. So what you'd really like is a billing record that says you just used, you know, 10 hours of this instance type.And this is the carbon footprint of it. And the problem is you, that number cannot be calculated. and that's what you'd like to have. And intuitively you'd like to just no matter how much carbon it is, the problem is the carbon is not known at that time. You can generate the bill 'cause you know, you've used 10 hours of the thing, but you can't know the energy consumption and the carbon number, the carbon intensity, those two numbers are not known for a while.So you typically get the data a month or two later. Whereas like, yeah, but you have to go back to your billing data. So you could put a guess in there. And things like the cloud carbon footprint tool and other tools that are out there will just generate guesses for you. but they are guesses. And then when you go and get the real data from your car cloud provider, the numbers will definitely be different, sometimes radically different.so the question is, do you want to have an early guess or do you want to have a real number and what are you doing with that number? And if what you're doing is rolling it up into an audit report for your CFO to go and buy some carbon credits at the end of the year, that's what the monthly, reports are for.Right? If you're a developer trying to tune a workload that is useless information to you, you need real, that's what the real Time cloud group was really trying to do is like if you're a developer trying to make a decision about what you should be doing. You know, calculating an SCI number or, understanding which cloud provider and which region has what impact.That's the information you need to make a decision in real time about something. So the real time aspect is not about like in my milliseconds, I need to know the carbon or whatever. It's like I need to know now. I need to make a decision now.Chris Adams: to make a forward looking decisionAdrian Cockcroft: Yeah. It's like I need to make a decision now, so what information do I have now?Which is why we take the historical, metadata they have for the regions and we project it into the current year with, so just trending and filling in the gaps to say, this is our best guess for where you'd be if you needed to make a decision this year, on it. And we've got some little code that automatically generates the Nafus, estimate.Chris Adams: so that's, at least useful. So people have an idea about what you might be using these two different kinds of data for. I guess maybe the thing, if we could just unpack one last thing before we move on to one of the questions is that one of the reasons you have this delay is basically because, is it, 'cause companies aren't, don't get the billing data themselves and they need to go then go out and buy credits.Like this is for the market based calculations. So this, what you've said here is basically about carbon based on a market based figure. But if we had something like, maybe if we were to separate that out and looking, look at something like location based figures for electricity, which is like representing the kind of what's happening physically on the grid.You plausibly could look at some of this stuff. Is that the, I mean, is that the way you see it? Really? Because I feel that we are now at this point where there's a figure for the grid, but that's not necessarily gonna be the, only figure you look at these days, for example, because as, because it's, we increasingly seeing people having different kinds of generation in the facility.If you've got batteries, you might be, you might have charged batteries up when the energy's green, for example, or clean and then using it at a certain time. that's there's another layer that we need to, that you might need to take into account. Right.Adrian Cockcroft: Yeah, so there's a couple of different reasons why the data i
Backstage: Software Standards Working Group SCI
In this Backstage episode of Environment Variables, podcast producer Chris Skipper highlights the Green Software Foundation’s Software Standards Working Group—chaired by Henry Richardson (WattTime) and Navveen Balani (Accenture). This group is central to shaping global benchmarks for sustainable software. Key initiatives discussed include the Software Carbon Intensity (SCI) Specification, its extensions for AI and the web, the Real-Time Energy and Carbon Standard for cloud providers, the SCI Guide, and the TOSS framework. Together, these tools aim to drive emissions reduction through interoperable, transparent, and globally applicable standards. Learn more about our people:Chris Skipper: LinkedIn | WebsiteNavveen Balani: LinkedInFind out more about the GSF:The Green Software Foundation Website Sign up to the Green Software Foundation NewsletterResources:Software Standards Working Group [00:18]GSF Directory | Projects [01:06]https://wiki.greensoftware.foundation/proj-mycelium [03:57]Software Carbon Intensity (SCI) Specification | GSF [04:18] Impact Framework [08:09]Carbon Aware SDK [09:11]Green Software Patterns [09:32]Awesome Green Software | GitHub [10:11]Software Carbon Intensity for AI [10:58]Software Carbon Intensity for Web [12:24]Events:Developer Week 2025 (July 3 · Mannheim) [13:20]Green IO Munich (July 3-4) [13:35]EVOLVE [25]: Shaping Tomorrow (July 4 · Brighton) [13:51]Grid-Aware Websites (July 6 at 7:00 pm CEST · Amsterdam) [14:03]Master JobRunr v8: A Live-Coding Webinar (July 6 · Virtual) [14:20]Blue Angle for Software / Carbon Aware Computing (July 9 at 6:30 pm CEST · Berlin) [14:30]Shaping Progress Responsibly—AI and Sustainability (July 10 at 6:00 pm CEST · Frankfurt am Main) [14:41]Green Data Center for Green Software (July 15 at 6:30 pm CEST · Hybrid · Karlsruhe) [14:52]If you enjoyed this episode then please either:Follow, rate, and review on Apple PodcastsFollow and rate on SpotifyWatch our videos on The Green Software Foundation YouTube Channel!Connect with us on Twitter, Github and LinkedIn!TRANSCRIPT BELOW:Chris Skipper: Welcome to Backstage, the behind the scenes series from Environment Variables, where we take a look at the Green Software Foundation's key initiatives and working groups. I'm the producer and host, Chris Skipper. Today we are shining a spotlight on the Green Software Foundation's Software Standards working group. This group plays a critical role in shaping the specifications and benchmarks that guide the development of green software.Chaired by Henry Richardson, a senior analyst at what time, and Navveen Balani, managing Director and Chief Technologist for Technology Sustainable Innovation at Accenture, the group's mission is to build baseline specifications that can be used across the world, whether you're running systems in a cloud environment in Europe or on the ground in a developing country.In other words, the Software Standards Working Group is all about creating interoperable, reliable standards, tools that allow us to measure, compare, and improve the sustainability of software in a meaningful way.Some of the major projects they lead at the Green Software Foundation include the Software Carbon Intensity Specification, or SCI, which defines how to calculate the carbon emissions of software; the SCI for Artificial Intelligence, which extends this framework to cover the unique challenges of measuring emissions from AI workloads; the SCI for Web, which focuses on emissions from websites and front end systems;the Realtime Energy and Carbon Standard for Cloud Providers, which aims to establish benchmarks for emissions data and cloud platforms;the SCI Guide, which helps organizations navigate energy carbon intensity and embodied emissions methodologies,and the Transforming Organizations for Sustainable Software, or TOSS framework, which offers a broader blueprint for integrating sustainability across business and development processes.Together these initiatives support the foundation's broader mission to reduce the total change in global carbon emissions associated with software by prioritizing abatement over offsetting, and building trust through open, transparent, and inclusive standards. Now for some recent updates from the working group.Earlier this year, the group made a big move by bringing the SCI for AI project directly into its core focus. As the world turns more and more to artificial intelligence, figuring out how to measure AI's energy use and emissions footprint is becoming a priority. That's why they've committed to developing a baseline SCI specification for AI over the next few months, drawing on insights from a recent Green AI committee workshop and collaborating closely with experts across the space.There's also growing interest in extending the SCI framework beyond carbon. In a recent meeting, the group discussed the potential for creating a software water intensity metric, a way to track water usage associated with digital infrastructure, especially data centers. While that comes with some challenges, including limited data access from cloud providers, it reflects the working group's commitment to looking at sustainability from multiple environmental angles.To help shape these priorities,they've also launched a survey across the foundation, which collected feedback from members. Should the group focus more on Web and mobile technologies, which represent a huge slice of the developer ecosystem? Should they start exploring procurement and circularity? what about realtime cloud data or hardware software integration?The survey aims to get clear answers and direct the group's resources more effectively. The group also saw new projects take shape, like the Immersion Cooling Specifications, designed to optimize cooling systems for data centers, and the Mycelium project, which is creating a standard data model to allow software and infrastructure to better talk to each other, enabling smarter energy aware decisions at runtime.So that's a brief overview of the software standards working group. A powerhouse behind the standards and specs that are quietly transforming how the world builds software. Now let's explore more of the work that the Software Standards Working Group is doing with the software Carbon Intensity Specification, the SCI. A groundbreaking framework designed to help developers and organizations calculate, understand, and reduce the environmental impact of their software.The SCI specification offers a standardized methodology for measuring carbon intensity, empowering the tech industry to make more informed decisions in designing and deploying greener software systems. For this part of the podcast, we aim some questions at Navveen Balani from Accenture, one of the co-chairs of the Software Standards working group.Navveen rather graciously provided us with some sound bites as answers. Chris Skipper: My first question for Navveen was about the SCI specification and its unique methodology.The SCI specification introduces a unique methodology for calculating carbon intensity using the factors of energy efficiency, hardware efficiency, and carbon awareness. Can you share more about how this methodology was developed and its potential to drive innovation in software development?Navveen Balani: Thank you, Chris. The software carbon intensity specification was developed to provide a standardized, actionable way to measure theenvironmental impact of software. What makes it unique is its focus on three core levels,energy efficiency, hardware efficiency, and carbon awareness. Energy efficiencylooks at how much electricity a piece of software consumes to perform a task.So writing optimized code, minimizing unnecessary processing, and improving performance, all contribute. Hardware efficiency considers how effectively the software uses the infrastructure it runs on,getting more done with fewer resources and carbon awareness adds a critical layer by factoring in when and where software runs.By understanding the carbon intensity of electricity grids, applications can shift workloads to cleaner energy regions or time windows. The methodology was shaped through deep collaboration within the Green Software Foundation involving practitioners, academics, and industry leaders from member organizations.It was designed to be not only scientifically grounded, but also practical, measurable and adaptable across different environments. What truly sets SCI apart and drives innovation is its focus on reduction rather than offsets. The specification emphasizes direct actions that teams can take to lower emissions, like optimizing compute usage, improving code efficiency, or adopting carbon aware scheduling.These aren't theoretical ideas. They're concrete, easy to implement practices that can be embedded into the existing development lifecycle. So SCI is more than just a carbon metric. It's a practical framework that empowers developers and organizations to build software that's efficient, high performing, and environmentally responsible by design.Chris Skipper: The SCI encourages developers to use granular, real world data where possible. Are there any tools or technologies you'd recommend to developers and teams to better align with the SCI methodology and promote carbon aware software design?Navveen Balani: Absolutely.One of the most powerful aspects of the SCI specification is its encouragement to use real world, granular data to inform decisions, and there are already a number of tools available to help developers and teams put this into practice. A great example is the Impact Framework, which is designed to make the environmental impact of software easier to calculate and share.What's powerful about itis that it doesn't require complex setup or custom code. Developers simply define their system using a lightweight manifest file,and the framework takes care ofthe rest — calculating metrics like carbon emissions in a standardized, transparent way, this makes it easier for teams to align with the SCI methodology and Track how the software contributes to environmental impact over time. Then there's the carbon aware SDK, which enables applications to make smarter decisions about when and where to run based on the carbon intensity of the electricity grid. This kind of dynamic scheduling can make a significant difference,especially at scale.There's also a growing body of Green Software Patterns available to guide design decisions. The Green Software Foundation has published a collection of these patterns, offering developers practical approaches to reduce emissions by design. In addition, cloud providers like AWS, Microsoft Azure and Google Cloud are increasingly offering their own sustainability focused patterns and best practices, helping teams make cloud native applications more energy efficient and carbon aware. And for those looking to explore even more, the awesome Green Software Repository on GitHub is a fantastic curated list of tools, frameworks, and research. It's a great place to discover new ways to build software that's not only efficient, but also environmentally conscious.So whether you're just starting or already deep into green software practices, there's a growing ecosystem of tools and resources to support the journey. And the SCI specification provides the foundation to tie it all together.Chris Skipper: Looking ahead, what are the next steps for the software standards working group and the SCI specification? Are there plans to expand the scope or functionality of the specification to address emerging challenges in green software?Navveen Balani: Looking ahead, the Software Standards working group is continuing to evolve the SCI specification to keep pace with the rapidly changing software landscape. And one of the most exciting developments is the work underway on SCI for AI. While the existing SCI specification provides a solid foundation for measuring software carbon intensity, AI introduces new complexities.Especially when it comes to defining what constitutes the software boundary, identifying appropriate functional units and establishing meaningful measurements for different types of AI systems. This includes everything from classical machine learning models to generative AI and emerging AI agent-based workloads.To address these challenges, the SCI for AI initiative was launched. It's a focused effort hosted through open workshops and collaborative working groups to adapt and extend the SCI methodology specifically for AI systems. The goal is to create a standardized, transparent way to measure the carbon intensity of AI workloads while remaining grounded in the same core principles of energy efficiency, hardware efficiency, and carbon awareness.Beyond AI, there are also efforts to extend the SCI framework to other domains such asSCI for Web,which focuses on defining practical measurement boundaries and metrics for Web applications and user facing systems. The broader aim is to ensure that whether you're building an AI model, a backend service, or a web-based interface, there's a consistent and actionable way to assess and reduce its environmental impact. So the SCI specification is evolving not just in scope, but in its ability to address the unique challenges of emerging technologies. It's helping to create a more unified, measurable, and responsible approach to software sustainability across the board. Chris Skipper: Thanks to Navveen for those insightful answers. Next, we have some events coming up in the next few weeks.First starting today on July 3rd in Manheim, we have Developer Week 2025. Get sustainability-focused talks during one of the largest software developer conferences in Europe. Next we have GreenIO, Munich, which is a conference powered by Apidays, which is a conference happening on the third and 4th of July. Get the latest insights from thought leaders in tech sustainability and hands-on feedback from practitioners scaling Green IT.In the UK in Brighton, we have Evolved 25, shaping tomorrow, which is happening on July the fourth. Explore how technology can drive progress and a more sustainable digital future.Next up on July the eighth from 7:00 to 9:00 PM CEST In Amsterdam, we have Grid-aware Websites, a new dimension in Sustainable Web Development hosted by the Green Web Foundation, where Fershad Irani will talk about the Green Web Foundation's latest initiative, Grid Aware.Then next week Wednesday, there's a completely virtual event, Master JobRunr v8, a live coding webinar, July the 9th, sign up via the link below.Then also on Wednesday, on the 9th of July in Berlin, we have the Green Coding Meetup, Blauer Engel, for software/carbon aware computing, happening from 6:30 PM.Then on Thursday, July the 10th from 6:00 PM to 8:00 PM CEST, we have Shaping Progress, Responsibility, AI, and Sustainability in Frankfurt.Then finally on Tuesday, July the 15th, we have a hybrid event hosted by Green Software Development, Karlsruhe in Karlsruhe, Germany, which is entitled Green Data Center for Green Software, Green Software for Green Data Center.Sign up via the link below.So we've reached the end of this special backstage episode on the Software Standards Working Group and the SCI Project at the GSF. I hope you enjoyed the podcast. As always, all the resources and links mentioned in today's episode can be found in the show notes below. If you are a developer, engineer, policy lead, or sustainability advocate, and you want to contribute to these efforts, this group is always looking for new voices.Check out the Green Software Foundation website to find out how to join the conversation. And to listen to more episodes about green software, please visit podcast.greensoftware.foundation and we'll see you on the next episode. Bye for now. Hosted on Acast. See acast.com/privacy for more information.