Comparing the Right Units of Efficiency

Image shows various units of measurement.

The recent sharp increase in the cost of gasoline and electricity affects us every day. Because of this, it is more important than ever to seek out the most efficient technologies. But doing so is not always simple because efficiency is often measured in confusing or misleading units. For example, the efficiency of cars is advertised in miles per gallon, but measuring efficiency in gallons per 100 miles is a much better metric for comparing vehicles. Similarly, the efficiency of AI, manufacturing facilities, and HVAC systems can be misleading when measured using common metrics. For each of these, it is always best to calculate efficiency in terms of how much energy it takes to do what you want, whether it is driving a mile, manufacturing a widget, heating a building, or answering a search query.

Fuel Economy

To begin with, a common example is cars’ fuel economy. Manufacturers typically advertise a vehicle’s fuel economy in terms of how many miles it travels on a gallon of gas. However, because the relative efficiency between different fuel economies is not linear, it is difficult to quickly compare fuel (and cost) savings between cars. As you can see from the chart below, moving from a gas guzzler (17 mpg) to a slightly more efficient vehicle (20 mpg) saves the same amount of fuel per 100 miles (one gallon) as moving from good (33 mpg) to excellent fuel economy (50 mpg). This is why using fuel economy standards to target the least efficient vehicles has a much greater impact on overall fuel use and emissions than greatly improving the most efficient vehicles (although those still matter).[1]

When comparing efficiency of cars, it is better to compare the amount of fuel used for each desired unit (distance traveled) rather than the distance traveled per unit of fuel. In addition to easier comparison, the shift in metrics also results in a shift in what is considered important: mpg focuses on increased mileage while gpm focuses on lower consumption.

Large Language Models

The substantial amount of energy used by large language models (LLMs) is also a hot topic. There are many ways to calculate the efficiency of data centers including electrical loss component (ELC, measuring losses to inefficiencies in its electrical system), mechanical load component (MLC, measuring the amount of energy used to cool the IT equipment), and power usage effectiveness (PUE, the total facility energy divided by the IT equipment energy). Still, those are meaningless if you are trying to compare the efficiency of LLCs to other tools. Similar to fuel economy, what we are really interested in is how much energy it takes to do a unit of whatever it’s supposed to do.

Recently, I came across a post by Ketan Joshi that discusses how much energy is used for each use of an LLM.[2] In his post, he estimated the amount of energy used by an LLM to multiple two 5-digit numbers. In this example, the LLM consumed 0.2864 Wh to generate an answer, which does not seem like much at all. However, if you compare that to a typical calculator (0.0008 W x 0.5 seconds of calculation = 0.00000011 Wh) or an iPhone calculator app (1 watt X 0.5 seconds of calculation = 0.000139 Wh), it looks a lot worse. In this example, a regular calculator is about 2.6 million times more efficient than using an LLM to do a math problem and an iPhone is about 2,000 more efficient.[3] Even worse, the number calculated by the LLM may not even be correct. Joshi’s post also shows research that LLMs only had a 97.5% accuracy rate when multiplying two 5-digit numbers, which is not ideal for math.

Similarly, according to the International Energy Agency, a ChatGPT query uses about 10 times the amount of energy as a Google search (2.9 Wh vs. 0.3 Wh). Although widely cited, this value is dated and the actual value might be 60x instead of 10x.

LLMs should not be used to do math problems and probably for not simple searches either, so these examples are not to call in doubt the general usefulness of AI. However, when considering how efficient data centers or LLMs are, it is important to compare it to other ways to do the same thing (e.g., a math problem or a search), not to a common data center baseline.

Buildings

Misaligned efficiency metrics can also obscure how much energy buildings use. By focusing only on an efficiency rating (e.g., the SEER value of an air conditioner), it can be difficult for users to compare how much energy is actually used. For example, savings from installing a heat pump may be lower than expected if the weather is significantly different from the previous year, or one building may have different savings because it is a different size. When evaluating energy efficiency, it is important to normalize the results to be able to compare the actual impact in the unit of interest (e.g., the energy needed to heat the same home in a typical meteorological year or the energy use per square foot in different buildings). In industrial facilities, sometimes improvements in efficiency lead to lower than expected savings because of production increases. To really compare the impact of the intervention, one must look at how the energy use per manufactured widget changes over time, not the total energy use.

[1] Once you start considering electric vehicles vs. internal combustion engines, it gets even more interesting. A recent Yale study estimates that for ICE cars, only 16-25% of the energy goes to the wheels compared to 87-91% for EVs. Factoring in how the electricity is generated (e.g., through fossil fuels or renewables) adds another layer of fun complexity.

[2] The post is about a year old, so some of the information may be outdated.

[3] This only calculates the computing time of the iPhone. If you also included the time that the screen was on to enter the equation (say 5 seconds), it comes out to 0.0116, which is still 86 times more efficient than an LLM.