How Accurate is MacroFactor’s Expenditure Algorithm?

Most apps rely on static calorie formulas. MacroFactor’s adaptive algorithm learns from your data to deliver far more accurate targets. Here’s how it works, how we measure its accuracy, and why it consistently outperforms other methods.
Frame

MacroFactor’s nutrition coaching algorithm is at the very heart of what makes the app so effective and unique. In this article, we’ll explain how it works, how we evaluate its performance, how accurate it truly is, and how large of a benefit it provides relative to other approaches.

But, before we get too deep in the weeds, we need to first cover a bit of background information to briefly explain how the algorithm works, and how it differs from the approach taken by most other nutrition apps.

The typical approach of estimating energy needs

Virtually all nutrition apps provide recommendations on the basis of the CICO principle: long-term weight change is principally determined by energy balance. If you consume more energy (“Calories In”: CI) than you expend (“Calories Out”: CO) over time, you’ll be in a state of positive energy balance, and therefore you’ll gain weight. Conversely, if you consume less energy than you expend, you’ll be in a state of negative energy balance, and therefore you’ll lose weight.

If you’re diligent about logging your food, it’s not too hard to quantify “Calories In” with reasonable accuracy (even if your food logging isn’t perfect, or some foods have nutrition labeling errors). The tricky part of the CICO equation is the “Calories Out” term.

To estimate “Calories Out” (i.e., your total daily energy expenditure, or TDEE), the most popular approach is to use a calculator or equation that estimates your energy expenditure on the basis of some combination of your height, weight, age, sex, body composition, and activity levels. This process will typically provide most people with a reasonable ballpark estimate of their energy expenditure, but you also shouldn’t expect this value to be extremely accurate. It’ll be spot-on for some people, but individual errors of >500 Calories are quite common, and errors exceeding 1,000 Calories per day aren’t unheard of.

This graphic, and the simulation used to generate it, were originally published in this article.

An alternate approach is to estimate your energy expenditure using a wearable device (like a smartwatch), but research on wearables suggests that they’re similarly inaccurate, with a systematic review reporting that in free-living settings, wearable devices under- or over-estimate energy expenditure by more than 10% a whopping 82% of the time.

But, as mentioned above, estimating energy expenditure using a TDEE equation is by far the most common approach used by other nutrition apps, so that’s what we’ll use as a point of comparison for the rest of this article.

Once an app estimates your energy expenditure, it will typically recommend an energy intake target to you based on your goals. For example, using the typical energy density assumption of 3,500 Calories per pound (or 7,700 Calories per kilo), if you have a goal of losing a pound per week, your recommended energy intake target would be 500 Calories per day below your estimated TDEE. So, if your estimated energy expenditure is 2,500 Calories per day, the app would recommend an energy intake target of 2,000 Calories per day.

Some apps will explicitly show and tell you about these calculations, and some will keep them obscured, but this is the standard process that’s used almost universally. (And if this isn’t something a nutrition app does for you, these are the basic calculations most dieters will do themselves. This is also the basic process employed by LLMs and most nutrition coaches if you ask them for initial energy intake targets.)

From there, most apps don’t update or adjust your energy intake targets over time, even if your energy intake target turns out to be way too high or too low to help you reach your goals. And, of the handful of apps that do recommend updates, the most common approach is to just increase or decrease calorie targets proportionally with body weight.

The MacroFactor approach

When you first start using MacroFactor, the app will initially estimate your energy expenditure and provide energy intake targets using the same process described above, using a TDEE equation. The major difference between MacroFactor and other apps is that, as you log your food and weight over time, MacroFactor will continually update its estimate of your energy expenditure, and therefore recommend updates to your nutrition targets to reflect expenditure changes.

The expenditure algorithm itself is fairly complex, but here’s a simplified illustration of how it works:

When you start using MacroFactor, the app may initially estimate that your energy expenditure is 3,000 Calories per day. You have a goal to lose a pound per week, so the app recommends that you aim to consume 2,500 Calories per day.

However, as you log your weight and nutrition, you’re consistently eating 2,500 Calories per day, but you’re not losing any weight. This strongly suggests that your initial energy expenditure estimate (and, consequently, your initial caloric intake targets) was too high: if you’re maintaining your weight while eating 2,500 Calories per day, your energy expenditure is probably closer to 2,500 Calories per day than 3,000. So, your estimated expenditure will adjust downward, and your energy intake targets will decrease to reflect this new information, until your energy intake target does actually result in your desired rate of weight loss.

In other words, unlike other apps, MacroFactor is able to identify and correct errors in its nutrition recommendations. If your energy intake targets are too high or too low to help you reach your goals at your desired rate, MacroFactor’s algorithms will make adjustments to get you back on track. And, crucially, these updates are continuous as your goals, lifestyle, and energy expenditure change over time.

How we evaluate our algorithms

At MacroFactor, we’re big fans of quantification. We don’t just want to be able to say our algorithms work well in theory, we want to be able to measure how well they actually work in practice.

Accuracy versus predictive validity

The default method of quantifying the accuracy of something like MacroFactor’s expenditure algorithm is to run a validation study. For example, we could have 20 people live in a metabolic chamber for a few months while researchers fastidiously monitor their weight and provide them with meals prepared in the lab’s kitchen with perfectly accurate nutrition information. Then, we’d log all of the data in MacroFactor and compare how closely MacroFactor’s estimated expenditure values matched the values derived from the metabolic chamber. While this would be the gold standard approach for quantifying the accuracy of MacroFactor’s algorithms, it would be prohibitively expensive, and we actually care way more about a close cousin of accuracy: predictive validity. Predictive validity tells you how well your predictions match subsequent observations.

MacroFactor’s expenditure algorithm is a prediction engine. If MacroFactor estimates that you’re burning 3,000 Calories per day, and you also consume 3,000 Calories per day, the app is tacitly predicting that your weight will remain stable. If you instead gain or lose weight, that suggests the prediction was incorrect to some degree. So, we can calculate the difference between your predicted rate of weight change (based on your logged energy intake and your estimated expenditure) and your actual rate of weight change to quantify the predictive validity of our algorithms.

We care the most about predictive validity with real-world data for two main reasons.

The first reason is that predictive validity tells us how well our algorithms actually work for our users. In the real world, people don’t always log their food with perfect accuracy or weigh themselves every day under perfectly standardized conditions. Any estimation of accuracy under perfectly controlled laboratory conditions would almost necessarily overestimate how well the algorithms actually work for real users — it would just be an unrealistic vanity metric. What actually matters to us is knowing how well the algorithms can deal with the messiness of the real world.1

The second reason is related to the first: When predictive validity and raw accuracy differ, optimizing for predictive validity means optimizing for usefulness, which is what we ultimately care about. Optimizing for accuracy at the expense of predictive validity would actually lead to worse recommendations for our users.

That may not sound particularly intuitive, but it’s a simple principle to illustrate.

Let’s assume that we can know with perfect certainty that you burn exactly 3,000 Calories per day. So, if you want to maintain your weight, you should aim to consume 3,000 Calories per day. Simple enough.

But, what if you don’t log your energy intake with absolutely perfect accuracy? Maybe you eat a couple of foods every day that under-label their Calorie content. Maybe you eyeball portion sizes for some foods, or forget to log some sauces or condiments. You do a pretty good job of logging your food, but you ultimately end up typically consuming 3,200 Calories when you think you’re eating 3,000. So, you think you’re eating the right amount to maintain your weight, but the number on the scale starts gradually ticking up.

If you just care about optimizing for accuracy, there’s nothing that needs to change about the recommendation: You should aim to eat 3,000 Calories per day. Since we know you’re burning 3,000 calories per day, that is the correct recommendation. But, practically speaking, it’s a recommendation that’s resulting in an undesired outcome (gaining weight instead of maintaining weight). The only way it would lead to your desired outcome is if you became utterly obsessive about logging your food with 100% perfect accuracy, which is almost never feasible (at least long-term) in the real world.1

Since MacroFactor primarily cares about optimizing for predictive validity, we’d see that you’re slowly gaining weight when you’re logging 3,000 Calories per day, which would suggest that your energy intake target should be less than 3,000 Calories per day to help you achieve your goal of weight maintenance. So, the app would settle on an estimate that you’re burning around 2,800 Calories per day, and therefore, it would recommend that you should aim to consume 2,800 Calories per day to achieve your goal of weight maintenance.

You tend to underestimate your intake by 200 Calories per day (in this illustration), so when you aim to eat 2,800 Calories per day, you’ll actually end up eating around 3,000 Calories per day and maintaining your weight. In other words, the recommendation to consume 2,800 Calories per day is technically a less accurate recommendation, but it’s the recommendation that results in the desired outcome while simultaneously maximizing predictive validity.

In essence, accuracy and predictive validity are identical (in this context) if users always log their energy intake with absolutely zero error. But, if there are any errors in measuring energy intake, more accurate nutrition recommendations would necessarily lead to worse outcomes than recommendations that reflect and account for those errors.

For the sake of simplicity, I’ll be using the term “accuracy” for the rest of this article. But, just know that “accuracy” refers to the accuracy of MacroFactor’s implicit predictions related to prospective rates of weight change (i.e., predictive validity).

Assessing algorithmic performance

There are four key metrics we pay attention to when evaluating how well our algorithms perform:

  1. Absolute monthly weight change prediction error: to what extent do users’ actual rates of weight change differ from their predicted rates of weight change?
    • We focus on absolute values (i.e., making all values positive) because the simple average is (essentially) zero
  2. The correlation between actual and predicted rates of weight change
  3. Changes in cumulative accuracy over time
  4. The ability of the algorithms to avoid large errors

As we go along, most of these metrics will be fairly intuitive, but I think it’s worth being explicit about how predicted rates of weight change are calculated.

Over a 30-day period, we calculate MacroFactor’s cumulative estimate of your energy expenditure and then subtract your cumulative energy intake. Finally, we divide that value by 3,500, which is the implied energy content of each pound of weight change.2

Here’s an example: At the start of a 30-day period, MacroFactor estimated that your expenditure was 3,100 Calories per day, and that estimate decreased linearly to 3,000 over those 30 days. So, MacroFactor is tacitly predicting that you burned 91,500 Calories during that 30-day period. Over the same 30 days, you consumed 84,500 Calories. So, MacroFactor is tacitly predicting that you expended 7,000 more Calories than you burned, meaning that you should have lost 7,000/3,500 = 2lb. If you actually lost 2.3lb over that same 30-day period, the weight change prediction error is: -2.3 – (-2) = -0.3lb. In other words, your weight is 0.3lb lower at the end of the month than MacroFactor predicted, given your energy intake. Since we primarily focus on absolute values, this corresponds to an absolute monthly error of 0.3lb.

All of the analyses performed below use data from new users of the app who participated in our 2025 transformation challenge. This is a useful dataset for this article for a few reasons. First, all of the individual datasets cover the same length of time (100 days), so there are no issues related to sample sizes varying over time. Second, long-time users already understand how well the algorithms work. By focusing on new users, we can show how well you can expect the algorithms to work for you if you’re not already using MacroFactor. Third, new user data helps to clearly demonstrate how long it takes for the algorithms to achieve peak performance.

For most of the subsequent analyses, we’ll compare MacroFactor’s approach to the common default approach of estimating energy expenditure using a TDEE equation or calculator. This also happens to be the approach used by most other food logging apps. This should help contextualize the performance metrics.

For these comparisons, expenditure estimates using the “TDEE calculator” approach are re-calculated on a daily basis (rather than using a single static value for the entire 100 days) to scale proportionally with weight gain or loss. Without this daily recalculation, the relative inaccuracy of this approach could just be attributed to one-time TDEE estimates becoming outdated (i.e., if your calculated TDEE was perfectly accurate today, that discrete value would probably be considerably too high two months from now if you lost 20lb), but I want to show the potential for error even if you used a TDEE calculator every single day to estimate your energy needs.

Finally, all of the metrics below use the data from users with scant evidence of partial logging. We emphasize to our users that they don’t need to log all of their food with perfect accuracy, that it’s totally fine to just make a good-faith estimate for meals that are tough to log, and that they don’t even need to log absolutely every morsel of food that they consume. Partial logging is the single cardinal sin we ask users to avoid, there are plenty of easy ways to avoid it, and we even try to flag partially logged days for users when they do their weekly check-in. So, we feel justified excluding this data from the analyses presented below.

To estimate partial logging frequency, we tagged days where logged energy intake was <50% of the values from other logged days within the same week. So, for instance, if a user was eating around 2,000 Calories most days, but they had a day where they only logged 600 Calories, that would be flagged as a day that was probably partially logged. Days where users logged zero calories were not flagged, since these were likely to be days when users were intentionally fasting.

Obviously, this is an imperfect metric. There were almost certainly partially logged days that slipped through (i.e., if someone logged breakfast and lunch but didn’t log dinner, they may have still logged ~60–70% as many Calories as were logged on surrounding days), and some accurately logged days that were incorrectly tagged as being partially logged (i.e., if you have a stomach bug, you may truly have a day or two where you only consume 500 Calories). But, we’re dealing with enough data that a handful of misclassified users will have virtually no effect on the aggregate metrics. For a user’s data to be included in the analyses presented below, they could have at most five days of nutrition data that were flagged for potential partial logging. Ultimately, less than 10% of users were excluded from these analyses due to evidence of consistent partial logging, so the analyses presented below will certainly still provide a fair assessment of algorithmic performance for typical users.

After excluding the handful of users with evidence of consistent partial logging, we were left with data from 748 challenge participants.

Update: November 2025

Following the introduction of additional expenditure modifiers, the expenditure algorithm is now about 7% more accurate in the short term, and approximately 20% more accurate in the long term. You can read more about these improvements here.

Monthly weight change prediction errors

As mentioned above, when you first start using MacroFactor, it will initially estimate your energy expenditure and Caloric needs using a process similar to any other app, by applying a series of predictive equations that account for factors like height, weight, age, sex, body composition, and activity levels. These initial recommendations are not “the algorithm.” Rather, the algorithms kick in once you start logging your weight and nutrition data, and serve to refine MacroFactor’s understanding of your unique energy requirements.

So, as an initial test of MacroFactor’s algorithms, we can quantify how much more accurate MacroFactor’s recommendations become over time as personalized algorithmic recommendations replace recommendations derived from population-based equations.

In the graph above, the x-coordinate is the day at the beginning of each 30-day period. So, day 10 is the median weight change prediction error for days 10-39, day 20 is the median weight change prediction error for days 20-49, etc. The median prediction error decreases from approximately 1.9lb for the first 30-day period, to approximately 1.15lb from day 24 onward. So, it would appear that MacroFactor’s algorithms have nearly twice the accuracy of population-based TDEE prediction equations (i.e., error rates decrease by ~40%) when we compare the typical errors observed during the first 30-day period to the errors observed after people have been using the app for at least 3-4 weeks.

But, appearances can be deceiving. In fact, accuracy more than doubles. MacroFactor’s algorithms begin updating their estimates of your energy expenditure on the third day after you start using the app. So, those estimates are gradually improving for basically the entirety of the initial 30-day period. Without those improvements, prediction errors for the initial 30-day period would be considerably larger. We can see this when we compare MacroFactor’s accuracy to the accuracy of weight change predictions using the typical approach of just plugging your height, weight, age, sex, body composition, and activity levels into a TDEE formula or calculator (I’ll just refer to this as the “formula-based approach” for the rest of this article).

The formula-based approach had a median prediction error of about 2.6lb for the first 30 days. Rather than decreasing, these errors increased to a median prediction error of about 3.1lb for the last 30 days (likely due to metabolic adaptation, which TDEE prediction equations don’t account for).

So, after using MacroFactor for 3–4 weeks, the app’s nutrition recommendations are about 120–170% more accurate than the recommendations provided by a standard TDEE equation (i.e., typical errors are 55–63% smaller with MacroFactor).

Based on an assumed energy density of 3,500kcal per pound gained or lost, we can convert these weight change prediction errors into caloric terms (since, after all, we’re discussing the accuracy of MacroFactor’s algorithms, which estimate energy expenditure in terms of Calories). 

After 3–4 weeks of consistent use, MacroFactor’s energy expenditure estimation errors typically fall within the range of 60–240 Calories (median = 135 Calories), whereas formula-based estimation errors typically fall within the range of 155–590 Calories (median = 335).3 Furthermore, relatively large errors in MacroFactor (errors in the 75th percentile) are considerably smaller than the median errors of formula-based TDEE estimates, and relatively small errors for formula-based TDEE estimates (errors in the 25th percentile) are still larger than the median errors for MacroFactor users.

Strength of associations

As another method of quantifying how well MacroFactor’s algorithms estimate users’ energy needs, we can calculate the strength of the association between predicted and observed rates of weight change over time.

A correlation coefficient (Pearson’s r) tells you the strength and direction of an association. A positive r-value denotes a direct association (meaning that as one value increases, the other value also increases), whereas a negative r-value denotes an inverse association (meaning that as one value increases, the other value decreases). Furthermore, values closer to 0 denote a weak association, whereas values closer to 1 or -1 denote a strong association.

Standard interpretations for positive r-values
r-valueInterpretation
0.00–0.19very weak correlation
0.20–0.39weak correlation
0.40–0.59moderate correlation
0.60–0.79strong correlation
0.80–1.00very strong correlation

Since MacroFactor’s algorithms aim to predict users’ rates of weight change based on their energy intake, strong performance would coincide with a large, positive r-value. Smaller r-values would suggest that we were doing a poor job of estimating energy needs, and negative r-values would mean we were doing an extremely poor job of estimating energy needs.

Starting with our point of comparison, formula-based TDEE estimations were able to predict monthly rates of weight change with an r-value of 0.595. This is right in between a “moderate” and a “strong” correlation. Not too bad!

However, MacroFactor’s weight change predictions are far more strongly associated with users’ observed rates of weight change, with a r-value of 0.869. This is comfortably a “very strong” correlation. This also means that MacroFactor’s equations explain more than twice as much variance in observed rates of weight change compared to predictions from TDEE formulas (r2 = 0.755 for MacroFactor vs. r2 = 0.354 for TDEE formulas).

Of course, there’s plenty of noise in monthly data, not to mention that each user has 70 individual data points represented in both of the graphs above (meaning that each data point isn’t independent of all other data points, which is technically not advised for simple correlation analysis). So, let’s take a look at how well MacroFactor and TDEE formulas were able to predict users’ total weight change over the 100-day challenge period, with one data point per user.

The correlation coefficients increase for both (r = 0.94 for MacroFactor vs. r = 0.61 for TDEE formulas), but now MacroFactor’s predictions explain nearly 90% of the variance in observed weight change, while predictions from TDEE formulas still explain less than 40% of the variance.

Cumulative accuracy

Since we primarily focus on predictive validity when evaluating the performance of MacroFactor’s algorithms, there’s something of an upper limit on assessed performance over any finite time scale, because short-term weight changes don’t always reflect changes in energy status.

To illustrate, let’s just assume that MacroFactor’s algorithms are literally perfect (I’m not saying that they are — this is purely for illustrative purposes): they perfectly track with your energy expenditure, so they can always predict the size of your cumulative energy surplus or deficit over any time scale with zero error. Over relatively long time scales, your cumulative energy surplus or deficit is the primary determinant of weight change, but in the short term, your weight can fluctuate for reasons unrelated to energy balance. So, if you assessed how well this perfect algorithm could predict weight changes over any finite period of time, it would never get a “perfect score.”

For example, if you assess the performance of this perfect algorithm over a month, some people are going to start or end that month weighing a couple of pounds more or less than their “true” weight simply due to dehydration, low glycogen stores, bloating, constipation, etc. Observed rates of weight change will differ — at least to some degree — from theoretically perfect predictions pegged to energy balance, just due to the inherent noise in weight data.

However, this apparent error due to the noisiness of weight data should be similar on an absolute basis over basically all finite time scales. In other words, the number on the scale may change by a pound or two more than it theoretically “should” due to things like water weight, bloating, glycogen stores, etc., but these apparent errors should be similar in magnitude (on average) over a week, a month, a year, or a decade. So, when you instead calculate per day error rates, they should be expected to decrease over time. A pound of “error” just resulting from the noisiness of weight data is 0.14lb of prediction error per day over the course of a week, 0.03lb of prediction error per day over the course of a month, and 0.003lb of prediction error per day over the course of a year.

So, by assessing the behavior of cumulative per-day error rates, we can gain confidence that the algorithms are actually doing a good job of helping people achieve their desired rate of weight change over time, and we can cut through some of the inherent noisiness of that weight data.

Overall, this is the type of pattern we like to see: per-day error consistently decreasing over time. By the end of the 100-day challenge period, the median weight change prediction error per day was only 0.031lb, with an interquartile range (IQR) of 0.014–0.054lb per day. In other words, if you set a goal of gaining or losing some amount of weight, and you perfectly followed MacroFactor’s recommendations for 100 days, you should generally expect your final weight to be within about 3.1lb of your goal.

By comparison, if you continuously used a TDEE formula to estimate your energy expenditure (the process used by most other nutrition apps), instead of the per-day error rate gradually decreasing, the median per-day error rate essentially plateaus at about 0.085lb, with an IQR of 0.042–0.151lb per day. 

In caloric terms, this suggests that the typical error over the course of a user’s first 100 days with MacroFactor is about 4.4% of TDEE, or about 110 Calories per day. In other words, if your average expenditure for the first 100 days you use MacroFactor is estimated to be 2,500 Calories, you can be fairly confident that your actual average expenditure over that time span was between 2,390 and 2,610 Calories per day. Furthermore, the IQR for Caloric estimation errors covers the span of 50–185 Calories per day, or 1.9–7.8% of TDEE. So, cumulative errors smaller than 50 Calories per day are more common than cumulative errors exceeding 200 Calories per day.

If you instead continuously used a TDEE formula, you could expect a median error of approximately 300 Calories per day (around 12% of TDEE), with an IQR of 135–525 Calories per day (5.4–21.1% of TDEE). In other words, there’s a less than 1-in-4 chance that your Calorie targets would be as accurate as those of a typical MacroFactor user, and a greater than 1-in-4 chance that your Calorie targets would be more than 500 Calories too high or too low per day.

It’s also worth mentioning that around a quarter of the cumulative error for MacroFactor users during their first 100 days of using the app comes from errors in their initial expenditure estimate (i.e., before the algorithms take over and converge on highly personalized recommendations after 3-4 weeks). Of the median error of 3.1lb over the first 100 days, approximately 0.8lb can be directly attributed to error in the initial estimates. Once the algorithms fully take over, the cumulative error is approximately 0.023lb per day, equivalent to a median Caloric estimation error of approximately 80 Calories per day, or 3.25% of TDEE. That is roughly the degree of ongoing accuracy that users can expect after their first 3-4 weeks of consistently using the app.

Within-individual comparisons of accuracy

All of the comparisons above paint MacroFactor’s algorithms in a very favorable light, but they actually still undersell the accuracy of MacroFactor’s algorithms for each individual user.

When you just compare the medians and interquartile ranges seen in the graphs and text above, it might look like MacroFactor should be more accurate for around 75–80% of people, with TDEE formulas being more accurate for around 20–25%. However, in actuality, MacroFactor is more accurate for essentially everyone.

Instead of just looking at the distribution of prediction errors for the entire cohort of users, we can instead make within-individual comparisons: for each user, how large was the absolute prediction error with MacroFactor compared with the absolute prediction error from a TDEE formula?

Overall, prediction errors were smaller with MacroFactor for 94.1% of users, whereas prediction errors would have been smaller with a TDEE formula for just 5.9% of users. However, even that split still undersells the accuracy of MacroFactor.

The 5.9% of users who would have had nominally lower prediction errors from TDEE formulas were just users that would have gotten very accurate recommendations with both methods. In this small group of people, the average difference in 100-day prediction errors was just 0.66lb in favor of TDEE formulas.

On the flip side, the 94.1% of users who got more accurate recommendations from MacroFactor’s algorithms typically got way more accurate recommendations from MacroFactor’s algorithms, compared with the recommendations that would have come from a TDEE formula. For these users, the average 100-day prediction error with MacroFactor was 7.48lb smaller than the average 100-day prediction error from TDEE formulas.

In other words, there’s a ~94% chance that MacroFactor’s algorithms will predict your energy needs more accurately than a TDEE formula would over a 100-day period. Furthermore, if you happen to be in the ~6% that would get slightly more accurate recommendations from the TDEE formula, the difference between the two would be practically imperceptible (a difference in predicted weight change error of 0.66lb over 100 days is equivalent to a difference in Calorie recommendations of about 23 Calories per day). In essence, the only time that TDEE formulas “win out” is when they just so happen to produce recommendations that are virtually identical to MacroFactor’s recommendations.

But, for the ~94% of people who got more accurate recommendations from MacroFactor, the benefits were typically quite large, with an average difference in weight change prediction errors of 7.48lb over 100 days. That means that, for the vast majority of people, MacroFactor’s nutrition recommendations are more accurate than the recommendations of a TDEE formula by about 262 Calories per day, on average. Furthermore, the larger the difference between MacroFactor’s estimate of your energy needs and a TDEE formula’s estimate of your energy needs, the more likely it is that MacroFactor’s estimate is considerably more accurate.

All in all, MacroFactor’s algorithms estimate your energy needs nearly three times more accurately than estimates derived from the types of TDEE formulas that other apps solely rely on.

Avoiding large errors

The final thing we assess when evaluating the algorithms is their ability to ensure that everyone can be confident in their recommendations. It’s all well and good for most users to receive accurate recommendations from the app, but we’d be extremely disappointed if a sizable minority was still receiving bad recommendations.

Our metric here is simple: How frequently do the observed weight change prediction errors imply that the app’s energy expenditure estimate is off by 10% or more?

Ten percent is our benchmark for one simple reason: MacroFactor users often want to know if they should put more trust in the app’s estimate of energy needs versus their smartwatch (or other wearable device).

As we’ve covered previously, wearables are notoriously bad at estimating energy expenditure. That’s a near-unanimous finding in the scientific literature, with a systematic review finding that wearables under- or over-estimate energy expenditure in free-living humans by at least 10% in 82% of the studies that have tackled the question.

So, as a motivating metric to aim for, we want to be able to flip those numbers: Instead of generating errors larger than 10% at least 82% of the time, we want MacroFactor to generate errors smaller than 10% at least 82% of the time.

We’re doing pretty well on that front.

When we assess errors on a rolling 30-day basis, implied expenditure estimation errors exceeding 20% of TDEE are very rare (occurring in less than 5% of user-months following the initial 3-4 weeks of calibration). A little over three-quarters of implied monthly errors are smaller than 10% (75.9%, to be exact) after day 30, so we do still fall slightly short of our 82% goal. Furthermore, a little over 45% of implied monthly errors are smaller than 5% of TDEE.

However, when we look at periods of time exceeding a month, we do reach our goal of producing errors smaller than 10% more often than wearables produce errors larger than 10%. Over 100 days, errors exceeding 20% occur less than 2% of the time, around 55% of users have cumulative errors below 5%, and nearly 84% of users have errors below 10% of TDEE.

General notes

Although we excluded users with strong evidence of consistent partial logging from the analyses above, I thought it might still be worth showing the impact partial logging can have. As you can see in the graph below, occasional partial logging (for example, slipping up and partially logging one day per month) isn’t that much worse than never partial logging. However, as partial logging rates exceed 5% of days, and certainly as partial logging rates exceed 10% of days, accuracy decreases considerably.

I should also note that all of the analyses above only screened out people with extensive evidence of partial logging. We didn’t exclude people based on their demographic characteristics (age, sex, activity levels, etc.), or for not logging every single day of eating, weighing infrequently, or occasionally having logging data that looked fairly implausible (for example, 300lb men not losing weight while logging 1,500 Calories per day).

The purpose of this analysis wasn’t to show the theoretical upper limits of our algorithms’ performance with pristine, highly curated data. The purpose was to show how well the algorithms work for a broad, representative sample of typical users, some of whom don’t always log with perfect accuracy or consistency. Even with imperfect data, the algorithms are still able to consistently generate excellent recommendations.

But, if you are someone who weighs everything down to the gram and never misses a day of logging, you can reasonably expect that the algorithms will be even more accurate for you than the analyses presented in this article would suggest. However, the algorithms still work very well even if you aren’t overly obsessive with your logging habits, as long as you do avoid partial logging.

Takeaways

Since this is an article on the MacroFactor website, I’m sure you expected us to conclude that our algorithms work very well. If so, your expectations were correct. The only way to measure accuracy is to measure error, so the analyses were all presented in terms of the errors the algorithms produce. The magnitude of those errors tells us two things:

  1. The algorithms aren’t perfect. There is still some degree of error that’s not simply attributable to initial estimation errors or the inherent noisiness of weight data.
  2. Our algorithms are quantifiably much better than anything else on the consumer market at estimating energy expenditure (for the purpose of recommending nutrition targets to help you reach your goals).

Just to drive the second point home, here’s a final head-to-head comparison of MacroFactor versus the two most popular methods of estimating energy expenditure: TDEE formulas and wearable devices. Since the only metric we have for wearables is the frequency of errors smaller than 10%, that’s the metric we’ll need to use for the comparison.

By this metric, MacroFactor is about twice as accurate as TDEE formulas, and about 4-5 times more accurate than wearable devices. Not too shabby.

Ultimately, we care very deeply about the quality of our algorithms, because they’re at the heart of the MacroFactor experience, and they’re one of the main reasons why people use and trust MacroFactor instead of some other nutrition app. Our main goal is to help our users reach their goals, and being able to consistently provide accurate nutrition recommendations is central to that aim. That’s why we’ve invested so much effort into developing our market-leading algorithms, and it’s why we’re still constantly researching ways to improve them even further.

Footnotes
  1. To perfectly account for your energy intake, you’d need to burn some of each individual food you consume in a bomb calorimeter to adjust for nutrition labeling inaccuracies, and you’d also need to collect and burn all of your feces to account for the chemical energy in food that you don’t absorb. Even if you weigh everything you eat to the nearest gram, never eat at restaurants, etc., I can promise you that you still have some non-trivial degree of energy intake estimation error.
  2. The same basic process works for any discrete period of time. We just tend to focus on monthly data because it’s a convenient and common unit of time, and analyses over shorter or longer time periods don’t lead to materially different conclusions.
  3. Formula-based errors increased over time, even when accounting for changes in body weight, likely due to the fact that most participants in the MacroFactor challenge were losing weight, and were thus experiencing progressively greater energy compensation and metabolic adaptation over time. Unlike a static formula, MacroFactor can dynamically adapt to these changes. But, for people in a state of energetic maintenance, you shouldn’t necessarily expect formula-based estimation errors to increase over time. I didn’t want to call attention to the increased error magnitudes with formula-based estimations in the text of the article since these increased errors probably have limited generalizability.

Related articles

Spot Reduce copy
Can We Spot Reduce Body Fat?

Is spot reduction of body fat possible, or is it a well-disproven myth? We dive into the nuances that make this topic more interesting than most people think.

Do People Really Have "Fast Metabolisms" or "Slow Metabolisms"?
Do People Really Have “Fast Metabolisms” or “Slow Metabolisms”?

If you compare two people who are the same height, same weight, same age, same sex, and who have the same body composition, could those two people still have BMRs that differ by 500+ calories per day? The answer might surprise you. Read this article to dig into the research on the subject, and learn what it means for how you approach your diet goals.

Menopause copy
How Menopause Impacts Body Composition, Strength, and Performance 

How much does menopause itself play a role in the changes to your body composition, strength, and performance versus just aging? What role does hormone replacement therapy play? This article offers a more nuanced view on this topic than you often find in the online world.

MacroFactor featured images
Should We Supplement With Collagen?

Is collagen supplementation all hype or is there something to it? This article explores the nuances, current research, and practical advice.

Scroll to Top