OK, it’s official, size does matter when it comes to DAX. Let’s be clear, we’re talking about DAX here. This is a Power BI Community Blog site after all. Let’s all remain professional. This subject of size and performance (still about DAX) came up in a rather droll conversation around the creation of my most recent Quick Measure Gallery submission, Order Fulfillment. If nothing else though, it got me thinking about whether the fact that size and performance matter really actually matters in every situation. I don’t think it does, but feel free to disagree.
So this question was asked in the forums. Given a set of sales order lines by product and quantity, tell me which locations/bins in my inventory that stock those products should be used to fulfill those orders. Oh, and fulfill the orders from the largest locations/bins of inventory first. And then later this morphed into a FIFO/LIFO situation, fulfill the earliest or latest sales orders first still sourcing from the largest locations/bins.
OK, so let’s solve this with a DAX measure. Flexible, dynamic, if down the road your warehouses in Louisiana get wiped out in a hurricane and you need to recalculate your sourcing on the fly, measures are good at that kind of thing.
Well, the first problem to overcome is fulfilling the orders using the largest locations/bins first. Ugh. Sort order in DAX is not guaranteed for pretty much anything. It’s not like we have access to EVALUATE and SORT BY in Power BI Desktop. But, there is one exception. Of all things, CONCATENATEX. Not even TOPN guarantees sort order in DAX but CONCATENATEX for some mysterious reason can. OK, peachy keen but now I have a text string. Ugh so now I have to convert it to an indexed table. Oh, luckily The Mytical DAX Index can be used to accomplish this feat as long as when I am CONCATENATEX’ing I use a pipe ( | ) character. Swell, things are moving along just fine.
But, and here is where we start to get into size and performance issues, now I have to perform a “while” loop. In DAX. Alright, we have a solution for that, the DAX “While” Loop. Except, it’s not particularly efficient like a real while loop. Basically for every sales order I need to do a calculation against every location/bin that holds the particular product. Then I have to find my “break” point and return the correct calculation from there. Details. However, this does present a problem of scale.
In my test scenario of 50,000 open order lines and 50 warehouses, that means I have to do 50,000 * 50 calculations or 250,000 total calculations at a minimum and then additional calculations on top of that. We’re easily reaching into the half a million to a million calculations at this point. Anyway, on my tiny little Surface, returning all 50,000 sales order lines takes about a minute. That’s a long time to wait for a visual to be displayed to be sure. Now, a single sales order consisting of 5,000 rows takes a couple seconds. An individual product that might appear in 10 or so sales orders, sub-second. All in all, I’m feeling pretty good about things and then…
So out of the blue I start getting heckled like “your code is garbage”, “it will never scale to millions of records” and on and on. Not even the “good” kind of heckling like “your code is garbage and here is how you can improve it”. Just, you know, “garbage”. It’s fine and kind of amusing because I’m thinking to myself, “who cares if it can’t scale to millions of sales orders and billions of locations/bins?” I mean, it’s DAX after all. We don’t even have proper while loops. And it’s not like the world is going to end because of some complex DAX code. So, yeah, 1 million times 1,000, that’s like, that’s a bunch of calculations!! But, just because it can’t scale, does that make the solution useless? Not necessarily.
Now, I am not the kind of person that throws statistics around. Because, you know, lies, **bleep** lies and statistics. Plus, in my experience people that quote statistics are kind of like people that constantly tell you how smart they are. Smartest people I have ever met have never once told me how smart they are. Just saying. But, take note that there are statistics about the size of businesses in the United States. 99.9% of those businesses have less than 500 employees. Of the .1% left, about 2/3rds of those have less than 1,000 employees.
So, I started thinking to myself. What are the odds that 99.9% of the businesses in the United States can use my garbage Order Fulfillment measure? I’m thinking, pretty high. I’m sure that there are businesses out there with less than 500 employees that have millions of open sales orders and thousands of warehouse locations but I’m willing to bet it is a small percentage of the population.
Anyway, at the end of the day, an enterprise with millions of open sales orders and thousands of warehouse locations or bins won’t be able to use my garbage Oder Fulfillment measure. Do I care? No. An enterprise that size had better be getting that information out of something like Dynamics 365 because that’s the kind of thing that Enterprise Resource Planning (ERP) systems were purpose built to do. So, I’ll take 99.9% of businesses out there being able to use my garbage code and notch that as a win. The other .1% can use their ERP systems.
Someone once told me that length without width is nothing. Wait…does that even apply here? Hmm…maybe that was uttered in a different context. Dangit! Screwed up another conclusion!