0
$\begingroup$

I have 3 domains of supplier data (Jan 2017 to Jan 2022) and they are as follows

a) Purchase data - Contains all the purchase (of product) data made by the suppliers with us. It contains columns such as purchase date, invoice number, product id,supplier id,project name

b) Inventory data - Contains the stock/inventory info of our product with the suppliers (in their warehouse). This is reported every month. It contains columns such as supplier id, product id, inventory_reported_date, qty_in_stock etc. There is no project name here.

c) Order backlog data - Contains the pending orders yet to be delivered by us to the suppliers. Meaning, the suppliers have already booked orders with us for products but we are yet to deliver. It contains columns such as supplier id, supplier name, product id, qty ordered, supplier_requested_delivery_date,company_delivery_confirmed_date etc

Now, I would like to come up with a rule to identify suppliers who are likely to leave us or stay with us. We plan to build supplier attrition ML model. For this, however, we don't have any ground truth with us (to know whether a supplier left us or not). So, we would like to create rule based label to indicate supplier attrition risk. It could be high risk and low risk. Meaning, high risk indicates supplier who is highly likely to leave and low risk means supplier who is less likely to leave us

please note that a supplier can buy same product multiple times for the same project and also for different projects

some of the points that I could think of is as below but am not sure whether it is correct or logical

a) Decline in order backlog - I can find out the average order backlog for a specific product by a supplier over time (Jan 2017 to Jan 2020) and how it is doing from Feb 2020 to Jan 2022. If the trend is declining, should I mark it as high risk?

b) Decline in purchase history - I can find out the average purchase time period (like every 3 months, 6 months etc) for a specific product by a supplier over time (Jan 2017 to Jan 2020) and how it is doing from Feb 2020 to Jan 2022. If the trend is declining, should I mark it as high risk?

c) Inventory data - If inventory is not reported for a specific product by a supplier, is it okay to consider that supplier left us for that specific product? But it is not realistic to expect supplier to buy all products available with us. He will only buy what he wants (and reports inventory only for what he buys)

Can I seek your suggestions and views on how we can arrive at a rule based label for supplier attrition scenario?

$\endgroup$
8
  • $\begingroup$ Hey! Long time no see. I'm reading your question, and here's what jumps out at me. Do I understand correctly that you have no "ground truth"/labels of any kind? So, are you talking about creating a synthetic/"fake" label based on information about the supplier and then try to run ML on those synthetic labels? $\endgroup$ Commented Apr 20, 2022 at 16:29
  • $\begingroup$ @VladimirBelik - Thanks for getting back. Yes, you are right. We don't have any ground truth. However, we would like to start off by synthetic labels first. However, the rules that we arrive at will be used for brainstorming with business users. $\endgroup$ Commented Apr 21, 2022 at 2:40
  • $\begingroup$ @VladimirBelik - I was thinking of using Recency, Frequency and Monetary thingy. But the problem is not all products are expected to bought by suppliers frequently. Meaning, if the product cost is 1 USD, supplier can buy frequently in large quantities. But let's say that product cost is 1000 USD, supplier may not buy frequently (and it is logical/expected). So, if I use RFM approah, then this high cost products will show up as not being bought often (and indicates attrition risk). So, any thoughts or suggestions. You can write as an answer. $\endgroup$ Commented Apr 21, 2022 at 2:43
  • 1
    $\begingroup$ This seems like a difficult problem, because the paradox I'm seeing is this: how can you "predict" a synthetic label if it's from data you already have? I think it's worth taking a step back and re-assessing your objective here. I'm not sure ML is appropriate here, unless you use UNsupervised ML to try to get an understanding of the clusters/types of clients you have. Fundamentally though, the paradox I mentioned before is the biggest issue I see - if you're using existing info to create labels, why not just predict that info directly? $\endgroup$ Commented Apr 21, 2022 at 3:11
  • 1
    $\begingroup$ Yes, you are right, we also have a plan to do unsupervised. But I also want to come up with a proper logic to define Recency, Frequency and Monetary of our customers. Am not sure whether you are open to put your email id publicly here. If not, [email protected] is my email id. You can drop an email and we can discuss there. $\endgroup$ Commented Apr 21, 2022 at 3:15

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.