Case-Studies: Manufacturing

& Logistics

Factored developed a graph-based solution for detecting variations of supplier names and generating representative names for each group. The model was deployed as an API in Google Cloud.

The Challenge:

Companies can name suppliers however they want in their own ERP systems. When it comes to integrating different ERP systems, this causes duplication of supplier names and hurts large-scale analysis. The problem is to identify the different variations a supplier name can have and group them together under a single, normalized supplier name.

The Solution:

Our solution included fetching external data from Bing, which helped correctly identify tricky cases like subsidiaries, mergers and acquisitions. Our proprietary algorithm then built a graph from this data, followed by a hierarchical clustering approach and finally a deduplication and post-processing step. The model obtained an adjusted mutual information (AMI) score of 91%. This helped improve the data quality significantly, and now the client can detect transactions related to the same supplier, even if their names are different.

The Outcome:

More Stuff goes here…

Tech Stack & Skills:

BigQuery, DataFlow, Bing, Dedupe, StarSpace, Machine learning.