Factored developed a graph-based solution for detecting variations of supplier names and generating representative names for each group. The model was deployed as an API in Google Cloud.
Companies can name suppliers however they want in their own ERP systems. When it comes to integrating different ERP systems, this causes duplication of supplier names and hurts large-scale analysis. The problem is to identify the different variations a supplier name can have and group them together under a single, normalized supplier name.
Our solution included fetching external data from Bing, which helped correctly identify tricky cases like subsidiaries, mergers and acquisitions. Our proprietary algorithm then built a graph from this data, followed by a hierarchical clustering approach and finally a deduplication and post-processing step. The model obtained an adjusted mutual information (AMI) score of 91%. This helped improve the data quality significantly, and now the client can detect transactions related to the same supplier, even if their names are different.