K- Means Clustering on Based Classification Method of Sales Agent

Agent is one of very important assets for distributors. A better knowledge of the agents and their behavior is required, particularly to support decisions related to the company's business strategy and to manage a better relationship with distributors. Such knowledge can be obtained by classifying agents based on their behavior through historical data, such as the sale and purchase transaction data. One approach that can be done is a segmentation approach can be done by dividing the agents into several segments. In this paper, Data Mining techniques i.e. K-means clustering method is exploredto classify sales agents. By implementing k-means, the knowledge about the best agents can be acquired along with the agents that have least contribution to the distributor.


Introduction
In maintaining relationships with the agents, distributors apply various approaches. Such an approach can be realized by utilizing strategic information for controlling the sales and after-sales activities services to the agent. However, the distribution is not optimal because there is no segmentation of sales agents. A business strategy that can be done to improve profitability, revenue and customer satisfaction at the initial stage is set segmentation [1]. Segmentation of the sales agents is important because every agent has different characteristics and behavior [2]. Therefore, a different business strategy is required. The right business strategy can be applied if there is the sales agents are classified well. Knowladge Discovery in Databases (KDD) is a process to gain useful knowledge from large amounts of data sets [3,4,5]. By utilizing the Data Mining tools, the data extraction process can be performed optimally [6]. This allows us to predict the behavior and future trend, enabling businesses to make proactive decisions based on certain knowledge [7]. Data Mining is capable of answering the business issues that are traditionally too long to be resolved [8] Data Mining browse the database to find hidden patterns. In this case, Data Mining clustering technique is used. Cluster analysis is a technique of Data Mining in classifying a series of data objects into several groups or clusters, in order that objects in a cluster have a high degree of similarity [9,10,11,12,13,14,15]. Different data will be placed indifferent clusters [16]. The k-means isone of the popular centroid-based classification techniques [17]. Verma, et al. explaind thatk-means cluster analysis is a method that aims to partition the n observations into k clusters where each observation is owned by the cluster with the closest mean [18]. The k is the number of clusters we want to form. It can be concluded that the k-means intended for objects that have the same characteristics grouped in the same cluster and the objects that have different characteristics [19,20]. Sale agent segmentation divides the sales agents of a company into several homogenous group from a heterogenous data [20,21,22]. The purpose of the segmentation of the sales agents is to maximize the value of each agent for the distributor. Through segmentation, agents with better performance tend to have different treatment in the distribution services [23,24,25,26]. This allows marketers of the company to choose an effective way of treating agents with different characteristics, since the objective of this segmentation is to establish a better relationship with the customer in order to maximize revenue [27,28,29]. One of the things that can be done to determine the characteristics of the agent is to learn the sales historical data and find hidden knowledge is using Data Mining. The RFM (Recency, Frequency, and Monetary) model is known as one of the customer value analysis method that was first introduced by Bult andmWansbeek in 1995 as described in [30], and has been applied in marketing for a long time. In [31], authors suggested that the integration between RFM analysis and Data Mining to sales data can yield useful information about current customers or new customers. Indicators in RFM analysis are: 1. Recency of last purchase (R); R represents recency, referring to the last time interval of purchase until the current time. In this paper, Data Mining techniques. k-means clustering method is explored to classify sales agents with respect to RFM. By implementing k-means, the knowledge about the best agents can be acquired along with the agents that have least contribution to the distributor. The rest of this paper is organized as follow. Section 2 presents proposed method. Section 3 presents results and discussion. Finally, the conclusion of this work is presented in Section 4.

Research Method
Supporting data that is relevant to the research process in determining segmentation agents using data mining techniques are historical sales data [21,22]. Preprocessing is a stage in the Data Mining that requires long time to complete [32]. Many raw collected data do not meet the appropriate criteria to conduct mining process, such as records that are incomplete or unclear and the selection of inappropriate attributes for Data Mining processes [33]. This stage also made transformation of data into the Recency, Frequency and Monetary model [30].
At the design stage, the number of clusters is determined along with specifications of the hardware and software used. Sales agent data value that has been calculated using RFM analysis will be processed by using Rapid Miner to obtain a conclusion regarding the pattern resulting from the process of data extraction (Pattern Evaluation). After testing Rapid Miner, the next step is to determine the sales agent segment. Figure 1 shows the process stages conducted in this study.  Figure.1, there are five processes proposed i.e. preprocessing, RFM transformation, data mining with k-means clustering, testing and evaluation with rapid miner, and finally segmentation (classification) of sales agent.

Data Analysis
The data used in this study is the historical cement sales data for one year. in2012 of a cement distributor company. Table 1 as follow presents a sample of XYZ company cement sales data only in 2 January 2012:

Data Preprocessing
Before the data is processed by the Data Mining, many missing values in raw data are often found i.e., the distortion value, non-saving value (misrecording), sampling is not good enough,etc. In the preprocessing, the selection of attributes to be used is also performed.

Applying RFM Model
After RFM value of each agent is obtained, the scaling is done because the value contained in the attribute RFM has a very different range especially the very high gap between the maximum and minimum values in the Monetary variables. This may affect the validity of the cluster. There for, scaling is considered very important. Table 3 presents specified scaling rules. In applying the RFM model, as described in Figure 1 earlier, the sample value of RFM is obtained in Table 2 as follows:  Table 4 shows RFM sample sales data values after scaling.

K-Means Algorithm
In the sales agent clustering process, k-means clustering method is used as shown in Figure 2.  Figure. 2, the k-means algorithm can be described as follows.
1. Determine the number of clusters. As mentioned previously, it will be set to eight clusters.

Determine the central cluster (centroid)
randomized to the object as much as k clusters.
3. Specify the distance of the object to the centroid using the formula Euclidean Distance. By the formula: 4. Allocate each object based on the closest centroid. Having obtained the distance of each data against centroid, the data is allocated based on the minimum distance to the cluster. 5. Search a new central cluster. 6. Back to step 2 until there is no object to be removed. In this case, the testing is performed by using Rapid Miner software in generating clusters. The next step is profiling on each cluster. Profiling is performed by calculating the average RFM value of each cluster compared to the overall average. Figure 3 shows the results obtained by the extraction of cluster models. The comparison between centroid of each cluster and the average of overall clusters is shown in Table. 5. The following figure presents the RFM values for each cluster: Figure From Figure 5, the complete results are the following: The agents belong to this cluster is an agent that has contributed the least to the company (Uncertain). Overall, having acquired the characteristics of each cluster, five different Segment of agents are obtained, namely: the best agent (Best), an agent which has a high value for the company (Valuable), an agent with a contribution of at least (Uncertain), agents that has decrease transaction (Churn) and a new agent (First Time). Figure 6shows a diagram of a membership percentage for each segment.

Conclusion
This paper has presented classification of sales agent for cement distribution using k-means clustering. From the results of the implementation and testing that has been done Using Rapid Miner tool, it can be concluded that the company's agents dominated by sales agents with the Uncertain characteristics. The 47% of agents in PT. XYZ, Ltd is an agent that has contributed very low. They no longer carry out the transaction in the near future. These agents are rarely bought and paid nominal value too low. The company owns 20 % of the best agents, they carry out transactions on a regular basis until today with high nominal amount. In addition, there are 2 % agents that have potential and high value. The Company currently has 29 % of new agents and 2 % agent has decreased the activity. From the testing results, it is known that the agent who occupy certain segment in the preceding discussion also occupies the same segment in the testing phase. After getting the knowledge, the company is expected to set a more appropriate policy.