Analysis of Covid-19 Cash Direct Aid (BLT) Acceptance Using K-Nearest Neighbor Algorithm

During the COVID-19 pandemic, the government imposed Large-Scale Social Restrictions (PSBB) to reduce or slow down the spread of COVID-19. This causes people to be unable to work as usual, and not even a few people have lost their jobs. This prompted the government to launch the Covid-19 direct cash assistance (BLT) program. One of the areas affected by the PSBB is Batu Ampar Village, which distributing BLT is considered less effective by residents because there are BLTs that are not well-targeted. The cause of the ineffectiveness of the distribution of aid was assessed because the data was out of sync; it was difficult to verify and validate the new data due to the size of the area and the constantly changing number of underprivileged residents. To overcome these problems, a model is needed to predict the recipients of this Covid-19 BLT. This study uses the K-Nearest Neighbor (K-NN) algorithm and RapidMiner tools to make predictions and validate using Cross-Validation. The data used are 711 lines with 474 training data and 237 testing data resulting in an accuracy of 89.68% for training data and 88.61% for testing data.


INTRODUCTION
Currently, the government has implemented various types of poverty and disaster reduction programs. However, the assistance that reaches the community is not as expected. The discrepancy in receiving aid funds occurs because the determination of the status of low-income families as recipients of aid has not been optimal, so in the provision of poverty assistance, there are still those not on target. One of them is the COVID-19 Direct Cash Assistance (BLT) program [1] [2].
The COVID-19 BLT program was designed as a substitute for improving living standards during a pandemic due to Large-Scale Social Restrictions (PSBB). Therefore, the amount of BLT is calculated as an increase in the standard of living of the poor caused by price increases (inflation) caused directly or indirectly. Judging from the government's program, poverty alleviation efforts in the State of Indonesia are attractive to the public. So that people are busy praising the government for the BLT program. Not surprisingly, then the people at the level of the able category also made themselves the target of the COVID-19 special BLT. However, this government program is considered ineffective, so it must be predicted to provide COVID-19 aid funds [3] [4].
Prediction is an art and science form of predicting future events. This can be done by taking historical data and projecting it into the future in a mathematical model. Alternatively, it can also be done using mathematical models adapted to the best judgment. Prediction plays a critical role in business. Currently, research on prediction has developed. Several methods, such as KNN, have been used for document classification [5], traffic detection [6], Heart Disease Prediction [7], stock price prediction models [8], and others. The results showed that the KNN method had accuracy in predicting. This algorithm performs data classification based on learning data (train data sets) taken from the k closest neighbors (nearest neighbors). This is an opportunity to explore KNN to predict decision-making in providing direct COVID-19 assistance funds in Batu Ampar Village.
Batu Ampar Village is one of the villages that have the regulation of Constitution No. 6 of 2014 on Villages following the good mandate of the village has the implementation of government, development, and community. Batu Ampar Village in BLT distribution has been assigned to provide COVID-19 funds directly to the community. Giving BLT is currently less effective because it is related to the availability of appropriate data for low-income family data that is on target. The availability of this data is often unverified and validated due to the region's vastness with a vast and frequently changing number of low-income families. The status of lowincome families obtained is often inaccurate data regarding the number and location of its spread. Often, there is a shortage of families receiving assistance in a region, while there is an overreceiving aid in other areas. The existence of appropriate data can help plan the right program.
Based on the above problem, the author needs to study the prediction of BLT COVID-19 in Batu Ampar Village using the K-Nearest Neighbor Algorithm (K-NN). The K-NN method can be used to select decent communities and does not receive funds for the help of low-income families. This method performs the classification of objects based on learning data closest to the object. The K-NN method serves to classify or group criteria from the data tested [9].

Data Mining
According to [10], the rapid development of data mining cannot be separated from the development of information technology that allows large amounts of data to accumulate. However, the rapid growth of data accumulation has created a condition called the "rich of data but poor of information" because the collected data cannot be used properly. Not infrequently, the data set is left alone to create "data tombs" (data graves). Data mining is also known as KDD (knowledge discovery in the database). However, in 1995, an International KDD Conference in Montreal successfully defined that KDD is a process of recognizing information or a new truth and is practical and recognizes discernible data patterns. The primary purpose of the KDD process is to predict the valuable values of existing variables or find patterns of a data cluster that humans can interpret. Following these objectives, recognizing new information and discovering such patterns needs to be applied with data mining. So data mining is a part that cannot be separated from the KDD process [11].
Please note that data mining is one of the fields that is quite widely supported by other branches of science in information technology, namely statistics, database technology, machine learning, expert systems, parallel algorithms, genetic algorithms, pattern recognition, data visualization, and others [12] [13]. Several factors are the main reasons why using data mining: 1. The amount of data collected takes a long time and enough experts to analyze it. 2. Computers have become one of the leading choices because of their speed, accuracy, never tiredness, and ease of operation. 3. Pressure from business competition strengthens so that information becomes critical and must be owned immediately. 4. It was able to find an unthinkable pattern at all.
According to [14], data mining is one of the activities in software that can provide a high ROI (Return of Investment). The thing to note is that data mining is different from query tools. Query and data mining are two things that complement each other. The existence of data mining is not to replace queries but to add some meaningful additions. At a simple question, the information can be accessed about 80% of the data in the database, while the other 20% will be hidden information that requires special techniques to access [15] [16].
Because data mining is a series of processes, it is divided into several stages, among others: 1.

2 K-Nearest Neighbor (K-NN)
The K-Nearest Neighbor (K-NN) algorithm is a method that uses a supervised algorithm. K-NN includes the instance-based learning group. This algorithm is also one of the lazy learning techniques. K-NN is done by searching for groups of k objects in training data that are closest (similar) to objects in new data or data testing [17].
The KNN algorithm is a method of classifying new objects based on (K) their nearest neighbors. KNN includes supervised learning algorithms, where the results of a new instance query are classified based on most of the categories on the KNN. The class that appears the most will be the class resulting from the classification.
In general, to define the distance between two objects x and y, the Euclidean distance formula is used in equation 2.2.
= Training Data

3 Rapid Miner
Rapid Miner is open-source software. Rapid Miner is a solution for data mining, text mining, and predictive analysis. Rapid Miner uses various descriptive and predictive techniques to provide users with the best decisions. There are approximately 500 data mining operators owned by Rapid Miner, including operators for input, output, data preprocessing, and visualization. A Rapid Miner is a stand-alone software for data analysis and a data mining machine that can be integrated into its products [19]. Rapid Miner is written using the Java language to work across all operating systems.
Rapid Miner has the following properties: 1. It is written in the Java programming language to run on various operating systems. 2. Multi-layer concept to ensure efficient data display and ensure data handling. 3. It has a GUI, command-line mode, and Java API that can be invoked from other programs.
Some of the features of Rapid Miner, among others: 1. Many data mining algorithms include decision trees and self-organization maps.
3. There are many variations of plugins, such as text plugins, to perform text analysis. 4. Provide data mining and machine learning procedures, including ETL (extraction, transformation, loading), data preprocessing, visualization, modeling, and evaluation. 5. The data mining process comprises nestable operators, described with XML and created with a GUI. 6. Integrates Weka data mining project and R statistics.

4 Cross-Validation
Validation and testing are done to determine whether all the functions are working correctly or not. Validation is done with 10-fold CrossValidation. Tenfold cross-validation is performed by dividing the data set into ten segments of equal size by randomizing the data. Validation and testing were carried out to determine the accuracy, precision, and recall of the 197 classification prediction results. Accuracy is the percentage of records classified correctly in the test of the data set. Precision is the percentage of data belonging to a good model, which is also good. The recall is a measurement of the actual positive recognition rate [20].

Research Framework
The framework for thinking about the researcher's picture can be seen in Figure 2 below.

6 Data Collection Methods
In the preparation of this study, to get the data and information needed, the methods used in the data collection process are carried out as follows:

Literature Review
It is the study of data by collecting references through books and journals during the analysis period.

Interview
It is a data collection technique by holding a question and answer directly to the village head and staff of the Batu Ampar Village office.

Documentation
It is a technique of collecting documentation and data evidence taken at the Batu Ampar Village Office.

1 Exploratory Data Analysis
This chapter will discuss the analysis carried out on the data obtained. The data obtained amounted to 711 in 2020, along with a comparison of the criteria described using a graph for each variable. Figure 3 illustrates the occupational categories, Figure 4 depicts the land area (hectare), Figure 5 describes the income category, Figure 6 represents the number of children, and Figure 7 illustrates the recipients of aid funds.
PROBLEM : The current BLT granting process is less effectively explained because it is related to the availability of unfinished data, poor family data that is on target.

2 Encoding Variable
In this encoding process, the data will be converted to numeric. Variables that must be changed are Job and Aid Recipient Category. In the variables of work, several types of jobs must be changed, such as: FarmWorkers "0" IRT "1" Private Employee "2" Village Chief "3" Village Device "4" Farmer "5" PNS "6" Driver "7" Handyman "8" Self-employed "9" As for the variable category of aid recipients, that must be changed, namely. YES "1" and TIDAK "0".
The results of the encoding table can be seen in the following image:

3 Test Results with Rapid Miner Studio application
Process the KNN Algorithm by dragging and dropping data and the operators to be used in the process directed in Figure 8. The testing process in Figure 9 is to break down and drop the KNN Algorithm operator in the Cross-Validation process, enter 10-Cross Validation values, and perform the Drag and Drop data process and the operators to be used by clicking the cross-validation operator drop. After entering cross-validation and dragging and dropping the operator, connect the KNN to the Result, and click the process button to generate an accuracy value. Based on the validation calculations carried out using KNN with 474 training data, an accuracy of 89.68% can be seen in Table 2. Based on the training data process results above, as many as 400 citizens are classified "No" In getting financial aid, and 26 citizens are classified "Yes" in obtaining financial assistance, according to calculations using rapid miner tools. It can be known that: Testing 237 data resulted in an accuracy of 88.61%. The table of data accuracy test results can be seen in Table 2 below. The results of the training data process above, then for the data, are as many as 237 citizens classified "No" In getting financial aid is 192 and 19 citizens classified "Yes" in obtaining financial assistance, according to calculations using rapid miner tools. It can be known that: