Reccomendations on Selecting The Topic of Student Thesis Concentration using Case Based Reasoning

Abstrak Case Based Reasoning (CBR) merupakan metode yang bertujuan untuk menyelesaikan suatu kasus baru dengan cara mengadaptasi solusi-solusi yang terdapat pada kasus-kasus sebelumnya yang mirip dengan kasus baru tersebut. Sistem yang dibangun dalam penelitian ini adalah sistem CBR untuk melakukan rekomendasi topik konsentrasi skripsi mahasiswa. Penelitian ini menggunakan data mahasiswa S1 Teknik Informatika IST AKPRIND Yogyakarta dengan jumlah data sebanyak 115 data yang terdiri 80 data latih dan 35 data uji. Penelitian ini bertujuan merancang dan membangun sistem Case Based Reasoning menggunakan Metode Similaritas Nearest Neighbor dan Manhattan Distance, serta membandingkan hasil nilai akurasi menggunakan Metode Nearest Neighbor Similarity dan Manhattan Distance Similarity. Proses rekomendasi dilakukan dengan menghitung nilai kedekatan atau similaritas antara kasus baru dengan kasus lama yang tersimpan dalam basis kasus menggunakan Metode Nearest Neighbor dan Manhattan Distance. Fitur yang digunakan dalam penelitian ini terdiri dari IPK dan nilai mata kuliah. Kasus yang diambil adalah kasus dengan nilai similaritas tertinggi. Jika suatu kasus tidak mendapatkan rekomendasi topik atau kurang dari nilai trashold 0,8, maka akan dilakukan revisi kasus oleh pakar. Kasus yang berhasil direvisi disimpan kedalam sistem untuk dijadikan pengetahuan baru.. Hasil pengujian menggunakan Metode Nearest Neighbor mendapat nilai akurasi 97.14% dan Metode Manhattan Distance 94.29%.


INTRODUCTION
The field of thesis concentration is the concentration to be taken by students in determining the topic of thesis. IST AKPRIND Yogyakarta, has several courses, and one of them is the Program of informatics engineering S1 level, which offers a student thesis course of final grade with a weight of 6 (SKS). Courses of thesis taken by students must meet the specified requirements of the course, among others: have carried out the work of Practice (KP) with the real work lecture (KKN), and have completed the course with a total of 140 credits with the Komulative Achievement Index (GPA) ≥ 2. 2016 curriculum based on KKNI and in accordance with applicable laws and regulations are required to choose the concentration of interest that is owned by the Department of Informatics Engineering. The concentration areas are 3 (three), namely: systems and information technology, computer networks, and intelligent systems and computing. Students currently conduct registration of thesis and selection of thesis concentration topic by filling out form. Students collect the form to study Program ' to get approval from the Board of Study Program.
Research related to academics and provide solutions to the problems faced by some researchers namely [1], [2], [3] and [4]. The research conducted by [1] is to help provide the solution faced by the students when there are problems and the results of the accuracy value of 90%. Research by [2] conducts case-based learning for the teaching of therapy in a glance Advance and blended learning. Research by [3] gives students solutions at the time of Counseling and [4] conducts personalized recommendations on learning paths using examples of student learning activities.
The process of selecting a thesis topic is still manually so it takes a system to help make recommendations on the selection of thesis topics. Case Based Reasoning Implementation in academic case has been much done before. L.Mustofa research utilizes CBR's concept to recommend students to determine the most suitable majors in the university. [5]by implementing CBR to determine students who received a PPA scholarship (increased academic achievement) and a BBM scholarship (student learning assistance) resulted in a 100% accuracy. While [6]use CBR to recommend topics and lecturers of thesis guidance to students S1 Informatics Engineering Bumigora Mataram. Research on the academic field also conducted by [7]is CBR for the predictions of prospective students ' graduation time.
Some stages of the CBR process or better known as CBR cycles are retrieve, reuse, revise, and retain. Retrieve process is the process of retrieving old problems relevant to new problems. According to E.Faizal research retrieval is the essence of CBR, the process of finding a solution from the old case in the case base to solve a new case. In the process retrieve the old cases compared with the new case then calculated the value of similarity from the comparison of the case. Similarity is the process of calculating similarities between old cases and new cases.
CBR has many methods of similarity that can be used, including Euclidean Distance, Manhattan, Minkowsky, Cosinus, Hamming Distance, Simple Matching Coefficient, Jaccard, Nearest Neighbor. The method of similarity used in this research is the method of Nearest Neighbor's similarity and the method of similarity of Manhattan Distance, which is further compared to obtaining the highest accuracy method of similarity. Research using Nearest Neighbor was also conducted Guessoum research, [8]. Research conducted by Guessoum Diagnosing chronic pulmonary disease with a result of 100% accuracy.
Research by [8] diagnosing psychiatric disorders. Test results of 250 cases, the system is able to properly diagnose as many as 240 cases (96%). The implementation of the use of more than one similarity method which is then compared by [9] using three methods of similarity namely Nearest Neighbor Similarity, Minkowski Distance Similarity and Eucliden distance Similarity for the domain cases of heart disease with the highest accuracy results, the method of similarity Minkowski Distance of 100%.
The Manhattan Distance Similarity method was chosen because it was based on previous research on CBR topic recommendations and thesis supervisors [6], namely by testing 34%. [6] suggests using another retrival method to improve system accuracy. This study uses the Nearest Neigbor Method because in [5]research on CBR to determine students who get PPA and BBM scholarships the accuracy value reaches 100%. The approach or similarity process in Nearest Neighbor by calculating the closeness between the new case and the old case is based on matching weights from a number of existing features [5]. In the search for similarity values, most of them use symbolic or continuous data, but in this study, numeric (discrete) data are used. Data that contains numeric and symbolic data is called mixed attribute [10]. Nearest Neighbor was chosen because it is one of the classification and pattern recognition techniques and is widely used in the CBR system [11]. Therefore, in this study, the researcher intends to compare the Nearest Neighbor and Manhattan Distance similarity methods in CBR to obtain a similarity method that has high accuracy. 3. Students who will apply for a thesis to fill the registration form consisting of Nim, student name, field of concentration, title of thesis, GPA and academic value, then collected in the administration of the course. 4. Head of Study Program as an expert will see the conditions that have been determined and students await the results of the decision. This, it takes a long time because that lists more than one thesis. For more details can be seen in Figure

Case Based Reasoning
Case Based Reasoning (CBR) is a problem-solving method that uses knowledge of past experience to solve new problems [9]. Case Based Reasoning is a method aimed at completing a new case by adapting the solutions found in previous cases that are similar to the new case [8]. The knowledge side of Case Based Reasoning is dynamic because it often has added knowledge. The work of Case Based Reasoning is comparing new issues with the old case. The new problem has a resemblance to the old case that Case Based Reasoning will provide answers based on the old case to resolve the new problem. Case Based Reasoning will adapt if there is no match then insert the new case into the database of case storage (case base) so indirectly the knowledge of Case Based Reasoning will increase. CBR flows include: • Retrieve, looking for an earlier case similar to the issue • Reuse, copy or combine solutions from previous cases

Case representation
The data of the case collection obtained from the Archive in the Administration section will be stored as case based. Case cases in this system will be represented in the form of feature sets making it easier to process the addition and deletion of case features. The features used in this study are GPA at the time of submission of a thesis, 40 the general mandatory value of the subjects, and 7 required concentration courses. Compulsory concentration courses are used as a feature because some topics of thesis usually refer to each area of concentration. In addition, the academic ability of a student is seen from the value of the courses that have been put, so that the value of the course can be used as a reference to determine the topic of the thesis concentration that will be examined by students Expected with the use of academic values as a feature, the thesis process is faster due to the topic of concentration that will be researched according to academic ability and student interest. In this research cases will be represented in the form of flat feature.

Process Retrieval
The Retrieval used in this study is looking for a case similar to calculating the closeness between new cases with old cases using the similarity function. Most of the search process is done by looking at the whole case based by comparing the features to the new case. stage in the study adopted the K-Nearest Neighbor (KNN) method with k = 1 and Manhattan Distance. Measurement of similarity using Nearest Neighbor method is done two stages, namely local and global similarity. Local similarity is a measurement of similarity at the feature level, while global similarity is a measurement of similarity at the object level (case). The measurement of similarity using the Manhattan Distance method by searching for proximity between cases can be searched by calculating the distance between new cases and cases stored in the case base and the calculation of the similarity value. Each attribute has a different weight for each subject type whose value is determined by an expert, as shown in Table 2. Knowledge-based Systems 0,5

Nearest Neighbor
The similarity measurement will result in a decisive value on the presence or absence of similarities between the new cases and the cases in the case base. Similarity measurements include two things: a. Local Similarity Local similarity values are distinguished into two types, namely local numerically numeric and symbolic similarity. Equations (1) are equations used to perform local similarity value calculations for numeric features. [12] f( ) = 1-

2.6.Manhattan Distance
The retrieval technique used in addition to K-Nearest Neighbor, is to use Manhattan Distance which is looking for a case similar to calculating the closeness between new problems with old cases using the similarity function. Proximity between cases can be searched by calculating the distance between new cases and cases stored in the base case. The smaller the distance between cases then the greater the level of similarity (similarity) of the case. Calculation of distances using the Manhattan Distance method corresponds to Equation 3. Manhattan distance is the proximity measure best suited for that project represents cases that are relevant to numbers or data that are of a nature quantitative [14] As for calculating the value of similarity using equation 4. Manhattan or city block is used to take a suitable case from a case base by calculating the absolute weight amount of the difference between the current case and the first case in the case basis of equation 3.

=∑
*(| -|) Description: n = number of variables K = Features W = weight of features T = new case S = history (case on base case) Manhattan distance is the most suitable proximity measurement for projects that represent cases that are relevant to numbers or data that are quantifying. The calculation results the distance for each case with Manhattan Distance in Equation 3 equation formula to calculate the value of similarity between cases using equation 4 [12].

Confidence Level Measurement
Pal and Shiu (2004) suggest to calculate the level of confidence. That a T issue is part of an S case class Outline, measurement of the level of confidence used in this study Includes two things: 1. Expert confidence level An expert confidence level is an assurance of recommendations from experts based on student grades. Expert confidence levels are determined by experts. The higher the expert belief the higher the certainty of the recommendation result of a case.

Confidence level of new problems
Calculation of confidence on new issues using the equation below: )= Where ) is the level of confidence between the case T (target case) and S (source case), J , is the many features in the target case that appear in the source case feature whereas J ) are the many features contained in the target case.
The importance of expert confidence in a case and the handling of new symptoms that may arise in the new case, then the formula in the equation (2) . [11] is done modifications as did [15] namely modifying the equation by adding an expert confidence factor and handling new symptoms into the calculation of the case similarity, as shown in the equation below.
Sim(T,S)= ∑ ∑ *P(S)* where P (S) is the percentage of expert confidence in a case in the source case. Modifications made by [15] in the equation (6) with reference to the equation (5) will be used in this study and then will be performed on the equation (7) so that the formula of similarity to be used for the Manhattan Distance method is Sim(T,S)= *P(S)*

System Description
The system to be built is a capable web-based intelligent system make recommendations on the selection of topics for the thesis concentration of students using the Case Based Reasoning (CBR) system. CBR is a method of finding solutions that uses past knowledge to solve new problems. The main processes carried out in the study are the process of recording data, the process of calculating similarity, and the testing process. The process of data recording includes the process of student data input, topic data input, concentration data input, user data input and course data input. In this study case data is divided into case bases as system knowledge and test cases as cases that will be used in system testing. The process of finding a solution is to calculate the similarity of new problems (new problems) with old cases (case basis). The method uses the calculation of the similarity value of Nearest Neighbor and Manhattan Distance. The revision process will be carried out if the recommended new problem has a value below the threshold (similarity equals 0). The threshold value used in this study to find solutions to new problems is 0.8. This value is also used as an indicator of whether the new problem will be retained on a case basis or not. Cases that have been revised by experts will be stored on a case basis, the system will directly gain new knowledge from new problems that have been revised.

Data and Testing methods
The study uses 80 case-base data as system knowledge and 35 test case data as cases to be used in system testing. The topic Data of the thesis concentration has a total of 29 covering the concentration interests of the system and information technology, computer and system network and intelligent computing. These cases will be taken the solution and used as a solution for the new case. The study uses 115 data consisting of 80 base data and 35 test data with details of each topic can be seen in Table 3. method using case based are 2 data that does not comply with the decision of the system or if it is 5.71% unsuitable data. While that is 33 data or if it is 94.29%.