Steganographic Model for Encrypted Messages Based on DNA Encoding

Information has become an inseparable part of human life. Some information that is considered important, such as state or company documents, require more security to ensure its confidentiality. One way of securing information is by hiding the information in certain media using steganography techniques. Steganography is a method of hiding information into other files to make it invisible. One of the most frequently used steganographic methods is Least Significant Bit (LSB).In this study, the LSB method will be modified using DNA Encoding and Chargaff's Rule. Chargaff's Rule or complementary base pairing rule is used to construct a complementary strand. The modification of the LSB method using DNA encoding and Chargaff's Rule is expected to increase the security of the information.The MSE test results show the average value of the LSB method is 0.000236368, while the average value for the DNA Encoding-based Steganography method is 0.000770917. The average PSNR value for the LSB method was 76.82 dB while the DNA Encoding-based Steganography method had an average value of 70.88 dB. The time of inserting and extracting messages using the Steganography method based on DNA Encoding is relatively longer than the LSB method because of its higher algorithmic complexity. The message security of the DNA Encoding-based Steganography method is better because there is encryption in the algorithm compared to the LSB method which does not have encryption.


INTRODUCTION
The Information has become an inseparable part of human life. Almost all information has been stored in a data file format that can be stored on digital media such as computers, external storage, and others. Information stored in digital media has several advantages including easy storage, reduced paper usage, more resistance to damage, and others. However, information stored in digital media also has several drawbacks, including its originality, which is easy to change, easy to duplicate, and others. Some information that is considered important, such as state or company documents, require more security to ensure its confidentiality.
There are many ways to secure information stored in digital media. One way of securing information is by hiding the information in certain media using steganography techniques. Steganography is a method of hiding information in other files, such as image files, to hide the information's presence. The application of steganography will provide more security for information security, as well as a challenge for attackers in digital media storage. Attackers who want to know confidential data need to work harder to get that information.
Along with the development of DNA computation [1] emerged DNA cryptography. DNA cryptography is a relatively new technique for securing information in the field of cryptography, using DNA as an information carrier and computation with the help of molecular techniques. DNA cryptography combines computational complexity and biological complexity [2]. DNA cryptography is gaining attention because of its large DNA storage capacity, where one gram of DNA is known to be capable of storing about 108 terabytes of data. This ability to store large amounts of data makes DNA the best candidate for future media storage. The study of DNA can be applied to DNA cryptosystems based on DNA and one-timepads, and if used correctly, the system is virtually impossible to penetrate. There are various procedures for one-time-pad DNA encryption schemes [2].
The most widely known method of steganography is the Least Significant Bit (LSB) method. This method modifies the smallest bit layer of an image. This technique takes advantage of the fact that the smallest bits in the image can be considered random noise and their alteration will have no effect on the image. Although the image did not appear to change visually after modification, the statistical properties of the image did change significantly. [3]explain that this method operates in the spatial domain of digital images. However, the application of this method usually raises suspicion because sometimes it can be detected by steganography detection applications and is very easy to extract.
Many modified LSB methods that aim to increase security and reduce noise that occurs in the information insertion process. Several LSBs are modified S Simple LSB Substitution, Fibonacci Decomposition LSB Substitution, Prime Number Decomposition LSB Substitution, and Natural Number Decomposition LSB Substitution. However, the algorithm of the modified LSB focuses on the insertion and pays less attention to the encryption aspects of the message. To encrypt the message to be inserted, the modified LSB uses another encryption algorithm outside the steganographic scheme.
This study aims to modify the LSB method using DNA encoding and Chargaff's Rule. Chargaff's Rule, also known as the complementary base pairing rule, states that the DNA base pairs are always adenine with thymine (A-T) and cytosine with guanine (C-G). Chargaff's Rule is used to construct complementary strands. With this complementary strand, it is expected to increase the security of the message. The LSB method will be modified by utilizing DNA Encoding and Chargaff's Rule, the modification will use 2 LSB bits because it is adjusted to the characteristics of DNA Encoding which represents nucleotide bases in 2 bits.

DNA Encoding
Deoxyribonucleic Acid (DNA) is an entity that stores information from all types of living things. There are four nucleic acids, namely A (Adenine), C (Cytosine), G (Guanine), and T (Thymine) which are used in the DNA sequence. In the DNA sequence, A is the complement of T and C is the complement of G [4]. These four nucleic acids can be represented in binary numbers, as we know, in the binary system, 0 and 1 complement each other. Therefore, it can be concluded that 00 and 11 are complementary and also 01 and 10 are complementary [5], Table 1 shows the encoding and decoding maps for DNA.  Table 1  In addition to increasing the security of DNA cryptography, [6] proposed several algebraic operations in the form of XOR, addition, and subtraction between nucleic acids, as shown in Table 2, Table 3, and Table 4. Addition and subtraction for DNA were carried out based on a system of addition and subtraction in Z2 (mod 2) [7]. For example, 11 + 10 = 01, 01 -11 = 10.
The encryption process using DNA cryptography is done by converting the value from plain text to binary form. Then do the coding based on one of the rules in Table 1. Next, encrypt the coding results with the key using the XOR operation and summation in Table 2 and Table 3. The decryption process uses the XOR operation and subtraction in Table 2 and Table 4. Table 3 DNA Addition Operation

2 Chargaff's Rule
Chargaff's Rule, also known as the complementary base pairing rule, states that DNA base pairs are always adenine with thymine (A-T) and cytosine with guanine (C-G) [8]. Purines always pair with pyrimidines and vice versa. However, A does not pair with C, even though it is purines and pyrimidines. This rule is named after scientist Erwin Chargaff who found that there are basically the same concentrations of adenine and thymine, as well as guanine and cytosine in almost all DNA molecules [9]. These ratios can vary among organisms, but the actual concentration of A is always the same as T and the same as G and C. For example, in humans, there is about 30.9% adenine, 29.4% thymine, 19.8% cytosine, and 19.9%. guanine. This supports the complementary rule that A must match T and C must match G [10].
This corresponds to hydrogen bonds joining complementary DNA strands along with the space available between the two strands. There are approximately 20 Å (angstrom, 1 angstrom equal to 10-10 meters) between the two complementary DNA strands. Two purines and two pyrimidines together will only take up too much space to fit into the space between the two strands. This is why A can't bind G and C can't bind to T.
The bonds between purines and pyrimidines are not interchangeable due to the hydrogen bonds that connect the bases and stabilize the DNA molecule. The only pairs that can make hydrogen bonds in that space are adenine with thymine and cytosine with guanine. A and T form two hydrogen bonds while C and G form three bonds. It is these hydrogen bonds that join the two strands and stabilize the molecule, allowing it to form a ladder-like double helix.
By using Chargaff's Rule, you can arrange complementary strands based solely on the order of the base pairs. For example, let's say you know the sequence of one DNA strand is as follows: AAGCTGGTTTTGACGAC Using Chargaff's Rule, a complementary strand is obtained: TTCGACCAAAACTGCTG

Proposed Methods
This study develops an algorithm that combines DNA-based cryptography and steganography using Chargaff's Rule. In general, this research scheme is divided into an insertion scheme and an extraction scheme. In the insertion process, Message M is inserted into the Cover Image, the insertion process will produce a Stego Image. In the extraction process, Message M is extracted from the Stego Image.

Insertion Scheme
The insertion process is carried out by converting the binary message or M¬BIN into DNA using DNA encoding based on Rule 2 in Table 1. The DNA message is then processed by Chargaff's Rule into MCHR and the length of the DNA message is used as MLENGTH. At the same time, the RGB value of the Cover Image pixels is taken. Then the value is divided into layer R, layer G, layer B. Then at the end of layer G, the value from MLENGTH is inserted. In this study, the last 24 LSBs on layer G were allocated to store the MLENGTH value. The binary value of each layer is converted into DNA using DNA encoding based on Rule 2 in Table 1. The DNA G layer is then XORed with a Secret Key based on Table 2 to produce a G DNA Key in the form of DNA that is entered by the user. Figure 2 shows the flowchart of the Insertion Scheme.

Cover Image
Take the RGB value from the pixels

Figure 2 Insertion Scheme
Then the DNA addition operation was performed on the DNA Chargaff's Rule or MCHR message with the DNA G Key according to Table 3. generate MKEYG messages. According to the characteristics of human vision [11], the sensitivity of the three components of a different color image is most sensitive to green, followed by red, which is least sensitive to blue. Therefore, MKEYG messages will be inserted at layer B then on layer R. Meanwhile, at layer G there is no message insertion, layer G is only used to store MLENGTH. After the message is inserted into layers B and R, the Stego layers R and B are generated. From the G and Stego layers, the image is reconstructed to produce a Stego Image.

Extraction Scheme
The extraction process is carried out by taking the RGB value of the Stego Image pixels. Then the RGB value is divided into layer R, layer G, layer B. MLENGTH is obtained from the end of layer G. The binary values of each layer are then converted into DNA using DNA Encoding.
Then take the LSB value from layer B, then layer R along with the value from MLENGTH produces MKEYG. The DNA G layer is then XORed with a Secret Key based on  Table 2 to produce a G DNA Key. Then the MKEYG reduction operation is carried out with the G DNA Key according to Table 4 to produce a DNA message. The DNA obtained from the reduction operation is then carried out by the Chargaff's Rule process and converted into binary by the DNA Decoding process. Figure 3 Shows the Extraction Scheme.  Figure 3 Extraction Scheme

RESULTS AND DISCUSSION
Image quality testing is done using MSE calculations and PSNR calculations. The greater the MSE value, the greater the difference between the cover image and the stego image. In contrast to the MSE calculation, in the PSNR calculation, the difference between the cover image and the stego image is greater if the resulting value is getting smaller. PSNR values are usually expressed on a decibel (dB) scale. PSNR values below 30 dB indicate low image quality, while values above 40 dB indicate good image quality. The higher the PSNR value produced, the better the image quality [12]. The results of MSE and PSNR calculations for the DNA Encoding-based Steganography method and the LSB method can be seen in Table 5.  In Table 5, it can be seen that the comparison of MSE and PSNR values from the DNA Encoding-based Steganography method and the LSB method does not have a significant difference, so it can be said that the method produces the same good image quality.
The message insertion and extraction time test was carried out to determine whether the DNA Encoding-based Steganography method modified by the LSB method still had the same speed or not. LSB is a steganographic method that has a relatively fast message insertion and extraction speed. A comparison of the results of message insertion and extraction time between DNA Encoding-based Steganography method and the LSB method can be seen in Table 6. In Table 6, it can be seen that the average time required to insert and extract messages using DNA Encoding-based Steganography method is more than LSB method. This is because DNA Encoding-based Steganography method has a higher complexity than LSB method.
The histogram test aims to compare the histogram of the cover image and the stego image from the DNA Encoding-based Steganography method and the LSB method. The histogram shows the appearance frequency of each pixel value. The histogram comparison of the cover image and stego image of DNA Encoding-based Steganography method and the LSB method is shown in Table 7.  In general, the inserted image using the two techniques tested gave the same good results visually. When the message extraction process is carried out according to the message inserted. To test more accurately, several quantitative test parameter values have been carried out. The parameters taken into account in this test are the value of MSE, PSNR, and processing time.
The MSE value is a parameter that measures the error between the original image and the embedded image. The MSE value is large enough to indicate a decrease in quality or there has been a significant change in the inserted image. Table 5 shows the results of MSE calculations for each of the techniques tested. The average MSE value for the Cha DNA Encoding-based Steganography method is 0.000770917, while the average MSE value for the LSB method is 0.000236368. If viewed from the MSE value, it can be seen that the LSB method provides slightly better insertion results than DNA Encoding-based Steganography method.
Peak Signal to Noise Ratio (PSNR) is a comparison used to compare the value of the cover image with the stego image that has been inserted with a message. The higher the PSNR value, the better the similarity level between the cover image and the manipulated image. To calculate the PSNR value, you must first calculate the Mean Square Error (MSE) value of the two images.
Based on Table 5, it can be seen that the average PSNR value in the LSB method is 76.82 dB while the DNA Encoding-based Steganography method is 70.88 dB. This shows that the LSB method tends to be better than DNA Encoding-based Steganography method. This can be seen from the difference in the average PSNR value of 5.94 dB. The PSNR value distribution for the LSB method is at a minimum of 73.34 dB and a maximum of 79.49 dB. As for the DNA Encoding-based Steganography method, it is between 67.43 dB to 73.59 dB.
Processing time testing aims to review the time required to perform the insertion and extraction in each method. Based on Table 6, it is known that the average insertion processing time for DNA Encoding-based Steganography method is 0.6951 seconds, while the extraction process is 0.7336 seconds. Then for the LSB method, an average insertion processing time was obtained for 0.1246 seconds and extraction for 0.1027 seconds. DNA Encoding-based Steganography method requires a longer processing time because it has a higher algorithmic complexity than the LSB method.
Based on the results and analysis that has been done, it is found that the LSB method has the advantage of a faster processing time than DNA Encoding-based Steganography method. DNA Encoding-based Steganography method has a higher level of complexity than the LSB method. DNA Encoding-based Steganography method has better message security because it has encryption compared to the LSB method which doesn't have encryption. The MSE and PSNR values of the DNA Encoding-based Steganography method and the LSB method do not have a significant difference, so it can be said that the two methods produce the same good image quality. Table 8 shows the advantages of each method in each parameter. DNA Encoding-based Steganography method can insert and extract messages properly as long as the inserted message does not exceed the capacity of the cover image used. The calculation of MSE and PSNR from the stego image using the DNA Encoding-based Steganography method produces a value that is not much different from the stego image using the LSB method. The message security in DNA Encoding-based Steganography method is higher than the LSB method because there is encryption in the insertion so that the message that is inserted cannot be immediately guessed. DNA Encoding-based Steganography method developed in this study is applied to insert text into a bitmap image. In further research, insertion with messages and other media can be carried out. DNA Encoding-based Steganography method requires a relatively longer time to insert and extract messages compared to the LSB method. Future research using parallel computing is expected to solve this problem. The resistance of the stego image generated by the DNA Encoding-based Steganography method is relatively low because the message cannot be extracted if an image is manipulated on the resulting stego image. In further research, it can be done to increase the resistance of the resulting stego image to image manipulation.