Hashtag Analysis of Indonesian COVID-19 Tweets Using Social Network Analysis

Social media has become more critical for people to communicate about the pandemic of COVID-19. In social media, hashtags are social annotations which often used to denote message content. It serves as an intuitive and flexible tool for making huge collections of posts searchable on Twitter. Through practices of hashtagging, user representations of a given post also become connected. This study aimed to analyze the hashtag of Indonesian COVID-19 Tweets using Social Network Analysis (SNA). We used SNA techniques to visualize network models and measure some centrality to find the most influential hashtag in the network. We collected and analyzed 500.000 public tweets from Twitter based on COVID-19 keywords. Based on the centrality measurement result, the hashtag #corona is a hashtag with the most connection with other hashtags. The hashtag #COVID19 is the hashtag that is most closely related to all other hashtags. The hashtag #corona is the hashtag that most acts as a bridge that can control the flow of information related to COVID-19. The hashtag #coronavirus is the most important of hashtags based on their link. Our study also found that the hashtag #covid19 and #wabah have a substantial relationship with religious-related hashtags based on network visualization. Keywords—COVID-19, Twitter, Social Network Analysis, SNA, Hashtag ◼ ISSN (print): 1978-1520, ISSN (online): 2460-7258 IJCCS Vol. 15, No. 3, July 2021 : 275 – 284 276


INTRODUCTION
Social media has been growing to become an essential part of people's lives in recent years. Several big social media platforms like Facebook, Instagram and Twitter have become more critical for people to communicate almost of everything with no exception to pandemic of COVID-19 related issues [1]. Those make how much information we receive through this platform influences how we perceive and deal with the current COVID-19 epidemic. Even before the outbreak, the public and scientists used social media as an essential information source. The efficient use of social media also affects the character and communication way of modern society [2]. One kind of social media that is commonly used today is Twitter [3]. Previous research has noted the potential for Twitter to provide real-time content analysis so that public health authorities can quickly respond to anxieties grown by the public [4].
Hashtags, which are social annotations used to denote message content, serve as an intuitive and flexible tool for making huge collections of posts searchable on Twitter. The inherent ability of hashtags to bypass the network structure's boundaries during the communication process gives the easy dissemination of information outside the user's network. Hashtags take a double role in the communication process, functioning both as metadata for archiving purposes and information retrieval on social media platforms [5], such as Twitter. Hashtags written with the symbol # are even used on other social media platforms to index keywords or topics [6], [7].
Several previous studies have carried out hashtag analysis on social media for various purposes. Xing et al., in 2016 [8], analyzed hashtags to find sub-shows on Twitter. They model the relationship between the hashtag and the tweet's topic and highlight the hashtag's role as a semantic representation of the appropriate tweet. In 2016 [9], Yılmaz and Hero studied to detect a multimodal event in Twitter hashtag networks. They introduced a new unsupervised event detection approach to Twitter.
Through practices of hashtagging, user representations of a given post also become connected [10]. Simultaneously, we can define the relationship between hashtags as an organization within a topic area, especially related to the COVID-19 outbreak. In this study, we aimed to analyze the hashtag of Indonesian COVID-19 Tweets using Social Network Analysis (SNA). We used SNA techniques to visualize network models and measure some centrality to find the most influential hashtag in the network.
Several previous studies have applied SNA for various purposes. Iswandhani and Muhajir, in 2017 [11], used SNA to identify the central popularity of tourist destinations in an Instagram account. In 2017 [12], Setatama and Tricahyono implemented Social Network Analysis to analyze the most influential actors in the spread of the country branding "Wonderful Indonesia." Tahalea and Azhari, in 2019 [13], used SNA to identify the central actor of crimes done by several people using five centrality measurements, such as degree, betweenness, closeness, and eigenvector centrality. This study showed 80.39% accuracy from 102 criminal cases gathered with at least three actors involved in each case.
In the other study, Hung et al., in 2020 [14], used SNA to determine the social network of dominant topics related to COVID-19 on Twitter. The study successfully identified five dominant issues related to COVID-19. They are social change, health care environment, business economy, emotional support, and psychological stress. In 2020 [15], Tahalea used SNA to analyze the relationships between heroes and find out the roles of heroes in DOTA2 online games. He used the measurement of degree centrality to see the popularity of a hero, betweenness centrality to see the role as a liaison, and closeness centrality to see a hero's closeness to other heroes. The social network analysis process in this study consists of three stages. They are data extraction and preprocessing, building a network model, and measuring centrality value.

Data Extraction and Preprocessing
This study implemented a web scraping technique to extract tweet data related to COVID-19 from Twitter web-based. Online data extraction refers to the routine extraction of data from a web data source that can evolve [16]. After we collected the tweet data, the next step is preprocessing. In this step, we pulled only the hashtag from every tweet data.

Network Model
The second stage of this experiment is builds a network model. In the network model hashtags are represented as nodes, and the connections between them are represented as edges. When a user publishes more than one hashtags in a single tweet, the connection between hashtag happens. For example, if user post a tweet contains hashtag #Covid19, #wabah, and #lockdown, then there will be a connection between #Covid10 and #wabah, #Covid19 and lockdown, and also #wabah and #lockdown. Figure 1 shows the network model representation of this case. Figure 1 The network model representation

Centrality Measure
This study used five centrality measurements, such as degree, betweenness, closeness, and eigenvector centrality, to analyze the Indonesian COVID-19 Tweets' hashtag. In some fields such as psychological networks [17], social networks [18], criminal networks [13], and even trust networks [19], these centrality measures can determine the network's central or highly influential nodes.

Degree Centrality
The degree of centrality is determined by the total quantity of direct connections to other nodes in a network graph [20]. The node degree centrality is a key parameter representing the community centrality in networks [21]. We used degree centrality to identify the most connected hashtags in a tweet that can be measured using equation (1), where hashtags represent as i and total nodes (hashtags) in the network represent as n.

Betweenness Centrality
Betweenness centrality is to measure the role of a node as a mediator in the network. We use this centrality to show how significant a hashtag is to act as a bridge in the network, which can be measured using equation (2). In this case, the hashtags represent as , the number of shortest paths from actor to actor shown as , , and the number of shortest paths from actor to actor through actor shown as , ( ).

Closeness Centrality
Closeness centrality measures one node to the other nodes' sum distances by measuring the average distance of a node from other nodes in the network [22]. We use this centrality to show the closeness of the connection between hashtags, which can be measured using equation (3). ( ) is the closeness centrality of node i, and is the shortest path from node i to node j.

Eigenvector Centrality
The eigenvector centrality measures the number of connections of a given node and its relevance in information movement [23]. We used eigenvector centrality to shows the importance of hashtags based on their link, which can be measured using equation (4). The hashtags represent as i, constant represents as λ, and ai,j is shown adjacency matrix of the network.

RESULTS AND DISCUSSION
In this section, we discussed the results of the calculations. For the first time, we did data profiling from the data extraction. After that, we found the most influential hashtags that play an essential role in disseminating COVID-19 information on Twitter using SNA.

Data Profiling
We have collected 500000 public tweets from Twitter based on COVID-19 keywords. We gathered from March to May 2020, i.e., three months. We selected those months since President Joko Widodo reported the first confirmed two cases of COVID-19 infection in March 2020, in Indonesia [24]. Then, the data collection was closed in May 2020, when this research analysis begins.

Centrality Measure
At this stage, we will discuss the centrality measure of Twitter data about COVID-19. Centrality measures that are calculated include degree centrality, closeness centrality, betweenness centrality, and eigenvector centrality. Degree centrality is a simple measure of centrality which calculates how many connections or neighbors a node has [25]. The hashtag regarding COVID-19, which has the most number of neighbors, can be seen in Table 1. Based on Table 1, it is known that people often use ten hashtags in uploading tweets on Twitter. These ten hashtags are important because they have the highest number of connections to other hashtags. The 10 hashtags include #corona, #Corona, #VirusCorona, # COVID19, #coronavirus, #dirumahaja, # covid19, #RememberingKhilafah, # Covid19, and #viruscorona. The use of lowercase and capital letters in a hashtag is a sensitive matter because the hashtag #corona with a lowercase and hashtag #Corona with capital is considered different.
The next discussion is about closeness centrality. Closeness centrality indicates how close a hashtag is to all other hashtags in the network. The closeness centrality data of hashtags can be seen in Table 2. The 10 hashtags that have the most closeness to all hashtags in the network are # COVID19, #dirumahaja, #Corona, #coronavirus, #corona, #VirusCorona, # Covid_19, # Covid19, #Indonesia , and #COVID. To get the maximum speed of information flow related to COVID-19 data, you can use these hashtags. Betweenness centrality is used to show how often other nodes pass a node to go to a particular node in the network. This value serves to determine the role of the hashtag as a bridge connecting interactions in the network. The hashtag data that has the highest betweenness centrality value can be seen in Table 3. The 10 hashtags are #corona, #VirusCorona, #dirumahaja, # COVID19, #Corona, #coronavirus, #Corona, # covid19, #dirumahaja, and # Covid19. These hashtags are located in communication channels and can control the flow of information related to COVID-19 data on Twitter.  Eigenvector Centrality is used to measure a node's importance by considering the importance of its neighbors [26]. The hashtag data that has the highest eigenvector centrality value can be seen in Table 4. The 10 hashtags are #coronavirus, #corona, #Corona, # COVID19, #dirumahaja, #VirusCorona, # covid19, # Covid_19, #virus, and #COVID.  In this section, the data visualization of the information dissemination network interaction on COVID-19 is carried out on the Twitter social networking site. The visualization of social network analysis can be seen in Figure 2. Visualization of the interaction network is made up of 12,906 nodes and 50,349 edges. From the visualization of the network of interactions, it is known that the network interaction patterns of information dissemination related to COVID-19 are strongly influenced by the hashtags #corona, #Corona, #VirusCorona, # COVID19, #coronavirus, and #dirumahaja. The central hashtag for the spread of COVID-19 information is the hashtag #corona. In our experiment we found some interesting fact that the the network interaction with the dissemination of information about COVID-19 on the social networking site Twitter has strong correlation with the hashtag form discussions related to religious issues. This can be seen in Figure 3, and in detail, can be seen in Figure 4. Based on the network visualization, it can be seen that the hashtag #covid19 and #wabah have a substantial relationship with religious hashtags. These hashtags include #islam, #muslim, #tauhid, #nikah, #kajianislam, #sunnah, # hijrah, #dakwah, and #ramadhan.

Figure 3 The network model representation
We reveal this network structure as represented in figure 4 by configuring the node size and network layout based on eigenvector centrality. Refering to the phylosophy of eigenvector calculation that the nodes which connect to popular nodes will becoming popular, we can assume that religious post on Twitter take the benefits of #covid19 hashtag popularity to increase the visibility of their posts. This is the common practice found in social media marketing.
As we knew, Social Network Analysis is usually used to determine the influence of actors in social media as in studies [27], [28] and most previous studies. According to this study result, we could know that Social Network Analysis also can be used to determine the influence of hashtags in social media. Moreover, some studies used Social Network Analysis based on a particular hashtag, but it was only used to define the topic as in studies [29], [30]. Eventually, the Social Network Analysis is used to determine the influence of actors. In this study, we used Social Network Analysis to determine the influence of hashtags on the other hashtags or the connection between hashtags as in the study [31]. However, the study was implemented on the Instagram platform, whereas this study was implemented on the Twitter platform. With the approach defined in this study, we can understand which hashtag has an essential role in disseminating information on Twitter. Therefore, we can control the stream of information dissemination, mainly if it negatively influences news. This paper's main goal was to analyze the hashtag of Indonesian COVID-19 Tweets using Social Network Analysis (SNA). We collected and analyzed 500.000 public tweets from Twitter based on COVID-19 keywords. We used SNA techniques to visualize network models and measure some centrality to find the most influential hashtag in the network. Based on the centrality measurement result, degree, closeness, betweenness, and eigenvector centrality, we got ten hashtags with the highest score. The hashtag #corona is a hashtag that has the most connection with other hashtags. The hashtag #COVID19 is the hashtag that is most closely related to all other hashtags. The hashtag #corona is the hashtag that most acts as a bridge that connects hashtag to the network. Therefore, the hashtag can control the flow of information related to COVID-19. The hashtag #coronavirus is the most important of hashtags based on their link. Visualization of the interaction network of the hashtags is made up of 12,906 nodes and 50,349 edges. Based on the network visualization, the hashtag #covid19 and #wabah have a substantial relationship with religious hashtags such as #islam, #dakwah, #ramadhan. Hence, there is an interesting topics which can be further explored on how the popular hashtag like #covid19 was being used to increase the popularity of another topic which actually unrelated.