Causal Relationships of Sexual Dysfunction Factors in Women Using S3C-Latent

Women with cancer are at risk for sexual dysfunction characterized by problems with sexual desire, sexual arousal, lubrication, orgasm, sexual satisfaction, and pain during sexual intercourse. The literature review shows that most studies have focused on correlation analysis between factors, and no studies have attempted to identify a causal relationship between factors of sexual dysfunction. This study aims to determine the causal mechanism between factors of sexual dysfunction in cancer patients using a causal algorithm called the Stablespec Specification Search for Cross-Sectional Data With Latent Variables (S3C-Latent). The causal algorithm has been implemented into the R software package called Stablespec. The computation of the model is done in parallel using the CPU server. The result of this study is that there are a causal relationship and association with a high-reliability score of sexual dysfunction factors. We hope that the causal model obtained can be a scientific reference for doctors and health workers in making decisions so that the quality of life of female cancer patients who experience sexual dysfunction can be improved. Keywords—Sexual Dysfunction, Cancer, Causal modeling, S3C-Latent


INTRODUCTION
Cancer is one of the leading causes of death in the world, ranking 2nd after cardiovascular disease [1]. The World Health Organization (WHO) in the Virginia study [2] stated that the risk of new cancer cases increased by 70% in developing countries. In Indonesia, the Special Region of Yogyakarta (DIY) has the highest cancer prevalence, namely 4.1 per mile [2]. One of the psychological effects of cancer is the change in sexual identity [3]. Sexuality is one of the areas most affected by cancer treatment [4]. Women, especially patients diagnosed with breast cancer and gynecology, will cause sexual dysfunction [4].
Sexual dysfunction is a situation in which there is a sexual disorder in which one's enjoyment decreases or even disappears [5]. WHO for review at Barbagallo et al. [6], it argued that women's sexuality is not only part of quality health but also human rights. A woman with cancer is at risk for sexual dysfunction, which is indicated by problems with sexual desire, sexual arousal, lubrication, orgasm, sexual satisfaction, and pain during sexual intercourse [7]. The sexual disorder affects all ages, from youth to menopause. The percentage of sexual dysfunction in women varies from country to country, 66.2% in Indonesia, 48.3% in Turkey, 72.8% in Ghana, and 63% in Nigeria [8]. Based on the data, we can see that the prevalence of sexual dysfunction in different parts of the world is relatively high.
The study conducted by Lauman et al. [9] found that Indonesia is one of the countries which dies not recognize the importance of sexual problems. For some Indonesians, talking about sexual problems is taboo and confidential [10,11]. Some people still feel embarrassed to discuss dysfunction with their doctor. Sexual dysfunction in a woman is just seen as a matter of under-attention (low) because dysfunction is considered a health issue that does not impact life even if we look deeper, study into sexual disorders is a very important and the effect of the disease. The relationship with the partner and the quality of life of the patient may be affected [12][13][14] Previous studies have been conducted to predict the determinants of sexual dysfunction or analyze the correlation of these factors, example, Chang, Yang, & Chen [15] used Medical Outcomes Study SF-12, Female Sexual Function Index (FSFI), Multidimensional Body-Self Relations Questionnaire-Appearance Scale, the Relationship Assessment Scale, and the Greene Climacteric Scale; Putri [16]; Fahami et al. [17] used a cross-sectional quantitative approach, Pérez-Herrezuelo et al. [18] used the female sexual function index (FSFI) and the menopause rating scale, and Pup et al. [19] used a practical approach.
However, a study focusing on causal relationships (causes and effects) between the factors of sexual dysfunction, particularly in women with cancer, is limited and has not answered one fundamental question: What is the causal mechanism (cause-effect) of the factors in the lives of women with cancer? Understanding the causal mechanism here means trying to understand the important mechanisms behind the causes of sexual dysfunction in women with cancer. The causal model is a description of the fundamental interactions between variables/factors within the context of a problem. Causal modeling is important in many scientific domains [20]. As an example from the field of clinical psychology, the causal model can use as a scientific basis in the development of a therapy. Therefore, in this study, we intend to model the causal relationship between sexual dysfunction factors in women with cancer. The model obtained is expected to be used as a scientific reference for doctors and health workers in making decisions. We hope that this research will improve the quality of life of female cancer patients with sexual dysfunction.

Dataset
In this study, data collected from the previous study [21]. Data collected from 172 female patients with cancer were RSUP Dr. Sardjito Yogyakarta and Prof. Dr. Margono Soekarjo Purwokerto. The data were measured using the Female Sexual Function Index (FSFI) questionnaire. The FSFI questionnaire is a multidimensional questionnaire developed by Rosen et al. [22]. The FSFI questionnaire consists of 6 factors and 19 indicators or question items for the last four weeks consisting of sexual desire factors (F1; 2 items), sexual arousal factor (F2; 4 items), lubrication (F3; 4 items), orgasm (F4; 3 items), sexual pleasure (F5; 3 items) and pain (F6; 3 items). More clearly, Table 1 shows the factors for each item as well as the scale ranges used. The rating scale for the sexual desire factor is 1-5. A score of 1 indicates that the respondent has rarely or never had the urge to have sexual intercourse in the last four weeks, while a score of 5 indicates that they have always or almost always felt the desire or desire to have sexual intercourse. Meanwhile, the rating scale for other factors such as sexual arousal, lubrication, orgasm, sexual satisfaction, and pain is 0-5. A score of 0 for each of the factors indicated that the respondent has not had sexual intercourse in the last four weeks. Meanwhile, a score of 5 for each of the factors means that sexual function or sexual activity is high [22].

Research Stage
In this research, the phases are as follows: literature study from previous studies, preprocessing of data, causal modeling, evaluation, and dissemination. Figure 1 shows the phases of the research that will do.

Figure 1 Research Stage
The stage of research carried out in this study began with conducting a literature study to get a more comprehensive picture of the problem. The literature study in this study is to seek scientific studies related to sexual dysfunction in cancer patients. The information obtained from this literature study is then used as a reference to strengthen the arguments in the study. The output of this stage is the determinants of sexual dysfunction. The second stage of this research is the pre-processing data. Pre-processing data is used to check the dataset to be used, whether it includes noise or noise, inconsistent data, or repeat data. The output at these stages is the data that is ready to be computed.
The third step is causal modeling. At this stage, the method to be used is the Stable Specification Search for Cross-Sectional Data with Latent Variable (S3C-latent) method. Modeling computations are carried out in parallel using a cluster computer from the R package named Stablespec [23]. The Computation is done using the CPU server. CPU server specifications are 40 Cores, 250GB RAM, 4GPU. Jupyter GUI and terminal/console, and Personal environment. Also, the programming language used is R v4.0.2. The output of this stage is a causal model of the factors of sexual dysfunction. The next step is evaluation. At this stage, we will evaluate the causal model obtained by involving experts in this field, such as doctors and health workers.
The final stage in this stage is dissemination. At this stage, we will implement the causal model obtained into the R shiny application.

Causal Modeling with S3C-Latent
S3C-Latent is a development of the S3C methods [24]. S3C is structured to model a causal relationship between observed variables, while S3C-Latent is structured to model the causal relationship between latent variables [24]. Latent variables are variables that cannot be measured directly but can be represented through relevant indicators. Latent variables are also often referred to as factors and the observed variables is often called an indicators [25]. Specifically, S3C uses the Structural Equation Modeling (SEM) representation of latent variables. SEM of latent variables consists of two models, namely structural models and measurement models [24][25][26][27]. The structural model is a model that shows the relationship between factors . The structural models reads (1) Where is a vector of the order in the latent endogenous variable (effect), vector of size on exogenous latent variable (cause), is the coefficient matrix of the order for the endogenous latent variable among , is the coefficient matrix of the order among , and is the error measurement vector of order on η. The measurement model is a model that shows the relationship between factors and indicators. The measurement model reads (2) where the matrix and matrix contain the structure coefficients associating latent variables and indicators, and the vector and vector contain errors on the indicators. Next is the estimation (parameter). These parameters are then converted into a covariance matrix with the implied covariance matrix model [24,25]. is a parameter set that is used to find the best parameter. (3) Where is the covariance matrix of the indicators written as a function of the parameters , is the coverage matrix of indicators and , and is the covariance matrix of the indicator. Figure 2 is an example of SEM with three variables [24,25] ξ1 (with , , and as indicators), ξ2 (with , , and as indicators), and (with , , and as indicators). SEM parameters can only be estimated when the so-called identification conditions are fulfilled. The conditions for SEM with latent variables that we consider are There are at least three or more indicators per latent variable, Each row of and has only one nonzero element, Each latent variable is scaled, and is diagonal [24,25]. The first condition will be added if there are latent variables that have indicators of less than three. If the number of indicators in the latent variable is two, the latent variable must have a causal relationship with other latent variables [25,28]. If the number of indicators in the latent variable is one then the associated indicator error is set to zero [29].  [25]. Maximum likelihood works at first randomly or randomly, then it is updated so that it gets better according to the specified criteria, namely the cost function. The equation for minimizing the maximum likelihood fitting is (4) is the maximum likelihood fit, is the parameter set, is the number of observed variables, and is the sample covariance matrix of the observed variables.

S3C-Latent Procedure
Modeling computations are carried out in parallel using a cluster computer from the R package called stablespec [23]. The stablespec package tries to produce a causal model that best fits the existing dataset. Specifically, S3C-latent models the causal relationship using the latent Structural Equation Modeling (SEM) representation to find the model. S3C-latent evaluates the model based on two criteria, namely and model complexity. The criteria of and the complexity of the model are often conflicting, so to solve this problem, S3C-latent uses a multiobjective approach, which is called Non-dominated Sorting Genetic Algorithm (NSGA-II) [30]. NSGA-II is used to find the Pareto optimal model or the best model. In addition, to handle instability from unlimited data, S3C-latent adopts the concept of stability selection using a subset algorithm . The purpose of using subsets is to obtain a more robust or relevant model of changing data. Each Pareto optimal DAG model is transformed into a CPDAG (Completed Partially Directed Acyclic Graph); [24,30]. Figure 3 give S3C method.

Figure 3 S3C Method [24]
The S3C method consists of two phases, namely the search phase and the visualization phase [31]. The search phase is an iterative process using an inner loop and an outer loop that combines SEM, NSGA-II, and stability selection to find the right model. The visualization phase displays the relevant relationships of the causal model represented in Directed Acyclic Graph (DAG); [24]. for  0,…, -1 do

4:
 subset of with size | |/2 without replacement 5: 1  () 6: for  0,…, -1 do 7: if = 0 then 8:  random DAGs consistent with 9:  fastNonDominatedSort( ) 10: else 11:  crowdingDistanceSort( ) 12: end if 13:  make population from 14:  fastNonDominatedSort(  According to the S3C pseudocode, the inner loop is interpreted on lines 6-16, the outer loop is interpreted on lines 3-18, and the stability graph is interpreted on lines 19-22. The inner loop is used for finding the Pareto front by applying NSGA-II. The inner loop in lines 7-12 begins by randomly generating a population of size N or with the previous population using crowding distance sorting. Model is represented by a binary vector {0,1}. Lines 13 shows that new population from is being manipulated by the binary tournament selection operators, one-point crossover, and one-bit flip mutation according to the corresponding binary representations. The genetic scheme selects N times the model from the population and gets the best model (lowest front or smallest crowding distance) in the mating pool. A onepoint crossover takes two models from and exchanges them after the crossover point random number is in the middle, then reverses each bit with a one-bit flip mutation. and are then combined (line 14) and sort using fast non-dominated sorting that results in a set of F front models. Line 15 updates Pareto Fronts in .
The outer loop pointed at lines 4 uses a subset of data that is without replacement [32]. Data selected randomly in the sample subset and of size | |/2. Lines 6-16 in the inner loop process will run times to get the Pareto front. The results will be stores on (line 17). After the iteration , contains the Pareto front. is the number of inner loop iterations, is the number of outer loop iteration.
The stability graph to be the main outcome of S3C [31], and can be visualized as a graph with nodes and edges. Lines 19-22 convert DAG to CPDAG on the Pareto front on using the consDAG2Cpdag algorithm and then compute edge stability and causal path stability graphs.
Pseudocode of S3C-Latent 1: procedure S3C-Latent(data set , constraint , factor loading Λ) 2: To ensure identification conditions fulfilled: 3: if Λ indicates that any latent has 3 indicators then 4: if the number of indicators = 2 then 5: Set a relation between and one random latent 6: Set one of the factor loading on to 1 7: else 8: Set the factor loading on to 1 9: Set the error on the indicator to 0 10: The S3C-Latent pseudocode is different from the S3C pseudocode. The difference between S3C and S3C-latent is the data. S3C-Latent applies latent variables to get the model so that the model generated by S3C-latent will be different from the model generated by S3C [24].
is a dataset of latent variables, is prior knowledge, Λ is a matrix containing latent variables or factors. Lines 2-13 is the process of ensuring that the conditions described in part 2 (see section 2.3) fulfilled. In more detail, line 3 will check if any latent variable has items less than 3. If there is, then lines 4-12 will perform. Line 4 checks whether the number of items is 2 or 1. If the number indicator is 2, then S3C-Latent will determine the relationship between the latent variable and the latent variable is chosen randomly (line 5) (where can be the causes-effect variable), and fix one of the loading factors to 1 (line 6). If the indicator is 1, then S3C-Latent will set 's loading factor to 1 and error it to 0 (lines 8 and 9). When all latent variables have 3 or more indicators then line 12 can be applied. In this case, the loading factor for each latent variable is set to 1. When all latent variables have 3 or more indicators then line 12 can be applied. In this case, the loading factor for each latent variable is set to 1. Next, line 14 applies S3C to the dataset with the latent variable , satises all conditions and fulfils the model identification conditions in . If there is a problem condition in then it has fullled, then S3C-latent has ensured that all SEM generated and repaired will be consistent with prior knowledge ( ); [24].

RESULTS AND DISCUSSION
The demography characteristics of the respondent are described based on age, marriage, education, profession, type of treatment, duration of pain, income, co-living, and type of therapy. Table 2 shows that in terms of demographics, the majority of respondents in this study were aged 51-60 years (36.6%), where more than half of the population was married (86.0%) and lived with their husbands and children (55.2%). The level of education taken by most of the respondents was elementary school graduated (44.8%), with the profession being dominated by low-income housewives (43.6%) or <1,461,400 (85.5%). The majority of the patient's duration of illness was 3-6 months (32.6%) with the type of treatment and type of therapy that the most respondents did was outpatient (50.6%); surgery and chemotherapy (59.3%). Table 2 Demographic Characteristics of Respondent (n=172) The study steps to obtain the results of the causal modeling shown in Figure 4. The data used is 172, with six variables/factors and 19 indicators. Also, the 19indicators latent variable will do by the Multicollinearity process. Data that has too high a correlation (close to 1; multicollinearity) indicates a problem [33] so, it will be deletion. In this study, there is multicollinearity on FSFI 4; FSFI 8; FSFI 16; FSFI 17; and FSFI 18 so those indicators will be deleted. Based on the deletion of these items, the final result obtained is 14 indicators.
The next step is S3C-latent computation by first doing parameter settings. Setting parameters in S3C-latent include the subset used (S), the number of iterations (I), the number of models evaluated (P), the probability of crossover (C), and the probability of mutation (M) [25].
In this study, the setting parameters we used were S = 150, I = 50, P = 150, C = 0.45, and M = 0.01. All computed data will then be analyzed using the S3C-latent method represented as a stability graph (see Figure 5). Plot the edge stability (blue line) and causal path stability (green line for a causal path of one length and red line for any) between two variables. Figure 5 is an output graph stability that uses plot stability to show a graph between several variables. The X-axis represents the complexity of the model, while the Y-axis represents the selection probability. The horizontal line is the boundary of the cell's selection probability (the threshold argument of the stablespec function), while the vertical line is the limit for the complexity of the model in which the Bayesian Information Criterion (BIC) score is found.
Stability selection determines two thresholds, namely, and .
is the probability selection limit, while is the complexity limit to control overfitting. A model is said to be a causal model that is relevant if it is stable and parsimonious. The stables model is all causal relationships from edge stability and causal path stability that have a value greater than or equal to the , and parsimonious has a value less than or equal to the threshold. We set the threshold for = 0.6, and the threshold value for the complexity of the model is found at the complexity of model 12 so that in the figure, we have a causal relationship between variables and a strong association relationship with the unknown direction. The blue line ( ) shows the existence of an association relationship between variables, the green line ( ) represents the existence of a causal relationship with length 1, while the red line ( ) represents the existence of a causal relationship of any length.  Figure 7 show edge stability and causal path stability with = 0.6 and = 12. Causal path stability only takes into account the causal relationship of any length, while edge stability takes into account all relationships regardless of direction. Figure 6 shows that there are 3 relevant edges on the upper left side, while Figure 7 shows that there are ten variables causal path stability. Furthermore, Figure 8 provides a visualization of the relevant model. Visualization can be obtained through the following steps. First, the nodes are connected according to the relevant edge obtained, the second edge is adjusted according to the added prior knowledge [24]. Based on Figure 8 the causal model consists of two parts. The first section explains the causal relationship between latent variables: sexual desire, sexual arousal, lubrication, orgasm, sexual satisfaction, and pain. These variables are latent (represented as circular vertices) and measured through the observed indicators. Arrows represent the causal relationship that exists, and dotted lines indicate relationships that exist where the direction of the cause and effect is unclear. In addition, the double line arrows from the latent variable to the indicator reflect the effect of the latent variable on the indicator.
Before computation, first, do parameter setting. The setting parameters we use are 150 subsets, 50 iterations, 150 numbers of evaluated models, 0.45 crossover probability, and 0.01 mutation probability. Computing is carried out in parallel using 40 cores with an estimated time in 1 hour. The computation is performed on a server CPU using the R v4.0.2 programming language. The CPU server specifications used are 40 cores, 250GB RAM, 4GPU. Jupyter GUI, terminal/console, and Personal environment. The threshold value , 0.6-0.9 has results that tend to be very similar and stable [25,32]. In this case, the value we use is 0.6. The result of this computation is that there is a strong causal relationship and association with a highreliability score.
Based on Figure 8, sexual satisfaction has a causal relationship with all variables (sexual desire, sexual arousal, lubrication, orgasm, and pain) with reliability score of 1, 0.64, 0.65, 0.6,  [34] The study explains that lack of sexual desire, lack of sexual desire, lack of lubrication, decreased orgasms, and the experience of pain during sex have an effect on sexual satisfaction. Sexual arousal is found to influence the occurrence of lubrication and orgasm with reliability scores of 0.82 and 0.60, respectively. A similar relation was found in the studies conducted by Levin [35], which explains that sexual arousal will lead to lubrication and orgasm [3]. The lubrication variable is found to affect pain, while the level of pain affected sexual desire, and that matches the studies of Falk & Dizon's research [36]; Prastiwi, Niman, & Susilowati [37] explained that pain during intercourse occurs when there is no lubrication and will result in an unwillingness to have sexual intercourse.
In addition to the causal relationship, there is also a strong association between the variables of desire-arousal, desire-orgasm, and arousal-pain with reliability scores of 0.86; 0.80; and 0.80. The association relationship that occurs between the desired and the arousal variable is following the research of Gonz et al. [38]; Holt & Lyness [39], who explained that reduced sexual desire will have an impact on decreased sexual desire. The results of the association of sexual desire with orgasm are in line with a previous study conducted by Hurlbert [40], which suggested that the inability to reach orgasm in women may be partly due to low sexual desire. Furthermore, the result of the association of arousal and pain, defined in a study by Dewitte & Schepers [41], states that sexual arousal is a prerequisite for painless intercourse. Next, we implement the results of the causal modeling into the R Shiny App.

Figure 9 Impement the result into R Shiny
The process is to describe the results menu with the fluid row source code fluidRow(). Fluid row functions to display how many boxes are made in a row so that the inserted image is neat. Fluid row functions to display how many boxes are made sequentially so that the inserted image is neat.

CONCLUSIONS
In many fields like medicine, it is very interesting to model the causal relationship between latent variables and indicators. In this study, we used a new causal method called Stable Specification Search for Cross-sectional Data with Latent Variable (S3C-Latent) method to model the causal relationship between latent variables. The main objective of S3C-Latent is to solve the instability problem inherent in model estimation. To achieve this, the S3C-Latent adopts the concept of stability selection into a multi-objective optimization problem, and together optimizes the entire range of model complexity, resulting in the optimal Pareto model. In this study, we found causal relationship of the variables of sexual desire to sexual satisfaction, sexual arousal to sexual satisfaction, lubrication to sexual satisfaction, orgasm to sexual satisfaction, pain to sexual satisfaction, sexual arousal to lubrication, sexual arousal to orgasm, lubrication to orgasm, lubrication to pain, and pain to sexual desire. Apart from causal relationships, we also found strong associations between sexual desire and sexual arousal, sexual desire and orgasm, and sexual arousal and pain. All of the estimated causal relationships corroborated with those of finding of the previous studies [3], [34][35][36][37][38][39][40][41]. For future work, we suggest to add more demographic variables in the model and see how the causal model would extend.