DEVELOPMENT AND VALIDATION OF INSTRUMENT TO MEASURE THE CHARACTERISTICS OF A HEALTHY ROLE-MODEL IN MEDICAL SCHOOL

Background: A medical teacher as a healthy role-model has a critical role in supporting health promotion effectiveness in medical school. However, an instrument to measure the characteristics of the medical teacher as a healthy role-models is unavailable. This study aimed to develop and validate a questionnaire to evaluate these characteristics by analyzing a model from previous grounded theory. Methods: A total of 442 medical teachers at the Faculty of Medicine, Public Health, and Nursing, Universitas Gadjah Mada, Yogyakarta, were invited to participate. We used hierarchical component models (HCMs) to develop our path model. A partial least squares structural equation modeling (PLS-SEM) were then used to analyze this model. Results: Twenty-six items from seven constructs supports our model. The construct of socially healthy (SH) has the most substantial effect on constructing healthy people’s characteristics (H). The constructs of healthy role-models (HRM) in medical schools are mainly influenced by healthy people characteristic (H). Conclusion: A questionnaire with 26 items grouped in these seven constructs showed good reliability and validity. Seven constructs have relevance to the characteristics of a healthy role-model in the medical school model.


INTRODUCTION
The medical school is one context of health promotion concerned by the World Health Organization (WHO) since 1995. By implementing health promotion, a medical school should provide a safe environment and culture that enables people to increase control and improve their health. 1 The health-promoting University (HPU) is an initiative that can be used by the medical school to fulfill this goal. There were six main focuses of implementing HPU; one is to produce a good role model of health promotion. 1,2 From our previous study, 3 we described it as a healthy role-model. In that study, we conducted a grounded theory to define healthy role-models characteristics in medical school because no theoritical concept was found to explain this construct.
The importance to implement HPU is also strengthened by the clause "I will attend to my own health, well-being, and abilities, in order to provide care of the highest standard" on modern-day physician pledge adopted by the World Medical Association. 4 Faculty staff and students are asked not only healthy for themselves but also expected to act as agents of change in society through the role of healthy role models. 1 In medical school, the medical teacher is a critical component of the educational environment. 5 In many publications, the effectiveness of teacher's role-modeling in teaching professional behaviors to medical students has been proved as an effective learning method. [6][7][8][9][10] This effectiveness might also apply to the development of healthy physician characteristics.
However, the characteristics of the medical teacher as a healthy role model in medical school are difficult to evaluate because of the absence of an instrument to evaluate. Therefore in this study, we develop and validate an instrument to measure these characteristics by using the result of our previous grounded theory.

Study context
This research was conducted at the Faculty of Medicine, Public Health, and Nursing, Universitas Gadjah Mada, Yogyakarta (FK-KMK UGM), Yogyakarta. A total of 546 medical teachers at the FK-KMK UGM as the study population. One hundred four medical teachers were eliminated because some of them were retired or did not have complete contact information (i.e., an electronic mail (e-mail) or phone number) obtained from the Human Resources Division Unit of FK-KMK UGM. A total of 442 medical teachers from 32 departments were invited to complete a questionnaire to measure the characteristics of a healthy role model of medical school in this study.

Development of items
Forty items from previously grounded theory 3 on a healthy role-model in medical schools and one item from a literature review on healthy people's characteristics 11 were included as pool item inventory in this study. All items were then written by using the guidelines on developing questionnaires from Artnino et al. 12 We chose not to follow a recommendation from these guidelines, e.g., by writing the item into statements rather than questions based on our concern that these items draw characteristics that might be irritated to our respondents. We used a four-point Likert scale as responses ranging from not very appropriate to very appropriate. A four-point Likert scale was used to minimize the possibility that a respondent might choose the middle response.

Conducting expert validation
We selected a panel of experts by following these criteria: having a background in the health profession area, working as a teacher in one of health education schools, and having interested in topics related to role-models, role-modeling, or health promotion and education. Twelve experts met our criteria, ten experts from Indonesia, and two experts from overseas. All items were then translated into English. Two experts who have a Health Profession Education background were helped to check all translated items if the item has a different meaning from original Indonesia items. Revisions were then made based on their review.
Six of 12 experts participated in this process. All experts come from Indonesia. To increase the number of experts on this process, we also invited 40 respondents from our previous grounded theory. All of these respondents met the same criteria as the previous experts. Fifteen of 40 respondents fulfilled our invitation. They come from Indonesia, Australia, the Netherlands, the United Kingdom, and Canada. In the final step, we used 21 responses from 6 experts and 15 respondents.
All of their responses were recorded as expert validation. We used a content validity ratio (CVR) by Lawshe. 13 The minimum CVR value is 0.42 (p <0.05) based on 21 expert responses. Fifteen items were eliminated since those items did not meet the minimum CVR value. All remaining items were corrected based on the review from our experts.
In this step, all experts were given 14 days to give their review to our items. We also gave 20 additional days to facilitate the expert who did not send their review yet. All invitations were sent by e-mail or WhatsApp. We used a google form to facilitate our expert validation process. We added a blank box below each item to facilitate the expert's review of that item. We also added a blank box at the end of the review to facilitate the experts in giving their general comments about our items and questionnaire.

Cognitive interview
This step aims to test the questionnaire. It is conducted based on psychological theory that there are four cognitive processes (comprehension, retrieval of appropriate information from long-term memory, judgment based comprehension, and selection of response) that occur when a person responds to a survey study. 14 By doing this process, the researchers can assess if the respondent does not have a problem in interpreting each item correctly. 15,16 The review that obtained from this process can be used to identify misinterpretations of each item or the response's scale which made by respondent. 16 There are two techniques commonly used in conducting cognitive interviews, i.e., think-aloud technique and verbal probing. In this study, we used immediate retrospective probing, a type of verbal probing technique, which developed by Watt., 17 which has "a breaking point" in the questionnaire. When the respondent reaches that breaking point, the interviewer gives several questions that relate to the items. This method allows the interviewer to explore the insight of respondents without disturbing their concentration in answering the items. This method can also minimize recall bias and hindsight effects.
In general, cognitive interviews are conducted faceto-face interviews. However, in this study, we used phone communication because of the different locations between the interviewer and respondents. We used six exploring questions adapted from Willis and Artino 18 in this step: 1) What do you think when you are asked about (item)?; 2) What does the word (a word in the item) mean to you? 3) What information do you need so you know that that item appropriate to you?; 4) Are you sure about the answer you gave for the item?; 5) Do you find it difficult to answer (item)?; and, 6) Are the scale response is appropriate to the item? These six questions represent the four cognitive processes according to psychological theory. Each item was then further explored with 'What' and 'Why' words. We made 12 sections for our cognitive interview. Each section consists of one to six items that correlated with each other.
We conducted eight cognitive interview sessions. All of our respondents were medical teachers from medical schools in Indonesia. Six medical teachers work at the medical schools in the West region of Indonesia and two medical teachers from medical schools in the East region of Indonesia. We conducted an item revision based on our respondent review and used in the next cognitive review session with the different respondent. We also asked our next respondent to comment on the item before and after it was revised. Each cognitive interview lasts for approximately 25 to 80 minutes.
In this process, we also checked the readability of our questionnaire. We asked our respondents to write the amount of time they used to complete the questionnaire. An average time to complete the questionnaire was 30 minutes. The overall form of the questionnaire was also evaluated. We then made corrections to our questionnaire based on their reviews. We developed 35 items from the 26 remaining items based on the review of our respondents. One of the 35 items, i.e., PH3 ("I can carry out my daily activity without limitation caused by suffering a certain disease") in the PH construct, is a conditional item. This item is responded if the respondent chooses a specific response in the previous question. We consulted this item development and revision with the two experts in medical education area. They help us to evaluate if the 35 items have the same meaning as 26 items before. They also help to confirm if the questionnaire is ready to use in the data collection.

Data collection
We sent an invitation letter to medical teachers in each department after the head of a departmentapproved it. We followed a particular policy of each department when sent this questionnaire link. This data collection was conducted from January 2019 to January 2020.

Model analysis
Our model is a higher-order model or hierarchical component models (HCMs) in the context of PLS-SEM, where the lower-order component is reflective. This model has one second-order construct, three exogenous latent variables, and one endogenous latent variable, with a total of 35 items. A secondorder construct is a healthy person (H). It has four first-order constructs, i.e., physically healthy (PH) with three reflective items, mentally healthy (MH) with 12 reflective items, socially healthy (SH) with seven reflective items, spiritually healthy (SpH) with two reflective items. The exogenous latent variables, i.e., internalized healthy behaviors (IHB) has six reflective items, willingness to promote healthy lifestyles (WTP) has two reflective items, and lifelong learner (LLL) has three reflective items. The endogenous latent variable is a healthy role model (HRM). We use higher-order models (HCM) because its advantages: helps to reduce the number of path model relationships, thus providing a more concise model to describe the relationship between several independent variables (first-order construct, secondorder construct, and exogenous latent variables) and dependent variable (endogenous latent variable) in the model path; minimizing collinearity issues caused by strong correlations between lower-order components (first and second-order construct) in this model that can interfere discriminant validity. 19 Figure 1 shows this path model.
We used the Partial Least Squares Structural Equation Modeling (PLS-SEM) to analyze our model. PLS-SEM is an analytical technique that prioritizes repetition; thus, it maximizes the variance that occurs in endogenous latent variables. 20 The researchers who used PLS-SEM in the model analysis is intended for exploratory research. A PLS-SEM is suitable for exploratory research, especially to explain the relationship of variables in the model. 21,22 In this study, we used PLS-SEM to explore the characteristics of a healthy role-model in the medical school model that is not explained yet in anywhere topics of publication. Therefore, no literature could be used to explain the relationship between the construct, which is related to this model. We used a PLS-SEM Professional Ver. 3.3.0 in this study. We carried out model analysis after obtaining data from the survey.
The test of data normality should be performed first before conducting the model analysis. Data has a normal distribution if skewness and kurtosis are between values -1 and +1. 23 To analyze the model, the researcher must conduct the assessment of the measurement model and the structural model. Because this model is a higher-order model or hierarchical component models (HCMs) in the PLS-SEM context, the assessment of measurement model (the relationship between items and constructs) for higher-order components must be carried out before the assessment of the structural model is conducted. The assessment of the measurement model helps researchers to evaluate how well and appropriate their developed model with the reality in the field (based on data obtained from data collection). Because the lower-order constructs are reflective (reflective measurement model), we considered evaluating some criteria. First, internal consistency reliability by evaluating Cronbach's alpha and composite reliability value. The value of at least 0.6 of it is considered to be quite reliable, especially for exploratory research. Second, convergent validity is assessed by evaluating the outer loading value of the item and the average variance extracted (AVE) value of the construct. The outer loading of the item should be greater than 0.708, and the value of AVE In assessing the structural model, some values need to be considered, i.e., construct's tolerance (VIF), coefficient of determination (R 2 ), the effect size (f 2 ), and predictive relevance (Q 2 ) value. A VIF value should be higher than 0.2 but lower than 5.0, or the construct is considered to be eliminated or combined with other constructs to treat collinearity problems (the independent variable explains the same variance as the dependent variable). R 2 represents the amount of variance in the endogenous latent variable (dependent variable) that can be explained by all the independent variables linked to it. It stated that R 2 = 0.75 is large, R 2 = 0.5 is medium, and R 2 = 0.25 is small. The value of f 2 represents the amount of contribution of the independent variable to the dependent variable. The value of f 2 = 0.35 is large, f 2 = 0.15 is medium, and f 2 = 0.02 is small. Q 2 value, which higher than zero, indicates that the model is fitted with the data in the field. It explains that item can represent their latent variable, and all exogenous latent variables represent the endogenous latent variable. It also explains that the exogenous latent variables have a predictive relevance to the endogenous latent variable they represented. Therefore, the model fits with all of its exogenous latent variables.

RESULTS AND DISCUSSION
Seventy-nine of 442 medical teachers at the FK-KMK UGM participated in this study. Even though this study's response rate is relatively low (17.87%), it met the minimum sample size of PLS-SEM to analyze this model.

Data distribution
A conditional item (PH3) was excluded from test normality because it could not be assessed due to missing values issues. The skewness of 15 items and kurtosis of 19 items in this model were not in a normal distribution. Skewness and kurtosis also provided a different data distribution on several items. However, we continued to analyze this model since there is no requirement that data must be normally distributed when conducting PLS-SEM.

Assessing the measurement model Internal consistency reliability
The Cronbach's alpha of all constructs is satisfactory, except the PH construct that has a value below 0.6. In this study, we did not exclude the PH construct, which did not meet the minimum Cronbach's alpha value for exploratory research. We prefer to consider the composite reliability instead of Cronbach's alpha value for this construct due to the limitation of Cronbach's alpha that generally tends to underestimate the internal consistency reliability because it is sensitive to the number of items the scale. We also considered that PLS-SEM prioritizes the reliability of each item and not all items in the scale were assessed using Cronbach's alpha. 20 All constructs have composite reliability values above 0.6. All constructs did not have Cronbach's alpha value above 0.95. It showed that no items measure the same thing in this model.

Convergent validity
Two items (PH1 and IHB1) have an outer loading value below 0.4. Therefore, these two items were eliminated. An item with very low outer loading (below 0.4) should, however, always be eliminated from the construct. 23 Seven items in two constructs (MH3, MH4, MH5, MH10, MH11, MH12, and IHB2), which have weaker outer loading (below 0.708), were also eliminated. We eliminated those items to increase the AVE value of two constructs; thus, they met the minimum value. However, the researchers need to be careful when eliminating items with outer loading value below 0.708 and examine the effect of item removal on the composite reliability, especially on a newly developed construct. The elimination of items should be considered only when deleting the indicator leads to an increase in composite reliability. 24 In this study, the composite reliability of these two constructs was also increased when those items were eliminated. The elimination of those items not affected the content validity of those constructs since our model is a reflective measurement model. Table  1 shows the outer loading values of 26 items in the model. Eight items (MH6, MH8, MH9, SH1, SH3, SH4, IHB4, and IHB5) still had outer loading values below 0.708, but we did not eliminate all of these items since AVE value of their construct has above 0.5. From this result, we confirmed that the model had achieved an excellent convergent validity.  SpH2  IHB3  IHB4  IHB5  IHB6  WTP1  WTP2  LLL1  LLL2  WTP2 (I participate actively in health programs that conducting in the environment)

Discriminant validity
All items have the highest cross-loading value in its construct. The HTMT values for all constructs are below 1. Based on these two criteria, a model has proven a good discriminant validity. When using PLS-SEM, the researcher should consider the HTMT values to complement the cross-loading of the item. The HTMT approach would estimate the real correlation between the two constructs if they were perfectly reliable. 25 Table 2 summarizes the assessment of the measurement model. From this result, we stated that the model with 26 items proved to have excellent reliability and validity. Therefore, we could continue to assess the structural model. a modification of the repeated indicator approach, in treating this problem. However, the results of the measurement of R 2 in the HRM construct were still close to 1(0.991) after we used the extended repeated indicator approach. Therefore, we also use a twostage approach to complement the assessment results of the extended repeated indicator approach. This approach was used by several considerations: 1) the number of items in the lower-order component is not equal; thus it could lead bias in loading value of lower-order component when we used the repeated indicator approach; 27,28 and, 2) because the lowerorder construct has explained all the variances in the higher-order construct (R 2 = 1); thus it is recommended to use the two-stage approach. 27 We ran bootstrapping on a model with 5000 subsamples (no sign changes) to assess the significance and relevance of the relationship between first-order components and second-order components. Table 3 shows the results of the structural model assessment by using these two approaches. In this study, we consider the limitation of our study: 1) sample size is quite small that might interfere the data distribution, Cronbach's alpha, and produce a weak outer loading value of some items; 2) three constructs (PH, SpH, WTP) in this model is considered inadequate since it has less than three items. Therefore, further study is needed to conduct more sample sizes; thus, it can increase the number of studies in order to provide more evidence to validate this model.

CONCLUSION
All seven constructs, i.e., physically, socially, mentally, spiritually health, internalized healthy behavior, and willingness to promote healthy lifestyles, a life-long learner fit the characteristics of a healthy role-model in the medical school model. The 26-items developed questionnaire, grouped into seven scales, showed excellent reliability and validity.