MIPA

Students' critical thinking skills are very important abilities in the 21st century. To measure critical thinking skills, a valid and reliable instrument is needed. This article uses the Rasch model to construct an instrument for critical thinking skills on number pattern material whose learning uses STEM PjBL. The development method uses the analysis, design, development, implementation, and evaluation (ADDIE) stages. Data processing uses the Rasch model assisted by the Winstep program. The instrument was tested on 33 public junior high school students in Semarang. The research results include validity, namely the validity test by experts with an average percentage of 92.4% in the very valid category, the quality of the questions in the instrument reliability aspect is good, and the average critical thinking ability of students who use STEM-based and Project Based instruments Learning was 61.46, higher than other classes at 49.79. The critical thinking test instrument is based on STEM and project needs to be further developed on other materials.


INTRODUCTION
Critical thinking skills are indispensable in 21st century learning (Basri , 2019;Lamb, 2017;Cahyono et al., 2021;Changwong, 2018).Critical thinking is a reflective thinking ability that focuses on patterns of decision making about what to believe, what to do, and what can be proven true.(Ennis, 2011;Cahyono, 2019).Critical thinking abilities can be identified through their main elements, namely; Focus, Reason, Inference, Situation, Clarity, and Overview (FRISCO) ( Ennis, 1996;Cahyono, 2017) .In the 21st century learning process, instruments are needed to measure critical thinking skills that are valid, reliable and innovative.
Instruments to measure critical thinking skills based on STEM (Science Technology Engineering and Mathematics) and projects based on open-ended problems are the right alternative choices.According to Sarlin's opinion (Sarlin et al., 2022), STEM is an approach that focuses on training students to engage in critical thinking, inquiry, problemsolving, collaboration, and engineering as design thinking.Instruments with the STEM approach are able to facilitate students in developing problem-solving skills, critical thinking skills, and creative thinking skills that support change in the 21st century (Sarican & Akgunduz, 2018;Saxton et al., 2014).Instruments with a STEM Approach can help students develop critical thinking skills, problem-solving abilities, and collaboration (Mufida et al., 2020).Question instruments with a STEM approach combined with project-style assignments can improve critical thinking skills.(Laboy-Rush, 2011;Fitriyah & Ramadani, 2021;Allanta & Puspita, 2021).An open-ended question instrument to measure critical thinking abilities has a positive impact on student's critical thinking abilities (Cahyono, 2021;Siswono, 2018).
The instrument is declared good if the instrument can function as a measuring tool for a variable that meets academic requirements (valid and reliable) and can measure what you want to measure accurately.In accordance with the opinion of Djaali (2000) who defines an instrument as a tool that meets academic requirements and is used as a tool to measure an object or collect data from a variable.Nurkancana (1992: 141) states that a measuring device can be said to be a valid measuring device if the measuring device can measure what is intended to be measured precisely.Instruments that are suitable for use are instruments that meet the validity and reliability criteria so that they can carry out the measuring function correctly or provide measurement results that are in accordance with the measurement objectives (Sugiyono, 2014;Marfu'i et al., 2019 ).The instrument meets the criteria of validity and reliability if the research test instrument measures what it is supposed to measure (Wright, B. D., & Stone, 1979;Moleong, 2006).Analysis of the validity and reliability of research instruments can be carried out using the CTT (classical test theory) or the Rasch model (Rakkapao, 2022;Polat, 2022).
Research instrument analysis using the Rasch model has several advantages and can overcome various limitations of classical test theory or CTT (classical test theory).
According to expert opinion, the limitations of the validity and reliability analysis of instruments with classical theory tests can be overcome with the Rasch model (Muntazhimah, 2020).The Rasch model is a solution to the validity problem where the Rasch model provides useful statistics and offers a tremendous opportunity to investigate validity (Bond & Fox, 2007).In addition, the application of the Rasch model in a study will be able to facilitate and produce more efficient, reliable, and valid measurements in addition to increasing user convenience.The Rasch model can produce reliable and valid instruments (Guat & Hamidon, 2022;Setyawati, 2018).A study to identify the validity and reliability of the instrument is very important to maintain the accuracy of the instrument (Ariffin et al., 2010).The use of the Rasch model is the development of a measuring tool in the social sciences in response to various paradigm weaknesses from classical test theory and the Rasch model can be used as a method for returning data according to its natural conditions (Sumintono, Bambang & Widhiarso, 2015).
Research in the last ten decades states that the development of critical thinking ability test instruments is still very much needed and analysis of the validity and reliability of the device using the Rasch model is a recommended alternative.In accordance with Faradillah's research (2021) states that instruments for measuring the critical thinking skills of prospective mathematics teachers which are examined through indicators of open-mindedness, curiosity, systematics, truth-seeking, analytics, and self-confidence are declared valid and reliable after being analyzed with the help of the Rasch model using software Winstep and Confirmatory Factor Analysis (CFA) using JASP.The validity and reliability of test instruments need to be done to ensure that the instrument measures what should be measured and can draw good conclusions from trials on research samples (Bernardi & Pazinato, 2022;Wijayanti, 2019).Development of critical thinking skills at the secondary school level through indicators of (1) interpretation, (2) analysis, (3) evaluation, (4) inference, and (5) explanation and analysis through the Rasch model results in the conclusion that all questions developed are valid and reliable (Harjo, 2019).Rosnawati (2015) developed an instrument for critical thinking skills in learning mathematics in junior high schools which was analyzed through expert validity and classical test theory or CTT (classical test theory).Sumarni and Kadarwati (2018) states that 15 critical thinking skills questions developed through a contextual problem approach to chemistry and analyzed through expert validity and classical test theory or CTT (classical test theory) are declared valid and reliable.Facione (1990) developed the California Critical Thinking Skill Test (CCTST) which is an instrument to measure the critical thinking skills of nursing students.Based on the analysis of the research conducted, no research has been found regarding the development of instruments to measure critical thinking skills in the preparation of questions using the STEM approach and project assignments.So this study aims to develop test instruments that are used to measure critical thinking skills with the STEM approach and project assignments that are valid and reliable.

RESEARCH METHODS
The assessment instruments were developed based on the Research and Development  (Linacre, 2006).The data was collected by using a test that was constructed by the researcher which contained 15 questions.The data analysis technique used RASCH modeling analysis.This research identifies several things which include; 1) reliability, 2) validity, 3) analysis of the difficulty level of the questions, 4) the distribution of items, and 5) a description of the projection of respondents' answers in the scalogram.These five things are sufficient to describe the measuring power of the instrument on students' critical thinking skills.In addition, this analysis will show the reliability of respondents in taking tests which will later be used as data to measure students' critical thinking skills.(Boone et al., 2014) suggest the following criteria to check whether an item is fit standardized value) is 0.0.This analysis can also identify questions that are too easy or too difficult and respondents who are outliers.The three criteria must be met by the item, so that the item is considered fit.In Perspective, Rasch sees that the item difficulty level and the testee's ability level are on the same scale, namely the logit scale (moving between -4 to 4).A rough idea of the logit number is the Testee (ability level).This taste will show low ability (-4.00 to -2.00), medium ability (-1.99.00 to 1.99), and high ability (2.00 to 4.00).The items indicating the level of difficulty (measures) are the easy items are in the MEASURE value of -4.00 to -2.00), the medium items are in the MEASURE value of In this study the data obtained by processing and analyzing using descriptive statistical techniques.This analysis includes the average, standard deviation, maximum value and minimum value.The difference in the average value of critical thinking skills is used independent t-test.

RESULTS AND DISCUSSION
The results of product validation carried out by two experts obtained an average of 92.4% with a very good category so that the category can be used with slight revision.
The instrument used is expected to be able to measure what it should measure (critical thinking ability) after carrying out minor revisions according to suggestions given by experts and being declared valid.In accordance with the opinion of Suyatman et al. (2021) which states that valid instruments can be used to measure what should be measured.The Measure shows that the item difficulty level and the testee's ability level are on the same scale, which is meeting the logit scale, the value is between -4 to 4. From the test questions, it shows a moderate level of difficulty.No one goes on a high or low 103 level of difficulty.The usefulness of the item difficulty level has two things, namely usefulness for teachers, testing, and teaching (Hadzhikoleva et al., 2019).Its uses for teachers are (1) as an introduction to the concept of relearning and providing input to students about learning outcomes, and (2) obtaining information about curriculum emphasis or reviewing the usual items (Rangkuti, 2011).Figure 1 shows that the level of difficulty of the test questions is in the medium category seen in "Measure".From Figure 2 show that the value of Cronbach's Alpha (KR-20) is the reliability coefficient obtained based on the classical test theory approach.This value is the 104 interaction between people and goods as a whole.The alpha value is 0.84.This shows that the reliability of the test in general is very satisfactory (Reliable).The person reliability value is 0.81 and the item reliability value is 0.94.This shows that the consistency of the answers from the subjects is good and the quality of the items on the reliability aspect of the instrument is good.Reliability is the extent to which a research test instrument can be expected to obtain consistent results when the test is repeated.
One of the features of Rasch's analysis with Winstep is that there is a map that depicts the distribution of the subject's ability and the distribution of item difficulty with the same scale.This map is called the Wright Map which is nothing but a person-item map.This shows that the average individual ability is slightly higher than the item difficulty level.
Therefore, this study will look further at one or two items that have a low index of difficulty to be changed and adjusted so that the instrument items will be a little more difficult for research respondents to answer.The student with the lowest ability is P11, with a logit value of more than -1, which also indicates a very low ability (outlier), because it is outside the boundary of T. Wright's map on the right explains the distribution of the logit value of the item, number 4 (4d) is the problem with the highest level of difficulty (+1 logit), which means the probability that all students do this problem correctly.very small.As for question 1a is the question 105 with the lowest logit value (close to -1 logit).In this case, the more students are able to work on this question I correctly.Problems 1b and 1d have the same level of difficulty because the logit value is the same, as in questions 1c and 2b.The average logit value of the item is always set in logit 0.0 which indicates the initial reference point of the scale.
From the Wright Map, the average logit person was found to be -1 logit (below 0.0 logit).This shows that the average ability of students is below the average level of difficulty of standard questions.Wright's map also shows that 2 students had logit scores below -1 logit, reflecting that more than half of the sample had low critical thinking skills.
The ability of the test instrument is designed to differentiate critical thinking skills between students who practice critical thinking skills who receive the STEM-PjBL learning process and those who do not receive treatment.Test data on students who are normally distributed and homogeneous, then on the different hypothesis test the average post-test results are obtained as follows: From Table 3, information is obtained that Zcount is greater than Ztable, so it can be stated that the average critical thinking ability of the experimental class which uses a project, STEM, and an open-ended approach is better than the control class.The experimental average was 61.46 which showed that students' critical thinking skills were still low.In accordance with research by Rahmawati (2019) revealed that critical thinking and creative thinking skills can be improved through project-based learning with a STEAM approach integrated with chemistry concepts.Novitasari (2022) in his research revealed that worksheets based on ethnomathematics and the STEM approach are effective in improving students' critical thinking skills.Retnowati (2020) revealed that the development of rectangular modules with a STEM approach was effective in improving the critical thinking skills of students in the medium category.Yulianti et al., (2021) and Awad (2023) The results of his research stated kemampuan berpikir siswa dapat dikembangkan melalui penerapan model pembelajaran dengan pendekatan STEM.

CONCLUSION
Rasch model analysis on the measure that shows the level of difficulty of the questions has a moderate level of difficulty.At PT-Measure Corr.indicates that the discriminating power of the grains has a high discrepancy.Furthermore, checking whether an item is appropriate (item fit) or not fit (outlier or misfit) is shown in the MNSQ OUTFIT value, ZSTD OUTFIT value, PT MEASURE CORR value.From the questions tested, items 1b, 1c, 1d, 2a, 2b, 2c, 2d, 3b, 3c, 3d met the fit items.
Cronbach's Alpha value (KR-20) is the reliability coefficient obtained based on the classical test theory approach.This value is the interaction between the person and the item as a whole.Alpha value is 0.72.This shows that the reliability of the test in general is very satisfactory (Reliable).The value of person reliability is 0.76 and the value of item reliability is 0.84.This shows that the consistency of the answers from our subjects is good and the quality of the items in the nstrument's reliability aspect is good.The result of its application is that there is an average difference in pretest and posttest scores using instruments based on open ended and ethnomathematics.
The average pretest score was 49.79 and the posttest score was 61.46.
using ADDIE (Branch, 2009) with analysis, design, development, implementation, evaluation.Critical thinking skills test tested on 33 students and has met the number of respondents who have met the requirements where based on.(Tan & Vicente, 2019) the number of respondents for the pilot study is between 25 to 100 people.(Johanson & Brooks, 2010) suggests the minimum number of respondents is 30 people.The analysis method of research data used is the Rasch model with the Winstep Software version 3.92.1 developed by

Figure 1 .
Figure 1.Rasch's perspective for the Differentiating Power of Items in the test questions

Figure 3 .
Figure 3. Wright Map Figure 3 depicts the distribution of respondents' abilities and the distribution of difficulty levels of questions on the same scale which can be seen from the Wright Map.The Wright map on the left describes the students' abilities, and it can be seen that students with code P17 (have the highest ability compared to other students.Even though they have the highest ability, this student's logit score is less of +1 logit Student P17 is outside the limit of two standard deviations (T) indicating a different high intelligence (outlier).

Table 1 .
Description of Critical Thinking Indicator 1.99 to 1.99, and the difficult items are in the MEASURE value of 2.00 to 4.00.Item Difference Power/PT-measure (Point Measure/Total Item Correlation) is shown if a negative value is obtained, meaning that item is problematic.If the item is below 0.20 it needs further inspection/not good.

Table 2 .
Example questions to measure critical thinking skills using STEM and project approaches Susi is going shopping at the "Berkah Fortune" store to buy a tunic and pants.Susi found a tunic she liked for Rp.350,000, -, before he had a voucher worth Rp.85.0000,-.The voucher can be used with a minimum purchase of Rp. 300.000,-.Then he moved to another section to look for pants.Susi is interested in buying black pants for Rp.200.00,-which says 20% discount.According to the rules of the shop, Susi can only use one type of cut, not both.Susi chose a voucher to get the cheapest price.Do you agree with Susi's decision?Suatmoko wants to buy Batik Semarang.He can buy at shop A, shop B, shop C, or shop D. Prices and discounts offered vary widely.Look at the discount scheme table above.Suatmoko wants to spend as little money as possible.

Table 2
Parno (2021)e of a valid instrument in the form of questions with a STEM approach and open-ended type projects which are able to stimulate the thinking process to get a variety of different but correct forms of answers (creativity) in solving problems so as to improve critical thinking skills.In accordance with research byParno (2021)which states that the STEM

Table 3 .
Z-Test Results for Critical Thinking Ability