Statistics and research methods overview

> [!references]- > see: [Dunn Data and statistics](x-devonthink-item://53348B8A-2541-444F-8814-25A0C53E7F10), [George Biostats and epi](x-devonthink-item://903883D0-4FE1-4143-A0E3-A69108C1AA4C), > [cabrini - landmark trials to know](x-devonthink-item://1E57ABF2-C015-4F6E-A8A9-AEC43C0BCF15), [does coffee increase happiness? EBM review](https://intensiveblog.com/can-coffee-cause-happiness-a-question-of-significance/) > > - [The diagnostic utility of prehospital hyperglycaemia in major trauma patients: An observational study.](bookends://sonnysoftware.com/ref/DL/248069) - good study for “learn by doing” to review biostats > - [EMA - Primer for clinical researchers on innovative trial designs for emergency medicine.](bookends://sonnysoftware.com/ref/DL/281928) - [online link](https://onlinelibrary.wiley.com/doi/10.1111/1742-6723.14532?af=R) overview about some novel research techniques not otherwise covered in the Study Design section of this note see also: [[Quality improvement#Differences from research|Differences between QI audit and Clinical research]] > notes from George's statistics talk, thanks! + USMLE First Aid and some other additions # Data ## Definitions **numerical variables** -- can be discrete or continuous **non-numerical** - categorical, binary or dichotomous - ordered categorical variable are non-numeric variables with a distinct order eg socioeconomic status or age groupings ## Displaying data - frequencies of categorical variables with often be displaced as a bar chart, sometimes a pie chart # Charts ## histogram - for showing **frequency distribution** - give a rough sense of the density of the underlying distribution of the data - similar to a bar chart, but key differences: - used for *continuous variables* that have been transformed into discrete groups for the graph - the y axis is like a bar graph, but the x-axis can have columns of various widths to illustrate different data - bar chart is for categorical variables; each bar is for a different category of observations ![[Pasted image 20240429173724.png|Histogram of travel time (to work), US 2000 census. Area under the curve equals 1]] ## Skew and data distribution - skew refers to the direction of of the mean from the centre of the data - a left skew (negative skew) means that the there is a long left tail, and that mean is skewed negatively to the left of the centre of the data - a right skew (positive skew) means that there is a long right tail, and that mean is skewed positively to the right of the centre of the data ![[Pasted image 20240429174735.png]] ![[Pasted image 20240429174827.png|A general relationship of mean and median under differently skewed unimodal distribution.]] ![[Pasted image 20240429174938.png]] ![[Pasted image 20240429174956.png]] ## Box and Whiskers graph - median is horizontal line in middle box - the box outlines upper and lower quartiles, as well as the [[#interquartile range]] (IQR) btwn 25th percentile and 75th percentile - the whiskers usually represent the full range of data values ![[Pasted image 20240429175354.png]] ## Scatter plots - used to document relationship between two continuous variables (eg in [[#Simple linear regression]] - can also be used to document relationship between continuous and categorical variables ![[Pasted image 20240429175600.png|two continuous variables]] ![[Pasted image 20240429175656.png| continuous and categorical variable]] ## Kaplan Meier Curves (Survival Curves) - not just about life and death -- can be used to look at duration of an event (eg the duration "survival" of sedation after giving midazolam vs droperidol) - usually two groups (control and intervention) start at 1.0 (all survivors) and over time as the end event occurs (eg death, patient wakes up, etc) the patient becomes a non-survivor - statistical significance is presented as a *p value* - useful for looking at clinically significant disparities that occur even if the end point is similar ![[Pasted image 20240429180021.png]] ## Forest Plots - often used in subgroup analysis or *meta-analysis* - squares indicate the [[#Quantifying risk|relative risk]] or odds ratio for each variable -- with the size of the square indicating the number of outcomes in that variable (its weight/ influence in the meta analysis) - horizontal lines are the [[#95% confidence interval]] for each variable - the vertical line is the summary RR for the entire trial - a black diamond (below is just "all participants) represents the RR or OR and 95% confidence interval calculated across all included studies ![[Pasted image 20240430225025.png]] ![[Pasted image 20240820203030.png]] ## Likelihood ratio nomogram (Fagan) see [[#sensitivity/specificity, PPV, likelihood ratio|likelihood ratio]] Integrates Bayes’ theorem into a nomogram for practitioners to quantify the post-test probability that an individual is affected by a condition given an observed test result and given the probability of the individual having the condition before the test was run (pretest probability). ![[Pasted image 20240430225136.png]] See [Fagan Diagrams and PE in ED](http://blog.clinicalmonster.com/2020/02/28/working-up-pe-in-the-ed-negative-likelihood-ratios-and-fagan-nomograms/), [d-dimer and fagan nomograms](https://acutecaretesting.org/en/articles/comparing-d-dimer-assays-using-likelihood-ratios-and-fagan-nomograms/?src=d-dimer), [likelihood ratio And PE](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5950618/), [The two-step Fagan's nomogram: ad hoc interpretation of a diagnostic test result without calculation.](bookends://sonnysoftware.com/ref/DL/296184) ![[Pasted image 20240715032716.png| fagan nomogram for 3-tier Wells and D-dimer 0.5]] Red line represents low-risk patients (wells <2, 1.3% risk), Blue Line represents moderate-risk patients (2-6: 16.2%) and Green line represents high-risk patients (>6; 37.5%). Nb 2-tier is score of ≤4 predicts risk of 12.1% and >4: 37.1%. As you can see from the Fagan nomogram, a high-risk individual with a d-dimer < 500 has a post-test probability of >1%, which is generally considered too high to rule out PE. As such, it is generally not recommended to use a d-dimer in high-risk individuals. Another way to look at this is that the high-risk population has such a high prevalence that the negative predictive value (NPV) of d-dimer testing is lowered, making the d-dimer test unsuitable for ruling out PE. (If you remember back to your biostatistics, NPV is changed by prevalence, where specificity and sensitivity are not.) So, high-risk patients should undergo a more definitive study, namely a CTPA or a VQ scan. # Statistics concepts ## Statistical distribution ### mean, median, mode **mean** - average value - usually pared with *standard deviation* in medical data - a small standard deviation means the data are closely clustered around the mean - μ (mu) used to represent **median** - middle value - half data above, half below - no SD reported with median -- instead reported with *interquartile range* (IQR) - the lower 25th quartile and higher 75th quartile are calculated -- the range btwn this is the IQR - IQR usually represented as a *Box and Whiskers* graph - useful for ==small data sets== and data with ==significant outliers that make the mean misleading== **mode** - most common value ### standard deviation σ (sigma) - how much variability exists in a set of values, around the mean of these values - σ = SD - variance = (SD)2 ![[Pasted image 20240430123336.png|in a normal gaussian distribution, 95% of the data exist within 2 SD (2σ) of the mean]] ### Standard Error estimate of how much variability exists in a (theoretical) set of sample means around the true population mean $SE = \frac{\sigma}{\sqrt[]n }$ ; therefore SE ↓ as n (sample size) ↑ ## 95% confidence interval *central limit theorem* : sampling distribution of a mean is normally distributed, even if the value being measured isn't normally distributed 95% CI: ==there is a 95% chance that the "real" value lies within the CI== - technically isn't true; if you measured CI 20 times, mean would lie within it 19/20 times - doesn't really represent 2 standard deviations to either side, but 1.96 ## p value **null hypothesis** : we assume that there is no difference between samples, and any variation is due to sampling error. - p value is aiming to quantify the chance that, if the null hypothesis is true (no difference between samples), we would arrive at the results produced by chance. - therefore, a smaller p value suggests a lower probability that the observed result -- that there IS a difference between samples -- is a false positive due to chance - reporting a positive study: "we have disproven the null hypothesis" - reporting a negative study: "we have failed to disprove the null hypothesis" note that p of <0.05 isn't really great; means 1/20 p values will be be false positives > therefore p <0.05 does NOT mean that the result is positive ### example ![[Pasted image 20240429203453.png]] - Trial 2 was positive for drug A because it included more patients. Trial 1 for drug A may have been a false negative - Trial 5 for drug C has a low p value, but the effect size in mean cholesterol is minimal, not clinically significant - ==note how 95% confidence interval gives better evidence of *significance* than p value== ## interquartile range 25th quartile - 75th quartile of the data often presented with a [[#Box and Whiskers graph]] ## strength of association see also: [[#R and correlation between variables]] see [spurious correlations](https://www.tylervigen.com/spurious-correlations) - linking correlation doesn't equal causation - features likely to infer causation: - temporal relation - link between dose and response - biologic plausibiliy - evidence from RCTs - demonstrating a strength of association -- eg [[#Quantifying risk|RR]] >1.5 -- NOT the [[#p value]] ![[Pasted image 20240430225456.png|A spurious correlation]] ## Kappa and reliability vs validity **Kappa:** measure of reliability and works like an R value, but runs from -1 (perfect disagreement) to +1 (perfect agreement) - high number is good ; low is bad - Kappa > 0.6 fairly good agreement ![[Pasted image 20240430225854.png]] reliable, not valid = precise, but inaccurate valid, not reliable - not precise, but accurate reliable and valid = precise and accurate neither = bad # compare variables btwn groups ![[Pasted image 20240429204240.png]] - categorical data (eg categories) - Chi-squiared test: - compares freq of categorical observations of binomial data in a population - tests the hypothesis that there is no difference btwn the groups - always a one-sided test - needs categorical data, non-paired data - parametric data - student's t-test : - compares a group of observations with a known mean - only applicable if data normally distributed - compares means between 2 groups - ANOVA (analysis of variance) - similar to a t-test for 3 or more sets of data # R and correlation between variables don't forget to consider [[#strength of association|spurious correlations]] ## Simple linear regression - when looking to perform hypothesis-generating studies on exposures and outcomes, often *regression analysis* is used - simple linear regression requires data to be continuous - *r= pearson correlation co-efficient* ; values >0.7 are fairly strong - negative value means negative correlation - r2 = is coefficient of determination (amount of variance in one variable that can be explained by variance in another variable) - it is a simple linear regression when only one exposure variable is considered - first, plot the exposure variable and outcome on a scatter plot, followed by attempting to draw a line of best fit (==r=1 means perfect positive correlation==) ![[Pasted image 20240430125900.png]] ![[Pasted image 20240430130041.png]] ![[Pasted image 20240430130102.png]] ## multiple linear regression - cornerstone of a lot of literature (eg PERC) - often, we have multiple potential exposures for our outcome of interest, and we want to analyse how we can identify the related ones from the unrelated ones - Example: Los Angeles epidemiological dataset of La county employees BP. variables measured: Age, height, weight, SBP, DBP, BMI, cholesterol, socio-economic - software does stepwise variable selection to analyse covariates - will get the *coefficient of determination data, R2* --- # evaluate diagnostic tests ## sensitivity/specificity, PPV, likelihood ratio see also: [[SHED#Statistics explanation|explaining effect of prevalence on NPV in the SHED SAH study]] **sensitivity:** TP/(TP + FN) - test positive given has disease **specificity:** TN/(TN+FP) - negativity of the test given absence of disease **negative predictive value NPV:** TN/(TN+FN) - does not have disease, given test negative. depends on prevalence of disease in population **positive predictive value PPV:** TP/(TP+FP) - has disease, given test is positive. also depends on prevalance **accuracy:** (TP + TN)/(TP+TN+FP+FN) **Likelihood ratio:** liklelihood that a given test result would be expected in a patient with teh target disorder compared to the likelihood that the same result would be expected in a patient without the disorder - LR+ > 10 indicated highly *specific* test - LR- <0.1 indicated highly *sensitive* test > combine the sensitivity and specificity with pretest probability to creat a post-test probability $LR^{+} = \frac{\mbox{sensitivity}}{\mbox{1 - specificity}}=\frac{\mbox{TP rate}}{\mbox{FP rate}}$ $LR^{-} = \frac{\mbox{1 - sensitivity}}{\mbox{ specificity}}=\frac{\mbox{FN rate}}{\mbox{TN rate}}$ see: [[#Likelihood ratio nomogram]] ![[Pasted image 20240430175627.png]] ### example diagnostic test with sensitivity of 67% and specificity of 91% applied to 2030 people looking for a disorder with a population prevalence of 1.48% ![[Pasted image 20240430224817.png]] > excerpt from: [doctors are-surprisingly bad-at reading lab results](cubox://card?id=6826101069026689259) > Say that Disease X has a prevalence of 1 in 1,000 (meaning that 1 out of every 1,000 people will have it), and the test to detect it has a false-positive rate of 5 percent (meaning 5 of every 100 subjects test positive for the ailment even though they don’t really have it). If a patient’s test result comes back positive, what are the chances that she actually has the disease? In a [2014 study](https://jamanetwork.com/journals/jamainternalmedicine/fullarticle/1861033), researchers found that almost half of doctors surveyed said patients who tested positive had a 95 percent chance of having Disease X. > > This is radically, catastrophically wrong. In fact, it’s not even close to right. Imagine 1,000 people, all with the same chance of having Disease X. We already know that just one of them has the disease. But a 5 percent false-positive rate means that 50 of the remaining 999 would test positive for it nonetheless. That means 51 people would have positive results, but only one of those would really have the illness. So if your test comes back positive, your true chance of having the disease is actually 1 out of 51, or 2 percent — a heck of a lot lower than 95 percent. ## Receiving operating characteristic curve ROC ROC curve demonstrates how well a diagnostic test can distinguish btwn 2 groups (eg disease vs healthy). ==plots the true-positive rate (sensitivity) against the false-positive rate (1-specificity)== the better performing test will have a higher area under the curve (AUC), with the curve closer to the upper left corner. ![[Pasted image 20240430211437.png]] # evaluate population health and exposure risks see also [[#case control vs cohorts]] ## Quantifying risk ![[Pasted image 20240430204001.png]] **Risk**: outcome/(outcome + no outcome). eg "has disease given exposure", or a/(a+b) %%**absolute risk**: risk A - Risk B%% **Relative Risk:** risk A/Risk C. =="(% has disease given exposure for all exposed/(% has disease without exposure for all not exposed)"== - used in *cohort studies* - risk of developing disease in the exposed group divided by risk in the unexposed group - RR = 1 → no association btwn exposure and disease - RR > 1 → exposure a/w ↑ disease occurrence - RR <1 → exposure a/w ↓ disease occurrence example: if 5/10 people exposed to radiation are dx with cancer, and 1/10 people not exposed to radiation are diagnosed with cancer, the RR is 5; so people exposed to radiation have 4x greater risk of developing cancer ![[Pasted image 20240430204511.png]] **Odds ratio**: used in *case-control* studies. represents the odds of exposure among cases (a/c) vs odds of exposure among controls (b/d) =="odds of + exposure in diseased group / odds of + exposure in non-diseased group"== *example*: if in a case-control study, 20/30 lung cancer pts and 5/25 health individuals report smoking, the OR is 8; so the ==lung cancer pts are 8 times more likely to have a history of smoking== ![[Pasted image 20240430204311.png]] **Relative risk reduction**: proportion of risk reduction attributable to the intervention compared to control. RRR = 1-RR *example*: if 2% of patients who receive a flu shot develop the flu, while 8% of unvaccinated pts develop the flu, then RR = 2/8 = 0.25, and RRR = 0.75 **Attributable risk AR**: difference in risk btwn exposed and unexposed group. =="has disease given exposed minus has the disease given not exposed"== $AR = \frac{a}{a +b} - \frac{c}{c+d}$ AR% = (RR -1/RR ) x 100 *example:* if risk of lung cancer in smokers is 21% and risk in non-smokers is 1%, then the attributable risk is 20%. **Absolute risk reduction ARR**: difference in risk (not proportion) attributable to the intervention compared to the control. =="% that has disease of those not exposed minus % has disease who are exposed"== $ARR = \frac{c}{c + d} - \frac{a}{a + b}$ *example*: if 8% of people who receive a placebo vaccine develop the flu vs. 2% of ppl who receive a real flu vaccine, then ARR = 8% -2% = 6%, or 0.06. **number needed to treat NNT:** 1/absolute risk reduction . number of patients who need to be treated for 1 patient to benefit. **number needed to harm:** number of patients who need to be exposed to a risk factor for 1 patient to be harmed. higher number = safer exposure. NNH = 1/AR **Case fatality rate:** percentage of deaths occurring among those with disease. CFR% = (deaths/cases) x 100 ## incidence vs prevalence **Incidence** = # new cases/# of people at risk per unit time **prevalence** = # of existing cases/total # of people in population *at a point in time* prevalence > incidence for chronic diseases, due to large number of existing cases eg diabetics prevalence = incidence x duration of disease - once a member of the population gets the disease, they transfer from the at risk population to the disease population | situation | incidence | prevalence | | ------------------------- | --------- | ---------- | | ↑ survival | - | ↑ | | ↑ mortality | - | ↓ | | faster recovery | - | ↓ | | extensive vaccine offered | ↓ | ↓ | | ↓ risk factors | ↓ | ↓ | # Research methods ## PICO - formulate research question **PICO** - patient, problem, population - intervention - comparison - outcome ## systematic review flowchart ![[Pasted image 20240922045218.png]] ## how to review a paper? (doubled differently below at [[#how to review a study]] - type (RCT/cohort/case-control?) - answering correct question? - measuring correct endpoint? - sampling: have they picked right group? - sample should represent the intended study population -- so if you mail a survey to get info about homeless people, unlikely to get to them - sample should be randomly derived. each member of population studied should have equal chance of being included. eg hospital study won't generalise to all people ([[#Forms of bias|berkson bias]]) - reduce loss to follow up (attrition bias) - consider [[#Forms of bias]] ## Validity - internal validity -- within the study, results appear accurate and therefore interpretation seems sensible. under the same circumstances, the result may be repeatable - external validity -- how generalisable the results are beyond the study to other situations or sample groups -- i.e. how likely is it we can demonstrate a similar outcome in different circumstances # Study design - hierarchy of evidence - filtered information: Systematic reviews > critically appraised evidence synthesis > critically appraised individual articles - unfiltered information: RCT > cohort studies > case-control studies > case series/ reports > "expert opinion" | type | explain | statistics | advantages | disadvantages | example | | -------------------- | ---------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | cohort | identify exposed and unexposed groups , see if they develop disease | incident rate, **relative risk**, attributable risk | - prospective - establish risk directly - can establish temporal relationship - assess multiple outcomes/diseases - good for rare *exposures* | - expensive - long and hard - not good for rare *diseases* - not good for diseases with long latency periods - confounding (you don't know what you don't know) | follow smokers and see if they get cancer | | case-control | retrospective. compares group of people with disease to see if odds of prior exposure increase development of disease | **odds ratio** | - good for *rare diseases* / outcomes - cheaper, faster, easier - good for diseases with long latency periods | - cannot establish prevalance - cannot establish risk - can only assess one outcome / disease - retrospective → more prone to bias | look at people with cancer see if they are smokers | | retrospective cohort | recruit based on data about exposure status and measures outcomes that have already occured | | | | | | non-inferiority | set a pre-determined non-inferiority threshold to demonstrate a new treatment is "not much worse" than an existing treatment | - Relative Risk Difference - Absolute Risk Difference | - evaluate a new therapy for a condition with a proven effective therapy where it would be unethical to do a placebo-controlled study - new treatment may advantages over standard treatment in certain aspects, eg cheaper, fewer side effects, more convenient | - do not test the superiority of new therapy compared to placebo or existing treatment - successive non-inferiority trials (where the non-inferior drug becomes the standard drug) can cause incorrect assumptions about the third drug compared to original drug - study design can bias towards non-inferiority | - ROCKET AF trial showed rivaroxaban was non-inferior to warfarin in stroke prevention in non-valvular AF - [A to Z trial](https://pubmed.ncbi.nlm.nih.gov/15238591/) showed enoxaparin non-inferior to heparin for NSTEMI | | RCT | phase I - IV *SWIM* I - safe II - work? III - Improvement? IV - Market | - Relative risk - absolute risk reduction - NNT | | | | ## cross-sectional studies - kinda like a census. measure the exposure and outcome at a single point in time - no idea of [[#incidence vs prevalence|incidence]] - quick to perform, useful for generating a hypothesis ## case control vs cohorts see also [[#Quantifying risk]] see [Case control vs cohort](https://s4be.cochrane.org/blog/2017/12/06/case-control-and-cohort-studies-overview/) **cohorts:** - prospective *cohort studies* recruit two (ideally) identical cohorts who will be identical aside from exposure of interest and followed forward for defined time - excellent at studying prognosis, risk factors, and harm - results are usually in the form of a **relative risk**: incidence of exposed/incidence of unexposed - you can measure **attributable risk:** incidence (exposed) - incidence (unexposed) - in retrospective **cohort** studies, the exposure and outcomes have already happened, usually data was collected prospectively (eg another study or registry) and exposures are defined before looking at the outcome **case-control studies:** - good way of studying rare diseases/outcomes - start by finding a population with outcome of interest - then identify a similar control group from an identical population with same opportunity to be counted as a case, but no disease - *retrospectively* look for exposure of interest - because this is a retrospective study, results are expressed as an **odds ratio** ; we have no idea of the underlying prevalence as we've artificially selected the cases - ==rare disease hypothesis: if the disease is rare, the OR approaches the RR== but as the disease is more prevalent, the OR increases disproportionate ![[Pasted image 20240430133839.png]] Cohort example: follow smokers and see if they get cancer case control example: look at people with cancer see if they are smokers retrospective cohort: a cohort of subjects selected based on exposure status is chosen at the present time, and outcome data (i.e. disease status, event status), which was measured in the past, are reconstructed for analysis ![[Pasted image 20240228104650.png]] ## Non-inferiority trials see: [Non-inferiority trials in cardiology: what clinicians need to know | Heart](https://heart.bmj.com/content/106/2/99) - [cubox](cubox://card?id=6913880309477935957) . other examples: [REBEL EM: The SAFER Trial: Pediatric CAP - Amoxicillin 5 days vs 10 days](https://rebelem.com/the-safer-trial-pediatric-cap-amoxicillin-5-days-vs-10-days/), [REBEL EM: Omadacycline, the NEJM and Non-Inferiority Studies](https://rebelem.com/omadacycline-the-nejm-and-non-inferiority-studies/), [First10EM: You don't understand non-inferiority trials (and neither do I) ](https://first10em.com/you-dont-understand-non-inferiority-trials-and-neither-do-i) A non-inferiority study compares one intervention to another, usually because a new treatment offers certain advantages over the "standard" treatment. Eg it may be cheaper, fewer side effects, more convenient, etc. consider some well-known non-inferiority trials: - ROCKET AF: rivaroxaban non-inferior to warfarin in stroke prevention in non-valvular AF. NOAC is more convenient than warfarin - A-to-Z trial: enoxaparin is non-inferior to heparin for NSTEMI. heparin is a hassle, clexane isn't. The *non-inferior margin* is the pre-determined margin of difference between the new and standard tx. Represents how much worse the new tx can be compared with standard tx while still being considered "similar" or "not worse" than standard treatment. consider this [[#Forest Plots|Forest plot]] below of the hypothesis testing and possible outcomes of non-inferiority trials: ![[Pasted image 20250104170428.png]] **Choosing the non-inferior margins** can use either relative risk difference or absolute risk difference - [[#Quantifying risk|Relative risk]] difference - ratio of end point events on the new treatment compared to end point events on the standard treatment is the non-inferior margin - the event rate of the standard group does not need to be assumed - Absolute risk difference - non-inferiority if absolute difference in end points between new and standard tx is less than a pre-defined value - this method entails an assumption on the event rate on standard treatment. - if the assumed event rate is higher than the observed event rate in the actual trial, then a higher relative difference will be calculated and favour non-inferiority. Two assumptions of non-inferiority trials that are *not tested* in the actual trial: 1. that the new treatment offers advantages over standard treatment. 2. that the new treatment is superior to placebo **Benefits** - for conditions like AF with a proven effective therapy, it is not ethical to test any new tx against placebo. ∴ many non-inferior trials use active controls as the comparator. - new "non-inferior" treatment may confer other benefits eg more convenient or have fewer side effects. **Drawbacks** - arbitrary non-inferiority margin (eg [10%](https://rebelem.com/omadacycline-the-nejm-and-non-inferiority-studies/) in this antibiotic trial) - statistical bias can favour non-inferiority - does not test superiority or equivalence - in sequential testing, if a new drug is tested against a new "standard" drug which was only established due to non-inferiority testing, the new drug may not actually be superior to placebo or the original drug. - intention-to-treat analysis may favour non-inferiority results; protocol violations may dilute any potential differences btwn the treatment arms, favouring non-inferiority results - should be analysed as both "per protocol" and intention-to-treat approach - Some of the safe guards present in superiority studies like blinding are less effective in avoiding bias in non-inferiority studies. - In a superiority study that’s blinded, it’s hard to bias the results by favouring the group receiving the treatment of interest because you don’t know which group is which. - In a non-inferiority study, you only need to show that the new treatment is about as good as the standard treatment so you can simply assess all patients as the same, thus showing non-inferiority. | characteristic | superiority trial | non-inferiority | | ---------------------- | --------------------------------------------------------------- | -------------------------------------------- | | null hypothesis | new treatment not superior to standard treatment or placebo | new treatment inferior to standard treatment | | alternative hypothesis | new treatment is better than standard tx/placebo | new tx is non-inferior to standard tx | | non-inferior margin | n/a | pre-determined | | sequential testing | not possible | can be performed | | significance level | 2-sided p <0.05 | 1-sided p <0.025 | | comparison | standard tx or placebo | standard tx (rarely placebo) | | possible outcomes | - new tx superior (or inferior to standard tx - inconclusive | see above forest plot | | subgroup analysis | possible | possible | ## RCT - outcomes - **relative risk** - incidence (w/ intervention) / incidence (without intervention) - **absolute risk reduction** = incidence (untreated) - incidence (treated) - **NNT** = 1/ARR ## meta analysis see: [[#Forest Plots]], [Introduction to systematic review and meta-analysis](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5903119/) - [bookends link](bookends://sonnysoftware.com/ref/DL/58009) - "rubbish in; rubbish out" - are the studies pooled in the meta analysis sensible studies? "A form of systematic review that uses statistical methods to combine the results from different studies to derived a pooled estimate of effect." a meta analysis is a quantitative systematic review in which the clinical effectiveness is evaluated by calculated weighted pooled estimate for the interventions in multiple separate studies. The aim is to derive a conclusion with increased power and accuracy than what could be achieved in individual studies. [Forest plot](https://www.nature.com/articles/s41433-021-01867-6) *benefits of meta analysis* - increase statistical power by ↑ sample size - resolves uncertainty when individual studies disagree - improve estimates of effect size - quantifies and normalises results across heterogeneous studies - may help provide information about the generalisability of results ## how to review a study (doubled above at [[#how to review a paper?]] 1. is it the right kind of study (eg past trials were cohorts rather than RCT) 2. is the outcome measure relevant? (eg number of blood transfusions in REBOA trial, rather than deaths) 3. have they picked the right group? 1. your sample should represent the intended study population 2. eg a survey result on homelessness that was mailed out to study participants 4. the sample should be randomly derived (each member of the population being studied should have equal chance to be included) 5. reduce loss to follow up and introduces selection bias # Forms of bias see also: [[Cognitive biases]] **selection bias** - there is a difference btwn study sampled and those who were not studied. eg unemployed, homeless, ethnic minorities. **non-response bias** - sine components of a study questionnaire or data form may nbe incomplete. this might be due to the way the measure occurs -- too tedious, uncomfortable, intrusive. **loss to follow up bias** - #tables | Type | definition | example | how to avoid | | ---------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Recruiting** | | | | | selection bias | nonrandom sampling or tx allocation of subject such that study population is not representative of target population often a sampling bias | *berkson bias* - cases / controls selected from hospitals are less health, different exposures than gen population *attrition bias* - participants lost to follow up have a different prognosis than those who complete the study | - randomisation - ensure the choice of the right comparison / reference group | | **performing study** | | | | | recall bias | awareness of disorder alerts recall by subjects common in retrospective studies | patients with disease recall exposure after learning of similar cases | decrease time from exposure to follow up | | measurement | information is gathered in a systemically distorted manner | - using a faulty automatic sphgmomanometer to measure BP - *hawthorne effect* -- ppl change behaviour when knowing they are being observed | - use objective, standardised, and previously tested methods of data collection that are planned ahead of time - use placebo group | | procedure and detection bias | subjects in diff groups are not treated the same | pts in tx group spend more time in hihgly specialised hospital units | blinding and use of placebo reduce influence of participants and researchers on procedures and interpretation of outcomes as neither are aware of group assignments | | observer-expectancy bias | researcher's belief in the efficacy of tx changes the outcome of that tx aka *Pygmalion effect* | paul marik thinks vitamin C helps sepsis so more likely to document positive outcomes when he treats septic patients with useless vitamins | same as for procedure bias | | **interpreting results** | | | | | confounding | factor related to both exposure and outcome (but not causal) disorts effect of exposure on outcome | an uncontrolled study shows an association btwn coffee and lung cancer. however, coffee drinkers smoker more cigarettes, which accounts for the association | multiple/repeated studies crossover studies matching , propensity score | | lead time bias | early detection is confused with ↑ survival | seems like survival increased, but diseases natural history has not increased | measure "back end" survival (adjust survival according to the severity of disease at the time of diagnosis" | | length-time bias | screenin test detects diseases with long latency period, while those with shorter latency perio become symptomatic earlier | a slowly progressive cancer is more likely detected by a screening test than a rapidly progressive cancer | a RCT assigned subjects to the screening program or to no screening | ## confounding vs effect modification > confounding isn't exactly the same as bias -- a study could be free from bias, yet confounding could generate significant issues **confounding**: a confounder is another factor that interacts with study exposure, independently affecting outcome. this can be significant when the confounder is not uniformly distributed between the groups - textbook example is coffee and pancreatic cancer: smoking was unevenly distributed between the groups, causing a [[#strength of association|false association]] between coffee and pancreatic cancer common confounders: - age - gender - ethnicity - social economic status control confounding: - randomisation -- true randomisation decreases influence of confounding - restriction -- if you only study old patients, then age is no longer a confounder - matching -- measuring the confounding variable(s) and ensuring they are distributed across the study groups (used sometimes in case-control) - analytic techniques - stratifying data by confounders **effect modification**: the magnitude or direction of an association varies according to levels of a third factor *unlike confounding, effect measure modification should be described and reported, rather than controlled* ![[Pasted image 20240430213450.png]] ![[Pasted image 20240430213624.png]] ## propensity score control confounding [Alternative approaches for confounding adjustment in observational studies using weighting based on the propensity score: a primer for practitioners.](bookends://sonnysoftware.com/ref/DL/295914), [intro to propensity score](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3144483/) - adjust for confounding in observational studies (eg cohort study) - *propensity score* = a patients’ predicted probability of receiving a certain treatment given their characteristics - Generally, the confounders are estimated using logistic regression model to assess from a list of possible confounders (eg age, ethnicity, smoking status) which ones are statistically significant representations in either control or cohort group - then, the outcome of data is re-calculated using propensity scores as “weights” in linear regression