What is a double-blind QC sample used by the IBSP?
How do I easily obtain raw blind QC data?
How are the Control Limits determined and what are Quality Control Units?
Why are charts missing for some parameter code/method codes?
How do I view historical charts or obtain historical data?
How do I find out more about the specific matrix of the IBSP blind samples?
Are methods for both filtered samples and unfiltered samples monitored?
Are filtered blind reference samples appropriate for the QA of unfiltered-aqueous-sample analyses?
Why not use unfiltered blind samples?
How can I determine if my environmental sample data might be biased?
What is the difference between bias and variability?
What is the difference between parametric and non-parametric statistics?
Why does the NWQL demonstrate a long-term negative bias for selenium?
Does the NWQL demonstrate a long-term positive bias for filtered boron by ICP-OES?
What determines which parameters are included/excluded in the IBSP Data Quality Assessment Summary?
I have a question that has not been addressed.
A "double-blind sample" is a QC sample submitted for analysis for which the identity of the sample as well as the concentration of the individual components within the sample is unknown to the analyst. Double-blind QC samples containing inorganic, nutrient, and physical property constituents at various concentrations are prepared and disguised as routine environmental samples. The QC samples used by the IBSP are very unique in that they are typically not synthetic reference materials; rather they are derived from snow-melt, surface-water, or ground-water sources (Woodworth and Connor, 2003). These natural-matrix standard reference samples are generally used as-is (undiluted). In some cases, the standard reference samples are diluted with deionized water or mixed in varying proportions with other standard reference samples in order to achieve a variety of concentrations within the range of concentrations that corresponds to those of typical environmental water samples.
The double-blind IBSP samples are made to appear as much like environmental samples as possible. The IBSP submits these samples in shipping coolers to the National Water Quality Laboratory (NWQL) to mimic the process, as much as possible, by which actual environmental samples are submitted to the NWQL. All identifying information (except account number) is changed to that of actual Water-Science Center customers. Bottle labels are even "soiled" to give the appearance that the bottles have been filled in the field. After the samples are logged in, they are subjected to the identical laboratory handling, processing, and analytical procedures as are the environmental samples. Once the laboratory analyzes the samples, the results are loaded into the National Water Information System (NWIS) database (blind-blank results do not go into NWIS) at which point the IBSP personnel twice-weekly compile and review the analytical results.
There are several key differences that make IBSP blind-sample results more applicable in describing analytical performance in environmental sample matrices than on-line QC results.
(1) IBSP blind samples measure the analytical performance of the entire laboratory process—from sample login through data storage in the National Water Information System (NWIS) database. (Blind-blank sample results do not go into NWIS.) In other words, the IBSP blind samples are subjected to the same laboratory processes as the environmental samples and are designed to capture the same sources of variability to which the environmental samples are subjected. On-line QC samples, on the other hand, are designed for a different purpose. They are used primarily for initial instrument calibration verification and for on-going instrument calibration verification. On-line QC samples measure only a subset of the environmental sample analysis process.
(2) Most IBSP blind samples are made from natural-water matrices whereas on-line quality-control samples are typically synthetic samples made from spiked reagent-grade water. Natural-matrix water samples contain a complex mixture of both organic and inorganic components that can cause interferences and affect how a sample interacts physically with an instrument (part of what's often referred to as "matrix effects"). These effects are not nearly as pronounced in spiked reagent-grade samples; therefore, natural-matrix blind samples present an analytical challenge more consistent with that of environmental samples allowing measured analytical performance of these natural matrices to be more appropriately applied to environmental sample results.
(3) IBSP blind samples are a third-party check in the truest sense. Most IBSP blind samples come from round-robin studies and are used "as-is." The accepted target value, or MPV (Most Probable Value), is derived from data from multiple laboratories' results so has been confirmed many times over. On-line QC samples are typically made within the laboratory and usually undergo a number of processing steps that modifies the QC sample from its original form. The analysts, equipment, and solutions used to make these modifications are typically the same analysts, equipment, and solutions used to make the calibration standards and the on-line QC samples, so systematic errors can be introduced that are difficult to detect. Third-party, on-line QC samples partially mitigate this situation, but they are still typically prepared by the same analysts with the same equipment that is used for the calibration standards and other QC samples. Therefore, third-party, on-line QC samples may be affected by systematic errors.
When natural-matrix, blind sample data are not available, on-line QC data can be used to estimate analytical performance for environmental results so long as their limitations are well understood. As always, environmental-sample matrix spikes using appropriate spike levels are recommended for evaluating sample-specific bias.
There are two ways to easily obtain raw data.
(1) Access the chart for your parameter of interest on the Charts and data page. Once the chart launches, there will be an "Open Data Set" button. Click on this button and a spreadsheet will pop up with all of the data for that particular chart.
(2) Contact the listed project chief (see left-hand menu, top of page) for custom data pulls that match your project's date range and objectives.
IBSP control limits are set at +/-2 Quality Control Units (QCU) by convention. QCUs are calculated by three separate methods for any given analyte within a blind sample. The largest value of those three values is assigned as one QCU.
The three values for the QCU (shown in no particular order) are calculated as follows:
(1) f-pseudosigma. The f-pseudosigma used is the f-pseudosigma determined by the Standard Reference Sample (SRS) Project's interlaboratory comparison (round-robin) study for that analyte in that blind-sample .
(2) One-half the Method Detection Limit (MDL). This is the MDL (not the Method Reporting Limit) in use at the time that blind-sample was analyzed.
(3) 5% of the Most Probable Value (MPV). The MPV used is the MPV determined by the SRS Project's round-robin study for that analyte in that blind-sample.
Since the control limits are set at +/-2 QCUs, the control limits will be double the values calculated for the QCU. The control limits will never be more stringent than +/-10% of the SRS Project's round-robin MPV (i.e. 2 times +/-5%), or more stringent than the acceptable variability (2 times +/-1 f-pseudosigma) demonstrated by the cumulative labs participating in the SRS Project's round-robin study, or lastly, no less than +/- the MDL (2 times one-half the MDL). At concentrations approaching the MDL (where the relative error is generally higher than 10%) the calculation based on the MDL becomes more appropriate and tends to be favored. At higher concentrations where 5% of the MPV is larger than one-half the MDL, the "5% of the MPV" or the f-pseudosigma calculations become more applicable and should be favored. Lastly, for more challenging matrices (higher than usual variability for a given concentration), the f-pseudosigma calculation is a more realistic measure of variability and it tends to be favored as it will often be greater than 5% of the MPV.
All results returned by the NWQL (for IBSP blind samples) are compared directly against the analyte-specific MPV. The difference between the lab result and the MPV is converted into QCUs and then plotted accordingly. The actual QCU for a given point on the chart can be seen by clicking once on the chart and then hovering the cursor over the data point of interest. In addition, clicking the "Open Data Set" link on the chart will open an MS Excel file with the data (and associated meta-data) used to create the chart.
Note: Prior to FY16, blind-samples were made from blended Standard Reference Water Samples (from the SRS Project) and the f-pseudosigma was computed via regression analysis. The MPV was calculated using the proportion of the each of the Standard Reference Water Samples used to make the final, blended solution.
Please contact the listed project chief (see left-hand menu, top of page) for additional clarification.
Generally speaking, if 250 or more environmental samples were analyzed by the NWQL for a given parameter code/method (pcode/mcode) code within a given year and a stable (>6 months) reference material is readily available, then non-blank, blind samples will be submitted for that pcode/mcode. There are some exceptions to this.
(1) In order to provide a comparison between results for methods for filtered samples and unfiltered samples, some methods for which fewer than 250 environmental samples were analyzed for unfiltered samples may be included in the non-blank, blind-sample submission plan.
(2) Some nutrient methods, although labeled with different pcode/mcodes, are very similar, and it's believed that problems or successes demonstrated by one of those methods will affect all similar methods in the same fashion. For efficiency and economic reasons, only one in the group of similar methods is included in the non-blank, blind-sample submission plan. Quality-control data are available for the pcode/mcodes that follow (shown in bold) with the pcode/mcodes to which the quality-control data can be extrapolated (shown in italics):
00666CL063 and 00665AKP01 --> 00666CL062
00623KJ002 --> 00623KJ003 and 00625KJ008
00666KJ005 --> 00665KJ009 and 00623KJ004
00610CL017 --> 00608CL036
00665CL021 --> 00666CL019
62855AKP01 --> 62854CL062
The blind blanks submission plan is handled differently than the non-blank, blind sample submission plan. Except for the similar nutrient methods listed above, blind blank samples are submitted for almost all inorganic methods regardless of the number of environmental sample requests. Blind-blank data are also available for selected carbon methods, nitrogen methods, and other methods as well.
Historical charts and their associated data (back to 1985) are available at the main Charts and Data page.
Although most IBSP blind samples are natural-water matrices, all natural-water matrices are not alike. Ground-water and surface-water matrices may demonstrate different analytical performance for a given analytical method. Other water characteristics that may affect analytical performance include (but are not limited to) ionic strength, carbon concentration, naturally-occurring concentrations of constituents that the lab uses for internal standards, and many other interferences that may be present at concentrations different than those present in the IBSP blind samples.
Environmental-sample matrix spikes using appropriate spike levels are recommended for evaluating sample-specific bias.
Follow this link to the IBSP Blind-sample Matrix Description Page for specific details.
Yes. However it is important to note that filtered blind reference samples are submitted to both the methods for filtered samples and to the methods for unfiltered samples.
Yes. Filtered blind reference samples are appropriate for the quality assurance of unfiltered-aqueous-sample analyses. The unfiltered-sample analysis is designed to evaluate the dissolved phase + colloidal phase + particulate phase (analogous to an entire pie). The filtered-sample analysis is designed to evaluate only the dissolved phase (analogous to a single slice of the pie). Samples analyzed by the unfiltered-sample analysis can have a wide range of proportions of these phases (analogous to each pie cut into varying sizes of pie slices), including samples where the entire mass of analyte is in the dissolved phase. Operationally, a filtered aqueous reference material has the entire mass of analyte in the dissolved phase, so the unfiltered-sample analysis and the filtered-sample analysis of the reference material should yield the same concentration. Where an analyte has separate analyses for filtered samples and unfiltered samples, it is preferable to run the same filtered reference sample through both the filtered-sample analysis and unfiltered-sample analysis so that the samples themselves are not an additional source of bias or variability. This allows for direct comparison of differences in analytical bias and variability between the two methods.
Note: It would not be appropriate to submit an unfiltered reference sample, if one were available, to QA a filtered-sample analysis because the filtered-sample analysis is not designed to quantify the particulate and colloidal phases.
When laboratory performance is measured using a blind sample, it is assumed that most of the deviation between the laboratory result and the target value (Most Probable Value) is due to variability in the laboratory process--not the blind sample itself.
The blind samples used by the IBSP are obtained from the Standard Reference Sample (SRS) Project. Demand for the reference materials produced by the SRS Project requires them to be produced in large batches (volumes currently range from about 240 to 270 liters). We feel that we cannot, at this time, reliably split the particulate and colloid phases in a large-volume, unfiltered sample in such a way as to produce an unfiltered sample that has consistent concentrations of particulate- and colloid-phase analytes in each aliquot for the large number of aliquots that we prepare for each Standard Reference Water Sample (SRWS). Therefore, the variability in concentration of particulate-phase analytes that might result in unfiltered SRWSs would be unpredictable making them unreliable for measurement of laboratory performance. In addition, unfiltered samples may be unstable due to microbial activity and phase changes over time.
For making a direct comparison of differences in analytical bias and variability between a filtered-sample method and an unfiltered-sample method, it is preferable to run the same filtered reference sample through both methods so that the samples themselves are not an additional source of bias or variability.
The digestion step within an unfiltered-sample analysis is quality assured when using a filtered sample; however, the efficacy of the digestion step (the effectiveness of the extraction of analytes from particulates) is not evaluated when using a filtered reference sample to quality assure an unfiltered-sample analysis because there are no particulates from which to extract. The efficacy of the digestion step is critical, but it's only one step of many in the overall process that constitutes the unfiltered-sample analysis. Note that the calibration standards and internal quality-control samples used for the unfiltered-sample analysis also have no particulate phase.
If the lab reports a higher concentration for the unfiltered aliquot than for the filtered aliquot for the same analyte in the same environmental sample, then the data pass a logic check (the part can't be greater than the whole). This is good. However, this situation does not guarantee that the data are free from bias. It is possible that the concentration of analyte recovered from the particulate portion in the unfiltered aqueous sample is biased low. If this "low-biased" value for the particulate and colloid portions is added to the dissolved component of the unfiltered aqueous sample, then the low-biased result for the unfiltered sample may still be higher than the result for the filtered sample. This situation, which involves a biased result, is not sufficient to fail the logic check described above. There are a number of other combinations of biases that may occur even though the lab reports a higher concentration for the unfiltered aliquot than for the filtered aliquot for the same environmental sample.
To identify bias, a combination of approaches is best.
1) Ascertain analytical performance using the results of reference samples with known concentrations. Such reference samples can be run by the lab as non-blind internal quality-control standards or as blind performance-test samples. Such reference samples also can be submitted by lab users or external groups (such as the USGS BQS) as double-blind samples. These blind reference samples are a key component of the USGS WMA laboratory-evaluation policy.
2) In addition, environmental-sample matrix spikes using appropriate spike levels are recommended for evaluating sample-specific bias.
Analytical errors fall into two major categories: bias and variability. Bias is systematic error that causes consistently positive or negative deviation in the results from the "real" value. Variability is random error that affects the ability to reproduce results. Repeated measurements of the IBSP samples over time provide estimates of both systematic bias and random variability in the laboratory's analytical procedures.
A full and adequate answer to this question is well beyond the scope of an “FAQ” answer; however, it is a question that arises since the IBSP makes use of the non-parametric statistics: median and f-pseudosigma.
Parametric statistics rely on assumptions about the underlying distribution of the population being described whereas non-parametric statistics make few or no assumptions about the distribution of the population being described. When describing the central tendency and variability of a data set, if the data are correctly assumed to be normally distributed, then the parametric statistics—mean and standard deviation—are adequate measures of central tendency and variability, respectively. However, skewed data sets (non-normally distributed) or data sets containing random and sporadic outliers may produce a biased estimate of the calculated mean and standard deviation. For non-normally distributed data or data that include random and sporadic outliers, the median and f-pseudosigma are more appropriate measures of central tendency and variability, respectively, because these statistics are considered “robust.” That is, they include all of the data (normal, not-normal, outliers, etc.), however, they are much less sensitive (more robust) to the shape of the distribution and to outliers and can therefore still give adequate measures of central tendency and variability even when the data are not normally distributed and/or include outliers.
The distributions characterizing data sets reviewed in IBSP are similar to the distributions seen for environmental data sets. They often are right- or positively-skewed, oftentimes due to random and sporadic high-value outliers; therefore, non-parametric statistics are favored.
The f-pseudosigma can be thought of as a non-parametric analogue to the (parametric) standard deviation (Hoaglin and others, 2000). In other words, like the standard deviation, it’s a measure of variance; however, it is less distorted by non-Gaussian distributions and/or outliers in the data set. (For Gaussian data sets, the calculated f-pseudosigma and the calculated standard deviation will be quite similar, if not the same.)
The f-pseudosigma is calculated by taking the fourth-spread (or interquartile range) of the data set and dividing it by 1.349. This normalizes the interquartile range to that of the standard deviation of a Gaussian distribution. This calculated value (the f-pseudosigma) may then be used in lieu of the standard deviation with little concern for outliers and distribution shape.
f-pseudosigma = (data [Fourth-spread])/1.349
where the Fourth-spread is analogous to the interquartile range (3rd quartile - 1st quartile) of the data.
Reference
Hoaglin, D.C., Mosteller, F., and Tukey, J.W., Eds. 2000, Understanding robust and exploratory data analysis: New York, NY, John Wiley, Inc., p. 38-41.
There may be one or more reasons for this. One point to consider is the target value against which the NWQL performance is assessed. This is the only point addressed in this answer.
The target value used by IBSP is the Most Probable Value (MPV) determined from the Standard Reference Sample (SRS) Project's interlaboratory comparison (round-robin) studies. For selenium, the method primarily used by laboratories participating in the round-robin (n = 17-34; Fall 2012-Fall 2017) is inductively coupled plasma mass spectrometry (ICP-MS). Although ICP-MS is very commonly used for the analysis of selenium in environmental samples, analysis of selenium by ICP-MS is complicated by the presence of interfering elements and molecules from both the sample matrix and the argon gas required by the method. For selenium, there are multiple techniques a laboratory may use within the ICP-MS method to try to mitigate the effects of one or more interferences and to maximize sensitivity for selenium.
First, there are many selenium isotopes from which to choose. Mass 78 and mass 82 are the more commonly used isotopes. Both of these masses can be affected by matrix interferences (signal, chemical, or physical effects from components other than selenium that can enhance or suppress, i.e. "interfere with," the true selenium signal). Positive matrix interference can be corrected for using matrix-interference equations that are applied by the software against the result in an attempt to reduce the impact of the interferent on the final result. Certain matrices are more difficult to correct for than other matrices, and the efficacy of matrix-interference equations can vary from matrix to matrix and isotope to isotope. Correction equations are specific to a single interferent, such as a krypton (mass 78) signal overlap on the selenium (mass 78) signal and therefore do not correct for every interference present in a sample.
Newer ICP-MS instruments have available a collision or reaction cell. This enables the instrument to physically or chemically remove any molecular interference before it enters the analytical pathway. The collision/reaction cell, when used for selenium, generally uses collisional energy to break up molecular interferences that overlap with the selenium signal and also uses a reaction gas to react with other interferences and shift their mass away from that of the selenium mass of interest. Some of the gases used for the collision and reaction cells include H2, He, O2, NH3, or a combined mixture of these gases. If a collision or reaction cell is used, correction equations are generally not used. The NWQL ICP-MS method for selenium analysis uses a collision cell with a He/H2 gas mixture to remove molecular interferences.
The variety and high levels of interferences on selenium isotopes and the wide variety of techniques for interference elimination available by ICP-MS can cause higher variability in the round-robin results for selenium as compared to results for other trace metals. The high variability indicates less confidence that the median value determined in the round-robin for a particular Standard Reference Water Sample (SRWS) represents the actual concentration of selenium in that sample. In addition, since some of the techniques are less able to control for interferences than other techniques, the calculated MPV for selenium concentration (the median value) may be higher than the actual concentration due to interferences imparting a positive bias to the selenium signal. If the median concentration is indeed higher than the actual concentration, then techniques that more completely eliminate or control for interferences will yield lower results by comparison.
Certain interferences have a relatively fixed impact on the selenium signal, so as the selenium signal increases, the relative effect of the interference on the selenium signal decreases. At higher concentrations of selenium, the difference between the various techniques will diminish since the effect of the interferent becomes minor, and successfully eliminating or correcting for the interferent becomes less crucial. Below a certain concentration (about 1 µg/L; Fall 2012-Fall 2017) there appear to be technique-specific round-robin results (not confirmed), which may indicate that some techniques are better able to eliminate or correct for fixed-level interferences than other techniques. Therefore, the ability to accurately determine the actual concentration of selenium in a sample may be specific to a given technique, and one technique may appear to be biased relative to another technique or all other methods and techniques collectively.
In conclusion, the method in use by the NWQL is designed to eliminate most signal enhancing interferences and thus may appear to be producing low biased results (as compared to MPVs for low-concentration SRWSs) when in fact, the results may be more consistent with the actual concentration of selenium in the blind sample. In other words, what appears to be a performance issue at lower concentrations may actually be, all or in part, a "target value" issue.
Analytical results for filtered boron by ICP-OES (parmcode 01020; method code PLA13) have demonstrated a positive bias since approximately October 2013. In FY14, the overall median positive bias was 3.1% (n = 86). FY15 through FY20, the overall median positive bias was 8.0% (n = 372), with the bias about 8.8% for the time range January 2021 through September 2024 (n = 221). (Percent bias and number of results derived from data in NWIS and QWDX as of 10/07/2024). The NWQL has tried many different approaches to decrease the bias without success. Data users should consider this bias when interpreting their environmental data and consider requesting filtered boron by ICP-MS if accuracy is a high priority.
Generally speaking, analytical methods that demonstrate biases greater than 5% and/or that have 10% or more results outside of the IBSP control limits will be included in the summary. There are a number of exceptions to this. Please see the following FAQ for more specific details.
Water Science Center (WSC) water-quality projects have data-quality requirements designed to meet their projects' objectives. Similarly, IBSP has data-quality requirements as well. The IBSP data-quality requirements are based on what is considered to be reasonable analytical performance as indicated by the Standard Reference Sample (SRS) Project's interlaboratory comparison (round-robin) studies in conjunction with the NWQL's Method Detection Limits (MDLs). WSCs' data-quality requirements may be more or less stringent than the data-quality requirements applied by IBSP.
Bias
Bias (when applied in the IBSP’s Data Quality Assessment [DQA] Summary) is defined as the percent-error from the target or expected value and is calculated as follows:
((lab value – target value) / target value) X 100%
Biases in the blind-sample results are mentioned in the DQA summary for analytical parameters for which the median bias is at least 5% (and the one-sided p-value associated with the bias measurement is <0.025). Biases less than 5% (even if long-term) are not mentioned in the DQA summary and are considered to be within reasonable analytical expectations. In addition to the bias being at least 5%, two or more results must be greater than +1 Quality Control Unit (QCU) and/or less than -1 QCU. The rationale here is that although a bias may be present, if almost all results are within +/-1 QCU, then the method is operating within reasonable analytical expectations. Other considerations come into play as well, such as the concentration of blind samples submitted. Low-concentration samples (samples for which the target value is within 2 times the MDL) may be excluded from the overall bias calculation since percent-errors greater than 5% are perfectly acceptable at concentrations close to the MDL. Similarly, results for samples for which the MVP is less than 1 µg/L may be acceptably biased by more than 5% since the overall effect on the absolute concentration will not be that great. Finally, method-performance capabilities as determined by the SRS Project's round-robin studies may indicate that for a particular parameter, biases greater than 5% are within that method's analytical capabilities (for example, selenium by ICP-MS).
Variability--non-blank, blind sample results
The phrases "high variability" or a "high percentage" (when applied in the IBSP's DQA summary) is describing a situation where multiple results for blind samples submitted to the NWQL are higher than the upper control limit (+2 QCUs) and/or lower than the lower control limit (-2 QCUs) used by the IBSP. These phrases apply in the DQA summary when the percentage of results outside of IBSP control limits is 10% or greater for a defined time-frame. Not all cases of analytical methods demonstrating a high percentage of results outside of control limits are noted due to other considerations; such as, but not limited to, multiple results outside of limits on a single day or all results outside of limits from the same blind-sample solution. If high variability has been a long-running issue but has recently improved, it may be mentioned in the IBSP Observation column (in the DQA summary) as a follow-up even if less than 10% of the results are outside of IBSP control limits.
Variability--blind-blank sample results
For NWQL inorganic results, a "false positive" refers to when a non-censored value is reported for a blank sample thought to contain little or no constituent of interest. Theoretically, 1% or less of blind-blank results should be reported with a "<" Remark Code. (i.e. no analyte detected; Childress and others, 1999). In reality, it's not too uncommon to see some random and sporadic false positives. The phrase "high variability" (when applied in the IBSP’s DQA summary) for blind-blanks is describing a situation where three or more results from different analysis dates are reported as actual detectable values within a defined time-frame (usually 2-3 months) resulting in a false-positive rate that at a minimum exceeds 5%. Due to numerous other considerations, not all analytical lines demonstrating a false positive rate that exceeds 5% are mentioned in the DQA summary. Blind-blank results may or may not exhibit a bias in conjunction with high variability. Biases in blind-blank results are not mentioned unless the bias is greater than the detection limit.
Reference
Childress, C.J.O., Foreman, W.T., Connor, B.F., and Maloney, T.J., 1999, New reporting procedures based on long-term method detection levels and some considerations for interpretations of water-quality data provided by the U.S. Geological Survey National Water Quality Laboratory: U.S. Geological Survey Open-File Report 99-193, 19 p. (Also available at https://water.usgs.gov/owq/OFR_99-193/index.html.)
Please contact the listed project chief (see left-hand menu, top of page) with any questions.