cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

Community Tip - When posting, your subject should be specific and summarize your question. Here are some additional tips on asking a great question. X

What do you do when you suspect that some data is wrong?

Fred_Kohlhepp
23-Emerald I

What do you do when you suspect that some data is wrong?

I spent a fair portion of my career taking measurements of experimental data.  When I first started the senior engineer would review the data and discard measurements that he deemed false.  Discarding data simply because you thought it was wrong struck me as very questionable and I spent a fair amount of effort developing a method (using T statistics) to identify "bad" data.

 

It turns out that there is a statistically rigorous calculation to do just that, developed (and published in 1852) by Benjamin Peirce.  That method is developed and demonstrated in the attached Prime 4 Express file.

 

Thoughts and suggestions?

9 REPLIES 9
Derbigdog
14-Alexandrite
(To:Fred_Kohlhepp)

W. Edwards Deming wrote a book on Statistical Process Control that addressed this subject as well. His ideas lead to the Japanese manufacturing industry going from one of the worst to one of the best in quality control.

FDS
12-Amethyst
12-Amethyst
(To:Fred_Kohlhepp)

Dear Fred, Would it be possible to attach a pdf of your worksheet? Have a nice weekend.

Fred_Kohlhepp
23-Emerald I
(To:FDS)

Per your request.

FDS
12-Amethyst
12-Amethyst
(To:Fred_Kohlhepp)

Dear Fred, Thank you highly appreciated!

Would your method also work in case of the data provided in this thread?
Solved: Remove specific regions from a graph - PTC Community

The built-in functions (Grubbs, GrubbsClassic, ThreeSigma) won't.

Man!  That's a lot of data!

 

First, note that Terry has successfully trimmed this.

 

Second, Peirce (takes the log of the inequality I'm using.  I haven't been able to do that successfully.  The large data set is(I think) keeping my solution from working (using the root function.)

 

Third, my first pass simply treated the entire set as measurements to be averaged and analyzed.  Clearly there's a sinusoidal function that might reduce scatter and standard deviation.


@Fred_Kohlhepp wrote:

Man!  That's a lot of data!

 

First, note that Terry has successfully trimmed this.

 

Second, Peirce (takes the log of the inequality I'm using.  I haven't been able to do that successfully.  The large data set is(I think) keeping my solution from working (using the root function.)

 

Third, my first pass simply treated the entire set as measurements to be averaged and analyzed.  Clearly there's a sinusoidal function that might reduce scatter and standard deviation.


Yes, sure a huge amount of data - slightly less than 175000 data points.
As far as I understood Terry guessed(!) a sine as being the upper limit and eliminated all data above it.

I thought about automating that process by using an outlier function (without success).

If we zoom in (see picture below) we can clearly(?) see which data should be considered an outlier/spike. I thought about some kind of windowing and applying an outlier function peu à peu instead of treating all the data in one go ....?
I had not played around with that idea any further as the OP in that thread seemed to be happy with Terrys solution anyway.

The thread just came to my mind when I read your posting.

Werner_E_0-1691771266421.png

 

I spent some more time to figure out why this system wasn't working.

The attacked file points out a major flaw:  The procedure (as presented here) will not work if the data set is larger than 142 points!

 

Back to the drawing board!!

Working to expand sample size limit, got the logarithmic equation to work.  Not sure why there're still issues.

 

Still Prime 4 Express

Top Tags