Fuelly Forums

Fuelly Forums (https://www.fuelly.com/forums/)
-   Fuelly Web Support and Community News (https://www.fuelly.com/forums/f2/)
-   -   Throw out obviously bad data? (https://www.fuelly.com/forums/f2/throw-out-obviously-bad-data-16811.html)

evoblade 09-03-2014 05:19 PM

Throw out obviously bad data?
 
Would it be possible for Fuelly to throw out extreme outliers when computing averages? For example there is a guy with a 2014 Mazda 6 that put in 11,051 gallons on a refuel. Obviously that was a mistake. Does it make sense to say that 2014 Mazda 6s get an average of 9.0 MPG? Heck no.

I'm not saying go in and change that guys data, just don't use it to calculate averages.

Jay2TheRescue 09-03-2014 06:46 PM

If you clicked the report page link on that vehicle page, (or post a link to it in this thread) I can take a look, and if it truly is out of the realm of possibility, I can see what I can do to fix or delete it.

trollbait 09-04-2014 04:45 AM

I think what is actually asking for is that instead of having to report every suspect vehicle, and then have you guys fix it, that the software just ignore the data outliers when researching vehicle averages.

RobertV 09-04-2014 09:19 AM

We are looking at way to stop/catch bad data at the point of it being entered. But if we put up a blocker, it can become frustrating at the time of entry. If we give the option to ignore the warning/error message, they can be overlooked and forgotten.

The Team is still looking for the best possible solution, because we want to see good-data just as much as you!

trollbait 09-04-2014 10:01 AM

Perhaps allow the user to select using the middle 95% of vehicles for calculating the average fuel economy of a group of cars.

andyrobo 09-04-2014 10:46 AM

We have been actively working on ways to identify outliers, mark them as potential errors and remove them from the crowdsourced results. I don't have an ETA for this as our developers are currently working on some projects that need to get done for our smartphone apps but hopefully in the next couple months we'll be able to implement a much more sophisticated tool to identify and ignore bad data.

Not only do we plan to use the data to identify outliers but also to understand what are "normal" results on a Year, Make, Model basis and then analyze data when it's being input into the system to identify issues before they are submitted. For example, this 2014 Mazda 6 probably entered their mileage in the gallon field and we should be able to identify that and fix it before the data is saved to our database.

Sometimes people forget to log a fuel-up and then on their next fuel-up the distance traveled is usually about twice that of a single fuel-up. Right now we require them to tell us that they skipped a fuel-up but we should be able to use the data we have to identify issues like this and help our users fix the issue.

We have been analyzing statistical methods to identify outliers, to determine what the max capacity of a fuel tank, the max distance per tank and other interesting tidbits. We are also gathering data on fuel prices and locations so we can determine if someone accidentally put the fuel price in the gallon field and the gallons in the fuel field even on fairly small fuel-ups.

Once we have developed these methods we will retroactively scan our entire database for issues and mark them accordingly. We will then highlight the fuel-ups on the logs so our drivers can see that they have a potential error.

In the meantime, you can paste a link to the vehicle with the error here in the forums and we can set that vehicle to be ignored.

evoblade 09-04-2014 12:54 PM

Here is the specific one I am referring to:

https://www.fuelly.com/car/mazda/6/2014/threewest/243719

I really think there should be some kind of automatic filtering if someone puts in 11 thousand gallons of fuel to throw that one out, at least in regards to the group average.

Honestly it might not be a bad idea to just throw out a certain top percent and lowest percent as far as the averages are concerned.

OliverGT 09-04-2014 01:39 PM

You can't throw out the top ones, that would be me :(

I don't lie honest...

But seriously, you could remove the extremes to get the average.

One of the problems though is that there are so many categories of vehicles, that most of the averages are only based on a handful of cars, this is always going to allow the extremes to influence the average more than they should.

Oliver.

Jay2TheRescue 09-04-2014 02:26 PM

After looking at that fuel log, and comparing the erroneous entry with others for the same vehicle, it was obvious that the user forgot to enter the decimal point when entering their gallons. I've corrected the error. If you see any more just use the report page function. I try to keep on top of those so they don't get backlogged like they have in the past.

RobertV 09-05-2014 04:29 AM

Quote:

Originally Posted by Jay2TheRescue (Post 179025)
After looking at that fuel log, and comparing the erroneous entry with others for the same vehicle, it was obvious that the user forgot to enter the decimal point when entering their gallons. I've corrected the error. If you see any more just use the report page function. I try to keep on top of those so they don't get backlogged like they have in the past.

Thank you!

evoblade 09-08-2014 03:49 AM

Quote:

Originally Posted by OliverGT (Post 179024)
You can't throw out the top ones, that would be me :(

I don't lie honest...

But seriously, you could remove the extremes to get the average.

One of the problems though is that there are so many categories of vehicles, that most of the averages are only based on a handful of cars, this is always going to allow the extremes to influence the average more than they should.

Oliver.

Oh, I'm not advocating throwing out anyones data, but not counting a certain amount of highest and lowest ones in the group average may prevent one guy who puts that he fueled up with 11000 gallons from ruining the average. Or maybe not so much the top XX number, but by %. If I suddenly get 80 mpg in my Jetta, it means I forgot to put in a fill up or typed the mileage in wrong. That wrong data point should not be used to calculate anyones average.

sea_king18 07-08-2016 02:06 PM

Sorry to bump an old thread, but there is a converse to this. I have the only manual transmission listed in my model year. The other cars are getting about 22 mpg, I get about 28-29. Every one of my fill-ups has been thrown out as an outlier.

1989 Saab 900.

RobertV 07-08-2016 02:23 PM

Quote:

Originally Posted by sea_king18 (Post 189654)
Sorry to bump an old thread, but there is a converse to this. I have the only manual transmission listed in my model year. The other cars are getting about 22 mpg, I get about 28-29. Every one of my fill-ups has been thrown out as an outlier.

1989 Saab 900.

What makes you say/think your data has been thrown out?

sea_king18 07-08-2016 03:04 PM

It hasn't been thrown out, that's a poor choice of words. On the chart that shows the distribution of mileage by fill-up, mine are all counted as outliers. I see the number of outliers incremented each time I add data.

The issue is that as the only standard in the group, I'm about 40% high and the algorithm flags the data.

14Corolla 07-08-2016 04:08 PM

1989? Gee... The computer knows that's when dinosaurs ruled the world.

sea_king18 07-08-2016 08:00 PM

Quote:

Originally Posted by 14Corolla (Post 189659)
1989? Gee... The computer knows that's when dinosaurs ruled the world.

That car is replacing my '65.... ;)

Jay2TheRescue 07-08-2016 09:05 PM

You only have 5 fillups so far. I think after you've been using Fuelly for a while, they will probably start showing up on the chart, and not be cast as an outlier.

sea_king18 08-02-2016 07:07 AM

Quote:

Originally Posted by Jay2TheRescue (Post 189664)
You only have 5 fillups so far. I think after you've been using Fuelly for a while, they will probably start showing up on the chart, and not be cast as an outlier.

I think this is true only if the application recalculates against the entire dataset for the vehicle each time a fillup is added. If it only calculates against the data points already flagged as good, then I'll just keep adding more and more outliers.

I assume that the application is using standard deviation to determine the outliers, but the problem for the '89 900 is that about 3/4 of the fillups come from a single vehicle and it's running 19.3 MPG (US). This vehicle should show a double peak (one for the standards at about 28 MPG, another for the autos at about 21), not a standard bell curve.

sea_king18 09-06-2016 07:49 AM

11 fillups now - all have been pitched from the dataset as outliers.

It looks to me as though any given fill is checked against the rest of the data at the moment of entry and, if flagged as an outlier, never checked again. This saves processing time but ignores any emerging patterns in the data.

The '89 Saab 900 has almost 10% of the data marked as outliers.

Is there a process to periodically reanalyze the data? If not, there should be.

Jay2TheRescue 09-06-2016 04:44 PM

There are links on the admin side to make Fuelly redo the math on your vehicle, but when I click on them, I get a no permission message. Maybe someone with higher admin permissions than me can try?

sea_king18 09-07-2016 09:18 AM

The other option is to manually set them all to 25 mpg so they register, then see whether I can increase them bit by bit back to 26, 27, 28, 29.... :)

Fortunately I keep my own record of data.

sea_king18 09-07-2016 12:51 PM

Quote:

Originally Posted by sea_king18 (Post 190893)
The other option is to manually set them all to 25 mpg so they register, then see whether I can increase them bit by bit back to 26, 27, 28, 29.... :)

Fortunately I keep my own record of data.

Just as well - that solution doesn't seem to work anyway. :)

RobertV 09-07-2016 09:37 PM

You're vehicle is being considered an outlier simply because of the lack of other users/vehicles that match yours mechanically. Even if just comparing the 2 vehicles that are "Hatchbacks", your 11 fuelups at 29MPG vs the other user's 240 fuelups at 20MPG make you an outlier.
The system isn't ruling your data as wrong, or invalid. It's simply just much higher than the averages of the other users with same year/make/model.
If, for example you had 100 more fuel ups, you'd move from being an outlier, statistically.

Maybe if our system allowed a filter for transmission type, that'd help in this situation. Something we'll need to look at in a future update.

Quote:

Originally Posted by sea_king18 (Post 190202)
This vehicle should show a double peak (one for the standards at about 28 MPG, another for the autos at about 21), not a standard bell curve.

It sounds like you have an understanding of what's going on!
If you just add (or rather, when you have) more fuel ups, the graphs will create that double peak... like seen here: https://www.fuelly.com/car/volkswagen/jetta/2015
It's also worth noting that the data generates fresh every hour, therefore no fuelup is an outlier until that hourly query says it is.

Quote:

Originally Posted by Jay2TheRescue (Post 190878)
There are links on the admin side to make Fuelly redo the math on your vehicle, but when I click on them, I get a no permission message. Maybe someone with higher admin permissions than me can try?

That won't affect what sea_king18 is asking about. Those are in place to refresh the profile page. Rarely needed/used though, as everything gets updated with Save/Edit.

sea_king18 09-09-2016 07:52 AM

So what determines an outlier? >2 standard deviations?

RobertV 09-09-2016 02:13 PM

https://en.wikipedia.org/wiki/Interquartile_range
Quote:

"The interquartile range is often used to find outliers in data. Outliers are observations that fall below Q1 - 1.5(IQR) or above Q3 + 1.5(IQR). In a boxplot, the highest and lowest occurring value within this limit are drawn as bar of the whiskers, and the outliers as individual points."

sea_king18 09-12-2016 10:24 AM

Quote:

Originally Posted by RobertV (Post 190908)

Maybe if our system allowed a filter for transmission type, that'd help in this situation. Something we'll need to look at in a future update.

The other option is that these cars are actually missing an engine type - they're all inline 4s, but only some are turbocharged.

Hersbird 10-02-2016 12:08 PM

There are lots of obvious errors that then get washed into a poor overall average.
Look at this for example 2011 Town & Country Ltd. (Chrysler Town & Country) | Fuelly
what should be a great set of data with supposedly 70,000 miles logged. The average seems great but start to look at the individual entries and you see a more normal 20 MPG. Go way back and suddenly 250 mpg tanks start showing up. Look at the best tank, 320 mpg. unless you are some kind of special prototype, or a scooter, any tank over 100 mpg should be thrown out.
I would bet any tank that is over 3 times the average of the model is some kind of error that should be flagged and not included in any overall averages.

ackiefer 02-01-2017 06:21 AM

Newbie here, hopefully not posting in the wrong thread. In my research for a new vehicle I stumbled upon this profile (Chevy Pickup (Chevrolet Volt) | Fuelly) which is responsible for over 1/3 of the fill-ups for the listed model however it is obviously a truck and not a Volt (given the Volt's fuel tank is less than 9 gallons and the entered fill-ups are for well over 20 gallons). It is another example of an "outlier" that is harshly affecting the average fuel economy for the group.

Jay2TheRescue 02-01-2017 01:58 PM

Quote:

Originally Posted by ackiefer (Post 192743)
Newbie here, hopefully not posting in the wrong thread. In my research for a new vehicle I stumbled upon this profile (Chevy Pickup (Chevrolet Volt) | Fuelly) which is responsible for over 1/3 of the fill-ups for the listed model however it is obviously a truck and not a Volt (given the Volt's fuel tank is less than 9 gallons and the entered fill-ups are for well over 20 gallons). It is another example of an "outlier" that is harshly affecting the average fuel economy for the group.

Thanks for bringing it to my attention. I changed the vehicle type to a Chevy truck, and then had the system ignore the two 100+ MPG fillups.

Unpaidbill 02-18-2017 08:37 AM

Somewhere, shortly after I started using Fuelly, I must have missed a fillup, since all of a sudden my mileage went over 100mpg. Is there any way to delete whatever happened to get the mileage more in line with real life? Page is at Fuelly - Track and Compare your MPG

Draigflag 02-18-2017 08:46 AM

Quote:

Originally Posted by Unpaidbill (Post 193145)
Somewhere, shortly after I started using Fuelly, I must have missed a fillup, since all of a sudden my mileage went over 100mpg. Is there any way to delete whatever happened to get the mileage more in line with real life? Page is at Fuelly - Track and Compare your MPG

Yes, you need to edit your 2nd fuel up, as you input over 1000 miles using just a few litre of fuel, showing as 160 UK MPG here. Are you using trip reading or odometer to track?

Unpaidbill 02-18-2017 09:55 AM

Using odometer.

Unpaidbill 02-18-2017 10:34 AM

I deleted ALL the fill-ups and will be starting over from 'scratch'.

larryd 04-06-2017 07:29 PM

There are two obviously incorrect vehicles that currently are responsible for more than 1/4 of the 2017 Honda CR-V miles in the system, yet show fuelups for several years. It seems the users involved have reclassified some older car as a 2017 CR-V. They are dropping the reported mileage in a significant way:
Van (Honda CR-V) | Fuelly
CRV (Honda CR-V) | Fuelly

larryd 04-17-2017 06:21 PM

Does anyone from Fuelly actually read this forum?

Take a look at the 2017 CR-V fuel-up statistics:
2017 Honda CR-V MPG - Actual MPG from 73 2017 Honda CR-V owners

You will see 2 bell-shaped curves side by side; the one on the left from the 2 cars that aren't 2017 CR-Vs, and the one on the right from cars that are.

If manual intervention isn't allowed, at least you could filter out fuel-ups from prior to the model year (or even 2 years prior to the model year) as an algorithmic way of removing bad data.

Etobian 04-18-2017 06:46 AM

I agree with Larryd. I was researching the 2017 CRV and noticed the obviously misplaced rogue vehicles. Can the moderators not clean up the CRV entries so the average MPG can be compared to other years?

trollbait 04-18-2017 10:40 AM

S report buttons on vehicle's profile page would be helpful.

hammertrack 05-23-2017 08:07 AM

Quote:

Originally Posted by trollbait (Post 194289)
S report buttons on vehicle's profile page would be helpful.



I agree, I found this "BMW" just now and was looking for a way to report it just so the data could be more trustworthy:
My Grand Cherokee (BMW 330Ci) | Fuelly

Jcp385 05-24-2017 04:49 PM

Goes both ways. I got an honest high mileage in my Civic, but as an outlier the algorithm tosses mine out as most are getting significantly less that I am, working for the mileage. Is what it is, I know I'm honest but there are folks who would fluff up their mileage or not pay attention to the numbers they're inputting.

trollbait 05-25-2017 05:14 AM

Then there are the people who get horrible mileage simply because of their regular driving cycle. A short commute in which the car can never fully warm up will drag down the efficiency of even the best hybrids.

There low fuel efficiency isn't do to the car. So that data is also tossed in order to not give the impression the car model is that bad.

When you drill down while researching a car model, you will eventually get an option to view the outliers.


All times are GMT -8. The time now is 12:19 AM.

Powered by vBulletin® Version 3.8.8 Beta 1
Copyright ©2000 - 2024, vBulletin Solutions, Inc.