High quality data is important. Here’s why:
What bad data looks like
Check out the image on the right. Zoom in and look. DD (Dupont) drops 83% on Jan 8, 1988, then climbs 580% exactly one year later. Coincidence? No. This represents an error in google’s historical data.
Yahoo! finance has a different history entirely, with no big dip in 1988. That’s not to say Yahoo! doesn’t have it’s own problems, which they do. You will find similar errors in their data as well.
Why it is bad…
If you build a strategy or test an hypothesis on data like this, you’ll arrive at questionable conclusion of course. For instance, a Machine Learning algorithm will believe these jumps are real. And because the jumps are so significant they will “learn” to how to predict them and then discover odd-ball strategies to trade on them. In the end these strategies are vapor.
How to fix?
There’s no way around it, you have to pay for quality data. No one will take you seriously if you build strategies using Yahoo! scraped data. The standard (expensive) providers are Thomson Reuters and Bloomberg. Of these, I prefer Bloomberg, they make you feel less like they’re trying to rip you off, they just have a standard price.
On the inexpensive end, for price/volume data I really like premiumdata.net. They have non-survivor biased data back to 1985.