Good Data vs Bad Data 🗑️
Why garbage in = garbage out — and how bad data makes hilariously wrong AI
Good Data vs Bad Data 🗑️
The Golden Rule of AI
💡"Garbage in, garbage out" 🗑️
If you feed an AI bad data, you'll get a bad AI. Simple as that!
😂 What Happens with Bad Data?
Story 1: The Racist Chatbot
In 2016, Microsoft released a chatbot called Tay on Twitter. It learned from Twitter conversations.
Within 16 hours, users had taught it to say horrible things. Microsoft shut it down.
The problem? Bad quality, unfiltered training data!
Story 2: The Confused Self-Driving Car
A self-driving car was trained mostly on US roads. When tested in the UK (where people drive on the LEFT), it got very confused.
The problem? Not enough diverse data!
Story 3: The Hiring AI
Amazon built an AI to screen job applications. Most historical applications were from men, so the AI learned to prefer male candidates and downgrade women's applications.
The problem? Biased data!
What Makes Data GOOD?
| Quality | What it means | |---------|--------------| | ✅ Accurate | Information is correct | | ✅ Complete | No important bits missing | | ✅ Diverse | Represents all different cases | | ✅ Recent | Not outdated or old | | ✅ Relevant | Actually useful for the AI's task |
🎮 Real Example: Training a Dog Detector
Bad data 🗑️: Only photos of golden retrievers on sunny days
- Fails at: pugs, black dogs, dogs in snow
Good data ✅: Photos of 200+ breeds, all ages, all weathers, all angles
- Works for: ALL dogs in ANY situation!
💡 How Much Data Do Big AIs Need?
- ChatGPT: Trained on 570 GB of text (about 1.3 million books worth!) 📚
- Google Photos: Trained on billions of labelled images
- Spotify: Analyses 100 million songs
Data collection is one of the most important (and expensive!) parts of building AI.
Quick check
What does 'Garbage in, garbage out' mean in AI?