UFO Sightings Project
The Relationship Between UFOs, Location, Time, and Human Emotion
Link to project repository on GitHub
This project is inspired by the project that sparked my interest in data analysis. The original project was done in R and it remains my favourite school project to this day. My professor even used it (and still uses it) as an example of a good project when teaching this class.
In this rendition of the project, I took the same UFO Sightings database and re-cleaned and re-analyzed it with my new skills. This project has two parts
- Part 1: General Data Cleaning in SQL
- Part 2: EDA and Data Visualization
- Analysis thinking and problem-solving
- Researched conclusions (and fun speculative conclusions)
- Text analysis
- Location analysis
- Time analysis
- K means clustering
- Density calculation using Gaussian Kernel Density Estimation (KDE)
- Sentiment and emotion analysis (tokenization, removal of stopwords, speech tagging, stemming, lemmatization)
Summary of Findings
The full summary can be found at the bottom of my project file.
Time
Sightings and Reporting Over the Years
- Over time there is an increase in UFO sighting rates, likely due to internet access, pop culture events, and conspiracy theories.
- Over the years reporting rates fluctuate. Spikes, such as around 2012, may be due to specific events like the end-of-the-world theory.
-
Findings:
- Large spike in sightings and reportings after 1995.
- Significant activity in Canada (2003-2007).
- Spike around 2012.
Reporting Delays
- Most post about an encounter immediately.
- Reporting delays differ by country. Germany takes the longest, the UK the quickest. The US takes surprisingly long.
- Different UFO shapes affect reporting delay. Distinct shapes prompt quicker reports; general shapes take longer.
- Strong emotions like happiness, fear, and anger lead to quicker reports.
-
Findings:
- Fastest to slowest reporting time (Countries): UK, Canada, Australia, US, Germany.
- Fastest to slowest reporting time (Emotions): Angry, Happy, Fear, Sad, Surprise.
- Distinct shapes were reported fastest; general shapes were reported slowest.
Location
Sightings on a World Map
- Sightings are concentrated in first-world countries, especially along coastlines.
- Areas where sightings are sparse show clusters of sightings.
-
Findings:
- Aliens prefer coastal areas.
- Clustering suggests aliens are real.
Sightings vs Time of Year (Northern and Southern Hemisphere)
- Sightings are most frequent in the northern hemisphere during June, July, August, and September.
- Sightings are most frequent in the southern hemisphere during January, March, June, and December.
-
Findings:
- June is the most popular month for sightings.
Human Emotion (Includes Sentiment)
Location and Sentiment
- Negative and positive sentiments are more concentrated in popular areas; neutral sentiments are more spread out.
- Sentiments are distributed similarly across different countries.
-
Findings:
- Most to least popular sentiment: Neutral, Positive, Negative.
- UK and Canada have the highest positive sentiments.
- Australia has the highest negative sentiments.
Location and Emotion
- Emotions appear randomly spread out on a world map, with clusters of surprised people in the US and Europe.
- Happiness is the most popular emotion in every country; surprise is the least popular.
- Emotion are distributed similarly across different countries.
-
Findings:
- Emotions ranked: Happy, Scared, Angry, Sad, Surprised.
- Happy: Most in Germany, least in Australia & US.
- Fear: Most in the US, least in the UK.
- Angry: Most in Germany.
- Sad: Most in Canada.
- Surprise: Least in the UK.
Feelings about UFO Shapes
- Most shapes elicit neutral sentiments; diamond shapes are an exception.
- When it comes to shape and emotions, the results are consistent with sentiment findings; flashes elicit neutral reactions.
-
Findings:
- Shape "light" has the highest positive sentiment.
- Diamond shapes are likely to get positive reactions.
- Teardrop shapes elicit mixed reactions.
Comment Analysis
- Light, object, and sky are the most used words to describe UFO encounters.
- The top 5 words are similar across countries.
- The top 5 words reveal more information about UFOs. They provide insights into the color and motion of UFOs.
-
Findings:
- UFOs often move quickly.
- Common colors: orange, white, red.
- Sightings often occur at night.
- UFOs can be large.
Recommended Next Steps
- Investigate more detailed temporal patterns, such as sightings during specific holidays, weekends, or significant cultural events.
- Implement machine learning models to predict the likelihood of sightings based on historical data and external factors.
- Use clustering algorithms to identify and categorize different types of UFO sightings.
Database Information
Database: https://www.kaggle.com/datasets/NUFORC/ufo-sightings
Each row represents an instance of a UFO observation. It has the following columns:
- datetime: date that the observation took place (year-month-date hour:min:sec)
- city, state, country: each column with respective information on the location of observation
- shape: the reported shape of UFO
- duration (seconds), duration (hours/mins): duration of observation (respective time units)
- comments: comment description of observation
- date posted: date the event was posted/reported
- latitude: latitude of the sighting location
- longitude: longitude of the sighting location