Guesses for Part 1

Guess for article 1:

To understand what this article is about, we found the top 3 articles that have the highest cosine similarity to the mystery article.

The three articles that had the highest cosine similarity were:

  • A. R. Rahman, an Indian musical composer, and his music
  • Online shopping on Cyber Monday, which is the monday just after Black Friday
  • Richard Gariott’s experience with private space travel

The cosine similarities were respectively:

  • 0.8070927869054605
  • 0.8047635531994215
  • 0.803327863916594

Using K-means clustering, we divided the articles into 11 categories: Technology, Money, Law, Environment, Health, Employment, World News, Politics, Entertainment, Violence, and Other.

This mystery article was classified in the Entertainment category.

Our guess for article 1 is that it has something to do with entertainment. It may be about music, tourism, or shopping.


Guess for article 2:

To understand what this article is about, we found the top 3 articles that have the highest cosine similarity to the mystery article.

The three articles that had the highest cosine similarity were about:

  • The impact of the credit crisis on world governments and banks
  • Patrick Dempsey with others bought the struggling coffee company Tully’s Coffee
  • Matthew McConaughey got married with Camila Alves

The cosine similarities were respectively:

  • 0.9665118691237833
  • 0.7595008813120413
  • 0.7163007338621993

This mystery article was classified in the Entertainment category.

Note that the cosine similarity for the first article is much higher than that for the other two articles. The first article is also an outlier in the entertainment category. This leads us to believe that the mystery article is also an outlier in the entertainment category.

Our guess for article 2 is that it has something to do with a financial crisis or recession. It also may have a lot of high profile names involved.


Guess for article 3:

To understand what this article is about, we found the top 3 articles that have the highest cosine similarity to the mystery article.

The three articles that had the highest cosine similarity were about:

  • The spread of Ebola virus and how it caused the cancellation of a football game
  • The spread of swine flu in Saudi Arabia and the symptoms of swine flu
  • How a restaurant was trying to prevent the spread of H1N1 virus in his eatery

The cosine similarities were respectively:

  • 0.9655711099949233
  • 0.9608629658468245
  • 0.9439563028414478

The mystery article was classified in the Health category.

Our guess for article 3 is that it has something to do with disease. Specifically, it might be about COVID or monkeypox.


Guess for article 4:

To understand what this article is about, we found the top 3 articles that have the highest cosine similarity to the mystery article.

The three articles that had the highest cosine similarity were about:

  • Greek police arresting the U.S. captain John Klusmire
  • Talks to end fighting between colombian rebel groups and the government
  • United Nations investigating North Korea’s human rights abuses

The cosine similarities were respectively:

  • 0.9595329059985429
  • 0.9475312792083215
  • 0.940227576284418

This mystery article was classified in the World News category.

Our guess for article 4 is that it has something to do with a world conflict. It might be about conflict between multiple countries or may be about internal conflict.


Guess for article 5:

To understand what this article is about, we found the top 3 articles that have the highest cosine similarity to the mystery article.

The three articles that had the highest cosine similarity were about:

  • Damage done by the Duck Lake Fire, a wildfire
  • Attempts to prevent the endangerment of the “heath fritillary”, a butterfly
  • Power restoration after New Jersey experienced Hurricane Sandy

The cosine similarities were respectively:

  • 0.9553339909982941
  • 0.9444377642330168
  • 0.9410271891129879

This mystery article was classified in the Environment category.

Our guess for article 5 is that it has something to do with the environment. It may be about natural disasters.

Classification for Part 2

Using K-means clustering, we divided the articles into 11 categories.

We determined by plotting the inertia of the clusters that ~12 clusters was optimal for K-means clustering.

The categories of ten of the clusters were very clear on inspection. These were Technology, Money, Law, Environment, Health, Employment, World News, Politics, Entertainment, and Violence. The categories of two of the clusters were not clear, so they were labelled as "Other". This resulted in 11 (rather than 12 categories).

More information can be found in our Clustering.ipynb or Clustering.pdf files.

These are attached in our additional info section and are in the github repo.

Bonus

The article that had the highest cosine similarity was about a bombing in Syria that killed civillians and soldiers.

Our guess for the bonus is that it has something to do with world violence and politics.

Built With

Share this project:

Updates