Machine Learning for a Better Search

I wanted to expand more on the comment I made earlier on my micro-blog about how to build a better search function because the more that I think about it, the more I believe that this addresses one of the Internet’s biggest problems right now.

We went from limited information before the digital age to endless information a few decades in, but now what we really need to focus on is putting the right information in front of people.

Or, as my micro example cited – it should be easier to find the source of a topic than it is to find commentary about that topic.

And as if grading your sources wasn’t difficult enough, I’m going to throw one more curveball into the mix – you can’t blacklist an article based on its publisher, with my thought process here being simply that sure, 95% of what places like Fox News and Breitbart post is absolute garbage, but…

We want everyone to use and rely on this new search method and people aren’t as likely to jump onboard if their favorite sources, damned as they may be, are automatically excluded from the mix.
But more importantly, even if 95% of what someone writers is pure drivel, we want to encourage that remaining 5% to rise above the rest because that’s how you change opinions.

Now most of this is well beyond my level of expertise, but I know that there are methods in use today to determine “the quality” of a body of text based on sentence structure, vocabulary, etc… The question is, how can we expand on that logic to categorize stories based both on quality as well as what they bring to the table. Because hey, there’s a lot of opinion on the Internet and I certainly don’t want to discount that – I’m just saying that when somebody searches for a topic, they should be presented with facts first and editorial second.

It gets even trickier when you don’t have a fairly clean example like the one I used – even with regards to the White House Correspondents’ Dinner, there were multiple videos that contained the full speeches from the dinner … some were censored, some were from different outlets … but what about when it’s not even that cut and dry?

A video of President Trump saying XYZ would be the most accurate source, but if instead you have news reports sharing what it was that he said – and possibly some with more/less context or fact correction in their articles – then that becomes very subjective to try and decide which one did the best job of reporting XYZ that then deserves to be at the top of the search results.

I kind of have a love/hate relationship with Google these days because I know that they’re trying to filter out the literally billions of pages on the Internet, and they do say that they look at things like user experience and reblogging to help rank their results, but at the same time I still see those hideous, clickbait ads from Taboola and Outbrain on some of the biggest websites seemingly without penalty.

How does a search engine remain independent while trying to sort relevancy as well as fact from fiction, alongside people constantly working to game the system to get their garbage to float to the top to make the ad bucks???

Maybe it’s time to learn a thing or two about machine learning and get to work on this… 😉

Leave a Comment Cancel reply