The lost credibility of .edu domains

NOTE: this was originally written in 2022, but never finished. This was updated and published in 2023, but some links are now out-of-date, etc.

It’s quite difficult to find real, credible, information on the internet nowadays.

Something that has made this easier (for me, at least) has been to search specifically for content on .edu domains, since that TLD is restricted to accredited institutions. This is as simple as prefixing your Google search with inurl:edu .

I’ve been doing this for a while with no issues, so you can imagine my surprise when I recently tried to find information about some “fat burner” pills that were being sold on Amazon and searched for inurl:edu nobi fat burner in the off-chance it was mentioned in some study/paper.

Screenshot of google search results showing .edu websites with advert titles, such as “5 Best Fat Burners for Women 2022”

Dang! If UMass Med, Penn State, and UC Santa Cruz are recommending these things, then they must be safe and effective.

However, just to do my due diligence, I should probably at least click on the first link and see what it says about-

Oh, that’s odd.

This doesn’t look like a university website?

Screenshot of page redirected to by .edu site. The page shows the “Fox News” logo and contains an article about a “No-Excercise Skinny Pill”. The page is hosted on afnethealthy.com

Maybe I clicked the wrong one, let me go back and copy and paste the URL.

Huh, a 404 page this time? Screenshot of the 404 “Not Found” page on the real UMass Med website Weird. Oh well.

If three university sites are recommending these entirely unregulated pills to me, then they must be good!

What’s happening here?

I decided to look a little bit deeper into the first search result.

First, I took the raw URL and curled it, but I got a 404. For some reason, it didn’t redirect to the spam site. Screenshot curl output of the URL. The site returned a 404 “Not Found”

My first instinct here was that something was checking if the client was the “expected” browser somehow. For example, it could be checking if the User-Agent, Referer, or Accept headers matches what would be sent from Chrome, TLS fingerprinting, etc.

Thankfully it was one of my first guesses. Adding the Referer of https://www.google.com/ did the trick: Screenshot curl output of the URL with the Google referer header. The site returned a 302 “Found” redirect to the spam site

Admittedly, at this point I was a tiny bit worried that this was something on my local machine, so I opted for the most optimal solution for checking: Sending my brother an unprompted discord message asking him to curl a suspicious URL with no context. Sadly it turned out that he did, in fact, want context.

How unfortunate.

A Discord chat between my brother and I. Ando(me):“could you curl this for me?”, slimsag(brother):“call?”

Regardless, I eventually convinced him to run it and confirmed it wasn’t an issue on my end.

Now at this point, I knew that it was happening on their server(s), and under which conditions it happened (coming from Google Search results), but I had no way of figuring out why it was happening. (e.g. something in their reverse proxy, some PHP code, etc)

The only indicator I had was the very unusual haircki cookie that was being set when visiting the regular page. It could technically be legitimate, but it seemed extremely out-of-place for a uni website.

By sheer luck, a sourcegraph search turned up a result for haircki, which lead me to a GitHub repo containing WordPress malware samples: sourcegraph.com search results for “haircki”, showing multiple GitHub repositories containing the string

https://github.com/stefanpejcic/wordpress-malware/blob/master/01.06.2021/index.php

After briefly looking over that sample from another site, I feel confident in saying that it produces the same behavior of search engine link hijacking that I’m seeing on the umassmed.edu site.

The answer here turned out to be less interesting than I thought, just another case of websites running vulnerable versions of WordPress and/or WP plugins.

How widespread is it?

Given what I’ve shown so far, you would be justified if your first comment was:

“You’ve only provided three example sites, I’d hardly call that representative of all .edu domains”

As such, here are the following institutions I’ve found with compromised sites / pdf hosting at the time of writing (2022-09-07):

Compromised sites:

Compromised PDF hosting:

Update (2023-07-29):

Those are just hackers though

“Okay, sure, the sites were hacked. That doesn’t make the universities less credible. That’s just dumb.”

You know what, fair. Maybe it’s unreasonable to say that all .edu sites are non-credible because a few of them had poor maintenance/security. This was not something that the universities would ever willingly advertise, and it was out of their control.

In fact, while looking into these compromised sites, I found a great paper accessible via Penn State University’s CiteSeerX instance: Screenshot of the “Fighting Parasite Hosting: Identifying and Mitigating Unauthorized Ads on Your Webserver” paper by authors: Shuang Hao, Adam Arrowood, Xinyu Xing, Nick Feamster https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.711.1013&rep=rep1&type=pdf

In this paper, they describe the exact behavior of these compromised sites only accessible via certain IPs/User-Agent header(s)/Referrer header(s).

It’s great to see this paper on accessible via Penn State University’s domain, as I found no instance of this malware on their site while searching.

It’s only slightly unfortunate that there was no need for any malware, as the fat burner pills are advertised in the Penn State University Collegian paper anyway:

https://www.collegian.psu.edu/studentadvice/5-best-fat-burners-for-women-2022-female-fat-burning-pills/article_6082d0f4-7931-11ec-9b80-a3411d9937dd.html Penn State University Collegian paper article advertising the “5 best fat burners for women”

Oh well.