Skip to main content
Software

How an IT director deals with AI crawlers

Automated traffic made up 51% of all web traffic last year, according to a recent report.

Mouse arrows huddling towards a protective screen overlayed on search bars. (Credit: Anna Kim)

Anna Kim

4 min read

Websites across the internet are dealing with a big infestation problem, and not the kind that involves creepy-crawlies.

For the past six months, Edmund Balnaves, founder and CEO of Prosentient Systems, an Australia-based IT company that provides infrastructure and hosting services to more than 500 libraries, has spent his time warding off an influx of AI crawlers—bots that gather information from websites in order to train LLMs—from his clients’ websites. 

Balnaves said he has seen hundreds, sometimes even thousands, of crawlers visiting his clients’ websites from different IP addresses at the same time in a quest to harvest data. The crawlers come from a variety of sources, including Meta—which released its latest collection of AI models, Llama 4, earlier this month.

“They just built their latest, largest language model,” Balnaves said. “And we knew they were doing that because all of a sudden, there were hundreds of IP addresses.”

On top of the influx of site traffic, Balnaves added that automated visitors to his library clients have a tendency to overstay their welcome as they often go off track from their intended destination, something he attributes to library systems having “good discovery and search engines.”

“They do a search. They get a list of links. They follow that link. That goes on to another search, and they get completely lost,” Balnaves said.

Balnaves is one of many professionals turning their attention to AI crawler bots. A recent Imperva report found that automated traffic, which includes both “good” and “bad” bots, accounted for 51% of all web traffic in 2024, surpassing human activity for the first time.

Traffic jam. Shayne Longpre, a PhD candidate at the Massachusetts Institute of Technology, told IT Brew that the presence of AI crawlers started to gradually grow in 2020.

“It’s really picked up in the last year or two, especially as companies developing AIs have started to build their own crawlers and invest in their own crawling infrastructure, rather than using public repositories developed by nonprofits for a variety of purposes,” he said.

Top insights for IT pros

From cybersecurity and big data to cloud computing, IT Brew covers the latest trends shaping business tech in our 4x weekly newsletter, virtual events with industry experts, and digital guides.

Longpre said that the AI crawlers can cause a spike in traffic to a website, which often can result in additional expenses for their owners.

“If the crawlers are hitting you thousands of times daily, or tens of thousands of times, it can be expensive for your website to generate all the page content they need repeatedly, again and again,” he said.

Longpre added that the other challenge with crawlers is that the content they collect may be used in a way that will prevent people from needing to go to the original websites.

Ramping up defenses. In addressing the AI crawler invasion, Balnaves told IT Brew that his goal isn’t to entirely block information gathering conducted by AI bots. Instead, he gives the bots exactly what they need, nothing more.

“One of the things we’ve built is [a lightweight] metadata cushion, so that when they land on a page, you’re not giving them a fully rendered webpage,” Balnaves said, adding that this allows the bot’s impact to Prosentient’s services to remain small.

Ironically, Balnaves has tapped the help of AI to monitor the uninvited visitors.

“There’s an element of AI from our side to detect when there’s bots that are not behaving graciously and to start filtering them, or if we’re happy with them but don’t like the way they’re behaving, to sort of cushion their load,” he said.

Still, Balnaves said the task of patrolling the pesky bots is an ongoing battle.

“There are lots of techniques like that we’ve had to build and we spent, sadly, a lot of time doing it because that’s a distraction from actually building the systems for information delivery that we aren’t even delivering,” Balnaves said. “Instead, we’re sort of trying to throw up the walls from the barbarians.”

Top insights for IT pros

From cybersecurity and big data to cloud computing, IT Brew covers the latest trends shaping business tech in our 4x weekly newsletter, virtual events with industry experts, and digital guides.