How to fix Chrome Crawler issues

If you have been experiencing issues when trying to crawl websites using the Chrome Crawler, please follow the steps below to diagnose and (hopefully) fix the problem.

Step 1: Confirm whether or not Chrome is working

You can self-diagnose the issue by checking how Chrome is running in Sitebulb in a different place - the Single Page Analysis tool.

This is located in the top navigation menu - head here and try a URL that you know works fine in the browser:

SPA Chrome Check

If it works, you'll see Sitebulb will have collected a bunch of data about the URL. If it does not work, you'll see a message like this:

Error message SPA

Try another URL from a different website, and check if you see the same thing. If you get another error, move onto the next section.

If the Single Page Analysis is indeed working, then please contact support and we will figure out how to resolve the situation.

Step 2: Restart your computer

It is amazing how many problems can be solved by a simple restart.

And yes, we know it's annoying to have to shut all your programs down and interrupt your work, but it's the most straightforward of these resolution steps, so please make sure you do it.

Once your computer has restarted, open up Sitebulb and head back to the Single Page Analysis tool and try one of those URLs again.

If it works...huzzah! You've fixed it. Now you can go back and run your audit again.

If it doesn't work, move onto step 3:

Step 3: Check your anti-virus isn't blocking Sitebulb

If you Google something like 'avg blocking chrome' you'll see how prevalent it is that anti-virus software decides to block a browser you use every single day - and this isn't even the headless version!

Anti-virus software can be both aggressive and inconsistent, particularly when it comes to something like headless Chromium, which certainly CAN be used in malware or adware (even though Sitebulb absolutely does not do anything dodgy).

In order to check this you will need to go into your anti-virus settings and find 'Blocked Apps' (or similar), like this in AVG:

Blocked Sitebulb app

Or you might find it in quarantine:

Sitebulb in quarantine sad face

If it is, then remove any blocks, and add the entire Sitebulb folder as an exception:

Sitebulb folder added as exception

This should then look something like this:

Sitebulb added as anti-virus exception

This should stop Sitebulb from being targeted by the anti-virus software in future. HOWEVER, in our experience, one of the most common things that anti-virus software does is actually delete the installed Chromium .exe file out of the Sitebulb folder.

So before you proceed, reinstall the latest version of Sitebulb. And do not worry, you will not lose any of your old audits or anything - this is just like applying an update.

Once you have reinstalled Sitebulb, open it up and head back to the Single Page Analysis tool and try one of those URLs again.

If it works...huzzah! You've fixed it. Now you can go back and run your audit again.

If it doesn't work, move onto step 4:

Step 4: Check your firewall isn't blocking Sitebulb

Similar to anti-virus, your firewall could be blocking Sitebulb from making outgoing connections (which it needs, in order to crawl websites).

Check that Sitebulb is in your 'allowed' list:

Sitebulb allowed in firewall

If not, add it as an allowed app. Also check that port 10401 and 10402 are allowed - as Sitebulb needs these ports to communicate.

Once you have adjusted your firewall settings, open it Sitebulb and head back to the Single Page Analysis tool and try one of those URLs again.

If it works...huzzah! You've fixed it. Now you can go back and run your audit again.

If it doesn't work, move onto step 5:

Step 5: Contact Sitebulb support

If you've tried everything listed above, but Sitebulb STILL will not crawl properly, it is probably something we have never seen before. In which we'll need to work with you to get to the bottom of the issue (which we will!).

Please email [email protected] and provide the following information:

We'll look into it and figure out what we need to do to make it work!

Background: why does this happen?

This next section is purely informational, but might help you understand a bit better what is going on.

Sitebulb's Chrome Crawler uses the latest stable version of Chromium, in headless mode, which allows it to closely mimic the way that Google renders web pages.

However, using Chrome in this way can occasionally cause some red flags to anti-virus software or firewalls, who mistakenly class Sitebulb as some sort of trojan or adware. Or, more specifically, it is the headless Chromium that they think is nefarious, and take steps to block it, like this:

IDP Generic

A totally generic 'threat' - they don't know what it is but they suggest blocking it anyway. Le sigh.

It's also not always as clear cut and obvious as this, as often your anti-virus will take steps in the background to 'protect' you, so we must remain vigilent!

Sitebulb error messages and warnings

It is quite easy to spot when Chrome does not work propely. Quite simply - you won't be able to crawl with Chrome properly! You might see audits that look like this:

1 URL error

Not a great start to an audit!

Since v5.6 we have actually added checks and warning messages to various points in the auditing process:

On the Project setup page

Sitebulb will check that Chrome is running ok when you go to start a new Project, and if it isn't you will see this message:

Chrome not installed

On the Audit setup page

Chromium might be installed ok, but then get blocked at the point in which Sitebulb tries to do something with it - for instance during the pre-audit. If this is the case, you would see a fugly error message on the Audit setup page:

Chrome Failed Error message

On the Audit Overview

You might be able to get this far and still progress to actually running an audit, and then experience the failure, in which case you'll see a message like this:

audit-overview-error-message