o.je

Search, discover and watch videos

Overview

o.je is a video search engine with a mission to help you search, discover and watch videos you're interested in without having to manually search each site that has a video. o.je is under active development by New Frontier Web Solutions.

Webmasters / Site owners

As a webmaster or site owner, you may have noticed our web crawler, called "oje-videobot", visiting pages on your website. To help you get aquainted with this web crawler, we have listed the answers to frequently asked questions that you may have about it:

What is a web crawler?

A web crawler is a software program that is designed to automatically follow hyperlinks throughout the web to retrieve and index web pages so that it is possible to document websites for searching purposes. These crawlers harmlessly browse the web in the same way that you use a web browser.

Back to list of frequently asked questions

Why does o.je use a web crawler?

o.je makes use of website crawlers to gather information about videos available on web sites to build a relevant search index. o.je's web crawler has been designed to carefully follow industry standards (set by W3C and companies like Google, Yahoo and Microsoft) so as not to negatively affect websites that are crawled.

Back to list of frequently asked questions

Does the o.je web crawler observe the Robot Exclusion Standard (robots.txt)?

Yes, the o.je web crawler obeys the Robot Exclusion Standard, also known as the Robots Exclusion Protocol or robots.txt protocol. Specifically, the o.je web crawler adheres to the 1996 Robots Exclusion Standard (RES). RES is a method that allows website administrators to indicate to robots which parts of their site should not be visited.

The o.je web crawler obeys the first entry in the robots.txt file with a "User-agent" line containing "oje-videobot". If there is no such record, it will obey the first entry with a "User-agent" set to "*".

If the crawler is not able to retrieve a robots.txt file for the website, it will assume there are no web crawler restrictions in place. It will periodically keep trying to retrieve the robots.txt file and will obey its contents if it becomes available.

Visit the following resources for additional information:

Back to list of frequently asked questions

How do I prevent my site or certain subdirectories from being crawled?

The o.je web crawler will respect and obey commands in a robots.txt file that direct it not to index all or part of a given URL. If a page has robots.txt standards disallowing it to be crawled, the o.je web crawler will not read or use the contents of that page.

To prevent o.je web crawler from crawling a specific subdirectories, such as "/cgi-bin/" and "/private/", you can place the following into your robots.txt file:

User-agent: oje-videobot
Disallow: /cgi-bin/
Disallow: /private/

If you wish to prevent the o.je web crawler from crawling your entire website except for specific subdirectories, such as "/videos/" and "/films/", then place the following into your robots.txt file:

User-agent: oje-videobot
Disallow: /
Allow: /videos/
Allow: /films/

If you wish prevent the o.je web crawler from crawling your entire website, then place the following into your robots.txt file:

User-agent: oje-videobot
Disallow: /

Note: We ask that you contact us before blocking the o.je web crawler from your website, we can work with you to resolve any questions or problems which you may have with our web crawler. In such scenerios, we are open to revealing our production code to demostrate the quality of our crawler.

Back to list of frequently asked questions

How do I allow my site or certain subdirectories to be crawled?

If you have blocked all web crawlers but wish to allow the o.je web crawler onto your website, then place the following into your robots.txt file:

User-agent: oje-videobot
Disallow:

Back to list of frequently asked questions

Can I control the rate at which the o.je web crawler visits pages on my site?

Yes. The o.je's web crawler supports the "Crawl-delay: x.x" robots.txt directive. Using this directive you can specify the minimum delay between successive crawler requests, where "x.x" is the delay value.

For example, a robots.txt rule to set a Crawl-delay of 5 seconds for o.je's web crawler looks like:

User-agent: oje-videobot
Crawl-delay: 5

A shorter delay value of one and half seconds looks like:

User-agent: oje-videobot
Crawl-delay: 1.5

Back to list of frequently asked questions

Where do I put the robots.txt file?

The robots.txt file must be placed in the base directory of your website. For example, the site administrator for "www.example.com" would place their robots.txt file at "http://www.example.com/robots.txt".

Back to list of frequently asked questions

How can I tell if the o.je web crawler has visited my site?

You can determine if the o.je web crawler has visited your site by checking your server logs. The o.je web crawler, called "oje-videobot", uses the following web browser user-agent string:

Mozilla/5.0 (compatible; oje-videobot/1.0; +http://o.je/help/videobot)

o.je's crawler respects industry-set best practices by using an unique user-agent as well as providing a "From:" HTTP header so you can tell it apart from your website users and other crawlers.

If you are technically inclined, then you may find the following also useful in detecting our web crawler (it is a copy of a sample HTTP client request used when retrieving "http://www.example.com/sample.html"):

GET /sample.html HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0 (compatible; oje-videobot/1.0; +http://o.je/help/videobot)
From: Email address for o.je crawler support
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Accept-Encoding: gzip,deflate
Accept-Language: en-us,en;q=0.7,es;q=0.3
TE: deflate,gzip;q=0.3
Keep-Alive: 300
Connection: Keep-Alive, TE

Back to list of frequently asked questions

How did the o.je crawler find my site?

The o.je web crawler finds pages by following links from other pages as well as following links manually provided to it. It focuses on links that point to pages that may contain videos or have information about videos.

Back to list of frequently asked questions

How frequently will the o.je crawler download pages from my site?

The o.je web crawler crawler will download only one page at a time from your site. After it receives a page, it will pause a certain amount of time before downloading the next page. This delay time may range from 0.1 second to hours. The quicker your site responds to the crawler when it asks for pages, the shorter the delay. See question "Can I control the rate at which the o.je web crawler visits pages on my site?" for additional information.

Back to list of frequently asked questions

Why would I see repeated download requests from the o.je web crawler?

In general, the o.je web crawler should only download one copy of each file URL from your website during a given crawl. Occasionally the crawler is stopped and restarted, and it re-crawls pages it has recently retrieved. These re-crawls should happen infrequently and should not be any cause for alarm.

The o.je web crawler also checks for changes to the robots.txt file fairly often so that any changes to the robots exclusion rules are applied promptly to make sure that the website administrator's wishes are respected. This behaviour is industry-wide and is used by web crawlers run by Google, Yahoo and Microsoft.

Note: We ask that you contact us before blocking the o.je web crawler from your website, we can work with you to resolve any questions or problems which you may have with our web crawler. In such scenerios, we are open to revealing our production code to demostrate the quality of our crawler.

Back to list of frequently asked questions

Does the o.je web crawler support HTTP compression?

Yes. The o.je web crawler supports both the "gzip" and "deflate" standards-based compression algorithms. See sections 14.11 and 14.39 of RFC 2616 for additional information.

Back to list of frequently asked questions

Does the o.je web crawler support conditional gets?

Yes. The o.je web crawler supports conditional gets as defined by sections 14.25 and 14.26 of RFC 2616. As per the industry standard, o.je's crawler will generally not redownload a page unless it has changed since the last time it was crawled. The crawler uses the "If-Modified-Since" HTTP header combined with the "If-None-Match" HTTP header if the last known page ETag value is available.

Back to list of frequently asked questions

Does the o.je web crawler support cookies?

Yes, the o.je web crawler supports HTTP cookies.

Back to list of frequently asked questions

Contacting us

If you experience any problems with the o.je web crawler ("oje-videobot") or wish to ask a question then do not hesitate to contact us via one of the following methods:

Contact form:
We hate spam too (privacy policy)
Verify your submission (type in the words below):
Complete the form
Email:
Email address for o.je crawler support
Voicemail & fax:
workvoicemsg +44 (0)709 2300 838
Postal address:
c/o: New Frontier Web Solutions, 14 Clumber drive, Weston Favell, Northampton, Northamptonshire. NN3 3NX United Kingdom

Note: Please make sure to include your contact details with your enquiry if you would like us to respond back to you.