A recent patent application from Yahoo explores ways that a search engine might consider the amount of time it takes different types of pages to render and other issues involving how quickly pages respond to a visits in ranking, classifying and crawling those pages.
Latency is a big fancy word that simply means the amount of time between when something was started and when you can see its effects. It’s a word that shows up very frequently in the Yahoo patent filing. It’s a word worth learning a little more about, especially when it comes to web sites, how people use them, and how a search engine might track that use.
A search engine may look at a wide range of information to make decisions about whether or not it will visit and index pages on the Web, how it might rank those pages in search results, and how it may classify those pages.
It’s likely that a search engine will consider at a wide range of informational signals. Those can include the content that appears on web pages, links and the text within links that point to and from pages, information about how people use specific web pages, and other information about pages and the sites that they appear upon.
A search engine might also look at how quickly pages load and render in a browser, how much people might tolerate when pages load slowly, and how good an experience web sites might deliver to their visitors.
When a search engine ranks pages in search results, it will explore signals that indicate how relevant those pages are to queries that might be used to find them, such as the use of words upon those pages that appear in those queries. A search engine may also look at signals that indicate the quality of the web pages that it might list within those search results.
A measure like PageRank is supposed to be an indication of quality rather than relevance, because it looks at the number and “importance” of links pointing to a page to try to determine how important a page might be. There are other quality signals that a search engine may use. Some examples might include things such as the amount of text upon a page, how readable that text is, if the page contains broken links, and possibly hundreds of other factors.
A search engine wants to return pages in search results that are both relevant and high quality.
Another set of signals or factors that a search engine may use involves how people interact with pages that they find on the web. These can include which pages people select in search results when they see them in search results for a specific query, how much time people might spend on a page they’ve selected before they return to the search engine, how far down a page they might scroll, whether they bookmark or save a page, and others.
User Experience Characteristics
The patent filing considers much more than just how quickly pages load into a browser, and it may influence more than just the rankings of pages.
It tells us about an information integration system that can be used with search engines, job portals, shopping search sites, travel search sites, RSS applications, and other types of pages, and how it might look at those in at least three different ways:
Access – How quickly it takes to access a page or other kind of document when sending a request to retrieve a page or document. Measuring access might mean looking at performance characteristics associated with a page such as server performance, and file performance. It might consider how quickly a page might load for visitors at different connection speeds, such as broadband and dialup. A search engine crawling program might simulate connections at different speeds to measure how quickly a page loads for visitors coming to a page through dialup or broadband connections.
Rendering – How quickly a page starts showing up within a browser (and it might emulate a number of different types of browsers), how a page loads in a browser, and how long it might take for the full page, or at least the part of the page above the fold to load in a browser. It contemplates that on some sites, some large pages might be set up so that even though they contain a lot of content, the content at the top of the page renders quickly so that a visitor doesn’t have to wait very long to start reading and viewing the content on the page.
It may also consider such things as “differences in complexity, size, number of files, user interface mechanisms, embedded sections (e.g., advertisements, audio content, video content, security features, etc), and/or the like,” to understand how a page renders, and how good of a user experience that might be.
User Experience – How do people actually use web sites, and how do they react to different access and rendering issues on different sites?
Different people might have different levels of patience in waiting for a site to load and render in a browser, and they might be willing to wait longer for some types of sites to load and render than others. For example, someone might be willing to wait longer for a page to show up that is associated with their bank account, than a for a “more generic” type of page.
Examples of other “user related performance characteristics” could include how visitors to pages react to things such as:
- Pages that fail to download or render within an acceptable period of time,
- Pages that automatically play video or audio content,
- Pages that include pop-up or pop-under advertisements,
- Pages that in some other way add further delays due to additional file downloading, additional processing, etc. These might include things such as Javascript, Flash, Embedded or externally links objects, and Plugins
How Measuring Latency and User Experience Might be Used
The inventors behind the patent application point to at least three uses that a search engine may have for measuring the performance of a web site based upon access, rendering, and user experience. They are ranking, classification, and crawling.
Ranking – The information collected about user experience characteristics could be used to possibly filter, promote, or demote web documents to improved desired user experiences.
Classification – The user experience information might be used to classify pages in some way. The layout of a page might indicate that a site might contain certain types of content related to certain types of sites. The patent application tells us:
For example, finance-related websites often display streaming data of the stock market, news websites also often stream content, and certain types of web pages might use frames or tables which may be useful in classifying the web document.
Crawling – When a search engine has a list of URLs to visit that it hasn’t seen before, or that it might revisit to check for new content, it might consider a number of different things in determining which to look at first. The user experience information might help making some decisions to look at certain content on pages that a search engine might not have considered before.
A search engine may simulate the amount of time it takes to connect to a page, the way and amount of time a page renders in a browser, and how people react to those times to influence how a page is ranked, classified, and how much of the page is crawled and indexed – including embedded material on a page such as javascript or flash content.
Leave a Reply