Why Do We Need Structured Data?

by: Lauretta Shokler – Director of Search Strategy at Valet Interactive a division of Worldwide Revenue Solutions

Since the evolution of the web, the information published on websites has largely been unstructured, meaning it has no formal organization or classifications into predefined-models. In addition, past limitations in the technologies that provide infrastructure and access to the web (software & hardware) forced search engines, and for that matter searchers themselves, to oversimplify language in order to index and efficiently search content. This led searchers, search engines and content publishers to focus heavily on literal character matches of words and short phrases (keywords) rather than understanding and communicating the meaning and context behind the words. This situation led to two key problems:

  1.  Search results were often random or irrelevant
  2. Content publishers could easily manipulate/game the system to drive more traffic to their site even when their site is not a good quality source for the searcher’s query.

Advancements in hardware and software that allow faster processing of data, new technologies that allow more natural inputs for queries and the need to provide higher-quality search results have recently opened the door to a semantic search model. A semantic search model involves structuring and cataloguing the data on the web so that both the searcher’s intent and context can be factored into the results shown. Much of this work is being done by the search engines themselves as they build better algorithms and more complex and relational indexes.

However, the search engines also know that this evolution can happen faster and be more accurate if everyone – content publishers, searchers and the search engines themselves – work together toward the common goal.

semantic search definition

Definition of semantic search

One key element of that process involves taking the unstructured data that exists on the web in the form of website content now and providing more context for it. This is the challenge currently facing web publishers, and the solution is looking more and more like a tool called Schema.org.

What is Structured Data?

unstructured to structured data

While the concept of structured data can be a difficult and technical topic, understanding the uses and benefits of this web vocabulary is incredibly important for anyone or any organization that has a website. The first step in developing that understanding is to define structured data and Schema.org and how they are used.

Structured data is basically a way to relate entities (people, places and things) to each other. For websites, structure data can serve as a bridge between what information visitors see and what data robots scan. Many types of structured data have been used throughout the years including the resource definition framework (RDF), Microformats and Microdata:

  • RDF was designed to be the standard model of machine readable attributes that allow applications to share data across the web.
  • Microformats allow publishers to identify topics and related information in a way that are easily read by software such as search engine bots.
  • Microdata is a HTML5 specification that, like Microformats, uses a pre-determined vocabulary that can be used to identify the topic and details of that topic on a web page/site, but does so using a hierarchical extensible structure.

According to a US crawl of Bing search results in January of 2012, 31% of webpages and 5% of domains contain some type of structured data. RDFa was the most common format used at 25%, with 7% using Microdata and 9% using Microformats.

 What is Schema.org?

Schema.org logoSchema.org is a form of Microdata. Announced in June 2011 as a collaborative and jointly supported initiative between Google, Bing and Yahoo! (now Yandex), Schema.org builds and supports a universal set of structured data markup rules that website publishers can use to identify entities and relationships between entities in their content. It’s also a part of the larger concept of the semantic web which aims to extend the standard data formats of the World Wide Web to other applications and technologies.

The goal of Schema.org is to be a singular resource for website publishers who want to help search engines and other applications better understand their information, products, services, etc. Google’s Webmaster Tools blog has more about Schema in their post titled “Introducing schema.org: search engines come together for a richer web

schema.org hierarchy

Example of Schema.org hierarchy for “Things”

How is Schema.org Used?

 The Schema.org site provides a collection of categories that can be used to markup pages in a way that major search providers recognize. By inserting schema markup into html div/span/header tags, a web publisher can provide search engines specific information about their page and site such as:

  •  What type of business they are and where they are located
  • Details about a blog article like the title, author, subject, etc.
  • What upcoming events they are promoting along with when and where they will take place
  • Product information such as price range, description, brand, color and more
  • Recipe information such as ingredients, ratings, cooking time, nutrition etc.

Schemas are organized by ‘types’ and contain an associated set of properties. The types are arranged in a hierarchy with the high-level categories below.

To implement schema or other Microdata, specific tags are inserted around key data in the HTML. For example, if a restaurant’s address is displayed in the footer of their website, that segment of text would be surrounded by the street address, postal code, region, and other markup tags that are part of the postal address schema. For specifics on how to implement schema markup in HTML, see:

Some digital marketing agencies can also help site publishers incorporate schema mark-up into their site.

What are the Benefits of Using Schema Markup?

Right now one of the key benefits for implementing schema markup are search result rich snippets. Rich snippets are enhanced search snippets that include extra graphical elements or text that provide more detail about that web page, PDF or other web asset. Right now rich snippets are a good way for a site to stand out in search results, but as the adoption of structured markup grows, it will likely become more and more essential for a site to have these enhancements. Rich snippets can increase click-through-rates on search results by as much as 30%. While not all schema mark-up is expressed as rich snippets, below are a few examples of schema types that do and how they look in search results:

 

Author/Publisher

This schema identifies the person (or organization for publisher mark-up) who authored the content and produces a rich snippet that contains a photo of the author and information about their Google+ profile on Google.

Image of Authorship Rich Snippet

Author Rich Snippet

 

Star Ratings

The star rating schema displays a star graphic in a search snippet to indicate the overall rating of a business, movie, recipe, etc.

Image of Star Rating Rich Snippet

Star Rating Rich Snippet

 

Business Type and Location Schema

Local business and postal address schema tags clearly identify a business and its location for features like the Knowledge Graph and cards in Google Now and Google Glass.

Image of Google Knowledge Graph

Example of Google Knowledge Graph

 

Navigational Breadcrumbs

The breadcrumb markups produce extra links to higher levels of the site’s navigation that led to the page showing in the search engine results.

Image of Breadcrumb Rich Snippet

Example of Breadcrumb Rich Snippet

 

Events

Event markup takes details on a web page about events, their location, time etc. and displays them in a table format in the snippet.

Image of Event Rich Snippet

Example of Event Rich Snippet

 

Video

Video markup shows data like the length of the video a preview screenshot, who uploaded it, a star rating, etc.

Image of Video Rich Snippet

Example of Video Rich Snippet

 

Google Carousel

This is an enhanced display feature that shows a rolling list of hotels, restaurants, musicians and other entities that are catalogued in Google’s knowledge graph when certain searches are entered. For local businesses this is primarily done through Google’s Business Places/Google+ Business pages.

Image of Google Carousel for Hotels in Chicago

Example of Google Carousel

Aside from rich snippets, verification is another benefit of using schema markup. By providing structure data that matches the content and topic of a site as well as any site data feeds or APIs, search engines can verify what a site is about by comparing it to what their crawlers and algorithms think they are about. While the may or may not be a ranking factor in search results right now, the possibility for this to potentially increase trust factors for a site and affect rank is not a far-fetched in concept.

Website publishers need to think beyond rich snippets and rank, however, and consider the larger potential for structuring their site’s data. Search engines are moving toward becoming answer engines. To accomplish that, they are pushing for a more semantic web. Google in particular is taking the lead in this realm. They recently launched their new algorithm Hummingbird which is said to be more semantic search oriented and less keyword focused. This fall they also encrypted keyword data for security reasons, but this move may also be a way to push marketers to focus less on keywords and more on entities and topics. All these changes indicate that sites utilizing structured data markup will be the first to benefit from future advances not only in search but also other technologies and applications.

As noted in the article “Future SEO: Understanding Entity Search,”

Implementing semantic markup on your site will make your business data  machine-readable to search engines, Web applications, in-car navigation systems, tablets, mobile devices, Apple maps, SIRI, Yelp maps, Linked Open Data , etc.

Semantic markup presents your business data as chocolate to the search engines — they love it and eat it up! Search engines understand it thoroughly and know how to aggregate the data for a better user experience in their SERPs. While search engines use structured data to display more relevant search results, you benefit because it’s known to boost CTR.

What are the Risks or Disadvantages to Using Schema Markup?

With so many benefits for implementing schema.org, website publishers might wonder if there are any risks to using it. Because it is integrated into the HTML scripting of a site, implementing the markup can be technically challenging and time consuming. This may prove to be an investment risk if the technology doesn’t take-off and provide benefits that outweigh the overhead of using it.

Search engines also created some initial confusion over whether multiple formats (for example having both RDF and Schema markup on the same web page) would be supported. For now, Google is supporting both, but how that plays out in the future is unknown.

Finally, the risk that unethical content publishers could spam a technology like structured markup could force applications and search engines to stop relying on the data. Deliberate misrepresentation, has already left a trail of abandoned or overrun technologies that were once successful tools for online marketers. Think Meta keywords, digital press releases, free software trials, email, etc.

Since all 3 of the major US search engines have made a commitment to Schema.org and evidence exists that it is already being used and benefitting the sites that implement it, this appears to be a calculated risk worth taking at this time for publishers willing to spend the time doing it.

 What are some of the Possible Uses for Structured Web Data?

The answer to this question is almost limitless, but a few examples of innovative uses already exists.

 Examples of Current Uses of Structured Data:

Potential Future Uses of Structured Data:

  • New or even well-established social channels could use structured data to add connections between people and the places and things in their lives. For example, Twitter might know that 2 people who tweeted they hope to meet someday are in the same place at the same time and alert them without them having to “check-in.”
  • Emails would be able to talk-to and understand relationships between topics mentioned in messages. For example, a flight verification email could provide weather forecasts, suggest destination hotels, dinner reservations, or local activities that are happening during the time the recipient will be visiting.
  • Comparative search results for products and services could be easily shown that don’t require a third party like online travel agencies, or product finders. For example, if all ecommerce sites use structured data to markup pricing, ratings, and other data about the same product, search engines could easily serve up sortable and filterable results to find the best retailer for making that purchase.
  • Use of WC3 Emotion Markup Language could be used to adjust marketing messages, social media posts served up, and even the timing of emails. Imagine how an aggregation of this data could identify trends and patterns in the collective moods of people by geo-location, socio-demographic, and other groupings and then used to the benefit of those groups of the organizations marketing to them.
  • Facial recognition software could be used to connect cameras to people, places, similar products, and more. This could potentially be used to provide a wealth of information about those people, places and things instantly to the person photographing them. This could work for technologies like Google Glass as well. Image how paramedics in an ambulance and detectives solving crimes could use something like this.
  • Calendars could follow news stories and alert users about disruptive or even potentially interesting happenings related to places they will be visiting at certain times and days.
  • A more distant technology might be one that can read a user’s thoughts and instantly provide information related to those thoughts and do something useful with it.
  • Imagine search ranks being influenced by the number of times a site’s structured data is tapped!

Truthfully, the most beneficial advancement resulting from the proliferation of structured data probably hasn’t even been imagined yet. It will come in the form of some connection being made that has never been considered before, but just works to solve a problem. In fact, many of those connections are likely to take place, followed by connections of those connections. Some people may feel somewhat unsettled about all this when viewed through the lens of personal privacy, and that will remain a concern that must be considered along the way. But the possibilities are far too numerous and valuable to not want to see structures data, semantic search, and language-based queries lead us to better future use for all the information the human race as built and documented during the last half century.

Where Can I Learn More about Schema.org, Structured Data and Semantic Search?

Below are some great articles on this complex but important topic.

Valet Interactive is full-service digital marketing agency specializing in the hospitality industry. Valet has been implementing a number of Schema.org and other mark-ups to client sites for almost 2 years including logo, location, publisher/author, offers, events, articles, Twitter cards, Facebook Open Graph tags, and many more.  We have made a substantial investment in this technology and continue to test more options on an ongoing basis. If your company is interested in working with an agency to add this markup to your site, we encourage you to submit a request for proposal today.