SEOPDXSEO ServicesSEM Servicesemail MarketingSEOSDSEO Web Design
Free ReportsFree Video SEO Site ReviewSubscribe to RSSemail SEOPDXSitemap


The SEO Blog

Search Engine Optimizician

SEO to the Max – Search Engine Optimization for a PDF

Posted • September 28, 2009 • 5 Comments

SEO to the Max - Search Engine Optimization for a PDF

Think You Have Done Everything You Can Do for SEO to Your Website? Think Again! Have You SEO’d That PDF?

SEO your PDF filesWe have covered SEO for your images and now we are going to SEO our PDFs.
Good SEO is like adding as many feathers to the balance scale as you can. Google, Yahoo!, Bing, and the rest of the search engines are always looking for the content you have. Why don’t you make their job as easy as possible? They will reward you with higher search engine rankings.

Keys for PDF SEO:

  • Use keyword research to help choosing the right words and/or phrases to use in your PDF’s info
  • Check your PDF file’s description before uploading
  • Be sure to use titles that have meaning to searchers in PDF files if you do nothing else
  • Use structure and proper tags to PDF files to improve the quality of usability and search results
  • Make sure that your uploaded PDF files are as small as possible, to maximize the chance that search engines will index the document
  • If you’re uploading a scanned document, be sure to run OCR on them first

(Post from Search Engine Optimizician.)
Searches for content will turn up of PDF files in the search results. Google and the other search engines consider a PDF document is just another web page, so search engines will index your PDF files. PDF is a factor in website SEO particularly if your PDFs are an integral part of your sites content.

Take a close look at the PDF file listings in your search results pages. This is how people decide whether or not to click onto your site. What is the meta description for your PDF?
pdf search listings for SEO
To focus in on PDFs in your search results? Just add “filetype:pdf” to your search using Google, Yahoo or Bing.

Viewed in search results, many PDF files appear raucous and unprofessional. When using a PDF in a website you should take the time to make PDF files work properly with the search engines. Optimized PDF files will get more clicks and make it easier for users looking for that file in the future.

How do the PDF files on your own website look today? Google’s Advanced Search makes it easy. Testing your own website is really easy just enter “site:yoursite.com filetype:pdf” into Google (replacing “yoursite.com” with your URL). So how do your PDFs look to Google?

SEO your PDFs

Every PDF needs a quality relevant title.

First, Google looks in the “Title” field for guess what? The title. If there is none, Google then tries to make a guess at the document’s title by scanning the text on the first few pages. This will usually produce the incorrect and improperly formatted results that are prevalent in the search results.

If there is text i the title field this text will be the title of the document whether it makes sense or not. This is where titles such as “Stories” book – full v2 and Figure 1: SERP for Pizza come from.

Be sure your PDF files’ document information fields represent what your document is really about.

To check a PDF file’s Title information in Acrobat, type a Control-D, or by: File > Document Properties menu, then click the Description tab, where you can add or correct PDF title, author and other metadata as desired.
PDF document propertiesWhile it’s really simple for PDF authors to include a meaningful title, search results clearly show that there are title fields left empty. As in the above examples, many of the PDF authoring applications place the filename in the Title field, providing a search-results with no help at all.

PDFs for the Web – Quality Relevant Title & Metadata is Necessary

If you make sure that each and every one of your PDF files contain a valid, meaningful title you are using the single easiest way to make sure that search results will display the information that’s important to the searchers.

Early 2006 Google couldn’t even index PDF files above the 1.5 specification. Since July 2009 Google will index content from even Adobe’s latest specification version 1.7 Adobe Extension Level 3, Acrobat 9.

I’d bet a pint search engines won’t index every word in every PDF file. Recent as 2006 Google didn’t even index PDF files larger than a couple of megabytes. Today Google indexing text from PDF files of up to 10MB. Much larger than that and Google simply ignores the PDF file completely.

Of course, it’s also possible that Google’s indexing of PDF files is based on the time required to download the file from a particular server. If this is the case faster websites with larger pipes will tend get more of their large PDF files indexed than the same size files would get on a slower server.
pdf optimizerJust like web pages if your PDF file/files are very large Google and the other search engines may not index all of them. There is a chance they will abandon the process and leave out some of the pages of your document. One way to check is to search for a string of text towards the end of your document and see if it turns up in the results.

Beginning late 2008, Google began to OCR image-based PDF files as it downloads them. Even plain scanned pages will be searchable, even if they weren’t created that way. Google’s OCR isn’t the greatest, it’s created for speed rather than accuracy. You can do this yourself, and increase search results, by quality controlling your own OCR, then posting the results.

There are many reasons to secure your PDF files against unauthorized changes, or to not allow the extraction of your content. If not done properly, it’s possible to block the search engines from indexing the text in your secured document. To ensure that secured PDFs are searchable, be sure to check the “Enable text access for screen-readers” box when encrypting your files. Additionally, when selecting in Acrobat 6.0 compatibility or higher, be sure to “Encrypt all document contents except metadata” to ensure PDF metadata is available to the search engines.

When you take a close look at search results showing your search terms do you find odd spaced, duplicate and/or jumbled text? Check it with Google’s “View as HTML”. Does the text look bad? Disjointed paragraphs, headings demoted to text and tables, columns and sidebars hopelessly confused are a sign of a poorly formatted PDF document.

If you have any concern about how search engines retrieve and display search results from your website and PDFs you should follow the rules of accessibility set forth by Section 508 standards for web content. You might as well plan to get familiar with reading order in PDF content and tagging the structure information including headings, lists, tables, and etc of your PDF files.

Just like a web page, to do well in the search engines, your PDF should have a logical flow of information built into the documents structure.

You can check both the content order and tagging may be addressed in Adobe Acrobat Professional by: View > Navigation Panels > Tags, and View > Navigation Panels > Order.

To define content order in Acrobat Professional find out whether your file is Tagged by: Control-D keyboard shortcut, then check the “Description” tab.

This is a great way to check for inaccessible content, not only should Tags say “Yes”, but the tags should be validated, too.

You can structure and tag your PDF by: Advanced > Accessibility > Add Tags to Document command. Once your tags have been added, you can see how the content is currently ordered using the Content panel, or by: Advanced > Accessibility > TouchUp Reading Order. In order to make your PDF files truly accessible and maximize their SEO value validate the tags, make sure the images have good alt text, and tables, lists and other structure elements are in order.

Just like any web page your PDF documents will add to the SEO value of your site when they contain well chosen keywords in the proper places and enclosed within heading (H1, H2) tags in the PDF files. Internal linking from the PDF back to your website is also a good idea. Anyone who places your PDF files on other servers will also be posting links back to your site.

Think of your PDF files as web pages that are part of your HTML website. Your visitors will be able to take these PDF “pages” offline.

A PDF file’s name is not only part your content management it is also extremely important to the search engines as well. Try to give your files meaningful names that include keywords and reflect the file’s Title.

All three of the major engines rank PDF files using different algorithms. If your PDF files are properly structured they will rank higher in Google and the rest of the search engines.

Late,
Gary Pool


Terms related to this post:
Be Sociable, Share!

Comments

5 Responses to “SEO to the Max – Search Engine Optimization for a PDF”

  1. Tweets that mention SEO : Search Engine Optimization : PDF : Search Engine Optimizician -- Topsy.com
    September 29th, 2009 @ 1:54 pm

    […] This post was mentioned on Twitter by . said: […]

  2. SEO : Search Engine Optimization : PDF : Search Engine Optimizician : Stilton Company - NJ SEO & IT Services - Ocean County - Monmouth County - Toms River - Jackson - Freehold
    September 29th, 2009 @ 8:25 pm

    […] Read the original: SEO : Search Engine Optimization : PDF : Search Engine Optimizician […]

  3. Search Engine Optimization Tutorials, Techniques, Tricks and Tips » Blog Archive » SEO : Search Engine Optimization : PDF : Search Engine Optimizician
    September 29th, 2009 @ 10:15 pm

    […] See original here:  SEO : Search Engine Optimization : PDF : Search Engine Optimizician […]

  4. pdf indexing - WebProWorld
    October 9th, 2009 @ 4:19 am

    […] pdf indexing I hope this would be helpful; SEO : Search Engine Optimization : PDF : Search Engine Optimizician __________________ Micfo International LLC VPS Web Hosting || Window Hosting Phone : […]

  5. Expert Search Engine Optimization Uk
    October 14th, 2009 @ 2:17 pm

    […] Search Engine Optimization Web Site […]