A Benefits Model for Search – Part 2

Introduction

In Part 1, I described how we could use two extreme scenarios to help to develop a benefits model for search, and outlined a benefits model based on Scenario1.  Part 2 describes how we can introduce a second scenario and use this to develop a more complete model. The scenarios were as follows:

  • Scenario 1 – Snippets Correct – in this scenario, the title and summary “snippet” are sufficiently descriptive and unambiguous that the user will view a set of results and be able to identify the most appropriate content from the result. (This also assumes that the user has sufficient knowledge to recognise the most appropriate result.)
  • Scenario 2 – Snippets Ambiguous – in this scenario, the title and summary snippets are sufficiently poor and ambiguous that the user has no information to decide on which is the correct document.

Scenario 2 – Snippets Ambiguous

Scenario 1 assumed high quality content descriptions in the snippet, so the user could identify the best or most appropriate content easily. In this second scenario, the titles and summaries are sufficiently poor and ambiguous that the user has little clue as to which is the correct document. We will assume that the user will start close to the top of the results page and work downwards, selecting results either in sequence or randomly. The user will waste time clicking each link – viewing the page, pressing the back button and selecting the next page and so on until the correct document has been found.

To model this scenario, we add two new variables:

tm = time to select a result, view result, determine that it is not the best page and return to the results page (e.g. using the “Back” button)
ps = percentage of the results that the user selects (assuming that the user will not select every result)

The formula now looks like this:

Time wasted (for one searcher) = s * ( (nr * ts) + (nr* ps * tm) + tr) + ( p – 1) * ts + (p-1)*( ps * tm)

if the result is in position 19 and the time to view each incorrect page ™ is 15 seconds, (compared to 0.5 seconds when the user only scans the result), then when the best result is in position 19, the time wasted jumps dramatically to £237,0000 per annum – if the user clicks on every link.   If we assume that the user will only click on 30% of the links then the time wasted is still £79,000 per annum. (Clearly the time wasted could be even higher if, by not clicking on every result, the user fails to find the document, and carries out another search, or even worse, phones someone up for the answer!)

Combining Scenarios 1 & 2

The reality is likely to be somewhere between the two scenarios outlined above. For most sites, the snippets are less than 100% effective at describing the content, but do at least provide some clues as to the best content. The diagram below shows the balance between these two scenarios, assuming that a user will only click on 30% of results. Using the formula and estimates in the previous section, if a site has high quality snippets, the time wasted from having the best result in position 19 is likely to be closer to the figure of £11,000, based on 10,000 searches per month. If the snippets are poor, a figure closer to £80,000 is likely – assuming that the user only clicks on 30% of the links.


As mentioned earlier, in reality the figure is likely to sit somewhere between these scenarios, and I have included a weighted value (in between these two extremes), where the weighted value would be estimated based on the quality of content shown on the result snippets.

The model described in this article shows that, in addition to ensuring the best content is as close to the first result position as possible, it is important to have the descriptive content in the snippets as accurate as possible. Without making any changes to the ranking process in a search engine, but simply improving the way that content is represented through snippets, it is clearly possible to deliver real benefits to users. However, we can also deliver further benefits by tuning a search engine to ensure that the best content is being delivered close to the top of the Search Engine Results Page.

Clearly the content on every site is different – and the time taken to view results and select a page to view will vary depending on the type of content, the speed of the site and the competence of the users. I have used rough estimates for time savings, which may or may not be applicable in your environment, so it is essential to develop benchmark timings for real users.

It is also worth pointing out that the above model works when there is a single page that users are looking for. In many cases, it is more complicated when the answer to the information query is to be found in a number of related pages.

Implementation

Can the model described be populated with real data? It could if an organisation can identify the following:

  1. a list of the most popular search terms
  2. an estimate for the number of times the most popular search terms are used each day/week/month/year
  3. an assessment of the quality of the search results – how descriptive are they – and how clearly do they identify the best results
  4. a list of the “best pages” for each of the most popular search terms, and the position that each of these appear in the search results
  5. an assessment of user timings described above
  6. an estimate for the cost of an employee.

Given these values, it is possible to turn the above model into a credible business case for investing in search.

Part 3 will look at additional complexities of search.

Posted in Search Benefits | Leave a comment

UK Government Reports covering Search

Although a little bit old now (2007), the UK National Audio Office report “Government on the internet:progress in delivering information and services online” is an excellent report with some real insights into government web site usage.

The report has a number of things to say about search – including

    • Internal search engines are widely used but are not meeting users’ needs
    • Participants said that they found department and agency sites hard to navigate, particularly when arriving at the homepage. Internal search engines, in particular, were found to be unhelpful in finding the information being sought
    • In our experiments with internet users, where participants started with the Directgov website, they used the internal search function for 65 per cent of the questions they subsequently answered, evidence of how vital it is for internal search engines to work well. In our focus groups, internal search engines also attracted criticisms. In interviews, Chief Information Officers (CIOs) and web managers acknowledged that internal search remains a difficult problem for departments and agencies.

The House of Commons Committee of Public Accounts also discussed search in the
Sixteenth Report of Session 2007–08.  This report, in addition to covering some of the same ground in the NAO report,  highlights the opportunity for a pan-government search capability, as developed in the USA using search.usa.gov.

Posted in Search in Government | Leave a comment

A Benefits Model for Search – Part 1

A Benefit Model for Search – Part 1 – Time Savings For Employees

Introduction

In a previous blog, I outlined the benefits from search. This post is the first of a series that will develop a model for calculating the productivity benefits that could be delivered through search (and the dis-benefits or time wasted resulting from low quality search results). The examples are shown for internal search benefits i.e. for employees using intranet sites and collaboration networks) although the model could equally be used for internet sites.

The conventional approach to productivity benefits for employees is to estimate a reduction in time spent searching for information – an arbitrary figure of , say 10% reduction in time spent searching for information is often used. The problem with this approach is that it usually lacks hard quantitative evidence, and is therefore difficult to justify to senior managers in a tough economic climate.

However, a more rigorous model for estimating time wasted or saved through search could enable a much stronger case to be developed for investment in both content improvement and search capability. The model in this article is based on typical user behaviours in response to a set of search results returned on a SERP (Search Engine Result Page).

Internet usability studies show that users usually click on the first few results of a search. We do know, from our own personal experiences, that time is wasted determining which is the best result in a set of results, and that further time is wasted selecting and viewing documents which are not the document we are looking for.

How can this scenario be turned into a benefit model? The starting point for this approach is to model two extreme scenarios.

  • Scenario 1 – Snippets Correct – in this scenario, the title and summary “snippet” are sufficiently descriptive and unambiguous that the user will view a set of results and be able to identify the most appropriate content from the result. (This also assumes that the user has sufficient knowledge to recognise the most appropriate result.)
  • Scenario 2 – Snippets Ambiguous – in this scenario, the title and summary snippets are sufficiently poor and ambiguous that the user has no information to decide on which is the correct document.

(In reality, what users experience is somewhere between these two scenarios, but these give us a method of calculating the time wasted through low quality search results.)

Scenario 1 – Snippets Correct

In this case, the user will be able to recognise the most appropriate content, but will have to scan through each of the search results before reaching the best result. It is assumed that viewing and reading each result takes a finite time. If the most appropriate content is in position 1, then the user will click on the first result, and no time will be wasted. If the result is in position 2,3,4 etc then, increasing amounts of time will be wasted. We can a develop a formula:

Time wasted = time to look at each result * (Position found – 1)

Lets say that the most appropriate content is in position 9 and it takes 0.5 seconds to view each result, then the time wasted is = .0.5 * (9 – 1) = 4 seconds

If our search logs indicate that one of the most popular searches has, for instance, 10,000 searches per month, then the total time saved (if we can deliver this result in position 1 rather than position 9) is approximately 11 hours of work time – at an employee cost of, say, £25 per hour, gives £275 per month. So with perfectly formed content (i.e. “snippets are correct”), the benefit would be £3,300 per annum for this one search.

If the result is not on the first page (and the user recognises this is the case), then the user would look at the second page. There is a further time penalty to select the second page (i.e. click on the “Next Page” button), and scan the results on this page also.

Assuming that the best result is on the second page, then the time wasted is now made up of :
the time to scan all results on the first page
the time required to retrieve the next page of results
and the time spent scanning the page that does contain the best result.

Given these variables:

ts = time to scan each result (secs)
nr = number of results on each page
tr = time to retrieve each “Next Page” (secs)
p = position found on results page
s = pages skipped to get to the correct result

We can then develop a formula that takes the Next Page into account:

Time wasted (for one searcher) = s * ( nr * ts + tr) + ( p – 1) * ts

if the result is in position 19, assuming that the number of results on each page (nr) is 10 and the time to retrieve the next page (tr) is 10 seconds, then the savings based on £25 per hour are much higher (£11,600 per annum compared to £3,300 per annum) given the need to retrieve the second page of results.

The figure below shows a graph of time wasted (using the above figures) if the most appropriate content is in positions 1-30, assuming 10 results on each page.
Time wasted - example for employee savings

I will expand the model to take account of low quality snippets in the next post.

Posted in Search Benefits | 1 Comment

Search & Collaboration Networks – The File Plan Fights Back

This post is a response from “guest blogger”  Oli Parker to my earlier post on Search & Collaboration Networks.  Oli Parker runs DeepGreen Consulting; an information management consultancy with a particular interest in ECM systems and collaboration. He can be contacted at ogp@deep-green.co.uk.

You’ve put your finger on a number of commonly-recurring issues with EDRM systems; issues which the EDRM (or ECM as it’s now called) community would do well to address.

However I think that we need to lay out some territory for the discussion before doing the addressing.  While it is easy to list the problems with certain approaches, it is also worth looking at what they offer.  In fact, before doing that, perhaps we should look at what is trying to be achieved – to find some requirements and investigate the way in which they can best be met.

Differing Requirements

The main distinction we need to make is between documents and records.  Or putting it another way, are we setting up a system in which in to store records (bearing in mind the various definitions and particularly the importance of records) or a system in which information more generally will be stored?  These are two significantly different requirements which will lead to quite different systems if addressed comprehensively.  (Yes, there may be a requirement for a system that does both of these, more later.)

If records are to be stored, a more formal approach needs to be taken.  Records bring with them retention schemes and access control issues.  Applying these is something to be taken seriously; often these are backed up by legislation which it is wise to stay on the correct side of, and sensitivities which need to be respected.  Records ‘belong’ to a corporate body, and the corporate body has a duty of care for them.  The internal structure of the corporate body may (and often does) change, and the storage system needs to be robust in such a circumstance.  Equally, the corporate body may split, with functional parts moving to another organisation, or parts of other organisations joining. (Machinery of Government changes are the public-sector equivalent of changes of ownership of companies or brands in the private sector).  The change in ownership of records in these circumstances needs to be handled efficiently, with as little administrative overhead as possible.

CFP’s are Good! (Honestly)

And these are the reasons for a Corporate File Plan (CFP).  By keeping records relating to a certain function, activity or transaction together, you can readily apply both access permissions and retention rules to them all, as access and retention are usually boundaried by the limits of a given activity or function.  By dividing records by function, you can re-organise the teams within your corporate body as much as you like without needing to change your record storage.  You can also very readily pass a particular function (perhaps that of regulating financial markets or of issuing licences for exhuming dead bodies) to another organisation, and by ‘snipping off’ the CFP at the relevant point, you can pass on all the relevant records pertaining to this function at the same time.

These are all good reasons for applying the rigour of a CFP to your corporate record storage.  Without structured organisation of records such as that offered by a CFP you may gain apparent short-term benefits (such as it being easier to find a ‘suitable’ place to save items), but in the longer-term you will find managing records to be much more difficult because – simply put – they aren’t organised.  Trying to find all the HR records relating to an employee who left 7 years ago when no effort has been made to keep them in the same place will be an exercise akin to trying to find a complete pack of 52 playing cards for a game of poker when they are not all kept together.  You can’t play the game with an incomplete pack, but while the appeal of both exercises wears off after only a few minutes of searching, the game of poker can be readily abandoned; the other can’t.

The benefits of a CFP are therefore clear.  But let’s not confuse a CFP with an ECM (EDRM) system.  An ECM system is simply a set of tools which automate a number of the more laborious aspects of record-keeping;  tools which work on a collection of records, in whatever structure they inhabit.  These tools are primarily concerned with security, retention and access, although a mature ECM tool will offer a raft of other functionality as well.

It is important that the storage structure is seen as a separate entity to the collection of ECM tools.  All too often, the design of a CFP and the implementation of an ECM system are seen as being two parts of the same project;  a view which often gets them dangerously confused in the eyes of the user.  They should be introduced and presented to the user separately, in order that their separate purposes are clear.  A CFP can be used to store information regardless of medium or system; in an ideal world, the CFP would be used to structure the storage of both electronic and hard-copy media.  It should be used to manage eMails as well as ‘Office’ items (why would you want to use a different structure for eMails?), for microfiched as well as paper-based items.  A CFP could (and should) out-last an ECM system by several generations; if an ECM is correctly seen as merely the tools that are available to manage items within the CFP, changing the tools for a new – updated – set should be readily possible, without any need to change the underlying structure of the information being managed.

CFPs therefore offer useful benefits which are hard to ignore.  Users need to understand the purpose of a CFP and the reason for using one, in much the same way that they need to understand the purposes of and reasons for keeping records (all of which are commonly dismissed as ‘just filing’; an attitude that is as unhelpful as it is short-sighted and too often badly overlooked).  However CFPs are also difficult to get used to and to use; people don’t think in terms of functional structure when either saving or looking for an information item. (They think in terms of areas of work that they are responsible for, or that their team are responsible for, and tend to file accordingly.)  For precisely this reason, users should not be presented with a CFP without also being shown a mechanism for easily navigating to the parts of the structure which they will most frequently use.  In the same way that a sales employee has no need to understand the structure of the HR department,  so such an employee need not know the detail of the HR function of the CFP;  that both exist is all they need to be aware of.

Nonetheless, the aforementioned sales employee may need ready and quick access to a number of apparently disparate parts of the CFP;  the sales and marketing area, the product development area and their own personnel file, perhaps.  For this reason, it is important for them to be able to (and to understand how to) set up shortcuts to these parts of the CFP relevant to them, such that in their regular work they need only access these shortcuts and not concern themselves with the rest of the fileplan; a fileplan which is not ‘theirs’  (it belongs to the corporate Records Management department) and over which they have no control. They must then be able to arrange these shortcuts in whatever way they please and thus can create their own personal fileplan (of shortcuts), for their own use.  This personal fileplan needs to be available to them whenever they access the record store – either for retrieving or saving items.  Once created, these shortcuts should be persistent – if created correctly there is no reason why they should ever lose that which they point at.

When a CFP is less useful

However for the majority of information held by most organisations this is still a bit tedious, and focuses heavily on the storage of items.  It hasn’t addressed the central concern of information retrieval and collaboration.

CFPs are useful for keeping records, but such rigour isn’t necessary for the vast majority of the information held by organisations.  Notes, papers, drafts, discussion documents, articles and blogs are all very useful information artefacts, but do not need the degree of management offered by a CFP.  They are not usually subject to a retention schedule or access restrictions as Records are.  Such items need a gentler management touch; they can be held in such a way as to be readily searched and retrieved, such that the intellectual value contained within them can be readily capitalised upon.

An information environment that maximises the utility of such items will, by it’s nature, be a lighter-weight and faster-moving affair.  Storing items needs to be quick and easy, and yet they need to be readily retrieved when necessary. This presents an interesting challenge in terms of tagging and indexing, but such an environment can be designed around a looser structure, using a different set of tools (most notably search) to retrieve items.  Given that items within such an environment are more likely to be used for research (as opposed to reference), the use of search as a tool to navigate them becomes more appropriate.

Search is, by its nature, a processor-intensive task, and as Moore’s Law tells us that processing will get ever cheaper with time, we can assume that search tools will become more useful. Your article outlines a number of areas where search has a large amount to offer, and we can expect to see good progress on a number of these fronts in the near future.

Then again …

However, you also talk about a number of pre-requisites for the operation of a search tool, as follows:

“For search to work across a collaboration network,  then there is a requirement to:

  • apply appropriate access controls to collaboration areas within the network
  • search across all collaboration areas (subject to access controls) in the network
  • provide consistent metadata across all collaboration areas in the network (to allow search to work accurately)
  • actively manage the long term retention of content i.e. close down collaboration areas that are not being used and migrate content to archive areas
  • provide governance in the setting up, management and disposal of collaboration areas (to avoid the situation where the number of collaboration areas grow in an uncontrolled manner or individual collaboration areas to not conform to the standards of structure or metadata) “

I wholeheartedly agree with these points, but would suggest that many of them are simply re-stating the same needs outlined earlier, namely those of access and retention; needs which were met by a CFP.  While I would agree that the rigour of a CFP is not necessary for collaborative systems, it would appear that there is a need for some form of structure to underlie the information holdings. Indeed, you mention such a need in an earlier stage of your article – “It makes sense that an EDRM system needs a folder and file structure at a basic level.”
So, what are we saying?  That there is a need for some form of structure in which to hold information artefacts; a structure that allows management of these artefacts.  Different artefacts need different degrees of rigour in management, with those being held as records demanding more rigour than those that are not.

Steps to Improvement

Given these differing needs of governance for different forms of information, what can be done corporately? A modern organisation will need both records management as well as information management, so how is this to be achieved? One-size-fits-all doesn’t work, as you have pointed out; you either find yourself applying too much rigour to information, or lack the management capabilities for records.  What is the solution?

I’d suggest a two-pronged approach. Recognise the differences, and use systems accordingly.

  1. Prong one would be a CFP for corporate records, with an ECM system to manage the contents.
  2. You could then implement this alongside a less-structured system (Prong two). Perhaps use some (relatively) unstructured nodes within Microsoft SharePoint (much loved by users for its support for collaborative working), and encourage users to do their collaboration and thinking using this system.

Aforementioned users should need to know enough about the importance of record management to understand the differences between the two, and to save discussions from Sharepoint to the ECM as and when they become records.  That sounds hopeful.  And you could also then run an enterprise-wide search tool to retrieve from either (or both) system.

Conclusions

You’re right.  Trying to persuade users to use a CFP when it’s not necessary will cause users problems, and introduce a barrier to the adoption of an ECM system.  And this is perhaps where many an ECM implementation has foundered; trying to use the wrong tool for the job is always a bad idea.

However, the converse is also true.  Trying to manage important, lasting records without the correct degree of governance and control is also going to cause problems – of a different form.  Again, the correct tools are needed for the job in hand, and the CFP is a very useful tool to use in the management of records.

The right way forward is to understand what is trying to be achieved (management of records or management of information), and building the system accordingly.  Such an understanding needs to be agreed by everyone up-front – ‘everyone’ including system owners, project sponsors, records management departments, systems integrators and users alike.
Only when such understanding and agreement is reached can an ECM implementation be successful. To try and proceed with any project without this agreement is to embark on a project which has – at best – a limited future usefulness.

Posted in Search Information Architecture | Leave a comment

Search Usability Issues


The figure below is from Jakob Nielsen & Hoa Loranger’s excellent book, Prioritizing Web Usability from 2006.  One section of the book identifies typical usability issues associated with public facing web sites.  As you can see,  search is the feature of a site that caused the greatest usability problems – greater than Information Architecture (IA),  Readability,  Content and the other features identified.   This information reflected results from 2006 or earlier.

Has the situation got any better since then?  Certainly, many of the worst examples of web site search have been replaced with much improved search capability and better designed Search Engine Results Pages (SERPs).

Increased familiarity with Google contributes to the “Google effect” – if the results look like Google results,  the user thinks they must be good.  Familiarity with Google may also help to raise general search competency – for instance, the use of advanced search facilities,  such as constructing a phrase search or use of wild cards.

However, as users become more familiar with an internet search engine’s advanced search capabilities, will they assume the same facilities are available on all sites?  The advanced search syntax on many site search engines differs from Google, Bing and other popular internet search engines.  Maybe now is the time to start standardising on advanced search criteria i.e. the syntax added to the standard search box, rather than an advanced search screen.   (For instance, the Google advanced search screen is much improved compared to a few years ago, but is still daunting for the average user.)

It would be interesting to repeat the above survey with today’s users, and see if search is still the area with the greatest usability problems.   If you have carried out surveys on your users, I would be interested in getting some feedback.


Posted in Search Usability | Leave a comment

Search and Collaboration Networks

Introduction

There is a  belief within the document and records management community that a comprehensive and detailed Corporate File Plan (CFP) is essential for the successful deployment of an EDRM (Electronic Document and Records Management) or ECM (Enterprise Content Management) project. The question posed in this post is “does a combination of search and team-based collaboration solutions provide a more efficient and effective way of managing files or documents  in large organisations?

This article outlines how an over-complex Corporate File Plan may actually prove to be a barrier to EDRM / ECM deployment and, more importantly, a barrier to information and knowledge management within large organisations.   A new approach, utilising search and other technologies, in a “collaboration network”, is proposed.

The Corporate File Plan Approach

Folder and sub folder functionality appeared within the first generation of document management systems (e.g. Saros, Documentum) and quickly became a prerequisite for any product in the EDRM space.   Interestingly, PC Docs, (now Hummingbird / Open Text), probably the most successfully deployed early EDRM system, did not use a folder and sub folder approach.  Users of PC Docs entered properties identifying the author, subject, title, project number and various other values, and the document was filed away and could be retrieved with any combination of the appropriate properties.  The location of a file (and its name) was not required. ( It could be argued that one of the key reasons for the success of PC Docs implementations was the simplicity of filing and retrieving documents.)

The folder and sub folder structure had some successful uses in department-based document management applications in the private sector.  However, the setting of electronic records targets by governments across the world led to the adoption of the “Corporate File Plan” as the de-facto approach for managing large document collections, based on a functional “Business Classification Scheme”.   As a result, the starting point for many EDRM programmes has been the development of the Corporate File Plan -  an all- encompassing hierarchical set of folders and sub folders into which any user could store material and find it again.

In many cases, the process of creating the Corporate File Plan has grown into a large project in its own right,  lasting years in some cases, delaying the implementation of EDRM, and eroding the benefits of new technology and new ways of working.  (Projects have often suffered from changes in organisation responsibility midway, resulting in extensive reworking.)  It makes sense that an EDRM system needs a folder and file structure at a basic level, but I would argue that many projects have gone way beyond this,and have developed products that are too complex to understand by the people who need to understand them – the users.

Experience shows that file plans work best where there is already a logical folder structure for information (which is understood by users), with consistent naming convention for sub-folders, and the logical structure is related to repeatable processes.

Examples include:

  • information to take to a law court within a “Case File”
  • insurance claim file
  • a patient health record file.

Most users experience has been that file plans are not as effective for information where the processes of managing, creating and reviewing information are less repeatable than the examples shown above.

Corporate File Plan Issues

Frequent complaints from users regarding EDRM systems are:

  • “It takes too long to save a document”
  • “I can’t remember where I put  the document -  it in the new file plan!”
  • “The file plan is too complicated”
  • “I have to enter too much metadata!”

One of the challenges of the folder / sub folder model is that there is too much choice  – do you set up a new sub folder for this document?  Where do you put the document in the current structure?  Should you add descriptive information to the folder, file name or a property in a document? What keywords  and other metadata should be used?

File Plan issues can be summed up as:

  • File plans slow down the process of saving and accessing documents, through having to navigate complex folder hierarchies (each time the user opens or saves files).  The “Save As” and “Open” dialogs are particularly unwieldy when navigating up and down multiple areas of file plans.
  • For anything but repeatable processes with very rigid folder structure, users find it difficult to understand and remember the file plan structure and where to file information.  The Function / Activity / Transactions approach (often recommended to organisations)  is obscure as the basis for file plan structure, particular for strategic rather than operational information.
  • It is easy to misfile a document, either through misunderstanding the file plan structure, or simply by making a mistake when using the Save dialog.
  • File plans lead to accidental duplication of content – it is too time consuming to check if a document has already been saved elsewhere (due to the lack of functionality within the “Save As” dialog), so users create a new copy just in case – for instance, when a number of recipients have been sent the same attachment.
  • File plans are not “email-friendly” – we often work off a different set of filing structures within email systems.
  • Metadata is spread across a number of containers e.g. disk name, folder & sub folder names, folder and sub folder metadata, document metadata.  This makes search ineffective, as we don’t know which metadata fields to search (and whether to search folders or documents).
  • Lack of easy to use facilities to reference a file in more than one location  at once (i.e. shortcuts are hard to set up and don’t get automatically updated when the original is moved).
  • It is time consuming to agree the Corporate File Plan and challenging to keep up to date as the organisation and its information evolves.

It is remarkable, given the accepted wisdom of the use of file plans, how little research has been carried out on the usefulness of the file plan for information and knowledge management, and the lack of information on the benefits achieved in major programmes.  Anecdotal evidence shows that the file plan is often a major barrier to adoption of EDRM / ECM solutions, rather than a help.   Where are the case studies or evidence that this is a suitable approach for managing corporate information?

The Role of Search

It is also worth considering the changes in technology over the last five years that point towards search as an effective method for accessing corporate information.

  • Search is increasingly popular with users (and understood by users), due to the growing acceptance of Google and other internet search engines.  Evidence from web usage shows that the majority of navigational access of web sites is via search, rather than menus. (Little information is available on Intranet or EDRM use, but it may show similar user support for search.)
  • Increased computing power and low cost search products makes search-based solutions affordable, and effective  e.g,. Google Search Appliance, Lucene
  • Increasing use of mobile devices (with small screens) make it less practical to display a complex multi-level file plan.
  • Increasing amount of information is not held within discrete files e.g. “Web 2.0” content, such as wikis, blogs, and email content, where file plan structures are not applied, so search is the main route to find this information.

However,  while search is an attractive option, poor quality descriptive information within document properties (i.e. the metadata) would mean poor quality search results also.

Search and The Collaboration Network

If we downgrade the significance of the Corporate File Plan, what do you replace it with? One option is to simply base document collections around team working and collaboration, in the way that collaboration tools such as Lotus Notes and Microsoft SharePoint allow separate collections to be set up.  I am calling this a “Collaboration Network”.

This could comprise collaboration sites, (including “Web 2.0” content) and also legacy EDRM stores, even from multiple vendors. Search inevitably forms a key capability of the “Collaboration Network”.  Search allows additional “document context” views to be developed.  For instance:

  • What’s new
  • Recently accessed documents (for a user/team/organisation)
  • Documents created or accessed by a user
  • Team documents
  • Saved searches (to retrieve commonly used documents against more complex criteria)

Search has a number of advantages over the Corporate File Plan approach. Search can:

  • be faster to find documents – assuming the metadata is present – particularly for very large collections
  • find specific documents or collections in many different combinations i.e. via name or combinations of other metadata.  The Corporate File Plan is not as effective for presenting corporate information where there are many different views of information
  • be tuned / customised for common search requests – i.e. if many people want a particular form, then a “best bets” approach can be adopted
  • deliver a “serendipity effect”  -  users are more likely to find information they did not know existed through search, compared to browsing folders.

For search to work across a collaboration network,  then there is a requirement to:

  • apply appropriate access controls to collaboration areas within the network
  • search across all collaboration areas (subject to access controls) in the network
  • provide consistent metadata across all collaboration areas in the network (to allow search to work accurately)
  • actively manage the long term retention of content i.e. close down collaboration areas that are not being used and migrate content to archive areas
  • provide governance in the setting up, management and disposal of collaboration areas (to avoid the situation where the number of collaboration areas grow in an uncontrolled manner or individual collaboration areas to not conform to the standards of structure or metadata)

Enhancing The Collaboration Network

Search is an essential component  but what techniques can improve the effectiveness of the Collaboration Network, by ensuring high quality search results?   There is a wide range of potentially useful technology out there that could be used to improve the quality of content in a  collaboration network, using an “active content management” approach – compared to the “fire and forget” approach of current document and records management solutions.

Some of these technologies will include:

  • Metadata Analytics – it is possible to use techniques to look for missing information (including metadata, summaries used in search snippets, titles etc.) and either add automatically or flag the omission to content owners.  Additionally, it is possible to add automatically based on other documents with similar properties (or earlier versions of the same document).
  • De-duplication – we can replace duplicate copies of content with links to a single master copy.  (Some hardware and software solutions do this already).  A system could prevent two files being created with the same name – the Corporate File Plan allows two files to have the same name but in different folders.
  • Semantic Analysis – many tools exist to automatically generate metadata – including those that look for proper names, scientific terms or terms from a supplied  business thesaurus.  It is possible to use this analysis to auto-populate missing data.
  • Linguistic Support – we can use glossaries, thesauri, taxonomies, ontologies etc. to to help steer the user to the most appropriate content, and help with the process of indexing information.
  • User Learning – we can utilise web analytics to identify popular content.  This approach is well established for ecommerce sites such as Amazon i.e. recommendations, drop down suggestion boxes based on users past activity.
  • Search Analytics – we can analyse search queries and search results to provide insights into content and feedback these results to content owners – do common searches deliver the best content to users?

And finally, I think the world is ready for a much improved  Open / Save / Save As dialog.  It does not work well with a Corporate File Plan –  the main problem is that when you are in the middle of a Save or Save As, you can’t check if a document does or does not exist, and inevitable duplication takes place.  A redesign of this dialog is needed for the Collaboration Network.

The Collaboration Network approach would involve using a range of tools as shown  above to continuously asses and improve content.  Much of the time consuming work for this can be automated (using technology that has been around for many years), although clearly it would be beneficial to involve the user in the  process i.e. provide the ability to veto or adjust any automated changes to content.  For instance, a user could be prompted to add metadata to a recently created document – using semantic analysis to suggest the most appropriate values from a thesaurus.

Conclusions

Many professionals would concede that the majority of corporate EDRM / ECM deployments have delivered patchy success, if at all.  The deployment of related technologies including collaboration software, email archiving, network drives and “Web 2.0” tools continues in parallel, with significant consequences for knowledge management. Information management programmes should build on team collaboration initiatives to form “collaboration networks”, rather than expend too much effort on developing and deploying a Corporate File Plan and corporate EDRM solutions.

Collaboration networks with search will enable information to be found and reused more efficiently and effectively than a single EDRM solution based on a Corporate File Plan, should it even be possible to achieve. This approach requires a much greater emphasis on search , which, in turn, is reliant on improving the quality of descriptive information associated with content – whether metadata or within the content itself.  We need to use “active content management” and develop processes and techniques to continually assess and improve content quality.. either using fully automated methods or automated assistants to feedback to content owners.

Posted in Search Information Architecture | 3 Comments

Defining search usage on corporate sites

What proportion of users of a web site use the search facilities as the method of navigation?  Measurement in this area is pretty important to any organisation implementing (and justifying)  search – either as an addition to a web site  (intranet or external web site) or as part of a collaboration / Electronic Document & Records Management (EDRM) project.  I include a few examples:

Citizen / Customer Facing Information-based Site

  • 82 % -  “82 percent of visitors use site search .”  (Google White Paper)
  • 65% – “..with the Directgov website, they used the internal search function for 65 per cent of the questions they subsequently answered..” (UK National Audit Office – Government on the internet)
  • 40%- 50% – users make use of search during a session – my estimate based on a number of government projects in the UK

Citizen / Customer facing information-based sites are likely to give one range of answers.  For other types of sites, the proportions may vary significantly.  For instance, the figures are likely to be quite different for Intranet Sites, EDRM applications, Collaboration Sites (e.g. SharePoint / Notes) and Ecommerce Sites.

    Here are a few measures that could be used – in ascending order of complexity and usefulness.

    Level  1 – Search Volume

    A basic  measure is identifying the total number of searches carried out each day/week/month.  Following the implementation of a much improved search capability, the volume of searching often increases significantly.  This is usually viewed as being a “good thing” although clearly it does not tell us how useful the search facility is  i.e. whether users are finding the information they need.   For instance, at one end of the scale, it could indicate much greater self-service access to information is being facilitated, delivering significant benefits.  At the other extreme, an increase in search volume could mean that a high proportion of searches are not finding the correct content (and so a second or subsequent search is required), leading to much lower benefits (or even disbenefits).

    Level 2 – Search Usage

    A measure that is more useful is the proportion of  users that have at some time used the search facility on a site. This information can be found through user surveys or via web analytics.  The figure quoted  above from the Google White Paper appears to be this measure, although it is not completely clear!

    Level3 – Search Usage Per Visit

    A measure that is more useful is the proportion of visits which involve at least one search.  This information can also be found using web analytics.  The figures I have quoted from UK  government projects are for this measure.

    Level 4 – Search Usage Per Information Request

    The most significant measure would be the proportion of questions (or information requests)  that are answered using search, as opposed to simply using page navigating to a page.  The Directgov figure appears to be this type of measure.  If each visit to a site only involved one question, then this figure would be the same as the Level 3 figure – in practice, a visit may involve a number of information requests.

    There may be other measures.   I would be keen to get examples from different types of sites and will include in this section – keeping the organisation anonymous or not, depending on your preference.

    Posted in Search Analytics | 1 Comment

    Improving Search Results

    Search is an essential component of most document management, collaboration or intranet solutions. While search usually does at least produce a set of results, in many organisations the quality of results leaves a lot to be desired. This is a missed opportunity, because there can be significant benefits delivered from a good search capability delivering high quality results. This post explores some ways in which search quality can be improved.

    The proportion of users that find information through search varies widely across the range of possible document-based applications, however, there are clearly many situations where the file plan / folder- based approach to finding a document is not effective. When search is selected, the expectations from users are that an intranet site or document management site search will work as fast, and find high quality content, the way that Google, Bing, Ask or Yahoo appear to do.

    Research into the use of internet search engines shows that users make an almost instant, and instinctive decision once they see the results of a search. If users do not perceive a close match within the result (based on the title and brief summary), they will usually search again.

    Usability studies on web sites show that:

    • users generally only look at the first page of results and indeed only the first few results
    • over 70% of users will click on either of the first two results in a listing.

    While the typical EDRM or ECM user may spend longer studying the results page than an internet user, it is clearly important for EDRM / ECM applications to ensure that the most appropriate content for common searches is well represented on the results page. Poor search results waste time, while the investment in creating and publishing is wasted if a document is never found.

    Keep It Simple

    Through familiarity with internet search engines, today’s generation of searchers expect to see a search screen and search results that look similar to Google. It is advisable to keep the design of the search screen as simple (and recognisable) as possible. Studies also show that complex advanced search options are rarely used.

    Search result pages should be kept simple and uncluttered, showing between 10 and 20 results in a simple list, with few if any additional text panels. Users usually ignore any text that is not within the core results list – this includes panels above, below or to the side of search results. (One of the reasons that users ignore this information is that many search engines serve up paid for adverts in these areas and therefore users consciously or unconsciously skip over this text).

    It is also a good idea to keep each search result free from unnecessary descriptive information – information that is of little use in the decision whether to select a result or not. (For instance, is document size really necessary on results?) A file name is often provided on the search result – again this may have limited value in an EDRM context where the originating file name has little meaning.


    Understand the Technology

    Search appliances and open source search engines have lowered the initial cost for very sophisticated search capabilities. However, in many cases, content owners have limited understanding of why certain documents appear at the top of a set of results.

    A search engine will use the words within a page to identify how relevant that page is to the search term or terms being entered. The relevancy of a result is affected by where the word is found – if in properties associated with a document, such as title, subject area, a higher weighting may be given compared to the same word found in the body of a document. In the case of a EDRM application, if the properties are not filled in, then the chances of a document being found will be lessened considerably.

    Some search engines use other weightings, for instance, delivering more recent document in preference to older content. It is usually possible to adjust the weighting, to improve the result ranking, however, an understanding of how a search engine ranks results is essential to inform content owners how to assign properties to a document.

    Good Content Delivers Good Results

    The key issue and the real challenge for organisations is that search accuracy is primarily dependent on the content that search is applied to. Good content delivers good results.
    Writing, approving and publishing documents is a time consuming process. However, if insufficient effort is spend on ensuring the appropriate descriptive content is present for a search engine, the document is unlikely to be found.

    The Title and Summary information or “snippet” that accompanies the title of a search result are what represents the document and is used to assess the relevancy of a result. The Title is the most important factor, although the summary provides additional help in the decision. Most search engines generate this summary automatically from the document itself, based on words in close proximity to the search terms. A better approach is to allow the search engine to display a specific piece of content (which more clearly represents the document). This could be an abstract within the document or a property associated with the document.

    With appropriate training and guidance for content owners and information managers (on the importance of good titles, properties and content), it is possible to improve the quality of results (and the information on the results page) over time. (One of the difficulties here is that any changes to content may not be immediately picked up by the search engine – it may be necessary to wait a few hours for a document to be reindexed and a new search position identified.

    Guide The User

    While advanced search screens are little used, there are a number of techniques that can be adopted to guide the user during the search process. A few examples include:

    • Refining search results – alongside the results from a search, a panel allows the user to narrow a search (by clicking on a list of categories based on metadata or clustering of results). For instance, a general search across an organisation might be refined based on department or year of publication.
    • Query suggestions – as the user types in a search, the search engine suggests popular terms based on the letters that have already entered.
    • Best Bets – in response to a specific search, the search engine can point a user directly at the predetermined most relevant page (I.e. bypassing the search engine’s choices)
    • “Did you mean? – the search engine can prompt the user to invoke a switch to a preferred search term – this is useful for commonly misspelt terms or abbreviated forms of terms.

    Assessing Results

    Search results are hard to predict, and minor changes to the search terms entered can bring up a completely different set of results. Search results can be sensitive to the order in which words are entered, whether singular or plural forms of words are used or whether prepositions such as “a” and “the” are used.

    However, the bigger challenge is that search results will depend on what content is available on the site at the point in time when the search is carried out – and this is changing over time as new documents are added or old documents retired from the collection.

    Therefore, it is necessary to regularly assess the effectiveness of search over time, and as content is added or removed from a document collection. “Search analytics” describes the processes used to help content owners to understand how well their content is performing through search. Search is normally analysed in two ways:

    • analysing the “search logs” to find the most popular terms that are being used. This information can then be used to reproduce the searches and examine the results. Additionally, a search engine log should show those searches that deliver no search results – this could point to content that requires immediate attention.
    • examining which documents are being returned most often i.e. the most popular documents. Some of these will be viewed as a result of search, but many may be as a result of navigating through the folder structure. Some search engines will identify which pages have been returned as a result of search but in many cases this information is difficult to extract from other usage activity.

    The Business Case For Search

    The benefits from search could make up a significant proportion of the benefits associated with EDRM / ECM. (The Forrester Report “ECM Priorities for 2010” estimated that 49% of respondents across 1,700 organisations couldn’t estimate the ROI for any of their ECM systems, making it difficult to get money for expansion.)

    It is possible to build up an understanding of the benefits through analysis of usage behaviour and direct feedback from users. Some initial measures can include:

    • What proportion of users make use of the search facilities?
    • How many searches are carried out on a daily / weekly / yearly basis?
    • How satisfied are users with search results?
    • Is the search engine returning the most appropriate content for common searches?
    • Which results does the user select to view?
    • How informative are the titles and summaries shown on the search results page?

    For larger document collections, there may be benefits from regular assessment of content. For instance, providing feedback to individual content owners, updating content and assessing improvements over time. Once a benchmark has been set, it is possible to start to build up the business case for investing in search or the need to make improvements to search. Benefits include:

    Strategic benefits

    • Impact to the business/clients/partners, through having the right information – improved decision making

    Productivity benefits

    • Faster access to information
    • Avoid recreating already existing information
    • Self-service access to knowledge (reducing staff and other peoples time)

    Infrastructure benefits

    • Carbon use minimised – reduced repeated searches / opening of non-relevant material
    • Reduced duplication of content – less servers, storage

    Studies show that organisations with staff dedicated to improving search do achieve higher user satisfaction with search. The question for organisations is how much time should be devoted to search on a regular basis? What are the benefits of implementing processes for continuous improvement for search? A broad brush approach to estimating time savings (e.g. 10% reduction in time spent searching for information, as quoted in a recent Google White Paper on search return on investment) may not be sufficiently evidence-based to justify significant investment to senior managers in a tough economic climate. However, analysis of real user behaviours will enable a stronger case to be developed for investment in this essential business capability.

    Posted in Search Benefits | 2 Comments

    Introducing the Corporate Search Blog

    This Blog is aimed at identifying the benefits delivered through corporate search applications.   There are a number of excellent Blogs devoted to corporate search technologies and even more that cover internet search engines – particularly for the Search Engine Optimisation (SEO) community.

    Search was important before the internet ever existed and clearly  Google has become the most significant technology of the internet today.   However, there is very little Blog information on the use of search technologies within organisations.   This Blog is aimed at expanding the debate on search – whether for internet-facing sites,  intranets, collaboration areas, Electronic Document & Records Management (EDRM) or Enterprise Content Management (ECM).

    As a consultant,  I have worked for many large organisations in both the public and private sectors and as a result,  I believe that large organisations have not exploited the full potential of search technologies for knowledge sharing.

    

    Posted in Search Benefits | Leave a comment