Sunday, June 25, 2006

Summer vacation

Coming up is the last week of work for me before the summer, and as last weeks go, I'll probably be too busy to blog about anything interesting. But I'd like to brag that my entire month of July will be spent mostly offline, doing nothing but enjoying the summer in our family's summer-place, as well as digesting the WebWork- and AJAX in Action books. I'm changing employer across the summer, and these technologies are probably the ones I'll be working with at the new place.

Have a nice summer, everyone!

Friday, June 16, 2006

The Web Content Challenges

Last monday I finally presented my thesis "The Use of Open Source and Open Standards in Web Content Management Systems". Present were my student guide, a small gang of friends and colleagues, and the external examinator who was there by way of tele-conference (or Skype as it's called these days).

The presentation went fairly well, but as the examinator had audio communication only (as well as a copy of my slides), my entire theatrical focus was inside the monitor of my laptop. So the people present in the room weren't really too flabbergasted by the presentation, but the examinator liked it and that's what counts. I got some flame for not having spent too much energy on the academic method, but overall he meant it was a great thesis. So that's the official end of my 17 year long education!

Anyhow, here's another snippet about the reason we developed WCMS-es in the first place:

Web Content Challenges

The concept of content in itself seeks to solve the challenges by delivering the right content. This goal is not easily reached due to the following conditions.

Content is not maneuverable

The main problem with information is that there is too much of it [Goodwin, 2002]. There are too many web-pages with too many attached documents [McGovern, 2006b]. A company can invest resources into sustaining a site map and a navigation tree menu, but if these are constructed manually, and not generated from the content structure automatically, these navigational methods will stagnate and become more of a nuisance than helpful tools [McGovern, 2006a]. Navigating by search is a great shortcut to make all content available, but searching the right way is easier said than done [Belam, 2006] and a search-engine can not substitute conventional site navigation.

Content is useless

Stagnated web-sites quickly grow dead links which are references to other web-pages that have been moved or deleted. There might be many pages and documents in existence which are not hyper-linked at all, and thereby will never be accessed. As defined earlier on, content which is not accessed and used has no value. Maintaining value-less content takes up resources which the content managers could have spent on more useful parts of the web-site. It also confuses the visitor by polluting the web-site, making it harder to find the useful content.

Content is not automatically accessible

Two elements by which one can interpret a language are syntax (grammar) and semantics (meaning). A computer interpreting the content of a web-page first checks the syntax by parsing the page and checking whether the markup language is valid. If the syntax is incorrect, the parsing is likely to break depending on the fault-tolerance of the parser. Although incorrect use of markup causes annoyance among web developers, the main issue accessing and reusing web content is lack of semantics. A computer can automatically access a web-page and read it, but it can not decide which paragraph is the title of an embedded article, which is the abstract text and which is the main text of the article unless the semantic standard is enabled in both the web-page and in the program reading it.

Mixing content and design also reduces accessibility. A computer can not decide whether a table is used to control the layout of a page, or if the table has semantic value.

Content is not structured

This grievance is tightly connected to the one above, though it is more apparent in traditional content management. Web content has the advantage of dealing mostly with HTML, which despite its criticism is still a transparent text-based standard based on the more reliable XML. This transparency is lacking in binary files, such as multimedia assets and proprietary formats such as Microsoft Office documents and PDF-files [Martins, 2004].

Content has no meta information

There has a been a noteworthy increase in the ability to tag or label various data objects with meta data. Meta tags can be included in the header of a HTML-page, or in the properties of a Word-document. Forcing users into actually using these features manually can prove to be difficult. If the title of a document is "Content Management", it is quite tedious to label the document with meta-data that states that topic is “content management” and similar keywords. A possible solution to the meta-problem lies in automatically tagging content [Staelin, 2004].

Content is not connected

There is bound to be digital content within the organization which could have been enabled on its web-site. Databases, memos, product catalogs and other documents, which do not violate corporate confidentiality by being made available online, are typical resources which are held back by their isolation from other content. Information systems are too often designed with a single purpose in mind, and it proves difficult to integrate them as services into the web-site. The worst scenario is when the organization has grown dependent on some specific proprietary software or platform which has restrictions on how the content can be accessed.

Design is not consistent

A company will normally have one graphic profile, or one different profile for each division of the company. The profile includes names, slogans, logos, a color-scheme, text styles, document headings, footers and layout. Periodically, the profile of a company will be changed, and typically all content produced up and until then will be stuck with the old graphical profile. It is expensive to have a clerk go through each HTML-document and change each document manually. As the profile perpetually changes, the company web-site will grow into a confusing mongrel of pages using various outlooks designed throughout the lifetime of the site. As a result, the visitor of the web-site gains little image of the company's identity, and is left with the impression that the company is badly organized.


Belam, M. 2006, "Fine Tuning Your Enterprise Search - How To Get The Best Results To Your Users" [] Retrieved 9. April, 2006

Goodwin, S., Vidgen, R. 2002, "Content, content, everywhere... time to stop and think? The process of web content management", April 2002, p. 66-70

Martins, J. 2004, "The Structured-Unstructured Information Continuum" [] Retrieved 10. April, 2006

McGovern, G. 2006, "Web navigation is about moving forward" [] Retrieved 9. April, 2006

McGovern, G. 2006, "Your website is for your most important customers" [] Retrieved 9. April, 2006

Staelin, C., Elad, M., Greig, D., Shmueli, O., Vans, M. 2004, "Biblio: Automatic meta-data extraction" [] Retrieved 25. August, 2005

Tuesday, June 06, 2006

The WCMS Alternatives

An interesting way to portrait what a WCMS is, is by saying what it is not. I tried to do this in my thesis, but ended up with too many WCMS-sisters that either
  • Can be part of the WCMS
  • Can have WCMS bundled inside
  • Is part of the WCM process, or
  • Depending on definition, is a WCMS
So I couldn't really say "This is what WCMS is not:", but ended up with the WCMS alternatives (not really a good title, methinks, but still I think the section helps to explain the WCMS):

To further explain web content management, one can consider what other web content tools and management systems are used today, and what separates these from full WCM systems [Byrne, 2001], [Junco, 2004].

The definitions in use are not clear, and some vendors flag functionality which goes beyond their product. To avoid confusion, these are some of the product families which most often are mixed with the WCMS.

File system

There are various servers or directory services that can be set up to store digital documents and expose them to the Web with the use of a web-server. Even though many of them store content and perform similar tasks to the WCMS, these systems are not complete content management systems. However, file systems form an architectural basis for physical storage in several WCMS implementations.


Perhaps the fastest growing channel for content creation is the weblog, more commonly referred to as 'blog'. Weblog systems make it possible for authors in lack of technical skills to publish online content. Recent years have seen an explosion of 'bloggers' appearing [Blood, 2000], and some believe that this form of publishing will continue to grow at such a rate that it eventually will replace communication lines like e-mail and online forums. In spite of its success, the weblog is still a far too simple protocol to be considered anything more than a possible part of a WCMS.


Not nearly as widely known as the weblog, the wiki stems from similar communities of developers using the Web for asynchronous communication and collaboration [Cunningham, 2001]. The wiki is a decade old tool allowing developers to create documentation on web-page format, making the documentation easily accessible for viewing and editing. The most famous wiki today is by no doubt Wikipedia [Wikipedia, 2006]. Like the weblog, the wiki is too simple a tool to be considered a WCMS. Some have explored the so-called xanalogical potential of wikis [Di Iorio, 2005], so this may very well change in the future.

Web editing tools

Most web-sites are made manually with the use of HTML-editors. While HTML documents can be made with simple text-editors, many users turn to larger web design tools like Macromedia Dreamweaver, Microsoft Frontpage and Adobe GoLive. These products usually feature WYSIWYG-editing, web-page previews and even synchronization processes for updating web-pages. Strictly speaking, these tools are mere design-tools. They can be used for creating content, but their main purpose is to control the look and feel of the web-design. This does not constitute content management.

Enterprise Content Management

Systems performing enterprise content management (ECM) are typical large scale systems meant for corporations with content throughput of higher magnitude. Some systems like these incorporate their own WCM systems, while other vendors have separated their WCM product from their ECM system [Pelz-Sharp, 2006].

In the industry of content management, the use of this term is largely undetermined. ECM is used for products that do simple content management.

Some WCMS vendors claim their services feature ECM. On the other side of the scale, many lightweight web applications claim to deliver content management when they actually are providing what is by most perceived as web content management, or perhaps merely weblog or wiki functionality. Regardless, in the terms of this thesis, ECM remains something larger than the WCMS, a system able to process the entire digital content flow of an organization.

Digital Asset Management

These systems are developed to handle advanced kinds of media information, like video and images. The market for this kind of software is expected to grow during the next years due to a larger amount of Internet subscribers capable of streaming multimedia due to wider bandwidth. Many WCMS support media types, especially digital images to some extent, but proper digital asset management systems are stand-alone systems [Porter, 2003].

Records Management

Records management (RM) is also referred to as data warehousing. Large quantities of situational and transactional information require special software developed to store information snippets where the number of articles is counted by the million. Some ECM vendors include RM systems in their enterprise solutions, but a WCMS alone is not necessarily linked with an RM solution.

Document Management System

Systems that allow version-management, workflow control, collaboration on documents, digital library and information repositories lie at the core of several content management systems. Some will regard document management systems as software managing scanned digital copies of paper documents. Traditionally these systems were built in-house or proprietary systems, but recently some open source alternatives have started to appear [Gottlieb, 2006]. Like RM solutions, these are not essential for web content management.

Knowledge Management Systems

Foremost, the principles behind knowledge management (KM) take on a more human approach than traditional software engineering [Davenport, 1998]. Even though a knowledge management process will at some point include digital content management, the process as a whole has a nobler end. While the goal of a WCMS is to make content delivery smarter, the knowledge management goal is to make people smarter. Most would agree that a KMS is a suite of processes and tools that includes a variety of computer systems like groupware and generally every kind of management and communication system, including the WCMS.

Web Portal

This is perhaps the most difficult category to separate from the WCMS. The term portal is subject to many interpretations. Some considered it to be a personalized start-point on the Web, displaying bookmarks, news and other select content. The Java Community Process' Portlet definition describes portal (or the compilation of Portlets) as a tool for integrating different content sources into one single page [JCP, 2003].

Regardless of its content, a portal is most easily recognized from its panel-like display, including several windows of various content types. It is both possible to say that a portal is part of the WCMS since it can be used for handling online content. On the other hand one can say that the WCMS is one of the many windows in one portal, one WCMS being simply one of the many data sources integrated in the portal.

CMSWatch defines the difference between a WCMS and a portal as the latter being intended for content delivery, while the former is mainly used for content creation. Still it admits that the tasks of the systems overlap, and that open source WCM systems bear portal similarities [Boye, 2006].

The content landscape

The landscape of alternatives is summarized in the figure above. Note that this is just one simple way to consider the range of content management software in the market today. The horizontal axis represents the goal ranging from delivery to the Web to storage. The vertical axis indicates the size or complexity of the system. This is not accurate overview, and many variations of these systems could have been placed differently.


Blood, R. 2000, "weblogs: a history and perspective" [] Retrieved 30. April, 2006

Boye, J. 2006, "Portals and CMS: What's the difference?" ['s-the-difference] Retrieved 3. April, 2006

Byrne, T. 2001, "CM vs DM vs KM vs DAM vs SCM vs DRM -- Which One is Right for You?" [] Retrieved 3. April, 2006

Cunningham, W., Leuf, B. 2001, The Wiki Way: Collaboration and Sharing on the Internet, Addison-Wesley

Davenport, T. H., Prusak, L. 1998, Working Knowledge: How Organizations Manage What They Know, Harvard Business School Press

Di Iorio, A., Vitali, F. 2005, "Web authoring: a closed case?", conference proceedings from HICSS-38, IEEE International

Gottlieb, S., Wohlrapp, S. 2006, "Unleashing the Power of Open Source in Document Management" [] Retrieved 10. April, 2006

JCP - Java Community Process 2003, "JSR 168: Portlet Specification" [] Retrieved 27. April, 2006

Junco, N. L., Bailie, R. A. 2004, "A Case Study of Content Management", conference proceedings from IPCC 2004, IEEE

Pelz-Sharp, A. 2006, " ECM + WCM = ?" [] Retrieved 2. March, 2006

Porter, R. 2003, "What is Digital Asset Management?" [] Retrieved 22. April, 2006

Wikipedia 2006, "About Wikipedia" [] Retrieved 3. April, 2006

Ps: I take some twisted academic pride in that my only reference to Wikipedia (above) was on the topic of Wikipedia itself :)