Sunday, January 29, 2006

Open source CMS evaluations

I have now seen three more or less serious open source CMS reviews.

First guy to hit the field was Matt Raible (1 2 3 4), ending up with Drupal, Joomla, Magnolia, OpenCms and MeshCMS being runner-ups.

Then there is
OpenAdvantage that tries out a handful (Drupal, Exponent CMS, Lenya, Mambo, and Silva), including Plone which they use for their own site (funny/annoying that the entire site has no RSS-feeds, nor is it possible to comment on the articles), following Matt's approach by exluding many CMS that seem not to fit the criteria. It is somewhat strange that OpenAdvantage cuts away Magnolia because it "Requires J2EE server; difficult to install and configure; more of a framework than CMS", and proceed to include Apache Lenya in the full evaluation. Magnolia does not require a J2EE server. It runs on Tomcat just like Lenya does (maybe it's an idea to bundle Magnolia with Jetty to make it seem more lightweight). I'm still sure that OpenAdvantage would 'fail' Magnolia for being too complicated as Matt did.

All websites have a different need, and these two evaluators value ease of setup, use and design-modification (not functional modification). A more enterprise-ish review has been done by Optaros, evaluating for different website needs (brochure, periodical, collaboration, wiki and community). Elegant observation:

Open source content management software is most frequently used in small to medium sized web sites with very common requirements (such as corporate identity websites and departmental intranet sites or online magazines rather than large product websites with hundreds of thousands of pages) and as a foundation for building unique, highly-customized solutions (such as Amazon.com which uses open source components such as Perl, MySQL, and the Mason templating engine).
The paper provides an in-depth evaluation of three or four CMS'es in each of the five categories. I am left with the feeling that the landscape of Java-CMS'es are very far behind the other ones, but still I would prefer to work with a Java-based CMS as it is my language of choice, and because I've fallen pretty much in love the the Java Content Repository. Nonetheless, the paper is an excellent starting point for a small or medium-sized business considering an open source CMS. I wonder if there is a common content repository interface (not Java-dependant). Jackrabbit has (or are on their way to) implemented a PHP-interface for their repository (but still no .net, perl, ruby and python interfaces). In the mean time the closest you get to platform independant content is a database (which is not so good).


Dynamifying my bookmarks

I guess this is as close I'll get to talk about Web 2.0 (yuck) in this blog. I'm in the process of moving my links in this blog out of my static template and into del.icio.us (tip o' the hat to Petter!), like the blogroll from bloglines, except this is a linkroll, I guess. Here are the steps I did to get it in:

  1. Bookmarked and tagged all the links on my del.icio.us (tagged with atleast 'cms')
  2. Did not bookmark links that I blogroll allready
  3. Find out what to do with the template. This turned out to be more difficult as all the how-to's on the net are concerned with how to integrate tagging between del.icio.us and blogger, which I don't care about (yet). Tagging (i.e. metadata) is seriously overrated anyway.
  4. Patiently wait for Petter to comment here and say what he did because I can't bother finding it out on my own right now :)
  5. Thanks, Petter! Went to http://del.icio.us/help/linkrolls and fiddled with it a bit
  6. Pasted it into a comfortable area in my template (same place as yon ole links)
  7. Fiddled with the CSS a bit (thanks, Lars!). Ended up with these styles:
.delicious-posts { margin: 1em; border: 2px solid #999; padding: 0.5em; width: 14em; font-family: sans-serif; }
.delicious-posts ul, .delicious-posts li, .delicious-banner { margin: 0; padding: 0; }
.delicious-post { padding: 0.25em; font-size: 80% }
.delicious-odd { }
.delicious-banner a { font-size: 80% }
.delicious-post a { }
.delicious-posts a { text-decoration: none; color: #f93; display: block; padding: 0.3em }
.delicious-posts a:hover { color: #fc9; }
.delicious-posts .delicious-odd a{ color: #888899 }
.delicious-posts .delicious-odd a:hover{ color: #cdcdcf }

Definition review: Gilbane's CM definition

I had a quick read through Gilbane's CM definition to see if it would spawn any ideas or reactions.

Web publishing meets e-Business. Seems like they want to shrink the definition into meaning web content management, excluding stuff like digital document management. I don't mind this, but personally I prefer to use CM as an umbrella for most of the other management types, including web. And I see that towards the end of the document different analysts views on this are presented. And indeed some of them share my view, Gartner even claims (uh, mind that this was in year 2000) that no full CMS does not yet exist (still a valid claim?). Coincedentially, CAP even uses the term "umbrella".
Oh, and there's a very nice comment on knowledge management (this one's for you, Thommy ;)):

Fortunately, the assault on logic and language that was knowledge management has run out of steam. We'll still see the term used by consultants and some technology vendors (including Microsoft and Lotus), but we won't have listen to specious marketing pitches claiming that managing knowledge will replace managing data, documents, content, etc.

Just to clarify, me and Thommy (good friend and KM-specialist) both agree that KM can envelope CM, and CM can envelope KM-tools. We shouldn't try to deny the usefulness of KM, I think KM's golden age has yet to arrive (knowledge is important, and corporate knowledge is not the same as corporate content). As technology improves, KM-tools will become more feasible. In my experience, a KM-evangelist doesn't try to replace CM with his or her own tools, but include CM in the suite of KM-tools. KM and CM do not compete. I guess this is where the commersialism of Gilbane shines through.

But back to the subject, mixing code and content, no surprises there. There's an interesting view of perhaps seperating transactional information out of the content term, but the theory fails. Any CMS but "brochureware" systems (nice word!) produces/supports transaction, like this very blog has transactional information in shape of the RSS-feed (further down on your right).

All in all, it is an old, but still very valid definition. A bit business oriented, but otherwise it didn't contribute much to my understanding of the term.

Note to all content- and knowledge managers, rest assured: You don't have to know what it is to figure out a good way to do it!


Tuesday, January 24, 2006

Another meeting with the coach

Met A again for the first time since the middle of December. We spent most of the time discussing the content of the thesis as it is now (haven't uploaded it to anywhere yet). I have to clear up my Research Question and send it to her by next monday.

Here are some personal notes on what must be done with each chapter:

Define the Research Question. What are you investigating? More academic references (find on portal.acm.com, IEEE)! Use the articles reviewed already.

Specialize in an aspect. People, human, social, business, technical, community. Choose one. Red thread.

Chapter 1
Present the problem early on. Small outline.

Move how-to to the end of the chapter. Make it correspond with the current TOC.

Introduce more along the generel context of information systems. Top down approaching WCMS. Context of study. Research Question (everything depends on this). The rise of the Internet (use inf5210 sources here). Describe more business, KM and CM. Information infrastructure and software. Link between CM and KM. Use eLearning for this purpose (eLearning produces the need for WCMS). Knowledge portals.

Too business-like on page 5. Do not use "you" (oops) "my" or "I". If you are going to use the business perspective, you must explain more about it.

Chapter 2
Again, start with a top down approach.

When introducing invented frameworks and architecture, say what you have invented. Try to find existing sources first. Draw an architecture model. Make models!

Levels. Use XHTML (surf the hype). Explain what the MVC is. Reference something.

Discuss integration problems.

Don't use 'quicker'. Be academic, follow the style. You are not a system developer. You are a researcher. Link this better with Open Source.

Try not to convince the reader too much. Be objective. Present the hybrid open-proprietary software properly. A pyramid which many are trying to use. Diagram.

Fix the "cannot" thingies which OpenOffice messed up. Done.

Move the cases in 2.3 to the introduction.

What is workflow? Define. Do not repeat the requirements when comparing the solutions. Make a diagram of requirements. Define them once and re-use.

Chapter 3
What is the point of presenting the Case on page 15? Purpose? It is not well linked to anything. Take it away?

Magnolia's fulfilment of the search requirement. The reader is not impressed by the fact that it is made with the help of open source tools. Explain the improved functionality.

Chapter 4
The two last cases. Too personal. Use them as research input, reduce them to a few lines, advantages and disadvantages. Explain the need for standardization better.

Goes a bit too technical in the end.

The levels of content management

Web ontent management is a challenge that any company with a website consciously deals with. We can divide the physical management of content into four levels.

Level 1 content management: static files
The most basic strategy is to compose static HTML files and transfer these to a webserver capable of serving such files to clients connceting to the website. It is possible to apply stiles to the pages, for example with the help of cascading style sheets (CSS).

Level 2 content management: templates
The next level of content management is if you want to reuse the design of your website by dynamically including content into a frame of finished design, or a template. The content is typically contained in some text file the dynamc page engine can read. Examples of technology capable of this are Server-Side Includes (SSI), CGI, PHP, ASP and JSP. HTML also has a command called “frames” although professional web designers and developers frown upon the use of this deprecated function.

Level 3 content management: dynamic content
The next level of complexity arrives as we want to add even more re-use to the templates, having a template dynamically selecting content source based on a dynamic parameter. This is not possible with SSI as you have to provide each seperate content page with its own physical HTML file [confirm this]. This means two files for any page on your website, one with content, another with design. Many find this to be too cumbersome and end up putting both files inside one, thereby mixing content and design. If a dynamic parameter is possible, one can have the template select and read the content file conditionally, thereby removing the need for its own HTML file.

Level 4 content management: content repository
Now the next step is to remove the content files to replace them with something more scalable. Why does one want to do this? Native files have many disadvantages: they are not versionable, backup-routines require mirrored copies, search is not easy, binary files (like picture and video) can not be wrapped with metadata, there is no fitting access control and the possibilities for collaboration is limited. Instead the content is put inside some kind of repository, most likely a database. Management of the content is thereafter handled by middle-ware which allows the aforementioned lacks of filesystem objects.

A system developer will now see that we have attained the three-level architecture of the Model-View-Control (MVC) pattern. The model consists of the content in the database, the view is provided by our templates and control is implemented in the middle-ware. The MVC is a pattern that offers a healthy seperation of concerns in the CMS.

It is possible to invent further levels of content management, but any complex WCMS will apply some variation of level 4.

Sunday, January 22, 2006

Why only a web content management system?

When selecting a system to control their website, CIO's are often tempted to invest into corporate-wide enterprise solutions. These solutions promise to solve many of the corporate IT-problems with a single centralized silver bullet system. However, as James Robertson points out, the projects where these solutions are selected, implemented and deployed often fail miserably, taking too long and when if they ever achieve normal use, the world has changed and the system no longer satisfies the demands of Web x.0.

My reaction to reading this well-pointed-out-and-written post is that CIO's should think very carefully before going into such immense projects. Until technology has matured further (and IT doesn't matter), perhaps it is better to leave your web content management to a system which is made for the job.

Again, the term Agile Management comes to mind.

Wednesday, January 18, 2006

Book review: Bob Boiko's CM-Bible

Note that I'm not all through this book, will update later.

This is more of a summary/thinkscript (hey, cool term! ©2006), using this book as a stormer for the thesis.

Introduction

States the obvious reasons for why Content Management is needed (underlies e-business), informations frenzy, information age, etc.

Part 1: What is Content?

Seems to be a nice place to start. Get the definitions sorted out in an introductory way.

Chapter 1: Defining Data, Information and Content

I was previously used to defining use a definition of data/information/knowledge, but perhaps the Content Management Camp share the Knowledge Management Camp's love for coining new definitions.

The core of this chapter is to define the words Information, and mostly Content, which will be used through the rest of the book.

Given that I know what data and information pretty well, the only surprising thing here is how similar the definition of Content is to that of Knowledge. The former does seem to be somewhat closer to Information (we are not ready for Knowledge Management, we need to do Content Management first).

Content is Information put to use.

Content is Information plus data. By applying a small tag of metadata to information (give it a new status), it might become content.

Liz Orna: Information is knowledge put into a communicative format.

Content is information that you tag with data so that a computer can organize and systematize its collection, management and publishing.

Chapter 2: Content has Format

Binary and nonbinary (ascii, xml) are storage formats.

Be consistent in formatting (use styles/schemes/standards).

Separate format from content so you can reuse.

Format can be categorized into: formatting by effect, method or scope. Funny categorization..

Overall, a very narrow chapter about details in text-composition that for most parts have been overcomed.

Chapter 3: Content has Structure

Content can divide into content types, segmenting into content components, which can be divided into elements, which can relate to other elements by way of outline, index, cross-reference and sequence.

Structure is part of the metadata. It is hard to agree and settle on a structure that can be used for information, and even worse, the structure will change over time.

You can structure by purpose, type or scope.

Overall, a small chapter about a very important aspect.

Chapter 4: Functionality is Content, too!

Indeed, I couldn't agree more. Functionality is content, and the ability to extend and modify functionality should be part of CMS evaluation.

To be continued....

PS: What the ### is up with Writely's line-breaks? Can't I pleeeeeeaase get to edit the html directly?

Tuesday, January 17, 2006

Book review: Integrative Document and Content Management: Strategies for Exploiting Knowledge

Found an electronic version of this book. Just looked briefly through it, and it might prove to be a valuable source (because):

  • Large focus on web
  • The business context
  • Very nice and large part on engineering the requirements of the "IDCM"
  • Perhaps a bit too big-bang, integrating much more than just web (also e-mail, DM, the lot)
  • Chapter 18 focuses on the functional requirements of Web Content Management. A must read for me.
  • A huuuge chapter on assessment, choosing, contracting, implementing (vendor's) solution

I dare to copy the title of the books preface (for personal reference):

This book blends theory and practice to provide practical knowledge and guidelines
to enterprises wishing to understand the importance of managing documents along with
presenting document content to facilitate business planning and operations support. The
book introduces strategies for Integrative Document and Content Management (IDCM).

Asprey, Len. Integrative Document and Content Management: Strategies for Exploiting Knowledge.
Hershey, PA, USA: Idea Group Inc., 2003. p ix.
http://site.ebrary.com/lib/hio/Doc?id=10022510&ppg=10

Copyright © 2003. Idea Group Inc.. All rights reserved.

Monday, January 16, 2006

I need a conference! v1.3

Latest update: Here's another conference just announced .

Update: Found a nifty list of conferences on the contentwrangler's blog. Will have to go through them properly later.

Now A has a functional requirement for all her master students: You have to publish something, or speak at a conference. I have been looking at various conferences, but unfortunately many of them are either too commercial/product oriented (like this one or this one), or they are not closely enough related to WCM (like this one and this one ). Perhaps I'm being too picky about it, and I'm quickly running out of time.

Any conference I apply for will have to be:
  • Not too expensive (the institute will probably not cover more than about 800€ plus travelling expenses to a certain limit)
  • Related to content management or web
  • Theoretically/academically focused (i.e. students and scholars attend)
  • Taking place within the next 6 months (very hard to submit a paper and be accepted in such a short timeline)

So if you have any ideas, please let me know.

The Gilbane Conference

San Francisco, April 24 - 26, 2006 (next one in November, submit by May)

No specific deadline for submittals, I doubt I'll get to speak at this one. Too industrial, high class.

LinuxTag
Wiesbaden, May 3-6, 2006
Also very industrial (but has a track on Information Web). Vendors presenting, buyers visiting. Deadline was a couple o' days ago.

The International Conference on e-Learning
Montreal, June 22-23, 2006
Very academic. Part of the ACI (I got an abstract accepted for an earlier KM conference, but did not submit the paper). Is about e-Learning, but CM is mentioned as a topic. Submission deadline was 12th of January.

The IASTED International Conference
Calgary, June 17-19, 2006
A wide range of topics. Academic. Ugly website, but this might be something. Deadline is 1st of March.
Fee is about 500€ I guess, strange payment model. Flight will probably be 8.000 NOK.

International Conference on Software Engineering Research and Practice

Monte Carlo Resort, Las Vegas, June 26-29, 2006
A suggest this, but content management is not really inside the scope of this conference, I think. Seems to be running in parallell with -

International Conference on Information and Knowledge Engineering

Now this is more like it! Exactly within my scope. Submissions by 20th of February (5-8 pages).

There is no information about the fee. Flight will cost from 6.000 NOK (BA) to 10.000 NOK (KLM). No rooms available in the conference hotel. We're lookin at 30$ a night and up.


Open Source World Conference

This one is in Malaga, Spain. But it's in February, and the CFP deadline was in November.


The European Conference on Knowledge Management

Budapest, September 4-5, 2006
Another ACI conference. A bit outside my scope. Submission deadline by 14th of March.
Travelling expenses will probably be less than 4.000 NOK, accomodation pretty cheap too. Conference registration is 200€ (for students).


Software Engineering and Advanced Applications

Cavtat/Dubrovnik (Croatia), August 28-September 1, 2006

A bit outside my scope.


Search engines for travel:

http://kilroytravels.no/

http://restplass.no/

http://wideroe.no/

http://www.expedia.com/

http://www.orbville.no/



Suggesting solutions

I need to come up with a couple of suggestions for a solution to the problem previously presented. The answer to the challenges of web content management is of course a web content management system. Some would prefer a portal, and there are many different opinions on whether a portal and a WCMS are two sides of the same coin. Personally I prefer to think of a portal as a web content management dialect, or maybe the portal is merely the content-delivery part of the WCMS. Will have to include a chapter on portals in the thesis, I think.

My plan in presenting different solutions is so that they can be compared next week. So let's jump into it and grab a couple of WCMS-es I can compare:

  • Magnolia (open source, uses JSR-170 standard)
  • Primetime Portal (proprietary, uses no standards beneath the web-front-end)

In the first round, these two will have to be sufficient. My timeline is simply too short to do a larger comparison. Anyway, both of these contain enough functionality to consider most aspects of content management, and I have lots of experiences with both.

I have already written a good piece on how Primetime is used. I am currently expanding the Magnolia chapter in the thesis. Next week I will have to read up on the AHP, and put the two CMS-es to the test.


Wednesday, January 11, 2006

Improving problem description

This week is set off for improving the chapter on problem description. Every thesis has a "problemstilling", a problem which the thesis should solve. A goal, or a challenge. Currently the chapter looks a little something like this:

Challenges

What are the challenges that have pushed forth content management. What are the problems IT-departmens suffer from today related to web content. Issues on web-management.

The issues of web content management

Content is not maneuvrable. There is too much of it, too many web pages with too many attached documents. Often a corporation will put much resource into sustainin a site map and a navigation tree, but if these are made manually, it will be a lot of work and no guarantee to be correct. Searching is a great shortcut to make all content available, but searching the right way is easier said than done. Does the search engine check if the search word was incorrectly spelled? Are there any synonyms of the search word which should be checked?


Content is useless. The web page is full of dead links. There exists many pages and documents which are not linked to at all, and therefore never will be accessed. It is safe to say that content which is not accessed and used has no value.


Content is not automatically accessible. No XML export. Recently many news-sites have offered the option of subscribing via popular RSS-feeds. By subscribing to these feeds in RSS-readers or news-aggregators, the process of collecting news from these sites is turned from a pull-protocol (actively surfing around on news-sites) into a push-protocol (content is pushed to the reader, like mail to a recipient).


Content has no meta information. There has a been a noteworthy increase in the ability to tag or label various data objects with meta data, like in the header of a HTML-page, or in the properties of a Word-document. It is difficult to force users into actually using these features manually. If the title of this document is "Content Management", why should I write in its meta-data that it is about the same topic? A possible solution to the meta-problem lies in automatically tagging content [HP, 2004].


Content is technically unaccessible. Dependancy to specific software or platform restricts the numbers of users.


-------

So I need to come up with something more completing this chapter. A good CMS doesn't produce the problems mentioned above. CMS-es like this already exist, I'm sure. And the goal of the thesis could indeed be to present a CMS solving these, by the use of open standards. To get the open source bit in, I should add something about functionality and customization (functionality is content too!, like Boiko said). A old rusty CMS, or even a modern one (but not a tidy one) can be quite hard to extend, having components which are not reusable. Content is not reusable.

Interestingly, I'm not the only one who's been asking questions about meta-data. Seth Cambridge is another blogger I just added to my bloglines. But still it remains a problem that so much of the CMS theory landscape remains opinions through blogs and online articles, mediums not really appreciated by the people who will judge my thesis. I might have to get back to basis and read up on some ancient IT-theory I can reuse in this context (but I haven't really got time to do that).

Wednesday, January 04, 2006

Master plan for spring 2006

Just have to point out that spring 2006 is in fact the last semester for my thesis. The feeling is that I'm lagging behind. This is quite common for people in their last semester, but I think it's relatively serious on my part. At least this is not because of a lack of interest in the thesis, but rather a lack of time (had to work) and efficiency.
Nonetheless, here is the workplan for my last semester:


Week 1 (2/1 - 8/1)
Workplan, modification of Magnolia
Week 2 (9/1 - 15/1)
Improve problem description
Week 3 (16/1 - 22/1)
Suggest solutions
Week 4 (23/1 - 29/1)
Describe comparison framework
Week 5 (30/1 - 5/2)
Describe implementations
Week 6 (6/2 - 12/2)
Compare solutions
Week 7 (13/2 - 19/2)
Discuss discoveries
Week 8 (20/2 - 26/2)
Describe standardizations possible tier by tier
Week 9 (27/2 - 5/3)
Collect feedback from users
Week 10 (6/3 - 12/3)
Conclude, further research
Week 11 (13/3 - 19/3)
Propose thesis draft
Week 12 (20/3 - 26/3)
Draft review
Week 13 (27/3 - 2/4)
Draft correction
Week 14 (3/4 - 9/4)
Propose new draft
Week 15 (10/4 - 16/4)
Corrections
Week 16 (17/4 - 23/4)
Final thesis
Week 17 (24/4 - 30/4)
Slack
Week 18 (and 19)
Prepare presentation

Testing Writely

Update: Fixed Writeley typo :)

So there is this new nice online-editor for documents on http://www.writely.com. Even though it's still pretty beta, it's still one of the nicest online wysiwyg-editors I've seen. I quickly noticed the feel of google/blogger of the Writely website, and of course there is some tight integration between these I guess. I'm now in the process of testing Writely's ability to publish directly into my blog. Yes, this post is written as a document on writeley :)

Revision: Did I mention that the revision feature is dead cool?

Reading up on about Writely, it seems like it's a small 3-4 person startup venture. I think we will see great things and interest in these parts until they're bought by some larger company (I'm guessing Google, just like Blogger was some years (?) ago).

Coming up in the near future: workplan for spring 2006, as well as first draft of thesis (old essay squeezed into new outline)!

Monday, January 02, 2006

CMS market is growing

A Danish member of the CM-Forum scene has pointed out how the market for CMS is growing and moving into the SMB niche. There is also mention of coming standards for CM.

Since Bloglines is down at the moment, I'll just note down this interesting Norwegian blog here: http://guiontoblog.blogspot.com/