|
|
/ Writings / Evaluation of the Need for Better Zope-RDBMS Integration |
Evaluation of the Need for Better Zope-RDBMS Integration
Shane Hathaway
June 2002
Introduction
This research project seeks to decide whether Zope needs better integration with RDBMSs than it currently has, and if it does need better integration, this project also evaluates whether a new solution under construction is the right answer. First, some background on Zope and RDBMSs.
Zope is an Internet application server designed to simplify the process of putting dynamic content on the web. Zope allows people to create web sites using a common web browser. Zope Corporation, a company in Fredericksburg, Virginia, produces Zope and makes it available as free, open source software. This means that anyone can download Zope, use and improve upon it, and even sell it.
Zope is based on a technology called ZODB, or "Z Object Database". ZODB is a type of database that breaks from tradition. The databases most people are familiar with are relational databases, or RDBMSs. Relational databases are fundamentally different from object-oriented databases, or OODBMSs, like ZODB.
RDBMSs versus OODBMSs
The difference between the relational databases and object-oriented databases lies in their flexibility. To store data in an RDBMS, you must first define the complete structure of your data. For example, if you wanted to store phone numbers, you would create a table, then in that table you would set up a few columns including "name" and "phone_number". You would then write a program that can interact with those specific columns. If you later decide you want to store people's email addresses also, you have to add another column and change your program as well.
To store data in an OODBMS, you don't have to define the structure ahead of time. You only have to write your program, then connect your program to the OODBMS with a few instructions, and you're finished. The OODBMS takes advantage of the structures you naturally use when creating your program, and it simply stores the structures. It is faster and easier to write a program for an OODBMS than for an RDBMS.
But RDBMSs are very popular. The major vendors like Oracle, Sybase, IBM, Borland, and others, all sell RDBMS software. Computer science courses in practically every university teach development and administration of RDBMS-based software. RDBMSs also have certain theoretical advantages, such as the ability to search for data based on previously unanticipated criteria, and years of competition in the RDBMS market have led to refinements in reliability and scalability.
Z Object Database
One of Zope's great strengths is that it is based on ZODB. Because of ZODB, software development using Zope is fast and easy. Most of the time, when you write a program that inputs something from the user, you have to write special routines to store that data. But not so with Zope--you can just pretend that your program never stops, never crashes, and never has to write anything to disk. ZODB and Zope take care of the rest of the details.
Because of the important advantages, Zope will never give up ZODB, but people still want to be able to use RDBMSs with Zope for several reasons. As discussed already, people are already familiar with relational databases. Also, ZODB is only accessible through the Python programming language, while relational databases are more language-neutral. Relational databases can often adapt to unexpected requirements more easily. And because they have been around longer, relational databases can often hold more data, read and write data faster, and maintain full-time operation better than ZODB.
Building the Bridge
For a long time people have been requesting better relational integration in Zope. There are several mailing lists for people interested in Zope, and anyone is allowed to post to these mailing lists. A question frequently asked on these lists is, "How do I connect Zope to my database?" Several people answer, but often the answers are not quite satisfactory.
Zope has limited relational integration--you can open connections to an RDBMS and store and retrieve data, and you can even store and retrieve objects, but objects from the RDBMS never reach "first-class citizenship" in Zope. Zope does not allow you to manipulate these objects as easily as you can work with objects stored in ZODB.
Zope can also connect to a relational database using the RDBMS as a type of ZODB. This solution satisfies those who need to store large amounts of data, but it stores the data in a special Python-only format. It eliminates the possibility of adapting to new requirements later on and locks out other programming languages.
So does Zope need better integration with relational databases? Are the current solutions adequate, or do we need a solution that can give first-class citizenship to objects in a relational database?
To answer this question, I analyzed the Zope mailing lists, where people interested in Zope converse on the Internet. I am working on a product that provides a new kind of integration between Zope and relational databases, and measuring the potential support for the new product will help me choose the right audience.
Methodology
Over the past three years, people have posted about 100,000 messages to the Zope mailing lists. The messages deal with many different aspects of Zope. Many of the messages express an opinion about Zope's relational integration, sometimes explicitly and sometimes implicitly.
To narrow the sample, I took only the messages that used the word "relation" or "RDBMS" in the subject line. This yielded 196 messages posted between the time when Zope was first released in 1998 and May of 2002. Then I developed a questionnaire and a series of hypotheses that could be proved or disproved based on the answers to the questions. I filled out a questionnaire for each message.
As it turned out, even with the filter, a lot of the messages were not relevant to the survey, so the first question assessed whether the message was relevant to the survey, and if not, why. After the relevance question, the questionnaire asked four questions assessing the message poster's background, such as familiarity with Zope and the current alternatives for RDBMS integration. Finally, the questionnaire asked four more questions to evaluate the poster's opinions regarding Zope and RDBMS connectivity.
To get enough useful information, I had to make a few inferences. For example, people who were just learning how to use Zope and were requesting information on how to connect it to relational databases probably thought that integration of Zope with relational databases is important. Most of the questionnaires used the "undecided" option frequently since most messages did not express all the opinions sought after.
Detailed Evaluation
63 of the messages turned out to be relevant. 60 messages had nothing to do with RDBMS support, 42 expressed no concrete opinions on the subject, 26 were duplicates or written by an author who had already been counted, 4 had no significant content, and one of the messages was written by myself, and therefore was not included in the tally.
The first hypothesis was that people believe current RDBMS support in Zope is not quite sufficient, but it has great potential. Based on questions 6a and 6b, out of those who expressed an opinion, 61% believed it was not sufficient, and fully 100% felt it had great potential. Note that the mailing lists are likely to show a strong optimistic bias in favor of Zope-based software, so the unanimous vote of confidence is understandable.
Secondly, I hypothesized that people who are Zope "gurus" don't see as much need for relational integration, probably because there are object-oriented ways to achieve almost the same capabilities that relational databases provide. The results, based on questions 2 and 6 and illustrated in Table 1, seem to agree with this statement. Among the 7 people counted in the survey who had just started using Zope, all believed that relational integration is important and none said it was sufficient. Among those who were more familiar with Zope, 96% believed relational integration was important and 75% believed it was not yet sufficient. And among the "gurus", only 42% thought the current RDBMS integration needed improvement and only 66% thought it was an important part of Zope.
|
|
Downloaded today |
Familiar |
Guru |
|
Insufficient support |
4 (100%) |
12 (75%) |
8 (42%) |
|
Important |
7 (100%) |
25 (96%) |
18 (66%) |
Table 1. Hypothesis 2: Gurus don't think RDBMS support is as important
The third hypothesis was very similar to the second: people just entering the Zope community see a great need for Zope relational integration. This hypothesis dealt with questions 3 and 6. To answer the third question, I estimated how often the person posts to the mailing lists. Like the results of the second hypothesis, the new people unanimously felt that RDBMS integration was not yet sufficient, while about half of the people who post often believed that the current integration was indeed sufficient. Apparently, either the technology is already there but is not easy to use, or people eventually learn how to work around the absence of relational database integration.
The fourth hypothesis evaluated whether people want relational integration primarily because they are already familiar with RDBMSs. Based on the results of question 7, familiarity is an important reason, but not the most important. As shown in Table 2, 95% wanted relational integration because RDBMSs have certain features and characteristics like data capacity and reliability. If this were the only reason people wanted relational integration, Zope's current integration would indeed be sufficient already, since it is already possible to use an RDBMS to store ZODB data, but in a Python-only format. However, at least 80% of the messages expressed needs that go beyond the capabilities of the current solutions. Particularly, 84% felt it was important that data be more accessible to languages other than Python and servers other than Zope.
|
People are already familiar with relational databases |
16 out of 20 (80%) |
|
Relational databases are more accessible to other languages |
11 out of 13 (84%) |
|
New kinds of queries are needed as requirements change |
14 out of 17 (82%) |
|
Relational databases have needed features or characteristics |
23 out of 24 (95%) |
Table 2. Why Zope needs relational database support (question 7)
The fifth hypothesis examined whether people want to connect existing Zope-based products to databases, or whether it was sufficient to connect only new products. It can be difficult to make a Zope product that is capable of storing objects in either ZODB or a relational database, so currently people have to choose between one or the other when writing their products. The results of the question 8 show that people want both: 90% want to store objects of existing types, and 100% of those who expressed an opinion wanted to be able to store objects of new types in an RDBMS.
The sixth hypothesis guessed that Zope RDBMS support should be usable by newcomers. If people consider Zope RDBMS integration important, and the solution is easy to use and learn, this feature should be advertised. Based on question 9, 72% believed new users should benefit from the RDBMS integration, 91% believed people who work with Zope through the web need the feature, and 100% saw product authors as an important audience. This means the solution must primarily be easy to use by product authors. Secondly, it should be easy to use by script writers and newcomers.
Results
Current RDBMS support is not sufficient but definitely has great potential. This belief is more prevalent among those new to Zope than those who are highly familiar with Zope. People want relational integration primarily for reliability, data capacity, and other RDBMS features; but they also want it for language independence, for future unanticipated requirements, and because RDBMSs are familiar to them. The integration should work almost as well for existing products as it does for new products. Product authors and script writers should be the primary audience, but newcomers want RDBMS integration also.
These results confirm the overall theory that Zope needs better integration with RDBMSs than it currently has. And they suggest that the solution being built, if it is easy enough for product authors to use, will indeed fulfill the requirements expressed by the Zope community.
Comments on the Methodology
Analyzing messages for semantics turned out to be more difficult than planned. I had to be very careful not to introduce a bias, so I had to read each message very carefully. To use this method on a larger scale, I would solicit the assistance of a second reviewer, who would review the same messages I did. We would compare the independent results to arrive at a standard deviation calculation.
This method has certain benefits over conventional survey methods. The sample comes from real-world wants and needs, rather than the people who happen to have a few minutes to spare. The sample spans several years and even includes the opinion of those who tried Zope once but left the community because Zope did not support relational databases well.
This method has a regrettable downside, though. The numbers in the results tend to look sporadic, since so many of the answers to the questions had to be left "undecided". In a conventional survey, most people answer all of the questions, so it's easy to compare the answers to one question with the answers to another question. Still, I believe the results are valid and conclusive enough for me to pursue a new kind of Zope RDBMS integration.
© 2002 Shane Hathaway
WORKS CITED
Lloyd, Brian. "An
Introduction to Zope."
Developer Shed, 03 August 1999.
http://www.devshed.com/Server_Side/Zope/Intro/page1.html
The Zope Community. The Zope Mailing Lists.
http://www.zope.org/Resources/MailingLists
Appendix A. The Survey Questionnaire.
1. Relevance to survey
