Web App

Description

Provide a community-building on-line information exchange. This will take the form of dynamically generated web pages that are presented in a heirarchical manner. User information is tracked by the system and acts in a feedback loop to constantly change how the pages are presented to the user (such that user preferences, activities, and the activities of similar users modify the information presentation).

The site presents information blocks in the form of user-generated bulletins, live chat sessions, polls and elections, hypertext links, and option settings. The ordering of these is defined by the user, as is the user's own rating of these materials.

Requirements

The system must be able to provide, at a minimum:

Varying user levels, defined by a set of parameters
Logging in and tracking user activity
Message editing, moderating, and approval
Organization of information into user defined categories
Dynamic, automatic, user initiated sub-category creation (without administrative intervention)
Using stored user information to generate suggested categories and information blocks
Using stored user information to compare the user to other users and generate suggested categories and information blocks that were not initially found
Unique page views for each user
User preferences used to seed and order page appearance
User evaluation of information blocks to determine suitability for that user and other, similar users
Unique user levels for each category (such that administrative privileges don't carry over to other categories)
Persistent, long-term storage of information blocks and user information
Confirmed user identity
Some administrative tasks can be handled by non-administrators by fiat or by an information block/election module
Open source software (after initial release)
Scalable architecture
Modular information blocks that can be added to allow the system to grow
Internal bookmarks for users
User reminders

Data Requirements

There are a variety of data sources and repositories that are needed.

Data Sources

User preferences module output
This defines how the user wishes the output to be displayed:
- frames/noframes
- java/nojava
- graphics/text
- color preferences
- information block ordering
- password
- email address
- real name
- register for user level
- description for other users
- will chat
- chat gag lists
- interested text (intended for other users to read)
- show interests based on clicks to other users
- reminders
User link clicks
Links to information modules and a redirector to links outside the system
Frequency of visits to particular categories
Posts to individual categories (lurker vs. active)
Voting in polls
Voting in elections
Evaluations of information blocks
User longevity (newbie vs. guru) (implied data)
Login times
Session length
Time spent in each category
Time spent chatting per category
Referring site URLs
Bookmarks

Data Repositories

These are general descriptions, not the detailed data definitions. It is important to note that very few data respositories are actually defined by the app engine; rather, most repositories are defined by the information block modules.

User information and preferences
User activity defined by the engine
Module entry points associated with form ids defined by the engine
Information blocks (including popularity and user cross-references for each block, enabled/disabled, approved, approved by)
- Poll and election results - including effects to implement if any
- Text/media elements
- Link lists
- Chat participants and text (may store only recent chat text)
Category and member information blocks
Category and subcategories (communities)
Connection/login/logout logs
Roles relating to rights/permissions
Cross reference tables
- User activity on information blocks
- User activity on categories
- Similar users for each user
- Visited/aware categories (to generate lists of new categories of interest)
- Bookmarks for each user
- Roles to categories

Process Details

Rights

Log in/out
Read information block
Suggest information block
Edit information block
Approve/post information block
Evaluate information block (sucks/rocks on block)
Disable information block
Vote on election type information block
Vote on poll type information block
Observe chat information block (no text entry widget)
Participate in chat information block
Change preferences
Set user id hardcoded to only apply to administrators
Sign up
System tuning hardcoded to only apply to administrators

Recommended roles

Each role in this list inherits all the characteristics of less powerful roles except for the "sign up" right. This is not imply a role heirarchy. Roles are to be individually set up and not arranged in a heirarchy.

Guest
- Read information block
- Suggest information block
- Sign up
- Observe chat
- Login/out (so users can get in--logout only appears if logged in)
Member
- Evaluate information block
- Participate in chat
- Vote on election
- Vote on poll
- Change preferences
Editor
- Edit information block
- Approve/post information block
- Disable information block
Admin/Janitor role 0 is hardcoded as admin
- Set user id
- System tuning

Category relationships

The top level category defines universals that are legal through all lower level categories. Below that level, however, category roles or definitions do not inherit or propogate to lower levels.

An information block may belong to only one category.

User tracking

Cookies
Web session ID posting
Some combination

Information Blocks

Information blocks can be nested. If the author of an information block is online and in this category, that information (if author is chat-ok) will be noted.

Information blocks will not appear for a user if they do not have rights to view the block.

The appearance of an information block is determined by XML stored within the database. This contains layout hints as well as definitions of nested information blocks and default values or form field data substitutions.

Each information block has a unique ID that allows it to be referenced from other information blocks and from a URL. This allows it to be linked to from outside sources, bookmarked in a traditional fashion, as well as being linked into other blocks by editors.

Distinguishing types of information blocks

Information blocks may be of almost any type. However, some have repurcussions (polls, elections, etc.) that act beyond simple data storage. In particular, voting and polling have special rights that are coded in. To allow special rights to appear, information blocks have an interface that let the system recognize that special permissions are in place.

Installation information blocks

Information blocks may need new data columns or tables added to the database. They should only impact existing tables by adding rows or adding new tables. In no case, except major version changes, should information blocks delete columns or tables. Prior to adding the new information block to the server, the database should be modified.

Anticipated information blocks

Polling
Stores poll data. Keeps track of who has voted for this issue. Can be evaluated.
Voting (abstract)
Stores voting data. Keeps track of who has voted for this issue. Performs some action or actions depending on results. These actions are valid in the given category only.
- Change user to some role
- Change some role
- Add a role
- Remove a role
- Approve an information block
- Remove an information block
- Replace an information block
- Change an information block's rating
- Make an information block sticky (put at top or bottom of list, but always visible
- Ban a user
- Unban a user
- Add a category
- Remove a category
When voting is set up, the editor has several options:
- Length of voting period
- Voting ratio (% yes / % no to succeed)
- Roles that may vote (may limit to editors, etc.)
- Actions to perform if successful takes the appearance of an add action button and a list of defined actions--will require an additional define action screen
- Actions to perform if fails ditto
Text/html block
Can be evaluated.
Link block
Links are ordered based on user preferences as well as newness of the links.
Preferences block
Cannot be evaluated.
Comments holder
Holds comment information blocks. Cannot be evaluated.
Comment
Threads via comments holder.
Chat
Allows chatting in this category. May be placed in a seperate window (via Java applet or equivalent) to allow user to view other categories but keep chatting. Cannot be evaluted.
Category
Holds other information blocks. Cannot be evaluated.
Search
Search category titles, authors, or text for strings. Cannot be evaluated.
System tuning
Rebuild recommendation cache, change recommendation rebuild period, write-all message, stop accepting new logins, start accepting new logins. Cannot be evaluated.
Roles and Rights
Add, edit, and delete roles. Assign rights when adding or editing a role. Change the name of a role. (Issue: removing a role with current members will cause the deletion to fail) Cannot be evaluated.
Login Module
Log the user in and return the page the user was trying to access. Cannot be evaluated.
Logout Module
Log the user out and return the user to a login page. Cannot be evaluated.
Highlights
Ordered by user preferences, user profile (activity profile), similar users, and what items have been seen in the past. This is the "recommendation" or analysis module that is the really unique part of this system (and requires all the data). Captures all user clicks and correlates user classes and activities to find similar users. Cannot be evaluated.
Bookmark
A special kind of Link block that gets added to by the user. Provides a bookmark icon on evaluatable info blocks.
User Viewer
Other users can get information about a user.
Category Builder
Allow editors to order pages, move elements into categories, link information blocks, etc.
Object Cache
Cache most frequently used objects from database. Use this as primary DBI. Reap after a timeout.

Service Level

Any individual page request should take no more than 1.75 seconds to complete. Up to 50 page requests per second should be able to generated per server.

This system is scalable by distributing the load across several tiers. Tiers identified at this point include:

User browser/applet
Web server
Application server
Object-Relational mapping
Data multiplexer/manager
Database

A strongly object-relational database may imply dropping, changing the location, or modifying the object-relational mapping layer.

Achieving a highly responsive system can happen via precaching requests. User recommendations across all categories are compiled when (a) a user logs in and no current recommendations exist or (b) 20 minutes after the last recommendations were compiled. This is achieved via a recommendation builder queue. The timeout is tunable by administrators.

Additionally, for security and convenience, the system will default to a 20 minute idle timeout, at the end of which the next page load will require a reauthentication. This time can be changed to any number minutes, with 0 indicating no timeout.

End-user interface

The end user will interact with this system via a combination of HTML, Java applets, and frames through a web browser.

The system must be capable of being almost entirely maintained via a web browser. The only activities not possible through a web browser are limited to:

Adding or removing modules
Adding or removing columns and tables from the database
Starting and stopping the server

Recommended Platforms

Because this is to be an open source project, is has the potential of working on any system. To speed development, the following are recommended:

Linux or FreeBSD operating system
Apache webserver
Java for prototyping, C or C++ for final
MySQL or ODBC database

Areas of investigation

These need to be investigated before making any final decisions on platform and supporting systems:

Will apache be fast enough with a compiled in module to generate the number of pages per minute, or is a custom webserver required?
Speed benchmarks for complex joins ~~and stored procedures~~ for MySQL ~~and Postgres~~
Raw socket with XML-like protocol for data transfer between certain tiers
Define the XML language for creating pages.
Define the XML language for database transfer (query/update dtd, and result/result-set dtd).

Engine Program Flow

Consider that the engine is broken into several discrete layers or components:

HTML output generator
Converts XML hints and module output into HTML pages. Usually, this will include asking modules for HTML output given the XML field as input to the modules.
HTML input processor
Processes HTML input forms and looks for a MODULEID hidden input tag. This will indicate to which module to pass the form input. If the form contains no data, the first row in the engine's MODULE table will be loaded which contains a pointer to a module that displays a startup screen. Additionally, each module must have a first, default row to use to display data if the form data are malformed or incomplete.
Additionally, process HTML clicks on links and stores the history in the engine's history table.
Modules
Each module defines a virtual function/method name that can be resolved by the engine and stored in the database in the engine's MODULE table. These entry points will accept an array of tuples containing the form data and process the data appropriately. This processing will usually take the form of storing the form results/destination (if appropriate) in the engine's history table, as well as loading an appropriate response or XML definition from the database and senging that to the HTML output generator. Recursively calls references for modules and processes XML forms.
XML reader
Asks the database interface (DBI) for the XML entry from the given table (usually called by a module). In turn, fills in data fields from the HTML input processor and engine's storage of the user session. Passes the completed XML definition to the HTML output module. This is a recursive process until all XML data fields are translated into XML HTML layout hints.
XML writer
Assumes that the calling module has stored the data in data fields and replaced the data with the appropriate table and column names for each field. Stores the XML page into the specified table via the DBI.
Database Interface (DBI)
Converts data requests to XML database interchange format. Sends that XML request to a database manager (preferred) or database via ODBC. Converts the returns back into the specified types and returns the values to the calling function.
User Sessions
Provides an abstraction to users. Uses DBI to retrieve user data on request. Writes changes to user data via DBI.

Servlet specifics

Assuming that we extend javax.servlet.HttpServlet, we don't have to implement the service method. Rather, we implement doGet, doPost, etc. This is easier, but is this faster? Additionally, is it faster or better to run a single-threaded servlet model, or the multiple threaded servlet model? I don't think the performance hit is so big, and I think this lets us handle more "simultaneous" connections than if we tried to handle all these (with spawning threads) from the servlet service method.

Will servlets execute faster if we define a set of static class methods, or if the classes are normal and handled by the VM?

A set of servlet utilities will need to be defined that will give "helper" functionality:

passing the request object
processing/storing form elements
processing pathinfo and querystring to help resolve database (static) references
getting a persistent session and user object
look up a module name/reference based on form elements and pathinfo

pass record id, session object, form data to module

Module Theory

Objects can have fields and methods that can be modified or supplanted by module fields and methods.

To accomplish this, a solution needs to be found. I haven't figured it out yet, but some ideas include:

Simple
Change the original class when a module is added to the source.
Medium
Use a makefile or script to modify the source.
Hard
Modules register their fields and methods on a static object that acts as a registry. All calls to fields and methods go through this object. These must be well defined in a hierarchy. This would happen in a static init method which is called when the engine starts. The constructor would create an instance of the module for the servlet thread to use (in the form of Module module = Module.createModule(id, session, dictionary).
Ideal? (Shawn and I thought this through on 4 Feb 99)
Modules register with an event registry. Each event called by this registry starts at 0 and works to n in priority. Lower level event handlers are called first. Only one event handler can be registered per level (throw an exception). Events are generated by modules and include everything from clicks, logins, and logouts to new postings. Modules can listen to other modules events, and can also issue changes in priority (in a positive direction) to skip the event (with a canceled call event--see next sentence) and avoid the event being noticed by other modules. Additionally, events can be cancelled by receiving modules, though this is considered bad form and all listeners will get a canceled call event with the original event embedded, thrown at the appopriate level.
Should events be registered that aren't added to the event registry yet (ie, register for a poll event, even if the poll module hasn't been added)? Each modules needs to implement eventlistener interfaces, as well.

Design Patterns hasn't provided a lot of guidance to this point, beyond using Decorators to encapsulate the functionality. This isn't appropriate, as I would like to be able to modify behavior in a global namespace.

Module Signature

module(String id, Session session, Dictionary form)

id
Record id to load. -1 or null if not present.
session
Session object to use for getting/setting and grabbing the user object/info.
form
A hashtable, dictionary, or XML structure containing the form data. Module parses to determine actions. May be null.

Utility Classes

Queue
Add elements into a self-managing, growing queue.
- add(Object)
  Adds an object to the queue.
- Object next()
  Gets the head element off the queue.
- Object peek()
  Look at the head element, but don't remove it from the queue.
MethodQueue
Add method calls to a queue. Extends Queue.
- add(Method, Object [] args)
  Enter the method with the given arguments into the queue.
- next()
  Calls the next method in the queue.
- MethodCall
  Represents a method call.
Scheduler
Schedule calls to an object (ie, notify suspended threads, execute utility methods). Extends MethodQueue.
- add(Method, Object [] args, int priority, int sleep)
  Call method after sleep seconds. Priority and sleep feed the scheduler thread.
- SchedulerThread
  Builds a list of methods to be called, times to call the methods, and orders by priority. Calls each method sequentially (within scheduler thread). This may later become necessary to change if this turns into a performance drag. I can't use a thread pool, though, because a scheduler is a component of a thread pool. :-P
Thread Pool
Creates two thread groups (available and in-use). The thread pool manages the allocation of these threads into these groups, keeps track of when each thread was last requested, and reaps threads that have not been released to the thread pool after a timeout.
- Timeout in seconds (0 == never)
- Optimal percentage threads available
- Avail thread percentage low to create threads (i.e. -> .05)
- Avail thread percentage high to reap threads (i.e. -> .10)
- Threads to create at a time
- Threads to kill at a time
- Maximum threads
- Minimum threads
- Check frequency (use scheduler)
- Thread getThread()
  Gets a thread from the available group, puts the thread in the in use group. Resets the last handled time/turns on reaping. Returns a reference to the thread. Creates a thread if none available.
- void putThread(Thread)
  Returns the thread to the available group, removes it from the in use group. Resets the last handled time/turns off reaping. Called from inside the thread at the end of processing.

Engine Design

General process/design. The static engine starts in the VM and initializes pools of servlet handlers

Question: how to make a class start up with the VM started by JServ?
Utilizes a thread pool to hold threads for connections from JServ.
Creates a new engine for the thread with Engine.create(threadpool.getThread(), servlet environment) [probably from the servlet start method]

Engine Data Dictionary

Context: Engine

Table:

Function VARCHAR(32) The virtual function name to call.
FormID VARCHAR(32) The contents of the MODULEID hidden form field.
Description VARCHAR(255) Describes what this module does.

Database Manager

The Database Manager (DBM) encapsulates the tier that seperates the modular program from the database back end. It consists of a Database Interface and JDBC connection to a database and provides JDBC connection pooling and connection thread pooling to increase speed when sending queries.

Todo: How do I deal with multiple database ids and logins with the following model?

Design Patterns

The DBM uses two thread pools: a JDBC connection pool and a servlet connection pool. The DBM is a Facade for JDBC and the database. The XML query/update and result/result-set dtds provide a Facade for SQL and result/result-sets respectively.

Note: the caller doesn't need to pick up a result, though one will always be returned, even in an error case.

Suggested DBM Classes

Listener
Listens for new connections from the servlet/client.
ConvertXMLtoSQL
Converts and XML query to a SQL query.
ClientConnection
Represents a thread or connection to the client. Managed by the client connection thread pool. Manages the end-to-end transaction (from client to db to client, including XML/SQL translations).
SQLConnection
Represents a thread or connection to the database. Managed by the JDBC connection thread pool. Manages the entire JDBC query/update and reports results/errors.
- boolean reapable()
  If ctime() - lastUsed() > reapTime && active()
- boolean active()
  If connected to db
- boolean inUse()
  if !in SQLConnectionPool && !reapable()
ConvertResultToXML
Converts a result, result set, or error to XML.
ClientPool
A thread pool that manages ClientConnections.
SQLConnectionPool
A thread pool that manages SQLConnections.
- cleanUpZombies()
  Scheduled method call. Check each connection to see if it is reapable() or !active() && ! in SQLConnectionPool.

Shutting down the Database Manager

Starting the DBM is covered in the DBM Thread Model. Stopping threads involves:

Telling the listener to deny new connections (stop).
Waiting for current transactions to finish/timeout (return to ClientConnection pool).
Telling the pools to kill their managed threads.

Database Interface

The Database Interface (DBI) is the process that converts XML queries to SQL and back into XML results. This is embedded in the client communication thread and takes the form of:

an XML to SQL conversion process
a process that gets a JDBC connection for the JDBC connection pool and submits the query/update
a process that gets a JDBC result/result set and converts it to XML

Please see the DBM thread model for details of execution and hints for implementation.

The DBM Thread Model

thread	init/constructor	wait	parse	execute	convert	return to client	wait
main	build client & connection pools (entry point)	wait for client	get client thread (connect to client)
check	check client pool/check conn pool (repeat)
client	create		XML to SQL	get SQL connection	result to XML	return XML to client	close connection/return to pool
conn	create/connect			execute/wait	get result	return to pool
client		connect	call query	wait for result		got XML result	close DBM connection

XML dtd design

In general, XML variables can reference forms, session info, object references, database record ids, etc. Special XML types defined should include:

Form elements
Object processor names/module names
Object/module field names and data
Record ids, tables, and contents
Literal (unparsed) values--to be stored or displayed without further translation.

Semiformal definition:

(fundamental classes--I don't believe we will actually implement this)

object
Represents a serializable object. May contain field tags.
container
- class =
  The fully qualified class name (ie, "java.lang.String"). Used for reflection to recreate object.
field
Represents a serializable field in an object. Values are represented as contained type or object tags.
container
- name =
  The field name. Used to assign values.
type
Represents a non-object value.
- type =
  The fundamental type (boolean, char, byte, short, int, long, float, double).
- value =
  The value.
- bits =
  Number of significant bits to assign to the value (for cross language casting).
  Optional.

Escaping Characters: Use the syntax '\char' to escape the given character (ie, '\<' to escape a left bracket).

Special Types: Consider array, which may be contain objects or types. In this implementation, array is considered to be an object of class name array which will then perform the necessary initialization code.

Intent: This is not meant to be a human-oriented protocol. Humans can read it, but it is intended to be read by a program that will use reflection to recreate object as specified by class name and initialize the field values as given for the new object.

DBM oriented:

Connection
A JDBC connection.
- url =
  The JDBC URL, such as jdbc:oracle:thin:@dbhost:1520:mischief
- user =
  The user id required to log into the database. Optional.
- pass =
  The password required to log into the database. Optional.
- min =
  The minimum number of connections to open. Optional. Defaults to 1.
- max =
  The maximum number of connections to open. Optional. Defaults to 255.
- timeout =
  Seconds to wait for a result to return. Can be set on an individual query, as well. Optional. Defaults to 300.
- cleanup =
  Seconds to wait before closing idle connections. When at minimum connections is met or below, closes and opens new connections but attempts to keep minimum threads available. Optional. Defaults to 6000.