Web App
Description
Provide a community-building on-line information exchange. This will take the form of dynamically generated web pages that are presented in a heirarchical manner. User information is tracked by the system and acts in a feedback loop to constantly change how the pages are presented to the user (such that user preferences, activities, and the activities of similar users modify the information presentation).
The site presents information blocks in the form of user-generated bulletins, live chat sessions, polls and elections, hypertext links, and option settings. The ordering of these is defined by the user, as is the user's own rating of these materials.
Requirements
The system must be able to provide, at a minimum:
- Varying user levels, defined by a set of parameters
- Logging in and tracking user activity
- Message editing, moderating, and approval
- Organization of information into user defined categories
- Dynamic, automatic, user initiated sub-category creation (without administrative intervention)
- Using stored user information to generate suggested categories and information blocks
- Using stored user information to compare the user to other users and generate suggested categories and information blocks that were not initially found
- Unique page views for each user
- User preferences used to seed and order page appearance
- User evaluation of information blocks to determine suitability for that user and other, similar users
- Unique user levels for each category (such that administrative privileges don't carry over to other categories)
- Persistent, long-term storage of information blocks and user information
- Confirmed user identity
- Some administrative tasks can be handled by non-administrators by fiat or by an information block/election module
- Open source software (after initial release)
- Scalable architecture
- Modular information blocks that can be added to allow the system to grow
- Internal bookmarks for users
- User reminders
Data Requirements
There are a variety of data sources and repositories that are needed.
Data Sources
- User preferences module output
This defines how the user wishes the output to be displayed:
- frames/noframes
- java/nojava
- graphics/text
- color preferences
- information block ordering
- password
- email address
- real name
- register for user level
- description for other users
- will chat
- chat gag lists
- interested text (intended for other users to read)
- show interests based on clicks to other users
- reminders
- User link clicks
Links to information modules and a redirector to links outside the system
- Frequency of visits to particular categories
- Posts to individual categories (lurker vs. active)
- Voting in polls
- Voting in elections
- Evaluations of information blocks
- User longevity (newbie vs. guru) (implied data)
- Login times
- Session length
- Time spent in each category
- Time spent chatting per category
- Referring site URLs
- Bookmarks
Data Repositories
These are general descriptions, not the detailed data definitions. It is
important to note that very few data respositories are actually defined by
the app engine; rather, most repositories are defined by the information
block modules.
- User information and preferences
- User activity defined by the engine
- Module entry points associated with form ids defined by the
engine
- Information blocks (including popularity and user cross-references for each block, enabled/disabled, approved, approved by)
- Poll and election results - including effects to implement if any
- Text/media elements
- Link lists
- Chat participants and text (may store only recent chat text)
- Category and member information blocks
- Category and subcategories (communities)
- Connection/login/logout logs
- Roles relating to rights/permissions
- Cross reference tables
- User activity on information blocks
- User activity on categories
- Similar users for each user
- Visited/aware categories (to generate lists of new categories of interest)
- Bookmarks for each user
- Roles to categories
Process Details
Rights
- Log in/out
- Read information block
- Suggest information block
- Edit information block
- Approve/post information block
- Evaluate information block (sucks/rocks on block)
- Disable information block
- Vote on election type information block
- Vote on poll type information block
- Observe chat information block (no text entry widget)
- Participate in chat information block
- Change preferences
- Set user id hardcoded to only apply to administrators
- Sign up
- System tuning hardcoded to only apply to administrators
Recommended roles
Each role in this list inherits all the characteristics of less powerful roles except for the "sign up" right. This is not imply a role heirarchy. Roles are to be individually set up and not arranged in a heirarchy.
- Guest
- Read information block
- Suggest information block
- Sign up
- Observe chat
- Login/out (so users can get in--logout only appears if logged in)
- Member
- Evaluate information block
- Participate in chat
- Vote on election
- Vote on poll
- Change preferences
- Editor
- Edit information block
- Approve/post information block
- Disable information block
- Admin/Janitor role 0 is hardcoded as admin
- Set user id
- System tuning
Category relationships
The top level category defines universals that are legal through all lower level categories. Below that level, however, category roles or definitions do not inherit or propogate to lower levels.
An information block may belong to only one category.
User tracking
- Cookies
- Web session ID posting
- Some combination
Information Blocks
Information blocks can be nested. If the author of an information block is online and in this category, that information (if author is chat-ok) will be noted.
Information blocks will not appear for a user if they do not have rights to
view the block.
The appearance of an information block is determined by XML stored within
the database. This contains layout hints as well as definitions of nested
information blocks and default values or form field data substitutions.
Each information block has a unique ID that allows it to be referenced from
other information blocks and from a URL. This allows it to be linked to from
outside sources, bookmarked in a traditional fashion, as well as being
linked into other blocks by editors.
Distinguishing types of information blocks
Information blocks may be of almost any type. However, some have repurcussions (polls, elections, etc.) that act beyond simple data storage. In particular, voting and polling have special rights that are coded in. To allow special rights to appear, information blocks have an interface that let the system recognize that special permissions are in place.
Installation information blocks
Information blocks may need new data columns or tables added to the database. They should only impact existing tables by adding rows or adding new tables. In no case, except major version changes, should information blocks delete columns or tables. Prior to adding the new information block to the server, the database should be modified.
Anticipated information blocks
- Polling
Stores poll data. Keeps track of who has voted for this issue. Can be evaluated.
- Voting (abstract)
Stores voting data. Keeps track of who has voted for this issue. Performs some action or actions depending on results. These actions are valid in the given category only.
- Change user to some role
- Change some role
- Add a role
- Remove a role
- Approve an information block
- Remove an information block
- Replace an information block
- Change an information block's rating
- Make an information block sticky (put at top or bottom of list, but always visible
- Ban a user
- Unban a user
- Add a category
- Remove a category
When voting is set up, the editor has several options:
- Length of voting period
- Voting ratio (% yes / % no to succeed)
- Roles that may vote (may limit to editors, etc.)
- Actions to perform if successful takes the appearance of an add
action button and a list of defined actions--will require an additional
define action screen
- Actions to perform if fails ditto
- Text/html block
Can be evaluated.
- Link block
Links are ordered based on user preferences as well as newness of the links.
- Preferences block
Cannot be evaluated.
- Comments holder
Holds comment information blocks. Cannot be evaluated.
- Comment
Threads via comments holder.
- Chat
Allows chatting in this category. May be placed in a seperate window (via Java applet or equivalent) to allow user to view other categories but keep chatting. Cannot be evaluted.
- Category
Holds other information blocks. Cannot be evaluated.
- Search
Search category titles, authors, or text for strings. Cannot be
evaluated.
- System tuning
Rebuild recommendation cache, change recommendation rebuild period,
write-all message, stop accepting new logins, start accepting new logins.
Cannot be evaluated.
- Roles and Rights
Add, edit, and delete roles. Assign rights when adding or editing a
role. Change the name of a role. (Issue: removing a role with current
members will cause the deletion to fail) Cannot be evaluated.
- Login Module
Log the user in and return the page the user was trying to access.
Cannot be evaluated.
- Logout Module
Log the user out and return the user to a login page. Cannot be
evaluated.
- Highlights
Ordered by user preferences, user profile (activity profile), similar
users, and what items have been seen in the past. This is the
"recommendation" or analysis module that is the really unique part of this
system (and requires all the data). Captures all user clicks and correlates
user classes and activities to find similar users. Cannot be evaluated.
- Bookmark
A special kind of Link block that gets added to by the user. Provides a
bookmark icon on evaluatable info blocks.
- User Viewer
Other users can get information about a user.
- Category Builder
Allow editors to order pages, move elements into categories, link
information blocks, etc.
- Object Cache
Cache most frequently used objects from database. Use this as primary
DBI. Reap after a timeout.
Service Level
Any individual page request should take no more than 1.75 seconds to complete. Up to 50 page requests per second should be able to generated per server.
This system is scalable by distributing the load across several tiers. Tiers identified at this point include:
- User browser/applet
- Web server
- Application server
- Object-Relational mapping
- Data multiplexer/manager
- Database
A strongly object-relational database may imply dropping, changing the location, or modifying the object-relational mapping layer.
Achieving a highly responsive system can happen via precaching requests.
User recommendations across all categories are compiled when (a) a user logs
in and no current recommendations exist or (b) 20 minutes after the last
recommendations were compiled. This is achieved via a recommendation builder
queue. The timeout is tunable by administrators.
Additionally, for security and convenience, the system will default to a 20
minute idle timeout, at the end of which the next page load will require a
reauthentication. This time can be changed to any number minutes, with 0
indicating no timeout.
End-user interface
The end user will interact with this system via a combination of HTML, Java applets, and frames through a web browser.
The system must be capable of being almost entirely maintained via a web browser. The only activities not possible through a web browser are limited to:
- Adding or removing modules
- Adding or removing columns and tables from the database
- Starting and stopping the server
Recommended Platforms
Because this is to be an open source project, is has the potential of working on any system. To speed development, the following are recommended:
- Linux or FreeBSD operating system
- Apache webserver
- Java for prototyping, C or C++ for final
- MySQL or ODBC database
Areas of investigation
These need to be investigated before making any final decisions on platform and supporting systems:
- Will apache be fast enough with a compiled in module to generate the number of pages per minute, or is a custom webserver required?
- Speed benchmarks for complex joins
and stored
procedures for MySQL and Postgres
- Raw socket with XML-like protocol for data transfer between certain tiers
- Define the XML language for creating pages.
- Define the XML language for database transfer (query/update dtd, and result/result-set dtd).
Engine Program Flow
Consider that the engine is broken into several discrete layers or
components:
- HTML output generator
Converts XML hints and module output into HTML pages. Usually, this will
include asking modules for HTML output given the XML field as input to the
modules.
- HTML input processor
Processes HTML input forms and looks for a MODULEID hidden input tag.
This will indicate to which module to pass the form input. If the form
contains no data, the first row in the engine's MODULE table will be loaded
which contains a pointer to a module that displays a startup screen.
Additionally, each module must have a first, default row to use to display
data if the form data are malformed or incomplete.
Additionally, process HTML clicks on links and stores the history in the
engine's history table.
- Modules
Each module defines a virtual function/method name that can be resolved
by the engine and stored in the database in the engine's MODULE table. These
entry points will accept an array of tuples containing the form data and
process the data appropriately. This processing will usually take the form
of storing the form results/destination (if appropriate) in the engine's
history table, as well as loading an appropriate response or XML definition
from the database and senging that to the HTML output generator. Recursively calls references for modules and processes XML forms.
- XML reader
Asks the database interface (DBI) for the XML entry from the given table
(usually called by a module). In turn, fills in data fields from the HTML
input processor and engine's storage of the user session. Passes the
completed XML definition to the HTML output module. This is a recursive process until all XML data fields are translated into XML HTML layout hints.
- XML writer
Assumes that the calling module has stored the data in data fields and
replaced the data with the appropriate table and column names for each
field. Stores the XML page into the specified table via the DBI.
- Database Interface (DBI)
Converts data requests to XML database interchange format. Sends that
XML request to a database manager (preferred) or database via ODBC. Converts
the returns back into the specified types and returns the values to the
calling function.
- User Sessions
Provides an abstraction to users. Uses DBI to retrieve user data on
request. Writes changes to user data via DBI.
Servlet specifics
Assuming that we extend javax.servlet.HttpServlet, we don't have to implement the service
method. Rather, we implement doGet
, doPost
, etc. This is easier, but is this faster? Additionally, is it faster or better to run a single-threaded servlet model, or the multiple threaded servlet model? I don't think the performance hit is so big, and I think this lets us handle more "simultaneous" connections than if we tried to handle all these (with spawning threads) from the servlet service
method.
Will servlets execute faster if we define a set of static class methods, or if the classes are normal and handled by the VM?
A set of servlet utilities will need to be defined that will give "helper" functionality:
- passing the request object
- processing/storing form elements
- processing pathinfo and querystring to help resolve database (static) references
- getting a persistent session and user object
- look up a module name/reference based on form elements and pathinfo
- pass record id, session object, form data to module
Module Theory
Objects can have fields and methods that can be modified or supplanted by module fields and methods.
To accomplish this, a solution needs to be found. I haven't figured it out yet, but some ideas include:
- Simple
Change the original class when a module is added to the source.
- Medium
Use a makefile or script to modify the source.
- Hard
Modules register their fields and methods on a static object that acts as a registry. All calls to fields and methods go through this object. These must be well defined in a hierarchy. This would happen in a static init
method which is called when the engine starts. The constructor would create an instance of the module for the servlet thread to use (in the form of Module module = Module.createModule(id, session, dictionary)
.
- Ideal? (Shawn and I thought this through on 4 Feb 99)
Modules register with an event registry. Each event called by this
registry starts at 0 and works to n in priority. Lower level event
handlers are called first. Only one event handler can be registered per
level (throw an exception). Events are generated by modules and include
everything from clicks, logins, and logouts to new postings. Modules can
listen to other modules events, and can also issue changes in priority (in a
positive direction) to skip the event (with a canceled call event--see next
sentence) and avoid the event being noticed by other modules.
Additionally, events can be cancelled by receiving modules, though this is
considered bad form and all listeners will get a canceled call event with
the original event embedded, thrown at the appopriate level.
Should events be registered that aren't added to the event registry yet
(ie, register for a poll event, even if the poll module hasn't been added)?
Each modules needs to implement eventlistener interfaces, as well.
Design Patterns hasn't provided a lot of guidance to this point, beyond using Decorators to encapsulate the functionality. This isn't appropriate, as I would like to be able to modify behavior in a global namespace.
Module Signature
module(String id, Session session, Dictionary form)
id
Record id to load. -1
or null
if not present.
session
Session object to use for getting/setting and grabbing the user object/info.
form
A hashtable, dictionary, or XML structure containing the form data. Module parses to determine actions. May be null
.
Utility Classes
- Queue
Add elements into a self-managing, growing queue.
- add(Object)
Adds an object to the queue.
- Object next()
Gets the head element off the queue.
- Object peek()
Look at the head element, but don't remove it from the queue.
- MethodQueue
Add method calls to a queue. Extends Queue.
Methods
- add(Method, Object [] args)
Enter the method with the given arguments into the queue.
- next()
Calls the next method in the queue.
Inner Classes
- MethodCall
Represents a method call.
- Scheduler
Schedule calls to an object (ie, notify suspended threads, execute utility methods). Extends MethodQueue.
Methods
- add(Method, Object [] args, int priority, int sleep)
Call method after sleep seconds. Priority and sleep feed the scheduler thread.
Inner Classes
- SchedulerThread
Builds a list of methods to be called, times to call the methods, and orders by priority. Calls each method sequentially (within scheduler thread). This may later become necessary to change if this turns into a performance drag. I can't use a thread pool, though, because a scheduler is a component of a thread pool. :-P
- Thread Pool
Creates two thread groups (available and in-use). The thread pool manages the allocation of these threads into these groups, keeps track of when each thread was last requested, and reaps threads that have not been released to the thread pool after a timeout.
Parameters
- Timeout in seconds (0 == never)
- Optimal percentage threads available
- Avail thread percentage low to create threads (i.e. -> .05)
- Avail thread percentage high to reap threads (i.e. -> .10)
- Threads to create at a time
- Threads to kill at a time
- Maximum threads
- Minimum threads
- Check frequency (use scheduler)
Methods
- Thread getThread()
Gets a thread from the available group, puts the thread in the in use group. Resets the last handled time/turns on reaping. Returns a reference to the thread. Creates a thread if none available.
- void putThread(Thread)
Returns the thread to the available group, removes it from the in use group. Resets the last handled time/turns off reaping. Called from inside the thread at the end of processing.
Engine Design
General process/design.
The static engine starts in the VM and initializes pools of servlet handlers
- Question: how to make a class start up with the VM started by JServ?
- Utilizes a thread pool to hold threads for connections from JServ.
- Creates a new engine for the thread with Engine.create(threadpool.getThread(), servlet environment) [probably from the servlet start method]
Engine Data Dictionary
Context: Engine
Table: Module
- Function VARCHAR(32) The virtual function name to call.
- FormID VARCHAR(32) The contents of the
MODULEID
hidden form field.
- Description VARCHAR(255) Describes what this module does.
Database Manager
The Database Manager (DBM) encapsulates the tier that seperates the modular program from the database back end. It consists of a Database Interface and JDBC connection to a database and provides JDBC connection pooling and connection thread pooling to increase speed when sending queries.
Todo: How do I deal with multiple database ids and logins with the following model?
Design Patterns
The DBM uses two thread pools: a JDBC connection pool and a servlet connection pool. The DBM is a Facade for JDBC and the database. The XML query/update and result/result-set dtds provide a Facade for SQL and result/result-sets respectively.
Note: the caller doesn't need to pick up a result, though one will always be returned, even in an error case.
Suggested DBM Classes
- Listener
Listens for new connections from the servlet/client.
- ConvertXMLtoSQL
Converts and XML query to a SQL query.
- ClientConnection
Represents a thread or connection to the client. Managed by the client connection thread pool. Manages the end-to-end transaction (from client to db to client, including XML/SQL translations).
- SQLConnection
Represents a thread or connection to the database. Managed by the JDBC connection thread pool. Manages the entire JDBC query/update and reports results/errors.
Methods
- boolean reapable()
If ctime() - lastUsed() > reapTime && active()
- boolean active()
If connected to db
- boolean inUse()
if !in SQLConnectionPool && !reapable()
- ConvertResultToXML
Converts a result, result set, or error to XML.
- ClientPool
A thread pool that manages ClientConnections.
- SQLConnectionPool
A thread pool that manages SQLConnections.
- cleanUpZombies()
Scheduled method call. Check each connection to see if it is reapable() or !active() && ! in SQLConnectionPool.
Shutting down the Database Manager
Starting the DBM is covered in the DBM Thread Model. Stopping threads involves:
- Telling the listener to deny new connections (stop).
- Waiting for current transactions to finish/timeout (return to ClientConnection pool).
- Telling the pools to kill their managed threads.
Database Interface
The Database Interface (DBI) is the process that converts XML queries to SQL and back into XML results. This is embedded in the client communication thread and takes the form of:
- an XML to SQL conversion process
- a process that gets a JDBC connection for the JDBC connection pool and submits the query/update
- a process that gets a JDBC result/result set and converts it to XML
Please see the DBM thread model for details of execution and hints for implementation.
thread |
init/constructor |
wait |
parse |
execute |
convert |
return to client |
wait |
main |
build client & connection pools (entry point) |
wait for client |
get client thread (connect to client) |
|
check |
check client pool/check conn pool (repeat) |
|
client |
create |
|
XML to SQL |
get SQL connection |
result to XML |
return XML to client |
close connection/return to pool |
conn |
create/connect |
|
execute/wait |
get result |
return to pool |
|
client |
|
connect |
call query |
wait for result |
got XML result |
close DBM connection |
XML dtd design
In general, XML variables can reference forms, session info, object references, database record ids, etc. Special XML types defined should include:
- Form elements
- Object processor names/module names
- Object/module field names and data
- Record ids, tables, and contents
- Literal (unparsed) values--to be stored or displayed without further translation.
Semiformal definition:
(fundamental classes--I don't believe we will actually implement this)
- object
Represents a serializable object. May contain field tags.
container
Parameters
- class =
The fully qualified class name (ie, "java.lang.String"
). Used for reflection to recreate object.
- field
Represents a serializable field in an object. Values are represented as
contained type or object
tags.
container
Parameters
- name =
The field name. Used to assign values.
- type
Represents a non-object value.
Parameters
- type =
The fundamental type (boolean, char, byte, short, int, long, float,
double).
- value =
The value.
- bits =
Number of significant bits to assign to the value (for cross language casting).
Optional.
Escaping Characters: Use the syntax '\char'
to escape the given
character (ie, '\<' to escape a left bracket).
Special Types: Consider array
, which may be contain objects or types.
In this implementation, array
is considered to be an object of
class name array
which will then perform the necessary
initialization code.
Intent: This is not meant to be a human-oriented protocol. Humans can
read it, but it is intended to be read by a program that will use reflection
to recreate object as specified by class name and initialize the field
values as given for the new object.
DBM oriented:
- Connection
A JDBC connection.
Parameters
- url =
The JDBC URL, such as jdbc:oracle:thin:@dbhost:1520:mischief
- user =
The user id required to log into the database. Optional.
- pass =
The password required to log into the database. Optional.
- min =
The minimum number of connections to open. Optional. Defaults to 1.
- max =
The maximum number of connections to open. Optional. Defaults to 255.
- timeout =
Seconds to wait for a result to return. Can be set on an individual query, as well. Optional. Defaults to 300.
- cleanup =
Seconds to wait before closing idle connections. When at minimum connections is met or below, closes and opens new connections but attempts to keep minimum threads available. Optional. Defaults to 6000.