Error » Certification & Programming center Error !! » Programming tutorials » Modular Architectures with Ruby

Programming tutorials All Knowledge Info and links to posted here

Post New Thread Reply
  Modular Architectures with Ruby
LinkBack Thread Tools Display Modes
Old 04-Jan-2007, 04:02 AM   #1 (permalink)
Administrator
 
Anilrgowda's Avatar

Posts: 18,715
Join Date: Jan 2006
Rep Power: 10 Anilrgowda is on a distinguished road

IM:
Default Modular Architectures with Ruby

Summary
Any reasonably complex end-user application is going to require some sort of customization and enhancement for effective deployment. This article shows one way to create a modular architecture as a way of leaving the door open for advanced users or consultants who want to extend the functionality without modifying the source.
End-user applications often require some customization and enhancement for effective deployment. A modular architecture is one where the user can create modules that conform to well-described APIs and plug them into the application to extend the functionality. It’s a way of leaving the door open for advanced users or consultants who want to extend the functionality without modifying the source.
One example of a popular modular application is the Apache web server[0]. Apache defines a set of processing steps in building a web page and allows programmers to write modules that may hook into one or more of these steps. Another example is the JavaDoc[1] comment processing system for Java. JavaDoc has a flexible Doclet back end. The basic Doclet produces HTML help files. But the interface has also been used in wide variety of applications including the popular XDoclet code generator.
Perhaps the best example of a modular API is Eclipse[2]. Eclipse is really just a modular framework that handles sets of interlocking modules that build IDEs, thick client applications, even portable device applications. If you want a reference work for how modular APIs are done, check out Eclipse.
I find that there are design smells that suggest when a modular architecture would be a good solution. Some of these are:
<dl><dt>Builders</dt><dd>Any time the Builder design pattern is followed the builders could be implemented using modular architecture. The builder pattern has the code that builds some output using a builder object to do the actual construction work. It’s often used for building portable UIs where one builder can build text while another builds HTML, and so on.</dd><dt>Adapters</dt><dd>When systems interface with each other there is always a slot for a modular architecture. If modules are used then the adapter interface can be used to connect to one of many services as opposed to one particular service.</dd><dt>Math libraries and functions</dt><dd>Graphing and spreadsheet applications can extend their function libraries using a modular architecture.</dd><dt>Processing strategies</dt><dd>As with Apache, any time you have a complicated transaction it’s possible to use a modular architecture. This allows the user to define exactly how a transaction is processed. This can work for business logic at almost any level; for example, the validation of a customer record, or the sending out of notifications. </dd><dt>Graphic objects and filters</dt><dd>Applications like Photoshop and The GIMP extend their graphics system using modular APIs so people can create custom commands, graphic objects, and effects. </dd></dl> For the article I’m going to create a simple modular system for reading subscription sources, such as RSS, RDF, and Atom. One can then extend the system to handle new subscription formats in the field without having to change the main code.
Specin’ Out the API

Instead of starting with a complete example I’ll work through building a modular interface just like I did in practice. That starts with some simple test code and a small set of parsers. In fact, I don’t even break out the modules to start with. I start with everything in just two files just to make sure the API is right, then move to a modular architecture so that I’m not trying to solve multiple problems simultaneously.
Here is the test code. It creates a new RSS parser and then gets the types of feeds that it will handle. It also iterates through all of the available parsers and prints them out.

require "parse_mods.rb"

# Create new factory and instantiate a new parser
a = RSSParser.new
print "Building an RSS parser:\n"
p a.get_type()
print "\n"

# Iterate through all of the available types
print "Available parser types:\n"
Parser.parsers.each { |parser_class|
p parser_class
}
</pre> Here is what it looks like when I run it:

% ruby test.rb
Building an RSS parser:
"RSS"

Available parser types:
RSSParser
RDFParser
%
</pre> And here is the code for the parsers.

class Parser
@@parsers = []

def get_type()
return ""
end
def parse( xml )
return nil
end

def Parser.add_parser( p )
@@parsers.push( p )
end
def Parser.parsers()
return @@parsers
end
end

class RSSParser < Parser
def get_type()
return "RSS"
end
def parse( xml )
# Parse the XML up and return some known format
return nil
end
end

Parser.add_parser( RSSParser )

class RDFParser < Parser
def get_type()
return "RDF"
end
def parse( xml )
# Parse the XML up and return some known format
return nil
end
end
</pre>Factoring to Factories

The next step is to refactor the code to use the Factory pattern. In that pattern each parser will have two classes. The first is the parser itself, and the second is a factory that creates parsers of that type. Why factories? Because the code should be able to get the types of feeds the parser can handle without creating a parser.
The newly refactored code looks like this:
class ParserFactory def get_type() return "" end def create() return nil end @@factories = [] def ParserFactory.add_factory( p ) @@factories.push( p ) end def ParserFactory.factories() return @@factories end def ParserFactory.parser_for( type ) @@factories.each { |pfc| pf = pfc.new() if pf.get_type() == type return pf.create() end } return nil end end class Parser def parse( xml ) return nil end end class RSSParser < Parser def parse( xml ) # Parse the XML up and return some known format return nil end end class RSSFactory < ParserFactory def get_type() return "RSS" end def create() return RSSParser.new() end end ParserFactory.add_factory( RSSFactory ) class RDFParser < Parser def parse( xml ) # Parse the XML up and return some known format return nil end end class RDFFactory < ParserFactory def get_type() return "RDF" end def create() return RDFParser.new() end end ParserFactory.add_factory( RDFFactory ) Now the factories register themselves with a factory base class. This class has the helpful <code>parser_for</code> method which returns a parser for a given input type.
The nice thing about this refactoring is that the <code>Parser</code> classes do just what they should, take XML and returns a list of articles.
The test code needs to be changed around a little bit to handle this new factory system:
require "parse_mods.rb" # Create new factory and instantiate a new parser af = RSSFactory.new a = af.create() print "Building an RSS parser:\n" p a print "\n" # Iterate through all of the available types print "Available parser types:\n" ParserFactory.factories.each { |factory_class| a = factory_class.new() p a.get_type() } print "\n" # Check the new parser_for method print "Request a parser for RDF:\n" pf = ParserFactory.parser_for( "RDF" ); p pf And I run it like this:
% ruby test.rb Building an RSS parser: #<rssparser:0x27b53d8> Available parser types: "RSS" "RDF" Request a parser for RDF: #<rdfparser:0x27b4da8> % </rdfparser:0x27b4da8></rssparser:0x27b53d8> The first part of the code creates the RSS parser directly. The second section walks through all of the available parsers. And the third section selects a parser by name.
The UML for the refactored code looks like Figure 2.

Figure 2. The factories and their related parsers
The list of what parsers are available is now in <code>ParserFactory</code>. And each parser has it’s corresponding parser factory which creates it.
Making It Modular

All right, enough playing around with what the API should look like. It’s time to make it modular by creating a <code>mods</code> directory and taking parts of the original large file and chopping it up into a module for each format type.
Shown below is the source for the RDF module. It contains both the parser and the parser factory.

class RDFParser < Parser
def parse( xml )
# Parse the XML up and return some known format
return nil
end
end

class RDFFactory < ParserFactory
def get_type()
return "RDF"
end
def create()
return RDFParser.new()
end
end

ParserFactory.add_factory( RDFFactory )
</pre> The second file is the RSS parser.

class RSSParser < Parser
def parse( xml )
# Parse the XML up and return some known format
return nil
end
end

class RSSFactory < ParserFactory
def get_type()
return "RSS"
end
def create()
return RSSParser.new()
end
end

ParserFactory.add_factory( RSSFactory )
</pre> Then comes the updated modules library.

class ParserFactory
def get_type()
return ""
end
def create()
return nil
end

@@factories = []

def ParserFactory.add_factory( p )
@@factories.push( p )
end

def ParserFactory.factories()
return @@factories
end

def ParserFactory.parser_for( type )
@@factories.each { |pfc|
pf = pfc.new()
if pf.get_type() == type
return pf.create()
end
}
return nil
end

def ParserFactory.load( dirname )
Dir.open( dirname ).each { |fn|
next unless ( fn =~ /[.]rb$/ )
require "#{dirname}/#{fn}"
}
end
end

class Parser
def parse( xml )
return nil
end
end
</pre> The important part comes with the load class method which loads the modules from a specified directory. The loading is done with the <code>require</code> function that reads the code in from the module.
Figure 3 shows the relationship between the module files and the classes they contain and the classes in the host application.

Figure 3. The relationship between the files and the classes
One thing that does trouble me is this statement to register each factory:
<code>ParserFactory.add_factory( RDFFactory )</code> In Ruby we can do better because classes actually get notified when they are subclassed. No kidding. The code that follows replaces the <code>add_factory</code> method with a method called <code>inherited</code> which is a Ruby standard method.

class ParserFactory
...
def ParserFactory.inherited( pf )
@@factories.push( pf )
end
...
</pre>Adding More Biographic Detail

I also have a problem with the get_type method on the factory. I think that in the long run I’m going to want more biographical information on each module. For example, the author, the module version, the description, inputs, outputs, etc.
Perhaps the easiest way to add biographical information to each module would be with a YAML encoded constant string attached to each factory class. This is shown on the RDF module below:
class RDFParser < Parser def parse( xml ) # Parse the XML up and return some known format return nil end end class RDFFactory < ParserFactory INFO=<<INFO type: RDF author: Jack description: An RDF parser INFO def create() return RDFParser.new() end end I then add some code to the <code>Parser</code> base class that reads the YAML and implements not only <code>get_type</code> but also <code>get_author</code>, <code>get_description</code> and anything else I want:
require 'yaml' class ParserFactory ... def get_info() return YAML.load( self.class::INFO ) end def get_type() return get_info()['type'] end def get_author() return get_info()['author'] end def get_description() return get_info()['description'] end ... end The code to get the constant from the subclass is pretty simple. The <code>get_info</code> method gets the <code>class</code> of the current object and gets the <code>INFO</code> method.
Getting the Job Done

Having gone through all of the effort to build a modular architecture that reads various feed formats, it only seems fitting to actually implement one of them.
First the test code needs to actually get some RSS data:
require "net/http" require "parse_mods.rb" require "REXML/Document" ParserFactory.load( "mods" ) rssp = ParserFactory.parser_for( "RSS" ); items = [] Net::HTTP.start( 'rss.cnn.com' ) { |http| rss = http.get( '/rss/cnn_topstories.rss' ) doc = REXML:ocument.new( rss.body ) items = rssp.parse( doc ) } items.each { |i| print "#{i.title}\n"; print "#{i.link}\n\n"; } This code starts with loading the modules. The code then gets a parser for RSS. It loads the RSS from CNN and creates an REXML DOM model from it. That DOM model goes to the parser which creates an array of object structures that hold the title, link, and description.
The code for the real parser module is below:
require 'ostruct' class RSSParser < Parser def parse( xml ) items = [] xml.each_element( '//item' ) { |item| link = "" description = "" title = "" item.each_element( 'link' ) { |l| link = l.text.to_s; } item.each_element( 'description' ) { |l| description = l.text.to_s; } item.each_element( 'title' ) { |l| title = l.text.to_s; } items << OpenStruct.new( :link => link, :description => description, :title => title ) } return items end end class RSSFactory < ParserFactory INFO=<<INFO type: RSS author: Jack description: An RDF parser INFO def create() return RSSParser.new() end end It’s pretty simple. The code first iterates through all of the <code>item</code> tags, then within each <code>item</code> tag it finds the <code>link</code>, <code>title</code>, and <code>description</code> tags. With each of these it creates an <code>OpenStruct</code> object (part of the standard Ruby installation) and adds it to an array of articles which it returns.
The output on the day I wrote this article looks like this:
% ruby test.rb Pumps begin draining New Orleans http://www.cnn.com/rssclick/2005/US/...cnn_topstories Violence rages in Iraq hotspots http://www.cnn.com/rssclick/2005/WOR...cnn_topstories Rehnquist to lie in repose at Supreme Court http://www.cnn.com/rssclick/2005/POL...cnn_topstories Castro: U.S. hasn't answered aid offer http://www.cnn.com/rssclick/2005/WOR...cnn_topstories Indonesia jet crash kills 147 http://www.cnn.com/rssclick/2005/WOR...cnn_topstories Copter drops concrete on cable car in Austria http://www.cnn.com/rssclick/2005/WOR...cnn_topstories There are several ways you could extend this code. One option would be to have a two-phase pass with the modules. In the first pass you hand the REXML document to each parser to see if it wanted to handle it. Then in the second pass it’s handed to the one that thinks that it can handle the document properly. That way the application doesn’t actually have to know what the format is of any particular feed.
Recommendations

Here are some tips for potential modular architecture builders:
<dl><dt>Use the architecture for the application itself</dt><dd>Don’t just reserve the modular architecture for user-contributed modules. If the modular architecture extends user notifications, write all of the notification code as modules. This ensures that the API is thorough and tested.</dd><dt>For complex modules, use directories</dt><dd>If you expect that modules are going to be complex or have lots of associated assets, put the modules into their own directories. This will make it easier to maintain and version them.</dd><dt>Handle pathing</dt><dd>With dynamic languages pathing can be a problem. I recommend altering the path to add the directory that contains the module code before loading the module. That will allow the module to require in its own code. The module writer should never be expected to write all of their code in one file, or to handle their own include path.</dd><dt>Provide a callback object</dt><dd>If the relationship between the module and the host application is bi-directional then the host application should pass in a proxy object that provides access to the functionality required by the module. This will allow the application code to change form as long as the proxy object API remains the same. It also provides a clear contract between the module and the application which will allow other applications to re-use the same modules.</dd><dt>Version</dt><dd>Version both the modules and the API. The code shown here doesn’t do that since I wanted to keep it simple. But for any production code you should support version numbers and only attempt to work with modules that support the current version number or earlier versions.</dd><dt>Host applications should handle portability and pathing</dt><dd>The host application should handle any path manipulation or portability work for the modules. This will ensure that modules can run on any operating system or environment without additional code.</dd><dt>Keep it simple</dt><dd>The role and life-cycle of a module should be very well defined within the system. And that role should be fairly well constrained. It’s far better to have several module standards that work with various portions of the system rather than one über module that has access to everything. Such modules are too easily broken when the host application changes it’s functionality during an upgrade.</dd><dt>Be aware of the complexity cost</dt><dd>Creating a modular application opens up a world of possibility for your application. But that flexibility always comes at a complexity cost. Creating a full-featured module development environment means building quality APIs that are easy to understand and are flexible enough to handle most potential use cases. It also means putting in enough debugging and error handling support to make it easy to develop modules. All of that is time and effort and it’s worth ensuring that the system will be used before going through what it takes to develop it completely.</dd></dl> I could easily write several articles with just recommendations for modular architectures alone. I’ve written a few and they have been more or less successful. I have also written to various modular architectures and have seen what works and what doesn’t. The common element in all successful modular architectures is thoughtfulness. Thoughtfulness in the design of the API, as well as in the care used in creating it and in mentoring those that use the API.
Conclusion

Modular architectures provide an opportunity for your customers to extend your application for their environment. For complex or highly customizable applications this can be a primary requirement. Ruby's facilities for dynamic code loading makes modular APIs convenient to write.
Talk back!

Have an opinion about modular APIs in Ruby? Discuss this article in the Articles Forum topic, Modular Architectures with Ruby.
Resources

[0] The Apache Web server:
http://apache.org
[1] JavaDoc, Sun's comment processing system for Java:
http://javadoc.sun.com
[2] The Eclipse IDE:
http://eclipse.org


end
</pre> The <code>inherited</code> method is called when one class inherits from another. The super class’s inherited function is called with the object for the subclass.
With that change the calls to <code>add_factory</code> can be removed.



Parser.add_parser( RDFParser )
</pre> There are two parsers that descend from the base <code>Parser</code> class. One parser handles RSS and the other handles RDF. Actually, they don’t handle anything at the moment, but I’ll fix that by the end of the article.
The base Parser class acts as both an interface for all of the descendant parsers, as well as a repository for the list of all parsers. In addition each type of parsers adds itself to the list of all parsers.
In UML the system looks like Figure 1 so far.

Figure 1. The first pass at the parsers
The test code is contained in <code>test.rb</code>, and the parsers in <code>parser_mods.rb</code>. The two parsers derive from the base class <code>Parser</code>.
Anilrgowda is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit!
Reply With Quote
   


   
Post New Thread Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT -8. The time now is 09:38 AM.

Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0

DMCA Policy

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228