moving this to the list. (I took the liberty of pasting this mail's
predecessor below as well)
This is an excellent discussion...
Here is a proposal of mine of how we can change the message system to
incorporate Tao's suggestions (more Pythonic creation of messages) as
well as some other features (having a connection that messages are
sent to)
Overall Design
The user should never have to see any serialized form of a message. He
or she should always use the higher level classes like Packet that are
sent and received. Serialization and deserialization are done not by
the user but by the UDPConnection.
MessageSystem -> UDPConnection
Represents a udp connection that can send and receive packets.
* *
*Responsibilities:*
* Keeping track of all the circuits it is connected to, or have
sent/received messages on (it seems people want the connection to
represent only a single circuit)
* When sending packets, add flags and sequence number, have a
serializer serialize them and send them to the host
* When receiving packets, deserialize them, keeping track of the
flags that were deserialized
+ keeping track of the packets we received that need acks
+ keeping track of the packets we sent that weren?t acked
+ sending acks to those we need to ack
+ resending packets that didn?t get acked
*Changes:*
* No longer has the responsibility of creating messages
* No longer has the responsibility of getting data from messages
Packet
Represents everything about a Packet that is needed to be sent and
received. It also has methods to add data to the packet. The design
says that the user shouldn?t set the flags or sequence number
themselves, but the UDPConnection will do such a thing.
*Responsibilities:*
* Knows the flags it is being sent with
* Knows its sequence number
* Knows its payload
* Methods to add blocks of data to the packet
* Passed to a UDPConnection to be sent, and read from the receive
*Changes:*
* Didn?t exist before. Exists because this design is a
message-centric design that allows the user to have direct access
to messages and manipulation of them.
MessageTemplateReader -> UDPDeserializer
Deserializes a buffer into a UDPPacket?..hmmm?.how does this work?
Whoever is deserializing the buffer MUST know it?s a udp packet, so
this must be done by the UDPConnection.
*Responsibilities:*
* Determines if the buffer is a template message, decodes the data
* Also deserializes the flags and sequence number as well
*Changes:*
* It doesn?t work off of a single message anymore. The data isn?t
gotten by doing a get_data on the reader. Instead, the
deserializer outputs a Packet object which the user has direct
access to.
* Decoding the flags and sequence number wasn?t a responsibility
before
MessageTemplateBuilder -> UDPSerializer
Serializes a Packet into a buffer that can be sent.
*Responsibilities:*
* Serializes a Packet, including its flags and sequence number,
header, and payload.
* Outputs a Packet object.
*Changes:*
* This was previously done by the template builder, which also built
the message (adding data and blocks to the message). These
functionalities are now separated and the Packet is directly
manipulated to add data, and the deserializer is used to put it
into network format.
-------------------------------------------------------------------------
(previous mail - discussion between Tao and Lock, with Tao in blue....)
In my approach all domain knowledge is encapsulated in an abstract
object (abstract not in an OO sense) and the same would be true for
the connection. Right now I am not 100% sure what needs to be done
when sending and receiving but I would assume that the Connection
class will handle this.
I was thinking of a Packet class which holds a message and has a
serializer which will take care of getting all information together.
What sort of flags would that be btw? Who defines which one gets set?
Right now I know about the reliable flag and I assume that this is is
either defined by the template of message.xml. So both could be known
by the message.
The flags are added on by the client. Any packet can be reliable, any
packet can be resent, any packet can have acks added on. Reliable means
the server needs to ack it. Resent means this is a second (or more)
attempt at the packet, and so may be a duplicate if it is received. The
ack flag means that we have attached acks on the end of our message
(saves network traffic). The templates don't define any of this and so
the message itself can't know. These are added on at the time of sending.
The Connection class would then have a list of to be acked packages
and would basically do the same as the message system does.
Similarly, the server can make any packet ackable (by setting the send
flag with the ack flag), and so we must ack it, otherwise the server
gets angry. We don't know which packet we will need to ack, we have to
determine that when we receive one.
It would follow the pattern seen in e.g. smtplib, urllib2 (where the
Request is the message). And most network modules actually have a
connection object, such as ftplib, nntplib, gopherlib etc. Not all
have message classes though because if it's just a file you send, then
there is no need to encapsulate this in a separate object.
http://docs.python.org/lib/message-objects.html
I'm starting to like the idea of a Message. Maybe this Message could be
only the payload of the message, with a Packet class (I think you have
suggested this) having the other necessary fields. Then, a serializer
can serialize the packet and a net framework can send it on a connection.
Also be aware that connections will change during the lifetime of the
client. You don't have a single udp connection. You communication with
neighboring sims, you may switch regions, etc. This causes you to create
a new connection to send on.
But also remember that for udp messages you don't need a connection, you
can simply send the message to any given host. So, it may be extra to
have a connection class doing such a thing because you can use a single
connection to send and receive on. The target we are sending to changes,
but we don't need to change the sockets or anything.
*Current Design: *
This is taken from
http://wiki.secondlife.com/wiki/Pyogp/Documentation/Specification/pyogp.lib.base
messenger = MessageSystem()
host = Host('sim_ip', 'sim_port') #note: these aren't true values, of
course
messenger.new_message("UseCircuitCode")
messenger.next_block("CircuitCode")
messenger.add_data('Code', circuit_code, \
MsgType.MVT_U32)
messenger.add_data('SessionID', \
uuid.UUID(session_id), \
MsgType.MVT_LLUUID)
messenger.add_data('ID', \
uuid.UUID(agent_id), \
MsgType.MVT_LLUUID)
messenger.send_message(host)
_Explanation:_
The thing to know about the current design is that it is encapsulated
into a MessageSystem. Everything from building, reading, sending, and
receiving messages all occurs in the Message System (though each of
the sets of functionality are performed by other objects that the
system HAS).
I think this is a quite good explanation where we differ. As said
before this feels very uncommon in the python world to me.
One concern I also have is all the sort of global state in these long
living objects. It doesn't need to but might lead to problems with
threading or coroutines. I would try to keep locking zones as small as
- You create and send a message in coroutine A
- Sending blocks for whatever reasons
- Coroutine B gets activated, creates a message and sends it. Maybe
with the same message system. It also blocks on sending.
- current_msg is now message of B and this is what A sends.
So this would mean that you need to separate message system per
thread. This also means though that it's only one host you connect to
per message system and thus the host could be in the constructor as
it's quite fixed then.
Yea, I do agree that it is confusing having the message remembered in
state by the system, builders, and readers. I'm starting to like the
idea of outputting a Message that the user adds data to and sends. Maybe
the Message System could remain as the connection you send through and
receive on, which automatically serializes sending packets and
deserializes receiving packets, keep track of all acks and such.
In my approach you of course also just would have one Connection per
thread/coroutine but additionally you could create messages e.g.
outside a thread and pass it into a thread. The send method would also
just have method local variables it works. Packet ID apparently is
something which needs thinking ;-)
For the current design, you don't ever have direct access (handle,
object, reference whatever it may be called) to a Message or a
Connection. Building is delegated to the Message System, which,
underneath the hood, is delegated to the appropriate builder. Sending
is delegated to the Message System, which again, is delegated to the
appropriate sender, this case being a udp_sender. Also note the user
doesn't need to serialize or otherwise perform any functions on the
built message.
One point I experienced in my programmer life was that delegation from
one object to another (and maybe yet to another) makes debugging hard
because if you need to keep in mind which method now was where (esp.
if they are called the same). As I had to debug such systems I feel
more comfortable with calls you perform directly on the object you
actually want to change.
You?ll also notice that the type is given when adding data. This is
not absolutely necessary to have (and can be removed). It is used as
a user-check to make sure the user knows what type of data he or she
is sending. This makes it a bit easier for coders to think their
creation through, as well as other coders who look at it (it may be
confusing to see adding a simple 1 where that 1 can be stored as a
byte, an int, or a long).
In Python you don't care about this. If there is a 1 you mean 1 and
you don't care how it's sent over the wire on a lower level of the
system. Yes, you might run into a problem if you don't know the type
but in my experience this rarely actually leads into problems. Having
no type also makes coding faster as you have to type less and you
don't have to consult the documentation.
So let's get rid of the type-checking, I'm fine with that. It IS just
extra junk I don't feel like typing anyway :)
*Proposed Design 1:*
*A*
conn = UDPConnection(region)
msg = Message('UseCircuitCode',
Block('CircuitCode',
('Code', circuit_code, MsgType.MVT_U32),
('SessionID', uuid.UUID(session_id), MsgType.MVT_LLUUID)),
('ID', uuid.UUID(agent_id),MsgType.MVT_LLUUID)
)
)
conn.send(msg)
* *
*B*
conn = UDPConnection(region)
msg = Message('UseCircuitCode')
block = Block('CircuitCode',
('Code', circuit_code, MsgType.MVT_U32),
('SessionID', uuid.UUID(session_id), MsgType.MVT_LLUUID)),
('ID', uuid.UUID(agent_id),MsgType.MVT_LLUUID)
)
msg.add(block)
conn.send(msg)
* *
BTW, now that I look at it again I think a Message is just a list of
blocks so it could even derive from a list object and add() would be
append. Blocks seem like dicts to me with the exception that they have
a name. But they could be more easily instantiated as
blk = Block('CircuitCode',
Code=circuit_code,
SessionID=sessionid,
ID=agent_id)
msg.append(blk)
_Explanation:_
This takes the code for the current design and makes it more
Pythonic. It essentially makes a wrapper class called Message, which
can handle Pythonic structures, and can create a message like that of
the current design.
In the A version of this design, the constructor takes in all the
blocks and data and then would construct the message completely. The
B version allows users to create blocks separately and add them into
the message. These two methods could be combined, in fact.
You can of course also first create the blocks in separate vars and
then pass them into the Message constructor: Message(name, blk1, blk2,
blk3)
_Pros n Cons:_
This method allows us to keep most of the same design in place, with
an additional layer that wraps the message creation to make it less
sequential and more Pythonic. It cuts out all the calls to
new_message, next_block, and add_data, allowing users to pass in more
Python structured data (form of lists).
This means less typing which to me is always a pro :)
With the above change even less typing.
Messages can have multiple and variable number of blocks with the
same name, so this method would consist of the user passing in a list
of blocks rather than just a single block into the constructor. This
is not too difficult to handle.
Having the constructor take the entire message may be complicated and
visually difficult to parse for the user. It is also prone to syntax
errors.
I actually see this the completely other way, esp. with
msg = Message('UseCircuitCode',
Block('CircuitCode',
Code=circuit_code,
SessionID=sessionid,
ID=agent_id)
)
Of course if the message is more complex you would probably create
blocks separately and then pass them in. But both would be possible.
It would also remove one bit of delegation (add_data) and methods
would only be defined on those classes which they actually implement.
It also refactors the way the Message System, builders, and readers
work. Some messages are template messages, which means the messages
MUST be built according to the template. If they are not, then they
shouldn?t be allowed to be built and sent. The builder makes sure
this doesn?t happen. These designs get rid of a builder and put it
directly into the message, which means the message IS the builder.
When the message is being created, we somehow have to determine what
type of message it is (template or llsd) and use the correct builder
(or at least make sure messages are being built correctly).
I have one superclass Message from which I have derived LLSDXMLMessage
and UDPMessage. There is a MessageFactory utility which can be used to
factory = getUtility(IMessageFactory)
message = factory.new('UseCircuitCode')
You can then look into message.flavor to check the flavor.
To serialize either message you then do
serializer = ISerialization(message)
serializer.serialize()
This is the same pattern as in the rest of the library.
My first though on this was though to create just an LLSDMessage class
which doesn't know about it's final encoding. This is decided on
serialization time. I think this would more follow the protocol
structure as both types are actually equivalent.
The problem was that the message based template was initialized from
the template on instantiation which the XML version not always could
be because not every message is in the template.
I am not sure if the initialization is necessary or just made to have
default values here. I would think it's not necessary as the template
is known in both approaches and you can also check for invalid blocks
when you add new ones (might raise an InvalidBlock exception).
The serialization step in this case would look like above just that
the serializer would consult the MessageDict (which is a utility in my
case).
There might then also be a MessageDispatcher which does the same so it
knows over which channel to send this message (I guess for XML
messages it's simply the cap we have and we do cap.POST(data).
Right, so I'm thinking the Message System could do all this. Maybe the
Message System could be the factory and dispatcher, with all messaging
being sent and received going through it (but BUILDING messages not
going through this).
* *
*msg = api.new_message('PacketAck')
msg.next_block('Packets')
msg.add_data('ID', 0x00000001, MsgType.MVT_U32)
msg.next_block('Packets')
msg.add_data('ID', 0x00000001, MsgType.MVT_U32)
data = api.serialize(msg)
connection = UDPConnection(host)
connection.send(data)
_Explanation:_
The new proposed design has a few differences. One is what the
responsibilities of each of the objects is. You'll notice in this
design you have direct access to the message. The message is also the
builder, so you perform building operations directly on the message
(whereas in the current design you use a builder to add data to the
message). You'll also notice that you have direct access to the
UDPConnection and therefore you direct the message to the connection
you wish to send it to.
Actually I would prefer the design above with Blocks and Messages.
message (up to 500 I believe) will have its own unique class that
will initialize the data attributes.
We would start with the ones we actually use in the library. If
somebody needs to use an additional one he can still use the more low
level version (Message('name', Block(...), Block(...)) ).
We also have to look at every message in the protocol spec anyway and
define it there in detail. When we do this we can go along and define
them in code as well. I am also willing to do that.
A pro here would be that you can put default values in the class so
that you don't have to specify all parameters.
When receiving a message you would have the possibility to attach an
event handler directly to that class using ZCA.
Another pro is that the user of this level doesn't have to know about
blocks and the sequence of these. She only needs to know about the
actual data to be passed in.
Well, we can do this with ZCA without deriving a class for each message.
We can have them all implement an interface and register them with a
certain name. This way we don't have to write each individual class, but
can have a generic Message which can handle them all, with handlers. The
default data can be added in by the Message Factory (which looks into
the template and fills in the message with default values).
I guess this is the problem then. If we have a single Message which
builds itself (add_block methods), we cannot write a Message class which
tests that the data being added is correct and expected. Unless the
message itself is a UDP message derivation and can look at the message
template itself and do the checking.
_Questions:_
The region domain stores the connections, both udp and http. So would
msg.send(region)
region.send(msg)
api.send(msg, region)
conn = UDPConnection(region)
conn.send(msg)
How are the UDP packet flags added onto the packets being sent? They
are apparently not built into the message itself (because they are
only UDP), so need to be added on when actually sending the message.
These depend on how you want to send the message (want an ack) and so
can vary per message, and they are not always the same even on a
single circuit.
Do I get this right that the message type defines the flags needed?
I shortly looked into your code and I think I would do it similarly.
You have all the data in MsgData without those flags and you add them
on sending. I would maybe move some logic from the msgsystem into the
send_flags = ...
packet=Packet(id, message, send_flags)
packet.addAcks(self.acklist) # might be in the constructor as well
serializer = ISerialization(packet)
packetdata = serializer.serialize()
# what defines if it's a reliable packet?
self.udp_client.send_packet(self.socket, packet_data, self.host)
This is just a quick shot without reading the code in detail so it
might be wrong ;-)
The reason there are builders is because the template messages must
have the correct data. The template builder makes sure that blocks
and data being added to messages follow the template?s specification
(LLSD has no format because it is going to be formatted into XML, and
deserialized when being received, and so the arrays and dicts can be
directly accessed). How is this accomplished without going through
the builder? How do we distinguish between creating a template
message (making sure it has the correct data) and an llsd message?
As said above, by the message factory. It gives you one of two classes
of messages.
See
http://svn.secondlife.com/trac/linden/browser/projects/2008/pyogp/pyogp.lib.base/branches/mrtopf-message-refactoring/pyogp/lib/base/message/message.py
in line 46 for the message factory. Message types follow below. This
is not using blocks etc. as in the example above though.
Who does the message maintenance? Meaning, who keeps track of the
packets that need to be acked, the ones we want acked, and resending
messages that weren?t acked? Do we leave this up to the user to
create such a system?
Some Connection class which seems to me similar to your message system
and circuits.
BTW, what actually is a circuit? Is it a connection to a region? Or
can you have many circuits to one region? This part of the protocol is
not that clear to me right now. We probably should write it down if it
isn't somewhere (but it should be part of the spec at some point anyway).
A circuit is a UDP connection. So, it is a UNIQUE connection to ip
address and port combination. Can only have 1 circuit for each ip and
port combination.
Thanks for your work, I'm starting to see where we can improve things.
I'll start writing down my new proposal and see if we can get something
working.
PS: I don't like the idea of ZCAifying things like the dictionary just
so that we can register them with ZCA as a global utility. It is an
extra abstraction that is confusing and the reasoning not clearly seen.
Something else we can do?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.secondlife.com/pipermail/pyogp/attachments/20080813/ca9e5363/attachment-0001.htm