Re: [jetty-dev] NoSQL Session manager

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [jetty-dev] NoSQL Session manager

From: Greg Wilkins <gregw@xxxxxxxxxxx>
Date: Thu, 2 Jun 2011 10:58:56 +1000
Delivered-to: jetty-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/jetty-dev>
List-help: <mailto:jetty-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/jetty-dev>, <mailto:jetty-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/options/jetty-dev>, <mailto:jetty-dev-request@eclipse.org?subject=unsubscribe>

On 1 June 2011 20:15, Simone Bordet <sbordet@xxxxxxxxxxx> wrote:
> On Wed, Jun 1, 2011 at 05:10, Greg Wilkins <gregw@xxxxxxxxxxx> wrote:
>> The servlet 3.1 EG will soon be considering how to better support
>> cloud/clusters in the servlet spec and one of the things that will
>> need to be considered is how to better support HttpSession that are
>> backed with NoSQL style scalable stores.
>>
>> We do have a prototype NoSQL session manager for jetty, but before
>> moving forward with that, I'd like to discuss here a number of the
>> issues/options for how it can proceed.   But first, lets review what
>> the jetty HashSession and JDBCSession managers do - as they are a lot
>> more than just maps of ID's to session instances.
>>
>> Sessions are created with an ID that is unique within all the contexts
>> that share an sessionIDmanager.  Multiple contexts can share session
>> IDs (as there is only 1 session cookie), but cannot share session
>> values.   If one context invalidates a session, then all contexts with
>> that session ID have their sessions invalidated.    So the data
>> structure is not:
>>
>>   ID --> Attribute --> Value
>>
>> but is instead
>>
>>   ID -> Context --> ID --> Attribute --> Value
>>
>> the double ID mapping is for efficiency within a context.
>>
>> Currently the session ID is split between a clusterID and a nodeID
>> part.  The cluster ID is unique within the cluster, while the nodeId
>> includes a suffix that can be used for load balancer stickyness.
>> Jetty supports session migration such that if a node receives a
>> session ID with a mismatched nodeID, then the session cookie is
>> rewritten and the session is consider migrated.
>>
>> For a nosql session manager, we need to ask if explicit stickyness in
>> the session ID is needed.  Connection stickyness may be sufficient for
>> efficient operation of nosql sessions.
>
> I think, as you also note below, that removing stickyness is asking
> for *big* troubles.

I think that generally HTTP/1.1 connections tend to be sticky because
they are persistent.
Trying to enforce 100% stickyness is an impossible task and works
against you when you try to handle migration and/or failure.

I think we need a solution that works without stickyness, but then you
make clients as sticky as possible as an optimisation.

>> Given this, I think we need to consider if the session manager should
>> be explicitly caching sessions in memory.  Rather it should delegate
>> caching to the nosql layer and rely on it's mechanisms for maintaining
>> distributed cache consistency.
>
> Perhaps I am missing something, but I fail to see how this is possible
> without a fully coherent transactional cache replication with
> pessimistic lock handling.

Well it may be that we have to give up on "fully" coherent.

What I do know is that if we take a copy of a session and hold it in
memory without any consideration of the underlying DB, then we are
asking to be incoherent.

So what I'm really saying is that on every session access, the session
manager should as the data access layer for the session object - if
that comes out of a local cache, then great!  if that local cache is
cluster coherent - even better!   But the caching/coherency semantics
will be driven by the data layer and not by us forcing the answer by
holding our own in memory copy.

>> If we don't want to have sessions concurrently in different nodes,
>> then we really need to think of a way that can be enforced
>> efficiently.... perhaps using the nodeID in the cookie? But I think
>> for scalability it is inevitable that concurrent instances will
>> eventually be allowed, which brings up the question of what is the
>> semantics/granularity of concurrent session updates?
>
> I am questioning "for scalability we need to allow concurrent
> instances in different nodes", which IMHO is not true (as scalability
> can be achieved with stickyness).
> What are the drivers in supporting this view ?

If you have a java.net login, you should be able to observe the expert
group mailing list archives.  Here is a thread about this in which the
google rep is calling for concurrent access:

   http://java.net/projects/servlet-spec/lists/jsr340-experts/archive/2011-05/message/18

He doesn't really support it there, but I've certainly had issues with
the requirement for non-concurrent access.

Specifically trying to enforce non-concurrent access is really
impossible, as if a request for a session arrives at a different node,
it could be due to some network partitioning failure and and the node
might not even be able to contact the original node that is holding
the session.     At very least this requirement forces centralized or
global pessimistic locking of sessions so that a single instance can
be maintained.

I think a lot of webapplications can operate in a more relaxed mode,
where they don't need the contents of their sessions to be unique or
even coherent- they just want them available.

I would argue that if your application wants atomic, coherent state,
then use a database and don't put that data in a session which is
neither and is barely functional on a single node, let alone a
cluster.

Also, if you are non-sticky or even just partially sticky, having a
session move back and forth between nodes can be very expensive.
Better to let it just exist on both nodes and deal with the issues
that result from that - which are pretty much the same issues as
multiple requests hitting the session on the same node anyway.

>> If session 12345 exists in both node A and node B, and both are
>> updated at about the same time, with attribute xyz updated in node A
>> and attribute pqy updated in node B,   then should the resulting
>> session state reflect the changes to both xyz and pqy, or should the
>> update of one be overwritten by the update of the other?   ie should
>> we persist at session or attribute granularity?
>
> You can't solve this problem if not via pessimistic locking and that
> would be a scalability killer.
> For example, what if nodeA removes attribute "foo" and nodeB changes
> its value at the same time ?
> You need pessimistic locking because with optimistic you can't decide
> (you can detect that a concurrent update happened, but you cannot
> decide the final status of the attribute - changed or removed).

There are solutions for this problem - although they are not going to
solve the full coherency or the atomic issues.

Specifically if we are using a document style nosql database and a
session is a document and attributes are properties of that document -
then we have two choices of how to persist a modified session:

  sessionDocument.save()

Or

  for (String name : sessionDocument.dirtyAttribute())
     collection.update(sessionDocument.id(),
sessionDocument.property(name),sessionDocument.get(name));
     // TODO add code to test if a remove property is needed

The former attempts to keep the session internally consistent  ( at
least from the view of a single node (but multiple requests to the
same node can still result in a session being saved when partially
updated and in a transient state)).

The later gives up on session consistency and just models session
attribute manipulation as a cluster wide streams of updates.  The
actual order of those updates will never be able to be guaranteed
because of multiple requests within a node or to different nodes.

>> Also, the perennial bug bear of distributed sessions is how to handle
>> last access time.  While many sessions are read mostly, the last
>> access time is updated on every request.  We don't want this to
>> trigger full serialisation of the session on every request - nor do we
>> want multiple nodes to fight about who has the correct last update
>> time.  This is something best kept in memory and only occasionally
>> swept to the persistent store.
>
> It's worse than that. You need atomic updates of the lastAccessedTime,
> and you need to be able to run over them when you sweep without
> migrating the whole session (if you have a non sticky model).
> Again, that is why you do not want to give up on stickyness.

Because 100% stickyness is impossible to implement.   Stickyness is
still desirable, just not a given.

Also I don't know if we need to be absolutely accurate with
lastAccessTimes and session expiry.  I think a
near-enough-is-good-enough approach will work for 99.9% of
webapplications, with the main criteria being that idle sessions will
never live forever.

>> your thoughts?
>
> I think a viable solution is to have stickyness, which means there is
> a master session on one node, and that session is the only one that
> can be updated.
> This solution is deployed around the world and proved to scale.

Well essentially what I'm advocating is that the one master copy of
the session lives on a node in the nosql database rather than on a
node in the webtier - this too is a solution deployed for non-session
data on many websites and I believe scales much better than data in
the webtier (see memcache for an example).

Migration, fault tolerance and dynamic clouds are much easier to
implement when you don't have to be 100% sticky.     Stickyness should
be viewed as an optimisation to make caches work better rather than as
an immutable assumption on which you build other assumption.

> Another solution is to have a pessimistic pass-by-reference
> distributed state updateable from any node, but I personally have
> experience of very bad scalability of this solution because of the
> distributed locking involved.

agree that we need to avoid such locking.

Note also that I'm not advocating that we drop support for our
existing session managers that work well in memory, in JDBC and even
on shared file systems.

cheers

References:
- [jetty-dev] NoSQL Session manager
  - From: Greg Wilkins
- Re: [jetty-dev] NoSQL Session manager
  - From: Simone Bordet

Prev by Date: Re: [jetty-dev] NoSQL Session manager
Next by Date: Re: [jetty-dev] NoSQL Session manager
Previous by thread: Re: [jetty-dev] NoSQL Session manager
Next by thread: [jetty-dev] Fwd: Test modules in Jetty
Index(es):
- Date
- Thread

Breadcrumbs