Scala School

About

Scala school was started as a series of lectures at Twitter to prepare experienced engineers to be productive Scala programmers. Being a relatively new language, but also one that draws on many familiar concepts, we found this an effective way of getting new engineers up to speed quickly. This is the written material that accompanied those lectures. We have found that these are useful in their own right.

Approach

We think it makes the most sense to approach teaching Scala not as if it's an improved Java but as a new language. Experience in Java is not expected. Focus will be around the interpreter and the object-functional style as well as the style of programming we do here. An emphasis will be placed on maintainability, clarity of expression, and leveraging the type system.

Most of the lessons require no software other than a Scala REPL. The reader is encouraged to follow along, and go further! Use these lessons as a starting point to explore the language.

Ship or Die! - SHIPPING IS EVERYTHING.

Ship or Die!

There is a tide in the affairs of men.
Which, taken at the flood, leads on to fortune;
Omitted, all the voyage of their life
Is bound in shallows and in miseries.
On such a full sea are we now afloat,
And we must take the current when it serves,
Or lose our ventures.

The quote is from Shakespeare’s Julius Caesar. It sources the ship building enterprise of days gone by when builders couldn’t time the high tides accurately. The smarter builders would launch their vessels even when unfinished, into the waters when the tide hit and hope to complete it at sea. And meanwhile those who chose to ignore the tide in favor of perfectly built ships often dealt with a perfectly build ship rotting away perfectly.

Aside for the navy, the ships were built for a simple purpose - sail to far away lands, acquire goods, raw material, spices for the cheap and return home to sell them at a margin that would cover the cost of the trip and then some more. That’s it.

The essence of the enterprise is to get to these other lands and NOT sit around in the harbor fawning over your creation. Unfortunately that’s what a lot happens in software — gold plated code with unit tests, agile up the wazoo, but too late to the market. Sitting around in the source repo, rotting away perfectly.

I have learnt lessons over a decade plus of writing software and only recently achieving success in shipping it such that the enterprise was successful. We returned home to wine and song and our own sanity. On the way I earned friends and fellow shippers, earned their respect and learned humility to ditch my ideas of what will it take to ship, park that ego or whatever else is in the way and commit to the cause.

Last year (2010), we ended a tremendous journey with exit of (Lil) Green Patch to Playdom/Disney which was one hell of software sprint during which the team reacted to not only changing cloud scale landscape but the very notions of business amidst social media’s nascent awareness of itself. We won. The fat lady sang. 

I now spend my waking hours thinking about what am I shipping today and my sleep about dreaming stuff I want to ship as soon as I wake up, if I don’t ship, I might as well be dead:

And we must take the current when it serves

Acknowledgements: Thanks a lot guys for helping out with the draft! Avichal Garg, Brian Lynch, David King

Don't let good ideas rot in the source repo.

MemoryImage - martinfowler.com

MemoryImage

31 August 2011

database · application architecture

tags:

When people start an enterprise application, one of the earliest questions is "how do we talk to the database". These days they may ask a slightly different question "what kind of database should we use - relational or one of these NOSQL databases?". But there's another question to consider: "should we use a database at all?"

One of the defining characteristics of enterprise applications is the need to store long term data, which naturally leads people to reach for a database. After all persisting data is one of the main things databases do. Using a memory image is a different route to persistence that doesn't involve a database.

The key element to a memory image is using event sourcing, which essentially means that every change to the application's state is captured in an event which is logged into a persistent store. Furthermore it means that you can rebuild the full application state by replaying these events. The events are then the primary persistence mechanism.

A familiar example of a system that uses event sourcing is a version control system. Every change is captured as a commit, and you can rebuild the current state of the code base by replaying the commits into an empty directory. In practice, of course, it's too slow to replay all the events, so the system persists periodic snapshots of the application state. Then rebuilding involves loading the latest snapshot and replaying any events since that snapshot.

Event sourcing has many consequences, including the ability to rebuild past states. But the important property for memory images is that it means that there is no longer any need to worry about keeping the application state in an up-to-date persistent store. Instead you can just keep the application state in main memory. Should the process crash, you can rebuild it from the events (and snapshots).

Using a memory image allows you to get high performance, since everything is being done in-memory with no IO or remote calls to database systems. Perhaps more importantly it means you can get rid of database mapping code, or worrying about synchronizing between in-memory state and database state.

Against that you do have to ensure you ensure you can reliably store the events and process them. You also need to write the code to save and load snapshots and figure out how to restore the system quickly enough to keep your quality of service up. Databases also provide transactional concurrency as well as persistence, so you have to figure out what you are going to do about concurrency.

Another, rather obvious, limitation is that you have to have more memory than data you need to keep in it. As memory sizes steadily increase, that's becoming much less of a limitation than it used to be.[1]

A number of different kinds of systems can make use of a memory image, I'll mention three examples I've come across.

The most recent is LMAX. LMAX is a high performance trading system, which processes 6 million TPS on a single JVM thread. Here the performance advantage of a memory image is obviously a big factor, but they found the simplification in programming model to be equally important. They don't have to worry about concurrency as its all about a single thread. To keep availability high, they run multiple copies of the memory image so if one goes down they can switch over to another instance while keeping their very high transaction rate.

A few years ago I wrote about a couple of systems using an EventPoster architecture. This style provides read access to the in-memory model to lots of UIs for analytic purposes. Multiple UIs mean multiple threads, but there's only one writer (the event processor) which greatly simplifies concurrency issues.

The oldest example is also the inspiration for the name - the Smalltalk development environment. Most development tools rely on text files in a file system which a compiled or interpreted as needed. Smalltalk held all its source code and compiled method inside the image [2]. Every command you executed got stored inside a change log. Most of the time you saved your image (snapshot) but if necessary you could replay the change log from a stable base if you did something foolish.

Like many of these kinds of ideas, it's an approach that's been used and reinvented many times [3], but never got mainstream traction. Having a database hold persistant data continues to be the more common approach.

One problem I've heard of with memory images is around migration. Whenever you are building a software system it's important to understand how it will handle changes. With a memory image, the essential task is to ensure you can continue to rebuild the memory image from the the event log.

One trap here is to use an serialization structure for the event log that doesn't handle evolution gracefully should you want to change the structure of events. If you create specific event classes and serialize them, this may make it difficult to process old events should you change the structure of the event class later. Often it's best to serialize with generic data structures such as maps and lists.

Also it's important to keep a good decoupling between the events and the model structure itself. It may be tempting to come up with some automatic mapping system that retrospects on the event data and the model, but this couples the events and model together which makes it difficult to migrate the model and still process old events.

At some point, it may be worthwhile to migrate the event log itself from an old format to a new one. Migrating the event log is often more hassle, but may be an option if you've evolved a long way from the original event structures.

For a long time, a big argument against using a memory image was size, but now most commodity servers have more memory than we were used to having in disk. As a result most working sets can now be held safely in memory. We noticed this a few years ago, but memory images are still relatively rare. I think that now the NOSQL movement is causing people to re-think their options for persistence, we may see an upsurge in this pattern.

1: You may also be able to reduce memory usage if you only need a subset of the data that's in the events. You can send the same events to different memory image systems for different subsets of data if that makes sense for your needs.

2: I'm using the past tense here for Smalltalk because it's a long time since I used it and it may have changed its behavior over the years. It is still around, although sadly only a niche environment.

3: Some people also may remember the Prevayler project, which is an open-source implementation of this approach. It generated a lot of noise in the Java community a few years ago, but has been rather quiet since. That community uses system prevalence as a generic term for this approach.

Events as a persistance mechanism.