Lost in the flood #1: programming, Japanese gardens, music, machine learning

The problem: an ever expanding set of Chrome tabs of read/study/watch/respond-to later material. I’m trying a new solution: put the links in a blog post, write why I haven’t closed the tabs, close the tabs. (And come back to them later. Maybe.)

Let’s go.

What is embodied programming? blog post by Alex McLean (@yaxu). Alex is trying to give a name to something he’s calling embodied programming. From a quick skim I think it’s related to programming in a cyber-physical loop with the machine. But I need to read Alex’s post properly to be sure.

I first encountered the term cyber-physical in a paper by Andrew Sorensen (@digego). Programming With Time – Cyber-physical programming with Impromptu (Sorensen and Gardner 2010). Andrew Sorensen recently posted a video of his latest work: Physics Playroom at QUT’s new Science & Technology precinct, and gave a detailed technical description on the extempore mailing list. Physics Playroom is an impressive piece of interactive programming making use of a huge aggregated video wall, a cluster of computers, and bank of multi-touch screens capable of simultaneously interacting with a hoard of school kids. Playroom is developed in Andrew’s latest live coding system extempore. (If you’re reading this Andrew, thanks very much for answering my question in such detail!)

Emotional responses to music: The need to consider underlying mechanisms (Juslin and Västfjäll 2008). Related to my recent emotional response on the music-dsp mailing list.

One thing I did find time to watch was Dream Window: Reflections on the Japanese Garden (1992) a stunning documentary. IMDb describes it as “Exquisite exploration of landscape and Toru Takemitsu’s music for a Japanese moss garden.” Highly recommended.

My interest in Japanese gardens arose from an ongoing reflective process concerning software design. I was thinking about the concept of negative space — the space around things — the “anti-object”. This lead me to the related (but I don’t think synonymous) Japanese word “ma(here’s another definition of “ma”, and an interesting discussion about Ma at Metafilter). In the process I found MA: Space/Time in the Garden of Ryoan-Ji a video artwork by Takahiko Iimura. The artist’s notes about the work provide an interesting insight into the compositional process. I would like to know more about ma.

Generic programming with C++ concepts and Haskell type classes—a comparison (J.F. Bernardy et al 2010). Lately I’ve been writing C++ code that is better expressed using compile-time polymorphism (templates) than C++’s good old runtime polymorphism (virtual functions). I’ve been thinking about the not-yet-extant C++ concept specifications, which lead me to the linked paper. My knowledge of Haskell is limited. The relation between C++ templates and Haskell type classes has been on my radar since I encountered Bartosz Milewski’s Monads in C++ blog post a couple of years back.

The Fast Cauchy Transform and Faster Robust Linear Regression (K.L. Clarkson et al 2012) paper at arXiv.org and summary at Nuit Blanche (via @IgorCarron). Fast, Robust, Randomised, what more could you want? Fast L1 regression is a hobby of mine. If nothing else this paper will give me a chance to practice my Linear Algebra.

Machine learning – Random forests video of lecture by Nando de Freitas. Random forests, aka decision forests, and ensemble methods. Seems like something I should know more about.

NI have released Traktor DJ for iPad. I am intrigued by the UI design.

Striking it Richer: The Evolution of Top Incomes in the United States (Updated with 2011 estimates) by Emmanuel Saez (via @accessjames). My version of following the movie stars.

A brief meditation on two-party message exchange

When two parties communicate, what are the available patterns of message exchange?

Here’s what I’ve come up with so far:

Send


Send

Also known as: procedure call, one-way, unidirectional, post, fire and forget, In-Only, Out-Only, request (R), trigger, event, notification, command, producer-consumer.

At this level "sending a message" is the ultimate primitive (although it decomposes into: prepare, post, transport, deliver, receive).

Request-Reply


Request-Reply

Also known as: function call, round-trip, request/response (RR), In-Out, Out-In, call-response, request-result, remote procedure call, polling, solicit-response, query-response.

The reply may provide useful information, or simply acknowledge receipt of the request. The reply might be optional ("In Optional-Out"). Spector (1982) also considers the three message exchange: request/response/acknowledge-response (RRA) . This may not be necessary when message transport is reliable.

Request-Multi-reply


Request-Multi-reply

A single request results in a number of reply messages.

Also known as: progress callback.

Each reply reflects a phase of a progressive process.

Subscribe-Notify


Subscribe-Notify

Also known as: observer, periodic timer, streaming updates, event driven, Hollywood principle.
Note that this is not the same as the multi-point publish-subscribe distribution pattern.

The final acknowledgement is required for the subscriber to be sure that it won't receive any further notifications.

An implicit subscription variant exists where notifications are sent without the client subscribing (like spam). Maybe that's called Advertising.

One-shot notification

Normal case:
One-shot notification

Canceled case:

One-shot notification (canceled)

Also known as: one-shot timer, delay, asynchronous completion notification, future, cancelable-future, callback.

?

Do you know other patterns? Let us know in the comments.

What's this?

I'm musing about message exchange patterns. Specifically, those relevant to reliable asynchronous message exchange between two threads in a concurrent shared memory system. I wrote this in an attempt to abstract the message exchange aspect of this post. I've focused here on message exchange events, ignoring the details of the items being exchanged. When you also consider the items being exchanged, higher-level patterns emerge (consider juggling patterns for example).

There are many areas of computing that study or use message exchange. They include: protocol design, real-time systems, distributed computing, SOAP MEP, MPI, bus communication protocols, and process calculi. I am not an expert in these fields but I am interested in what they can teach me about message exchange. There are some links to related resources at the end.

Additional reflections

Acknowledged assumptions

I have assumed that the two parties outlive the message exchange.

I have assumed that the two parties know how to address messages to each other -- although often it is enough that the requestor know the address of the respondent, since the requestor can provide a return address to the respondent as part of the request.

Conversations

Request-Reply, Request-Multi-reply and/or Send sequences can be concatenated to form longer back-and-forth conversations. Certain transactions such as two and three-phase commit require multiple request-reply pairs.

Symmetry/asymmetry


Symmetry/asymmetry

The relationship between endpoints may be symmetric: either party may be the initiator in the above exchanges; or asymmetric (as in master-slave, client-server relationships etc.).

WCF has a related concept called Service Contracts. WCF defines three types: Request-Reply, One-Way and Duplex.

Party visibility

Does the receiver know the identity of the sender? Does the message provide this information?

A reply is only possible if the receiver knows the sender's identity. Sender could provide a return address, or receiver may already know the address. Are sender and receiver knowing participants in a fixed (or dynamic) topology? Does receiver retain or store sender's identity? When does receiver need to do this? (Subscriptions are one case.)

Message bus architectures distribute events to subscribers without the sender knowing the identity of the receiver. Anonymizing servers provide a route between requestor and respondent without revealing the identity of the requestor to the respondent.

Message semantics

Some writers (eg. Spector 1982) are concerned with efficient and minimal primitives for message transport. SOAP Message Exchange Patterns are similarly concerned with mechanism.

You can also consider the purpose or role of a message. For example, a unidirectional message may be a command to perform an action, or it may be a notification that something has happened. Notifications can be further subdivided into those provided for informational purposes ("last lap of the race") and those that require action ("evacuate now").

This post from i8c on Basic Message Exchange Patterns makes an argument for considering more expressive patterns that capture the abstract purpose of a message exchange. It lists 10 communication patterns from SAP, broken into two layers: service communication patterns (query/response, request/confirmation, information, notification) and transaction communication patterns (information, notification, query, response, request, confirmation).

This article on Messaging Patterns in Service-Oriented Architecture also presents "higher level" patterns. For example, when considering unidirectional messages it distinguishes between Command messages, Event messages (notifications) and Document messages (information transfer).

Consider the difference between a command (sent with intent) and an event/notification (interpreted by the receiver).

Also consider command-query separation (and here).

Reactive vs. dataflow

Some message passing systems are concerned with managing the flow of data between processors in a computation (eg. HPC systems based on MPI), whereas others are concerned with reacting to external events or processing transactions (eg. operating system kernels, interactive systems).

Stateful vs. stateless protocols

By focusing on message exchange I have sidestepped the perspective of stateful protocols. Stateful protocols are (usually?) modeled as communicating state machines. Messages drive state transitions and the current state determines which messages are legal. (Searching for "communicating state machines" is a good place to start reading about this.)

Non-software processes

I am interested in applying these patterns to messages passed between communicating computer programs, but the patterns appear in the real world too. Real-world examples include: sending a postcard, an exchange of letters in a legal or bureaucratic process, patterns of business communication, types of business transactions, subscribing to a periodical, marketing email workflows, customer service processes, and so on.

Am I missing something?

Please let me know in the comments if you can think of other patterns, book recommendations, good links or a more abstract (domain-neutral) treatment of this material.

I tried the following searches. Can you suggest other keywords or terms I have overlooked?

communication patterns | two party communication patterns | service communication patterns | transaction communication patterns | message passing theory | message exchange | message exchange theory | packet exchange theory | message exchange protocols | two party message exchange protocols | message communication theory | event driven patterns | event interaction ontology | event exchange patterns

Responses from the Twitterverse

Here are the suggestions for additional message exchange patterns that I've received so far (updated January 9, 2013):

Alex McLean @yaxu tweets:

how about scissors-paper-stone voting?

Me: good one. how to draw the picture? +not sure of utility since it is either biased (w/ defaults on ties) or non-wait-free (w/ retries)
on further thought it's just a request-reply unless you have shared memory, in which case it's not a message exchange.
reqest/reply: A proposes game along with "sealed" vote, B replies with it's vote, then opens the seal. B must be honest.

The replies should be simultaneous though / or be received/processed after they've all been sent.
There may be similar constraints around time sync protocols.. E.g. data not in the message but in time between send and reply

Me: Indeed time sync protocols are something to consider. As for rock-paper-scissors and timing: www.youtube.com/watch?v=3nxjjztQKtY
Yeah that's related to Thor's diplomatic reading between the lines too.

ok how about DNA message that creates a third party?

thor magnusson @thormagnusson tweets:

Also: communication based on not willing to communicate: ignoring messages & breaking protocols. (Eg. Middle east & N-irland)
Also, Austin's book "How to Do Things with Words" on speech acts, might be an interesting read. (BTW good #openresearch!)

Damian Stewart @damian0815 tweets:

Telephone pictionary. It's just unreliable Send, but after several transmissions the unreliability becomes the message.

I'll have more to say about this shortly.

Background reading (for you and me)

Transport level

Programming models

Operating systems

Enterprise, SOA, SOAP, EIP

MPI

Interaction ontology

  • Gerrit Niezen, "Ontologies for interaction: enabling serendipitous interoperability in smart environments" PhD TU/e repository.tue.nl/735539 [pdf]

Communication theory

This is getting a bit lateral, but it might be useful...

Thanks for reading!

Programming with lightweight asynchronous messages: some basic patterns

This post introduces some basic asynchronous message passing patterns that I’ve found useful while implementing real-time audio software. The patterns apply to in-process shared memory message passing. For the most part this post is intentionally abstract. I won’t talk too much about concrete implementations. I do assume that the implementation language is something like C/C++ and that the message queues are implemented using lock-free ring buffers or similar.  For a motivation for this style of programming see my earlier post.

This post isn’t the whole story. Please keep in mind that there are various ways to combine the patterns, and you can layer additional schemes on top of them.

Meta

I wanted to enumerate these basic patterns to give me some clarity about this approach to writing concurrent software. While writing this post I started to look for a mathematical formalism that could capture the semantics of this kind of asynchronous message exchange. I looked at process algebras such as the actor model and pi-calculus but they didn’t seem to model asynchronous message exchange, distributed state and atomic state transitions the way I have done it here. To me the approach presented here seems similar to what you might use to model bureaucratic processes involving asynchronous exchange of paper forms and letters.  If you know of an existing formalism that might be used to model these patterns please let me know in the comments, I’d love to hear about it. Thanks!

Introduction

Communication between execution contexts

This post focuses on patterns of asynchronous message exchange between a pair of execution contexts.

I use the term “execution context” instead of “thread” because I’m thinking not only of threads but also of interrupt handlers, signal handlers, timer callbacks, and audio callbacks. An execution context might also be the methods of an object that only operates in one thread at a time, but may migrate between threads during the course of program execution.

Some execution contexts impose certain requirements, such as not blocking (see my earlier post for a discussion of what not to do in audio callbacks).

I assume that the contexts share the same address space, so it is possible to pass pointers to objects between contexts.

I assume that processing a message is atomic and non-re-entrant with respect to the receiving execution context.

Passing messages via queues

When I talk about asynchronous message passing I have in mind an implementation where messages are communicated between threads using queues. The sender and receiver operate asynchronously. The sender doesn’t usually block or wait when sending a message. The receiver chooses when to collect and interpret messages. The queues are light-weight data structures often implemented using lock-free ring buffers. Writing to and reading from a queue are non-blocking operations. The diagram below illustrates the scheme:

In all of the diagrams in this post the left hand side is one context, the right hand side is another context (sometimes called the sender and receiver, or source and target). The half-headed arrow indicates that a message is passed asynchronously in the direction of the arrow by writing a message into the queue from one context and reading the message from the queue in the other context at some later time.

From the perspective of each of the two contexts, code for the above diagram might look something like this:

// execution context A:
{
    // post a message:
    q.write( m );
}

// ... some time later
// execution context B:
{
    // receive and interpret pending messages
    while( !q.empty() ){ // in this case we don’t block if the queue is empty
        Message m = q.read();
        interpret_message( m );
    }
    ... continue other processing
}

I’ve shown messages being passed by value (i.e. copied into a ring buffer). For small messages and single-reader single-writer queues this usually works out well. I mention some other variations later.

To send a message in the opposite direction (from context B to context A) a second queue is needed:

For the rest of this post I’m not going to explicitly notate the contexts and queues: when you see a half-headed arrow it means that the message is being passed asynchronously from one context to another via a non-blocking queue.

Program organisation: a network of message queues and execution contexts

There are different ways to connect between the various execution contexts and objects of a program with message queues. Some options that can be used in combination are:

  • Fixed topology of contexts and queues. A static network of queues are used to pass messages between contexts. In the extreme, contexts may “own” a queue for receiving messages and run an event loop — like the post message/event mechanisms of Win32 or Qt.
  • Objects maintain their own receive queues. Objects can migrate between contexts and a message can be sent to the object irrespective of what context it is executing in. In this situation queues are a bit like ports in the ROOM formalism.
  • Message queues or “channels” as first-class objects. E.g. a queue for returning results might be passed via a message to a worker thread — when the worker thread completes it writes the result to the queue.

A few concrete use cases are:

  • A pair of queues are used to communicate messages and results between a main (GUI) controller thread and an audio callback context. This is the primary communication architecture used in SuperCollider. It’s described in my book chapter here.
  • Asynchronous disk i/o commands are sent to an i/o scheduler thread via a queue, each command specifies where the result of the i/o operation should be enqueued. This is somewhat similar to I/O Completion Ports.
  • A thread receives audio data from a network socket and enqueues a packet record on a queue, the audio thread dequeues packet records and decodes them.

Implementation mechanisms (in brief)

There are a number implementation details that I won’t go in to detail about here, including: message implementation strategies, queue implementation, and delivery mechanisms. In the future maybe I’ll write about these in more detail. For now I’ll give a brief sketch:

Message implementations

A message queue transports messages from one execution context to another. A message is a bundle of parameters along with some kind of selector that specifies how the message should be interpreted or executed. In principle there might only be one type of message, in which case only the parameters need to be enqueued.  However, a message queue often transports a heterogeneous set of messages, thus usually some kind of selector is needed. For example,  something like the following struct, with an enumeration used to indicate the message type (this is similar to a Win32 MSG structure):


enum MessageCommandCode { DO_ACTION_1, DO_ACTION_2, DO_ACTION_3};
struct Message{
    enum MessageCommandCode commandCode; // receiver interprets the command code
    int messageData1;
    bool messageData2;
}

Or a function pointer that is executed when the message is interpreted (this is the way SuperCollider does it for example, see my article about SuperCollider internals for more info):

struct Message{
    CommandFunctionPtr commandFunc; // receiver calls the commandFunction passing the message and context-specific state
    int messageData1;
    bool messageData2;
}

You could do something more elaborate with C++11 lambdas, functors etc. But keep in mind that in a lock-free scenario you want to keep things simple and avoid overheads or allocations when sending messages.

An important thing to notice is that an enumerated command code can be interpreted in multiple ways (for example, by different states in a state machine) where as a function pointer can only be executed in one way (the latter case is more like code blocks in Apple’s Grand Central Dispatch). There are pros and cons of each approach.

For efficiency and to avoid blocking I recommend that the message is moved in to the queue when sending, and moved out of the queue when receiving – data is transferred by value. An optimization is to initialize/construct the message in-place in the queue buffer, rather than copying it there. Similarly,  messages can be interpreted/executed directly from queue storage.  The queue may support variable-length messages.

If you want your message data to be more expressive you could use a union or object with different sub-types for each message type.

Message could be implemented as polymorphic Command objects (cf. Command Design Pattern) although allocating and passing polymorphic objects in queues might get complicated. Depending on the number of commands involved, introducing polymorphic messages using virtual functions might just be overkill.

Messages might be timestamped and stored in a priority queue in the target context for future execution. Or messages might represent prioritized work (such as disk i/o) to be performed by the target context in priority order.

Ultimately there are many choices here. There are trade-offs between simplicity of implementation, ease of extension and complexity and readability of both the client code and the message passing infrastructure code.

See also: Command Pattern, Command queue, Message queue, message/command implementation in SuperCollider.

Queue implementations

This post focuses on situations where there is only ever a single-reader and single-writer to any queue. Partly this is because lock-free single-reader single-writer queues are relatively simple and efficient, and partly because they’re often all you need. That said, it is sometimes desirable or necessary to introduce multiple reader and/or multiple writer mechanisms: for example when a server thread has multiple clients sending it requests, or a single queue is used to distribute work to multiple worker threads. These cases can be implemented by using a lock at one end of a single-reader single-writer queue, or by using node-based lock-free queues such as the one by Michael and Scott.

Delivery mechanisms

The patterns in this post do not require or imply synchronous rendezvous or blocking/waiting when sending or receiving messages. A context might choose to block waiting for messages to arrive, or just as likely, it could poll the queue  for new messages (without blocking) at certain synchronization points. Alternatively, some other mechanism might be used to ensure that the messages get processed. Some possible mechanisms are:

  • An audio callback polls for messages at the start of each callback.
  • A worker thread blocks waiting for an event/semaphore that is signaled when new messages are available.
  • A GUI thread processes messages when it is notified that new messages are available (via a window system event)
  • A GUI thread runs a periodic timer callback to update the GUI, the callback polls for new messages from another thread (without blocking)
  • A consumer issues a get message whenever it finishes using a datum and processes received response messages whenever its local datum pool is exhausted.

Periodic polling is often criticized, and for good reason. Please keep in mind that in the cases above that employ polling, there is already a periodic task that is running for other reasons. For example to generate audio or update the GUI.

Now, on to the patterns…

The Patterns

1. Simple state management patterns

1.1 Mutate target state

One simple use for a command queue is to send messages that changes (mutates) some context-local state in the receiver. In the example below, a message is sent to change the colour of node x to red:

Note that the message is asynchronous. There is no locking involved because the data structure x is local to the right hand context. The x.setColor(red) code is executed in the right hand context. For comparison, the equivalent code in a lock based system might look like this:

// alternative implementation using locks
Node& x;
Mutex m; // protects x
// ... in context A: 
    m.lock();
    x.setColor(red);
    m.unlock();

// ... in context B:
    m.lock();
    ... all operations on the structure and x must be protected by the mutex...
    m.unlock();

One way to think about it is that in a lock-based system, the data comes to the code (context A gets access to object x), whereas in the message passing case the function call and parameters are sent to the data structure x in the receiving context (the function executes on context B).  If context A and B are running on separate CPUs you can imagine different cache transfer patterns if x is accessed from context A, compared to passing a message (containing a selector, a reference to x, and the colour red) to context B where x is already in the CPU’s cache.

1.2 Get state vs Observer / state change notifications

In principle you could implement an asynchronous state request (”get”) message to asynchronously retrieve a state. This is a “pull” model:

More likely you’d use some form of asynchronous Observer pattern where changes to x in the right hand context trigger change notification messages to be sent to the left hand context. This is a “push” model:

In the classical Observer pattern the receiver explicitly attaches/subscribes and detaches/unsubscribes from receiving notifications. For state that is constantly in flux (such as audio levels for VU meters) you might always post periodic notifications rather than requiring explicit subscription.

1.3 Synchronised state / cached state

Often, contexts operate in lock step, in the sense that a client knows (or categorically assumes) the state of the remote context. In this situation there is no need to explicitly request server state.

If there is a master/slave relationship between contexts/states the master can keep a local copy of the slave state — when the master needs to know the state of a slave it can synchronously query the local copy rather than sending an asynchronous message to the slave. Such mechanisms are akin to using a thread-local proxy cache with write-through semantics.

2. Generalized object transfer patterns

We assume shared memory, so transferring objects from one context to another requires only that an object pointer be passed in the message.

The generalized object transfer patterns below (Link, Unlink and Swap) deal only with transferring the visibility/availability of objects into or out of a context. They don’t say anything about the semantics of the transfer. For example they say nothing about whether the source context still references, manages, or owns the object after it has been transferred. More on that in a minute. First the patterns:

2.1 Link

Move an object from source to target. Install in target data structure.

 

2.2 Unlink

Remove an object from a target data structure and return to source.

2.3 Swap/exchange

Replace an object in the target context with a new object. Return the old one.

A new object is sent and installed in the receiver and the old object is returned.  This mechanism can be used to implement atomic state update in O(1) time, even when the object is large or complex (such as a whole signal flow graph). It is related to using a double buffer or pointer-flip to quickly switch states.

Additional semantics for object transfer

It is often useful to overlay additional semantics on these object transfer operations. Potential semantics for object transfer include:

  • Transferring between contexts transfers ownership. Objects are only ever accessed from one context at a time. The context that sees the object owns it and has exclusive access to it. Such objects need not be thread-safe.
  • Transfer implies transfer of write access. Objects are only ever mutable in one context. Objects support single-writer multiple-reader semantics.
  • Transfer implies “publication” of an immutable object. Objects are only mutable prior to publication to other contexts. Const methods need to be thread-safe.
  • Transfer implies “publication” of a single-writer (possibly multiple-reader) object. Objects are only ever mutable in the source context. Objects support single-writer multiple-reader semantics.
  • Transfer makes a thread-safe object visible to another context. Thread-safe objects are “Monitors” or multiple-writer safe. Their state and operations are protected by a mutex or other mechanism. They can be used concurrently from multiple contexts.

Semantics that enforce immutability, single-context access or a single-writer semantics can be useful because they can be used to avoid locks and blocking.

Note that things can get more interesting if you want to support these semantics and also allow the receiver to be destroyed prior to destroying the queue.

3. Object transfer patterns: allocation and deallocation in source context

In non-garbage-collected languages such as C and C++ you need to manage object lifetimes by explicitly allocating and deleting objects. Often this is handled by having strong ownership semantics: a specific object or process is always responsible for allocating and/or destroying particular objects.

In some contexts (such as a real-time audio callback) it is not appropriate to invoke the memory allocator at all, therefore allocation must be performed in another context.

The following allocation/deallocation pattern deals with these scenarios by allocating the new object in the source context when performing a link operation, and deallocating the result in the source context after performing an unlink operation.

3.1 Allocate in source context, link in target

3.2 Unlink and return to in source context for destruction

This pattern is useful when:

  • Only the left hand side context can allocate memory (due to non-blocking requirements in the right hand context), AND/OR
  • Allocating the object is time consuming or unbounded in time and the right hand context has strict time constraints.

The above example shows literal allocation and destruction of the object, but other variations are possible including:

  • The object is allocated from and released to a pool that is local to the source context.
  • Instead of allocating/destroying, a reference count is incremented/decremented in the source context on behalf of the destination. This means that the destination context always owns a reference count to the object, but it ensures that the final destruction of the object is always performed in the source context where it is safe to call the memory allocator or access other destruction-time resources.

4. Alternative allocation patterns

4.1 Context local storage/allocator/pool

Create new object and install in target context using destination-context-local storage:

This pattern is useful when objects are always (or mostly) used locally in the receiving context, and allocation and deallocation can be implemented efficiently (e.g. using a simple freelist or O(1) allocator). There may be cache advantages to keeping objects local to a context. This pattern may also result in less complex memory management code in the receiving context if the objects are allocated and/or deallocated as the result of a number of different events, messages or state transitions.

4.2 Closed-loop object reuse

Pre-allocating all needed resources is a common strategy for real-time memory management.  Streaming systems usually have a known maximum number of buffers in-flight for a single stream. In these situations closed-loop circular buffer exchange can be used. A fixed number of objects is allocated and then cycled between contexts:

Conclusion

That’s it for the basic patterns.

As I said at the outset, this isn’t the whole picture. There are more involved patterns such as prioritized work lists, asynchronous state mirroring,  and sub-object reference counting. You can implement transactions by combining messages so that a set of state transitions is always executed atomically with respect to the receiver. You might want to limit the amount of  allocated memory by only having a certain number of  messages/objects in-flight at any time. Hopefully I’ll get around to writing about these mechanisms in future posts.

I hope these basic patterns and my explanations give you a new way to think about implementing asynchronous message passing communication in concurrent applications. If you know of any other patterns, references or links on this topic please share them in the comments.

Sparrow and their users: breaking up is hard to do

“only $10 so I call 1st world problem.”
@ignatiusmusic (giving some needed perspective)

There’s been a lot of traffic on Twitter about Sparrow over the last 24 hours. Sparrow is a $10 email client App for Mac OS X and iOS. The Sparrow company has been acquired by Google. As far as I can tell there will be few or no further Sparrow updates. Some Sparrow users are not happy.

I never used the App. I have no relationship with Sparrow. I’ve been watching the mobile App market with interest for a while. I look at the case of Sparrow and see some interesting things. I see muddled user expectations and confusion about user-developer relationships. Confusion that perhaps arises out of the nature of the “App” product model.

Matt Gemmell has posted a provocative rebutting of unhappy user’s tweets about the Sparrow acquisition, you should check it out. Matt’s post and the conversation that followed inspired me to write this. Much of this is a commentary on Matt’s post. Hopefully I’m adding something to the conversation.

I’ll get back to Matt’s post in a moment. But first let’s look at the announcement from sparrowmailapp.com:

We’re excited to announce that Sparrow has been acquired by Google!

We care a lot about how people communicate, and we did our best to provide you with the most intuitive and pleasurable mailing experience.

Now we’re joining the Gmail team to accomplish a bigger vision — one that we think we can better achieve with Google.

We’d like to extend a special thanks to all of our users who have supported us, advised us, given us priceless feedback and allowed us to build a better mail application. While we’ll be working on new things at Google, we will continue to make Sparrow available and provide support for our users.

We had an amazing ride and can’t thank you enough.

Full speed ahead!

Dom Leca
CEO
Sparrow

For me, the announcement reads as focused on the Sparrow team and their achievements. They’re passionate about creating an “intuitive and pleasurable mailing experience.” They have a bigger vision and they’re excited to have the opportunity to reach for it with the help of Google. As some have suggested, we should be happy for the Sparrow founders at this exciting point in their careers. Indeed from a business standpoint it is a great outcome. The flip-side is that there are now a bunch of Sparrow users with an App at the end of it’s life.

The announcement appears on the front page of the Sparrow website. I’m really not sure who the announcement is addressed to. A natural assumption is that it is addressed to Sparrow users. However, it is not until the fourth paragraph that users are mentioned. Users are thanked for their part in the process, and in the fifth paragraph, thanked for giving the founders “an amazing ride.”

I can understand the founder’s excitement. But is this really what the users need to hear? For users, product EOL is an unhappy day. Developers should be mindful of that.

The response has been predictable. Sparrow users are not happy. As I mentioned above, Matt Gemmell has rounded up a bunch of user responses.

Matt describes a mismatch of expectations between users and developers. He uses the word “entitlement” as if user’s responses are out of line. Perhaps they are. Clearly there is a disconnect between Sparrow and user’s expectations. Are the users at fault for having these expectations? I’m not sure. Where did these expectations come from? Are they well founded? I don’t know for sure. What is clear is that users are not getting what they expected. I don’t think this can be explained by writing them all off as a bunch of self-entitled prats.

Matt appears to take the view that the unhappy users were ill-informed and that they need to wake up, adjust their expectations, and accept the reality of the situation. Well, maybe. We all have expectations about relationships, business or otherwise. Sparrow is appliance-like technology used day-in-day-out. Perhaps the expectation of product continuity is reasonable.

Matt writes:

If you’re a developer, then you really need to get your head straight. You’re probably being a hypocrite. This is you:

1. YESTERDAY! “I hate how people feel entitled to future versions of my app after paying $10.”
2. TODAY! “I feel betrayed that Sparrow won’t be updated anymore after I paid $10.”

Pick one.

True. If a developer is thinking like this they are a hypocrite. Maybe some developers do have the attitude (1). I haven’t met anyone with this attitude. It if it’s true, that’s unfortunate, because guess what? delivering software is about relationships and service. Whether or not your business has “SaaS” as a business model, software is a service. People reasonably expect support, bug fixes, updates. Maybe it doesn’t come for free, perhaps users are not prepared to pay what it costs, none the less these things are all a part of how software works today. In this context, are expectations really misplaced entitlement?

Maybe I’m missing the point. Perhaps the mobile App world is different. Software-as-a-disposable-consumable, with only the big companies willing and able to deliver service, support, updates and product continuity. What do you think?

On to some of Matt’s other points:

You don’t get much for $10

For $10 you might get a great App, but you’re not buying service, or support, a relationship, or product continuity. You’re buying a one-time hit. The App-store model is based on selling a steady stream of low-priced licences. It’s a high-volume low-margin business.

If you start from this point, all of Matt’s rebuttals are hard to argue with. It’s true. You don’t get much for $10. Is this really the culture of commercial software supply and development that users want?

Wait, there is ambiguity here. Apps auto-update. The App store can withdraw the App. There is an ongoing relationship. It’s kind of like a disposable-consumable with a built-in auto-update service that can be obliterated at any time.

Don’t buy promises about the future

The Sparrow team promised certain features. Users claimed to buy the product on the expectation that they would be delivered. Matt criticises Sparrow for making promises and the users for making purchasing decisions based on them. Here’s what my AudioMulch What’s Coming page says:

About the Future
This page used to be called “roadmap”. Roadmaps can have many purposes: to create a guiding vision for future development, to give a vague impression of what we think might happen, to try to get you to buy something that doesn’t exist yet. We don’t want to publish those kind of roadmaps. We have tried not to include things here that might happen, even if we think they’re important or could be useful.

In disclosing future plans there is a line to be navigated between being open with users and over promising. Users who have invested their time and attention in a product (and it is this investment, more than the money that is important) deserve openness. On they other hand, they need to understand that conditions change. “All specifications are subject to change without notice.”

Honour and betrayal

Cole Peters suggested that it’s a matter of honour to ensure product continuity: developers should decline an acquisition if product continuity can’t be ensured. Other users feel betrayed. Matt is pretty uncompassionate saying, “You weren’t betrayed, though, so why should anyone care?” Maybe there is no contractual condition, but clearly there was an expectation of relationship and honour here, at least from one side. How did that happen? Is it unreasonable? Why? Do developers know they are “leading users on”?

Modern app development is often a collaboration between developer and user. Users put a lot of effort into the collaboration. Value does flow from user to product. Is thanks for “an amazing ride” really the way to end the relationship? If I were a customer I’d be confused.

There is honour in every customer relationship, and there is honour in serving customers and delivering product continuity. How far the honour stretches depends on many things. It’s not black and white. There are reasonable and unreasonable expectations to be sure. What are reasonable product continuity expectations for Sparrow? for any $10 App? for any email client App? What if it were a GUI framework, or a whole operating system? Does anyone really know? Seems like there are a lot of differing opinions out there.

Reading Sparrow’s announcement it’s pretty clear that the Sparrow founders are “Full speed ahead!” with the Google acquisition and are acting in their own best interests. The user’s best interests don’t seem to come in to it. In the end maybe their user’s best interests will be served. Maybe in a year’s time every Google user will have something better than Sparrow available to everyone for free. There’s always a bigger picture.

Open Source

“When software you use and rely on suddenly becomes discontinued,
you begin to understand what all those GPL guys are really talking about” — @tolmasky

It’s hard to know exactly what “GPL guys are really talking about” but perhaps @tolmasky means product availability, maintainability, and continuity. Releasing the source is one way to deliver some form of continuity after acquisition. However it takes more than source code to deliver an App. Would an indie FOSS project give Sparrow users what they expect? Matt doesn’t think so. I’m not sure either.

The kind of app that’s developed under a commercial start-up model is often quite different from what is produced in the Open Source world. So are the users. The key ingredients to many commercial Apps is not source code, it’s everything else: user experience design, graphic design, support, quality assurance, project management. There are surely counter examples, but I don’t see many indie open source outfits delivering the same kind of polish as commercial Apps. Guess what? Open Source users don’t care. The Open Source stuff still works. But that doesn’t make it a solution here.

Perhaps people want the combined benefits of commercial Apps and Open Source Apps. If you know how to develop innovative software that delivers all the benefits of both models, in an indie environment that is economically sustainable, I’d love to hear from you. Seriously.

Anger and grief

Matt quotes a bunch of angry comments. Anger is maybe not the best way to deal with losing an App, but it’s an understandable human response. I’m not going to defend the haters, but clearly they expected a different outcome.

Matt makes more good points about indie development and the success of this acquisition for Sparrow’s founders. I’ve said enough. Go check out Matt’s post.

Closing thoughts

For me, the Sparrow acquisition highlights issues that arise in a climate where for many, the holy grail of developing end-user software is not to serve the user, but to sell the company.

Lean development practices encourage delivering a minimum viable product and iterating development in collaboration with users. What is the relationship of user to product in this collaboration? Can users reasonably expect to be considered investors? Maybe not for their $10 App purchase, but perhaps for their participation in the development process. Typically the pay-off is seeing the features they want implemented. What happens when the App is EOLed and they can’t access those features?

Today the relationship between user and developer takes many forms. From free web Apps and services, commercial SaaS, through to shrink wrap desktop applications and bespoke development services, the terms of each relationship is different. The differing terms are not always front and foremost in the user’s mind. Divergent expectations arise, especially when barriers to entering the relationship are low.

Apps are low-cost consumable products to be sold, not ongoing customer relationships to be honoured. App stores mediate customer relations. Customer relationships belong first and foremost to the App store vendor, not the developer. Maybe this explains (1) above.

To me at least, software tools are not throw-away items, they are things you live and work with every day. They become part of your routine. Something like an email client is an intimate part of your world. It’s something you form a relationship with. You invest time becoming accustomed to it. Shouldn’t the custodians of the App take this into account? If not, why not?

How much transparency within the developer-user relationship is possible? How much should be expected? Personally I would prefer to buy software from a developer who is in business to serve the customer, not just to create intuitive and pleasurable experiences for the highest bidder. I don’t think I’m alone.

[Edited July 22: minor edits and cleanups]

Dave Sparks on Android audio latency at Google I/O 2011

This is old news but I just heard about it and thought it was worth posting for future reference.

Over on the Andraudio list Robert Munro kindly pointed out that there was a question about the sad state of audio latency on Android at the Google I/O 2011 “Fireside Chat with the Android Team.” The question is at 40:24 in the video:

I’ve transcribed Dave’s answer below:

In response to “There are questions about audio latency being a problem for Android, are you seeing this Dave?” Dave Sparks said:

“Latency is a big problem. We’re working at, hopefully we hope to be able to do something about it with ICS. As we investigated it it’s actually a pretty complex problem. There are a number of different places where latency gets introduced. Most of the latency is introduced below Android. Basically it’s happening in the drivers or in the chipsets or somewhere in there, and some of these are really obscene amounts like hundreds of milliseconds of latency in the audio path. So, that’s something we’re going to push on. We started/ I think we introduced something in CDD Gingerbread which was a “should” hit certain latencies.

There are some interesting problems we need to solve in the scheduler, so I’ll be talking to Rebecca shortly about this. Because the fair scheduler makes it really difficult to make sure that these low latency audio threads get scheduled when we need them to be scheduled every single time. That’s probably the biggest issue we’re running into right now. But it’s a problem we want to deal with and hopefully the next release will get it. Obviously it’s not going to solve the problems for legacy devices but it’s going to get better.”

Source:
Dave Sparks — Technical lead for the Android media framework.
Google I/O 2011, “Fireside Chat with the Android Team” May 10, 02:30PM – 03:30PM
On YouTube as “Google I/O 2011: Fireside Chat with the Android Team” at 40:24
www.youtube.com/watch?v=gfiYUL2exT8#t=2424s

Real-time audio programming 101: time waits for nothing

“The audio processing thread is stalling because the client’s implementation of some XAudio2 callback is doing things that can block the thread, such as accessing the disk, synchronizing with other threads, or calling other functions that may block. Such tasks should be performed by a lower-priority background thread that the callback can signal.”

– Microsoft XAudio2 documentation

“Your IOProc is making blocking calls to the HAL and it is calling NSLog which allocates memory and blocks in fun and unexpected ways. You absolutely cannot be making these calls from inside your IOProc. You also cannot be making calls to any ObjC or CF objects from inside your IOProc. Doing any of these will eventually cause glitching.”

Jeff Moore, Apple Computer, on the CoreAudio mailing list

“The code in the supplied function must be suitable for real-time execution. That means that it cannot call functions that might block for a long time. This includes malloc, free, printf, pthread_mutex_lock, sleep, wait, poll, select, pthread_join, pthread_cond_wait, etc, etc.”

JACK audio API documentation for jack_set_process_callback()

As you may have gathered from the quotes above, writing real-time audio software for general purpose operating systems requires adherence to principles that may not be obvious if you’re used to writing “normal” non real-time code. Some of these principles apply to all real-time programming, while others are specific to getting stable real-time audio behavior on systems that are not specifically designed or configured for real-time operation (i.e. most general purpose operating systems and kernels).

These principles are not platform-specific. The ideas in this post apply equally to real-time audio programming on Windows, Mac OS X, iOS, and Linux using any number of APIs including JACK, ASIO, ALSA, CoreAudio AUHAL, RemoteIO, WASAPI, or portable APIs such as SDL, PortAudio and RTAudio.

I’ll get on to the programming specifics in a moment, but first let’s review a couple of basic facts: (1) You do not want your software’s audio to glitch, and (2) real-time waits for nothing.

You do not want your software’s audio to glitch

You don’t want your software’s audio to glitch. Especially if your software is going to be used to perform to a stadium full of fans, or to watch a movie in a home theater, or to listen to a symphony in your car while you’re driving down the freeway, or anywhere really. It’s so fundamental I’ll say it again: you do not want your audio to glitch. Period.

Girl Talk

That’s Girl Talk performing using a laptop running my AudioMulch software up there (Photo by fromthephotopit on Flickr). I don’t want AudioMulch’s audio to glitch. Girl Talk doesn’t want his audio to glitch either.

Real-time waits for nothing

Digital audio works by playing a constant stream of audio samples (numbers) to the digital to analog converter (DAC) of your sound card or audio interface. The samples are played out at a constant rate known as the sampling rate. For a CD player the sampling rate is 44100Hz, that’s 44100 stereo sample frames every second. Every second at the same rate. Not faster, not slower. 44100 sample frames every second. If your sound card doesn’t have the next sample when the DAC needs it, your audio will glitch.

On general purpose operating systems such as iOS, Android, Windows, Mac OS X or Linux your software usually won’t be delivering individual samples to the DAC, it will be providing buffers of samples to the driver or an intermediate OS layer. For example it might process buffers of 256 samples at a rate of 179.26Hz (that’s 44100 / 256). The lower levels of the system then feed the individual samples from each buffer to the DAC at 44100Hz.

It isn’t always the case, but for the purpose of this post I’ll assume that the audio API periodically requests a buffer of samples by calling a callback function in your code. In the above example your callback would have to compute each and every buffer in less than 5.8 milliseconds. No matter how your code is invoked your software has to provide those samples within 5.8ms. Each and every buffer. No exceptions. Real-time does not wait for latecomers.

Sidebar: real-world buffer sizes and latency values

To put things into context, just to make sure we’re all on the same page here: To many users today 5ms is considered a large buffer size. ~1ms buffers (say 64 samples at 44100Hz) would be considered pretty good but not necessarily ideal. Applications where low latency is especially important are (1) interactive audio systems (such as musical instruments or DJ tools) where the UI needs to be responsive to the performer, and (2) real-time audio effects, where the system needs to process analog input (say from a guitar) and output the processed signal without noticeable delay. What is considered “noticeable delay” depends on a number of factors, but an end-to-end system response of less than 8ms including all system latencies (not just audio latency) is not an unreasonable ballpark number for interactive systems. For live audio effects processing many users would prefer latency to be much lower than this.

For the purposes of this post I consider buffer sizes somewhere in the 1-5ms range to be normal and achievable today on most computers running Windows with WDM, WASAPI or ASIO drivers, with Mac OS X’s native CoreAudio system and with ALSA or JACK on Linux. Some audio hardware now supports buffer sizes down to 16 samples (about 1/3 of a millisecond at 44.1kHz sampling rate) or even lower. I assume that you want to write low-latency audio software for one or more of these platforms. Even if you’re targeting a platform that only supports high-latency, like Android’s ~50ms or Windows’ legacy WMME API’s ~30ms the same principles apply. You don’t want to glitch.

Sources of glitches

We’ve established that you don’t want to glitch, and your buffer period is around 5ms or less. Your code has to deliver each and every buffer of audio in a time shorter than one buffer period.

All sources of audio glitches within your code boil down to doing something that takes longer than the buffer period. This could be due to “inefficient code” that’s too slow to run in real-time. That’s not the main thing we’re interested in here though. I assume that your code is efficient enough to run in real-time, or that you can profile and optimise it so it’s fast enough. If not, the internet is full of resources to help you write faster code.

The main problems I’m concerned with here are with code that runs with unpredictable or un-bounded execution time. That is, you’re unable to predict in advance how long a function or algorithm will take to complete. Perhaps this is because the algorithm you chose isn’t appropriate, or perhaps it’s because you don’t understand the temporal behavior of the code you’re calling. Whatever the cause, the result is the same: sooner or later your code will take longer than the buffer period and your audio will glitch.

Therefore, we can state the cardinal rule of real-time audio programming simply as follows:

If you don’t know how long it will take, don’t do it.

There are a number of things your program could do that fall into the general category of “unbounded time operations”. Many are mentioned in the quotes at the start of this post. I explore them in more detail below. As you might guess, some are more obvious than others.

“Blocking”

You may hear someone say “don’t do anything that blocks the audio callback thread.” Used like this it’s a general term. Doing anything that makes your audio code wait for something else in the system would be blocking. This could be acquiring a mutex, waiting for a resource such as a semaphore, waiting for some other thread or process to do something, waiting for data to be read from disk, waiting for a network socket. It’s pretty clear that some of this waiting could take an indeterminate, or even indefinite amount of time, and certainly longer than a few milliseconds. I discuss some of these specific types of blocking in more detail below.

Keep in mind that not only do you want to avoid directly writing code that blocks, it is critical that you avoid calling 3rd-party or operating system code that could block internally.

Poor worst-case complexity algorithms

You’ve written every line of code in your audio callback yourself to avoid blocking. No calls to any system or 3rd-party code that could block. Even so, you might still have a problem: software efficiency is often analysed in terms of average-case (amortized) complexity. For example, in many applications, an algorithm that runs super-fast 99.9% of the time but every now and then takes 1000 times as long might still be considered “the fastest algorithm available.” This is often referred to as “optimising for throughput instead of latency.” But remember: real-time waits for nothing, certainly not for that 1000-times-normal processing spike that happens every now and then. If you do something like that in your audio callback, you may get a glitch. For this reason, you should always consider the worst-case execution time of your code.

A simple example would be zeroing a delay line to reset it (say using a for-loop to zero every element, or perhaps by calling memset). You might think that using memset to clear a buffer of memory is fast, but if the delay line is large the memset call is going to take quite a long time — and if the time taken is longer than you have available you’ll glitch. It’s usually better to use an algorithm that spreads the load across many samples/callbacks, even if it ends up burning a few more cycles overall. Of course if these spikes in processor usage are small, and you know that they’ll be bounded (e.g. you know the maximum size of the delay line you’ll need to zero), you might be OK. But if your code is full of these little occasional spikes, you’d better hope you can predict how they all interact and sum together. If they all occur at the same time you could still get a glitch, and you don’t want that.

Another thing to keep in mind here is that many operating system and library functions are implemented using average-case optimised algorithms. In C++ many STL container methods fall in to this category (more on that below). General purpose memory allocation algorithms and garbage collectors also have unpredictable time behaviour, even if they don’t use locks.

Locking

It’s difficult avoid concurrency when you have a GUI or text interface, network or disk i/o and a real-time hardware-driven audio callback. Assuming that your GUI is somehow controlling the audio process you’ll need to communicate between the GUI thread and the audio callback. Perhaps you’ll want to display graphics that reflect the state of audio processing (level meters for example). Similar requirements may arise if you need to control audio via a network socket or MIDI.

The first thing you might think of is “I need a lock or mutex” to protect concurrent access to data shared between the GUI and the audio thread. This is a common response. I remember having it too. Actually when I first went looking for a lock, there wasn’t one available. On Mac OS 7, with the SndPlayDoubleBuffer API there was no OS mechanism to implement a lock since the callback could occur at interrupt time. Of course modern operating systems do provide locks (mutexes) to protect against concurrent access to shared state. You should not use them within an audio callback though. Here are three reasons why:

#1 reason to not use locks: priority inversion

Let’s say your GUI thread is holding a shared lock when the audio callback runs. In order for your audio callback to return the buffer on time it first needs to wait for your GUI thread to release the lock. Your GUI thread will be running with a much lower priority than the audio thread, so it could be interrupted by pretty much any other process on the system, and the callback will have to first wait for this other process, and then the GUI thread to finish and release the lock before the audio callback can finish computing the buffer — even though the audio thread may have the highest priority on the system. This is called priority inversion.

Real-time operating systems implement special mechanisms to avoid priority inversion. For example by temporarily elevating the priority of the lock holder to the priority of the highest thread waiting for the lock. On Linux this is available by using a patched kernel with the the RT preempt patch. But if you want your code to be portable to all general purpose operating systems, then you can’t rely on real-time OS features like priority inheritance protocols. (Update: On Linux, user-space priority inheritance mutexes (PTHREAD_PRIO_INHERIT) have been available since kernel version 2.6.18 together with Glibc 2.5. Released September 19, 2006. Used in Debian 4.0 etch, 8 April 2007. Thanks to Helge for pointing this out in the comments.)

Keep in mind that any code in your audio callback that waits for something to happen in another thread could be subject to priority inversion. Rolling your own “spin lock” that busy-waits polling for something to complete in another thread, in addition to being inefficient, will have the same priority inversion problem if the thread you’re waiting for has lower priority and gets preempted by another thread in the system.

#2 reason to not use locks: risk of accidentally calling code with unbounded execution time

I know you’re not going to use a lock, because I just explained that priority inversion will mess with you even if your code simply locks a mutex to set one flag and then releases the lock:

mutex.lock();
flag = true; // subject to priority inversion
mutex.unlock();

But if you’re still contemplating calling code that acquires a lock, consider this: the audio callback will have to wait for all of the code that is protected by the lock (the “critical section”) to complete before the audio callback can proceed. Effectively, in addition to the thread context switching costs, you’re executing all that code sequentially with your audio callback. Do you know how long that will take? in all cases? Remember we’re talking about worst-case time here, not average-case. Any code path inside a critical section that’s shared with the real-time audio thread would have to follow all of the rules we’re outlining here. That’s asking for a lot of discipline from you and your fellow developers. It would be easy for bugs to creep in. In C++ you wouldn’t want to do this for example:

mutex.lock();
my_data_vector.push_back( 10 ); // could allocate memory and copy mucho data
mutex.unlock();

If my_data_vector is a std::vector, calling push_back() when the vector’s internal storage is full will cause memory to be allocated and all existing elements to be copied into the new memory. That’s obviously going to cause a processing time spike. Most non-real-time code behaves like this some of the time. It’s easy for simple-looking code to have non-deterministic time behaviour. You don’t want it creeping into your critical sections.

#3 reason to not use locks: justified scheduler paranoia

“you’re definitely opening things up to a world of hurt when the system gets stressed”

Jeff Moore to James McCartney on the CoreAudio list in 2001

Priority inversion and unbounded execution time inside critical sections aren’t the only reasons to avoid locks and other concurrency primitives. In the case of proprietary operating systems, few people know exactly how the schedulers are implemented. No matter what the OS, scheduler implementations may change with each OS release. These general purpose operating system schedulers are not required or guaranteed to exhibit real-time behavior. They’re not certified for use in avionics systems or medical equipment. There are no governments or judiciaries to hold their real-timeness to account.

My general position on this is that you should avoid any kind of interaction with the OS thread scheduler. Avoid calling any synchronisation functions in your audio callback. Schedulers employ complex and diverse algorithms and you don’t want to give them extra reasons to de-schedule your real-time audio thread.

Of course you can’t escape depending on the scheduler to invoke your audio callback on time, and with high priority. You might also consider a few other hard-core scheduler interactions, such as signalling condition variables or semaphores to wake other threads. Some low-level audio libraries such as JACK or CoreAudio use these techniques internally, but you need to be sure you know what you’re doing, that you understand your thread priorities and the exact scheduler behavior on each target operating system (and OS kernel version). Don’t extrapolate or make assumptions. For example, last I checked the pthreads condition variable implementation on Windows employed a mutex internally to give correct POSIX semantics — that’s definitely not something you want to get involved with (you could perhaps use the native Windows SetEvent API though).

Side note: trylocks

One related option that may be open to you is to use a non-blocking “try-lock” function that simply returns a failure code instead of blocking if the lock can’t be acquired (pthread_mutex_trylock() on POSIX, TryEnterCriticalSection() on Windows). The downside is that since acquisition of the lock is not guaranteed, you can’t use it to protect anything that you must access at every callback. You’re gambling that you’ll be able to acquire the lock at least some of the time — although depending on the behavior of the other lockers, in the worst case, your code may never manage to acquire the lock.

Memory allocation

I’ve touched on this point above, but it’s worth re-iterating: you should not allocate memory in your audio callback. For example you shouldn’t call malloc() or free() or C++’s new or delete, or any operating-system specific memory allocator functions, or any routine that might call these functions. The three obvious reasons are:

  • The memory allocator may use a lock to protect some data it shares between threads. Aside from priority inversion, trying to lock a mutex that’s potentially contended by every other thread that allocates memory is clearly not a good idea.
  • The memory allocator may have to ask the OS for more memory. The OS may also have it’s own locks, or worse, it may decide to page some memory to/from disk and make you wait while it happens.
  • The memory allocator may use algorithms that take unpredictable amounts of time to decide how to allocate a block — and you know you don’t want that.

Some obvious solutions are: 

  • Pre-allocate all your data.
  • Only perform dynamic allocation in a non-real-time thread where it isn’t time-critical.
  • Pre-allocate a big chunk of memory and implement your own deterministic dynamic allocator that’s only invoked from the audio callback (and hence doesn’t need locks).

For example, SuperCollider has an implementation of Doug Lea’s memory allocator algorithm that is only used in the audio callback. The current version of AudioMulch uses per-thread memory pools for dynamic allocation. For tasks with unbounded execution time such as plugin loading, AudioMulch performs them in the UI thread and then sends the results to the audio callback when they’re ready — sometimes it’s OK to make the user wait, so long as the audio callback doesn’t have to.

Invisible things: garbage collection, page faults

Even if you avoid all of the above in your own code, there are still a few environmental issues. If you’re using a garbage collected language such as Java or C# and the GC kicks in while your audio thread is running you will probably be in trouble unless you can disable GC or use an environment with deterministic real-time GC. This isn’t my area of expertise so I won’t say more about this other than that there are Java implementations with real-time GC. Otherwise you may have to settle for higher latency and additional intermediate audio buffering.

Even in a non-garbage-collected language such as C, the OS virtual memory system could page out memory that you reference in your audio callback. This would cause the thread to stall waiting for memory to be paged in from disk. I’ve never had problems with this myself — usually if the memory pages are kept hot by accessing them regularly the OS will keep them in physical RAM. But if your program is pushing the limits of available RAM, uses data that’s only referenced infrequently, or expect other memory-intensive tasks to be going on in parallel to your audio processing, you may want to investigate locking your real-time data into RAM (using operating system specific mechanisms such as mlock()/munlock() on OS X and Linux, VirtualLock()/VirtualUnlock() on Windows).

Waiting for hardware or other “external” events

You’re probably not writing code that directly waits for hardware. But disk i/o often has to wait for the hard disk head to seek to the right position, and that can take a while (averages ~8ms for consumer disks). This means performing file i/o from an audio callback is a no-go, even ignoring the fact that file i/o may lock a mutex while accessing file system data structures, allocate memory, or otherwise use poor worst-case performing algorithms. Similar rules apply to other tasks like syncing with the vertical interrupt on the graphics card or performing network i/o. As I said earlier: If you don’t know how long it will take, don’t do it.

Combinations of all of the above: code that doesn’t make real-time guarantees

If you use a closed-source operating system you probably don’t know what every API call does under the hood, but it’s a safe bet that most general purpose computing code is not written with all of the above real-time considerations in mind. Even if you do have access to your operating system source code, unless the system guarantees real-time behavior, it’s hard to say for certain that code that doesn’t use a lock today won’t use one in the future, or that an implementation won’t change to one with better average-case performance but worse worst-case performance. Memory paging issues are almost unavoidable since you can’t lock every OS-managed page in RAM without disabling virtual memory entirely.

For these reasons I suggest that you assume that all system API functions could allocate memory, use locks or employ algorithms with poor worst-case time behavior. Some may perform disk i/o either directly, or indirectly by triggering page faults when there is a pressure on the memory subsystem. Many functions will allocate memory — even just temporarily as scratch-space.

Paranoia is justifiable, since you really don’t want to glitch. I generally avoid OS API functions with very few exceptions and only when absolutely necessary. The only exceptions I can think of off the top of my head are QueryPerformanceCounter() on Windows, and mach_absolute_time() on OS X. Unfortunately I’m not 100% confident of the real-time behaviour of either.

Similar paranoia should be applied to 3rd-party libraries such as those for soundfile i/o. Unless code claims to have non-blocking real-time behavior, don’t assume that it does.

On OS X, Apple advises: don’t call the BSD layer from your IOProc. Objective C code (since the objective-C dispatcher may use locks) and CoreFrameworks are also out. Microsoft don’t typically document which, if any, of their APIs have real-time semantics.

Lands of confusion

The ideas in this post are not secret knowledge although they are in-part folklore. They come up periodically on all the audio development mailing lists I’ve ever subscribed to and I know they come up elsewhere. The thing is, avoiding locks goes against best-practice advice in “normal” concurrent programming. As a result, sometimes things get confused. Here are two examples of bad advice I’ve found online that illustrate the confusion:

*** DO NOT TAKE THE FOLLOWING QUOTED ADVICE ***

“you should use a mutex or other synchronization mechanism to control access to any variables shared between the application and the callback handler”

Open SL ES audio API documentation for Android

Well, it is true that you should use a synchronisation mechanism, just not a mutex. My thoughts on the above quote are on the public record here.

“Both read and write locks the buffer so it the pointers of the buffer will be maintained consistent”

JACK Wiki description of using ringbuffers

(The JACK guys do know better than this, I’ve reported the error, it’s probably fixed by now).

If these snippets were referring to normal multi-threaded code then they would be absolutely correct. Except in extremely rare circumstances, you should not access shared data without protecting it with a mutex lock. However, for the reasons I’ve explained above, in real-time audio code, on general-purpose operating systems, using a mutex is not advisable. It is poor practice and widely frowned upon. The solutions that I advocate involve certain lock-free and atomic techniques that I mention below and that I hope to describe in more detail in future posts. [Update: but see the comments for other views.]

In summary

Boiling down the above discussion into a few rules of thumb for code that executes in a real-time audio callback:

  • Don’t allocate or deallocate memory
  • Don’t lock a mutex
  • Don’t read or write to the filesystem or otherwise perform i/o. (In case there’s any doubt, this includes things like calling printf or NSLog, or GUI APIs.)
  • Don’t call OS functions that may block waiting for something
  • Don’t execute any code that has unpredictable or poor worst-case timing behavior
  • Don’t call any code that does or may do any of the above
  • Don’t call any code that you don’t trust to follow these rules
  • On Apple operating systems follow Apple’s guidelines

There are a few things you should do where possible:

  • Do use algorithms with good worst-case time complexity (ideally O(1) wost-case)
  • Do amortize computation across many audio samples to smooth out CPU usage rather than using “bursty” algorithms that occasionally have long processing times
  • Do pre-allocate or pre-compute data in a non-time-critical thread
  • Do employ non-shared, audio-callback-only data structures so you don’t need to think about sharing, concurrency and locks

Just remember: time waits for nothing and you don’t want to glitch.

Coda

But wait you say, how am I supposed to communicate between a GUI and the audio callback if I can’t do these things? How can I do file or network i/o? There are a few options, which I will explain in more detail in future posts, but for now I’ll just say that the best practice is to use lock-free FIFO queues to communicate commands and data between real-time and non-real-time contexts (see my article about SuperCollider server internals for some ideas, PortAudio has an implementation of a lock-free queue you could use as a basis for your own queuing infrastructure). Other options include other types of non-blocking lock-free data structures, atomic data access (requires great care), or trylocks as I mentioned above. For another resource, these techniques are touched upon in the notes for a workshop that Roger Dannenberg and I ran in 2005 on real-time design patterns for computer music.

I’m not exactly sure where I picked up all these ideas. At a minimum I need to thank Phil Burk, Roger Dannenberg, Paul Davis and James McCartney for sharing their insights on various mailing lists over the years. The quotes above reveal that Jeff Moore has also been banging on about these issues on the CoreAudio mailing list for at least a decade.

I started writing this post a few weeks ago, but just yesterday I read Tim Blechmann’s Masters thesis about his Supernova multi-core audio engine. It rehearses many of the ideas discussed here, and from there I learnt that the Linux RT Preempt patch implements priority inheritance as I mentioned above. Tim’s thesis is definitely worth a read for another angle on this material, and a lot of ideas on multi-core audio too.

Finally, if you made it to here, please, if I got something wrong, you think I’ve missed something, or you know of other activities to avoid in an audio callback please post in the comments. Thanks for reading.

Lightly updated for clarity 13 August, 2011.

Reflections on Bret Victor’s “Explorable Explanations”

I’ve been reading and thinking about maths quite a bit lately. In the process I happened upon Bret Victor’s killmath posts on interactive dynamic systems, explorable explanations and usable math. Some of the killmath examples really got me thinking. I’ve actually started coding a few experiments around this, but for now I just want to summarise some reflections and comments I have about Bret’s research. There is a lot more to Bret’s site than I touch on here, so I really recommend you go check it out if you havn’t already.

Explorable (Visual) Explanations

To give a feel for the material: one of Bret’s posts in Explorable Explanations gives an interactive example of a digital state variable filter (it’s not actually Chamberlin’s topology as I had previously tweeted). The example presents different graphical and symbolic representations of the filter and allows you to manipulate the filter’s parameters and hear the result. One of Bret’s videos demonstrates an interactive interface for exploring the Predator/Prey dynamical system with multiple cross-linked interactive visualisations. I discuss both of these examples below.

If you’ve read Edward Tufte’s book Visual Explanations — images and quantities, evidence and narrative you’ll have an idea of some of the concepts and considerations involved. Pictures, diagrams, sounds, and dynamic time-based demonstrations can all tell stories in ways that numbers and formulae can’t — potentially allowing us to gain new insights, solve problems and draw conclusions that are not accessible (to the uninitiated, if at all) by simply looking at mathematical formulae or unstructured data. Explorable Explanations extend visual explanation with interactivity allowing the reader/user to engage, explore, play, and listen, in order to gain a deeper understanding of a system’s dynamics and behaviour under varying conditions.

Of course, animations, simulations and interactive examples in the form of Java applets and Flash interactives have been circulating for years. To give a random example: sorting algorithm visualisation applets from the field of algorithm visualisation. Often though these applets are demonstrative rather than truly interactive and explorable. Closer to the mark is a recent site from the U.K. Department of Energy and Climate Change: My2050, a Flash interactive that allows users to experiment with a parametric model of energy generation and consumption, lifestyle choices, and modes of transport to design their future world. By moving sliders on the screen, users explore the environmental effects of their choices while being challenged to achieve a workable set of parameters. It’s a pretty neat web site that illuminates some of the environmental trade-offs we face.Click here to try it

To me there is something implied by Bret Victor’s work that isn’t present in DECC’s My2050 or in simple illustrative Java applets. DECC’s My2050 site supports developing intuition about the relationships between energy consumption, production and carbon footprint, however the underlying mathematical model and data set are not visible. In contrast, it seems to me that Bret’s examples are not just about increasing user intuition about system outcomes. They also support increased user awareness and expression of explicit models of system structure and dynamics. For example, by simultaneously showing the various representations of the state variable filter, or by showing multiple overlaid trajectories in the Predator/Prey model. I think this is a really interesting direction, but as I discuss later, the mechanisms for illustrating structure and inter-relationship deserve further development.

Scrubbable numbers

This brings me to Bret’s recent post on the Scrubbing Calculator. One facet of the Scrubbing Calculator that caught my attention was scrubbable numbers — numbers that you can click and drag on to change their values (not unlike a spin box) to support interaction with typed mathematical expressions. Changing one number changes the others according to their context in the surrounding text. Try clicking and dragging up and down on the numbers in the sentence below and you’ll see what I mean:

1 plus 2 equals 3

The point here is that this functionality would be available as you type, not just in a pre-written document like this one. The system would also deal with solving multiple simultaneous equations. Text is no longer just text, it’s “active” text. Looking at it another way, “widgets” are integrated with the text, making creation of an interactive user interface as easy as typing a sentence. Since behaviour is implied by context, the Scrubbable Calculator seamlessly provides a layer of interactive mathematical functionality without the user needing to explicitly invoke a separate “math tool” (calculator).

This reminded me of an idea I had a few years ago for a live coding user interface. (Live coding is a performance practice where audio-visual art is coded on the fly in performance.) The as-yet unimplemented idea was for a live coding text editor where typing something like:

oscillator.setFrequency( Slider() );

Would be immediately expanded in-line, as soon as you type it, to:

oscillator.setFrequency( Slider() );

When the code is executed, the slider would be automatically linked to the live audio algorithm thus allowing interactive manipulation without needing to explicitly construct or code a GUI — while at the same time retaining the direct visual linkage between the slider-manipulated parameter and the code it’s related to (an important factor for audience comprehension in live coding). Personally I think this would be an improvement over the current situation in many live coding environments where you need to re-execute lines of text to change variable values — something I sometimes see when watching Andrew Sorensen’s performances with impromptu for example. Of course, similar dynamic linkage in a live coding editor could be done with scrubbable numbers or other types of widgets instead of sliders.

Bret’s scrubbable example doesn’t end there though. He also illustrates dynamic creation of constraints (relations) between values by dragging linking lines, a bit like visually authoring spreadsheet equations or routing patch-cords embedded in free-form text. And this, I think, points to the overall direction that Bret is working on: tools for making mathematical systems less abstract, more dynamic, visual, situated, understandable and usable.

More expression of relationships between representations please

One thing that struck me about Bret’s state variable filter explorable example is that it presents a disjoint set of visual representations and formulae. The user is left to deduce (or perhaps assumed to already understand?) the relationships between the various representations. A formula for the transfer function is given, along with a pole-zero plot and the frequency response, but there is no built-in way to explore the mathematical relationships between these pictures. The relationship between the frequency response and the pole-zero plot is not illustrated. It isn’t at all clear that the pole locations on the z-plane plot are are the roots of the transfer function polynomial. Not to mention that casual readers may not even realise that the variable z is a complex number. To me this seems like an “interactive representation jumble” rather than a coherent, explorable presentation of what is in-fact a crystaline structure of interrelated mathematical meanings.

To be fair, Bret calls the SVF diagrams an “Explorable Example” — it serves to provide an interactive illustration within a larger textual narrative rather than as a self contained “Explorable Explanation.” That said, completely self-explanatory explorabile illustration is desirable. The interested reader should be able to examine everything in-situ, potentially down to the workings of the number system. If the interface employs multiple visual representations their relationships should be clear. Visual linkage could be provided with animation or additional elements showing the cross-relationships between points on the various graphs (as in the Predator/Prey example). Mathematical relationships could be expanded (click to zoom) or contracted to show the big picture. Clicking on a complex polynomial could visualise each component and expand to provide further detail. Clicking on any node in the data flow graph could show the transfer function for that node. Et cetera.

My impression is that Bret is working on these ideas. The link-creation system in the Scrubbable Calculator explores explicit creation of relationships between values, and the Predator/Prey video includes clear indicators on each graph to show how data points relate to those on each of the other graphs. There are other clues in the texts.

I’ve found Bret’s Explorable Explanations fascinating and inspiring. Do go check them out.

SuperCollider internals book chapter

In case you haven’t heard, the SuperCollider Book is now available. Congratulations to everyone involved in bringing this to press.

The publisher has been kind enough to allow my chapter on SuperCollider’s server internals to be posted online as a preview chapter here. I think this is great, because understanding SuperCollider’s server internals will be of interest not just to SuperCollider developers and users, but to anyone involved in creating dynamic real-time audio software.

I first became interested in the internal workings of SuperCollider during the early days of AudioMulch development. At that time, SuperCollider was a closed-source project and information about how it was implemented was fairly scarce. Luckily, SuperCollider’s creator James McCartney was active on the music-dsp mailing list and was open to answering questions and explaining how things worked under the hood. Later, after SC became an open source project I performed initial work on porting scsynth to Windows, and in the process had a good look at what was going on in there. One thing that really interested me was how SuperCollider supported re-patching the audio signal processing graph in real-time without glitches or interruptions to the audio stream. The answer turned out to be simple and elegant.

Class diagram of significant scsynth domain entities -- one of the class diagrams from the book chapter.

Real-time dynamic audio graph manipulation

In SuperCollider, the dynamic audio processing graph (the graph of Nodes) is a linked data structure that is traversed at each audio buffer period to compute audio data. Modifications to the dynamic Node graph are achieved by posting asynchronous commands to the real-time audio processing thread via a lock-free queue. These commands are interpreted at the start of each buffer period. They cause the Node graph to be modified, changing the signal flow and/or adding and deleting processing Nodes as necessary. When required, dynamic memory allocation is performed in the audio thread using a special-purpose thread-specific real-time memory allocator. A general mechanism, also based on lock-free command queues, is provided to execute non-real-time operations (such as file i/o) asynchronously in a separate thread.

Precompiled Graphs and Unit calculation functions

In a slightly confusing case of overloaded terminology, scsynth defines a Node subclass called Graph, which evaluates a precompiled schedule of Units (unit generators). That is: scsynth has a dynamic graph of Nodes, some of which are Graphs whose purpose is to evaluate precompiled graphs of Units. Got it? The diagram above from the book chapter should help.

Allocating and evaluating Graph objects is relatively efficient because Unit instances are stored sequentially in a single memory block, and a Graph’s evaluation loop involves calling each Unit’s mCalc calculation function pointer in sequence. One interesting by-product of this design is that a Unit can change its calculation function pointer at runtime — this can be used to implement a Unit with different states. For example, an envelope unit generator whose final state returns a constant could set its mCalc member to a function that simply returns the constant. Thus less conditional logic need be executed for each Unit at each time step.

These mechanisms are two of the many aspects of the scsynth implementation described in the full book chapter available here. I cover the implementation and usage of these mechanisms in detail, along with the concurrency structure of scsynth including threading and inter-thread communication. The chapter is illustrated with a number of class and sequence diagrams. I encourage you to check it out, and of course, to get the book, which is full of insights and examples of how people have built whole artistic worlds using SuperCollider.

Building Necessitas Qt framework using NDK-r5b and Cygwin — not fun, but mostly possible

I’ve been following the Necessitas project (formerly “Android Lighthouse”) for a while now. It’s a port of the Qt GUI framework to Android. Last year I managed to get it to build on Windows by following the instructions at android-lighthouse google code ticket 11 and this very helpful post by Damien Buhl (in French, Chrome will translate it for you). Those links are still super helpful, but some things have changed since then — Necessitas has undergone significant structural change, and Google have released a new Android NDK r5b.

This post documents the steps I took to get the Necessitas Qt framework to compile with Cygwin and the official Android NDK r5b (the latest version as of this writing). So far I’ve only got the libraries to compile. I havn’t built Qt Creator (the GUI IDE that Necessitas now offers) and nothing is running on my device yet — that’s a matter for more research. I thought I’d post the info now because I seem to have got it to build, and well, there’s more than enough info here for a blog post.

*Disclaimer:* If you follow these instructions you will have built Qt libraries for Android with Cygwin, nothing more. I don’t tell you how to use them or how to load them onto the device, nor do I tell you how to get Necessitas Qt Creator running on Windows.

Cygwin vs. mingw

Windows gcc development can be done with the Cygwin POSIX emulation layer, or with mingw — a light weight minimalist gcc distribution. So far Android NDK development on Windows requires the Cygwin environment. However Google is in the process of dropping support/dependency on Cygwin from the Windows Android NDK. This is good and bad news. Good because NDK builds with Cygwin are slow, and installing Cygwin just to build NDK code is a lot of overhead. Bad news is that the native Qt build and configuration system doesn’t support cross-compiling Qt on Windows without Cygwin, and using the new NDK in combination with Cygwin-based Qt builds is non-trivial.

I tried to build Qt Necessitas with both mingw and Cygwin. I made progress with both but in the end I succeeded with Cygwin first, so that’s what this post is about. In the long term I think cross-compiling Qt using mingw on Windows is most likely a better solution but it will require substantial work on Qt’s configuration system before it is possible. I just learnt that mingw.android is working on it.

Preliminary steps

The official Necessitas instructions for compiling the Qt framework describe how to build Qt frameworks for Android on Linux. There are a few extra steps to make this process work on Windows under Cygwin.

You need to have Cygwin installed, with the development tools  (binutils, gcc-core, gcc-g++, make, patchutils) and Cygwin git. It’s all easy to install using cygwin’s package installer. You may need a few other Cygwin packages. You’ll soon know about it if you try to run a command that isn’t installed.

Also have the Windows Android NDK r5b downloaded and unzipped.

Clone the Qt source code from the android-lighthouse repository using git (it’s about 400MB and might take a while):

git clone gitorious.org/~taipan/qt/android-lighthouse.git

A few points here:

  • Make sure you’re typing everything here in a Cygwin shell and using Cygwin git. You want the symlinks in the repo to come out correctly as Cygwin symlinks.
  • Early on I had a problem using msys git where things wouldn’t build because git had converted everything to Windows-style CRLF line endings. So you might want to make sure your git config says core.autocrlf=false
  • Once git is finished I recommend making a local copy of the android-lighthouse directory so you have a clean version you can duplicate if you want to start again. I found this easier than cleaning the tree if the configuration breaks badly.

Another thing to note: I have the NDK and android-lighthouse directories on the same drive. I’m not sure, but this might help in cases where commands ignore drive letters. Best to keep them on the same drive to be safe.

NDK compiler include paths

Since the NDK r5b gcc compiler no longer depends on Cygwin, it doesn’t understand Cygwin’s POSIX style paths that map Windows paths like C: /blah to /cygdrive/c/blah. When you configure and build Qt using Cygwin, qmake and all the Qt Makefiles will use Cygwin POSIX style paths. This works out OK for relative paths: the NDK gcc can deal with forward-slash path separators, but it can’t follow the Cygwin style /cygdrive/blah absolute paths used for some include and library directory arguments. The net result is that gcc will report it can’t find standard header files like <string.h> or <new>. If you invoke gcc with the -v option you get the real story: it can’t interpret the Cygwin paths and is ignoring them.

I tried a few of different fixes for this issue:

Attempt #1: Pass include paths to qmake as Windows paths, executable tool paths as Cygwin paths. This doesn’t work. Not only does qmake pass include paths to gcc, it also uses them internally to compute dependency information that gets embedded in the generated Makefiles. Qmake and make need to be able to understand all paths you use. Therefore you need to use Cygwin paths at this level.

Attempt #2: Defining QMAKE_RUN_CC and friends in mkspecs/android-g++/qmake.conf allows you to override how qmake invokes gcc (grep other qmake.conf files for examples). I was able to wrap $INCPATH with some sed foo that converts Cygwin paths to Windows paths which gcc would understand. This almost worked. Unfortunately qmake hardcodes (in qmake/generators/unixmake2.cpp) the $INCPATH text for generating the Makefile rule for precompiled headers, so it’s not possible to re-write the paths in all invocations of gcc using qmake.conf. I could have just disabled precompiled headers (-no-pch flag to ./configure) but that’s not a real solution.

Solution: having given up on a purely qmake.conf approach I decided to just wrap gcc and g++ with shell scripts that re-write include path (-I) and library path (-L) flags that use Cygwin paths. These scripts go in android-ndk-r5b\toolchains\arm-linux-androideabi-4.4.3\prebuilt\wi ndows\bin along side the NDK gcc and g++ binaries:

arm-linux-androideabi-cygwin-gcc.sh:

#!/bin/sh
b=`basename $0`
d=`dirname $0`
p=`echo $@ | sed -e "s,-I/cygdrive/\(.\)/,-I\1:/,g" | sed -e "s,-L/cygdrive/\(.\)/,-L\1:/,g"`
$d/arm-linux-androideabi-gcc.exe $p

arm-linux-androideabi-cygwin-g++.sh:

#!/bin/sh
b=`basename $0`
d=`dirname $0`
p=`echo $@ | sed -e "s,-I/cygdrive/\(.\)/,-I\1:/,g" | sed -e "s,-L/cygdrive/\(.\)/,-L\1:/,g"`
$d/arm-linux-androideabi-g++.exe $p

Then the definitions of QMAKE_CC and QMAKE_CXX in android-lighthouse/mkspecs/android-g++/qmake.conf need to be modified to invoke the wrapper scripts instead of the NDK gcc and g++ executables:

QMAKE_CC  = $$NDK_TOOLCHAIN_PATH/bin/$$NDK_TOOLCHAIN_PREFIX-cygwin-gcc.sh
...
QMAKE_CXX = $$NDK_TOOLCHAIN_PATH/bin/$$NDK_TOOLCHAIN_PREFIX-cygwin-g++.sh

Symlinks

Symbolic links are not well supported on Windows (although Vista and Windows 7 do add some support). Cygwin provides emulated symbolic links, but of course the new r5 NDK gcc can’t understand them. This causes two separate problems, solved thusly:

1. The current android-lighthouse tree uses symlinks in a number of places to alias .c and .h files to different locations in the tree. I’ve filed a ticket about this not working on Windows, but until it’s fixed you can run the following command (from a Cygwin prompt) to replace the symlinks with normal C files that #include the linked file:

find -type l -regex '.*/.*\.\(c\|cpp\|h\)$'
    -printf "rm %p; echo '#include \"%l\"' > %p\n" | sh

(all of the above on one line)

2. The Qt build process creates symlinks to redirect general shared library .so files to the version numbered ones (eg. libQtCore.so is a symlink to libQtCore.so.4.8.0). Since the Qt build process crosslinks Qt libraries with the general (usually symlink) versions this breaks if gcc can’t understand the symlinks. Qmake would usually use the ln -s command to create a symbolic link. The easy (but space hungry) fix is to tell Qt to just copy the libraries instead of linking them by adding the following to android-lighthouse/mkspecs/android-g++/qmake.conf:

QMAKE_LN_SHLIB = cp

Incidentally, that’s the behaviour of ln -s when you invoke it in msys/mingw.

Miscellaneous patches

There are a couple of patches that have been floating around for a while that are required to make the Necessitas framework build with Cygwin. I think I found them on the google code ticket 11 page.  Both are needed so that the Qt bootstrapping process can compile tools such as qmake, moc and rcc using the Cygwin toolchain. The first patch adds library prefixes and extensions to the Cygwin g++ qmake.conf, the second resolves a Unicode issue in src/corelib/global/qglobal.cpp by replacing OSVERSIONINFOW with _OSVERSIONINFOA.

diff --git a/mkspecs/cygwin-g++/qmake.conf b/mkspecs/cygwin-g++/qmake.conf
index ddfceb0..7b54298 100644
--- a/mkspecs/cygwin-g++/qmake.conf
+++ b/mkspecs/cygwin-g++/qmake.conf
@@ -25,6 +25,9 @@ QMAKE_CFLAGS_DEBUG    = -g
 QMAKE_CFLAGS_SHLIB     =
 QMAKE_CFLAGS_YACC      = -Wno-unused -Wno-parentheses
 QMAKE_CFLAGS_THREAD    = -D_REENTRANT
+QMAKE_PREFIX_SHLIB      = lib
+QMAKE_PREFIX_STATICLIB  = lib
+QMAKE_EXTENSION_STATICLIB = a

 QMAKE_CXX              = g++
 QMAKE_CXXFLAGS         = $$QMAKE_CFLAGS

diff --git a/src/corelib/global/qglobal.cpp b/src/corelib/global/qglobal.cpp
index 6105682..fad4c40 100644
--- a/src/corelib/global/qglobal.cpp
+++ b/src/corelib/global/qglobal.cpp
@@ -1742,7 +1742,8 @@ QSysInfo::WinVersion QSysInfo::windowsVersion()
     if (winver)
         return winver;
     winver = QSysInfo::WV_NT;
-    OSVERSIONINFOW osver;
+    _OSVERSIONINFOA osver;
     osver.dwOSVersionInfoSize = sizeof(osver);
     GetVersionEx(&osver);
 #ifdef Q_OS_WINCE

androidconfigbuild.sh bug fix

As of this writing there’s a bug in androidconfigbuild.sh’s use of the getopt built-in that means that the script ignores host and platform parameters passed on the command line (we need to be able to specify the host as windows). It should be fixed by the time you read this, but just to be sure, make sure the getopt line looks like this:

while getopts "Hr:h:p:v:a:q:c:i:d:" arg; do

The version of androidconfigbuild.sh I checked out of git passed a getopts parameter like this: “help:r:h:p:v:a:q:c:i:d:” That’s a problem because getopts doesn’t support multi-character flags which “help” is, and in any case the -help flag doesn’t accept a parameter so it shouldn’t have a trailing “:”.

Disable webkit build

During the build I encountered a problem with Windows’ 32k limit on command-line length. This meant that when the Qt build got around to linking the webkit library it failed with an “argument list too long” error. The command line arguments were about 60k. I don’t have a fix for this other than to disable building webkit. Do this by editing androidconfigbuild.sh, change the -webkit parameter to configure to -no-webkit.

Now we’re ready

If you’ve followed along you’re ready to actually launch the Qt build. From a Cygwin prompt type something like the following. You’ll have to substitute the Cygwin style path of your Android NDK. Mine’s in the root of G: drive:

./androidconfigbuild.sh -r /cygdrive/g/android-ndk-r5b -h windows
    -p arm-linux-androideabi -v 4.4.3 -q 1

(all of the above on one line)

This should proceed through the entire build process. It will print error messages at times, usually these are just configuration tests for whether things are present and functioning. Unless the process completely halts and returns to the command prompt you can trust that it’s doing its thing.

The configure and make process will complete but the final two steps in androidconfigbuild.sh will fail (qpatch and make install). I’m not sure whether these steps are important on Windows unless you want to relocate the libraries once you’ve built them. I’ll update the post if I find an answer, for now I think it should work without them.

Cygwin Win32 error 487 contingency

This may not happen to you, but after my build was about half way through it failed with an error like “could not load C:\WINDOWS\system32\winmm.dll, Win32 error 487.” A bit of digging suggests that this is a known issue with the current Cygwin build. I fixed it by copying the latest cygwin1.dll snapshot over the version in my Cygwin install path at C:\cygwin\bin\cygwin1.dll. After that I restarted the build and the error went away (you can pass -q 0 instead of -q 1 to prevent androidconfigbuild.sh from running configure from the start). If you’re brave you may want to update your cygwin1.dll before you start but I can’t guarantee that won’t introduce other problems.

Wrap up

Right now that’s what you need to do to build Qt/Necessitas with Cygwin. You won’t have Qt Creator and at this stage I can’t tell you how to install Qt projects on an Android device without it. You used to be able to just push everything with adb but things seem to have changed a bit since I last followed Damien Buhl’s instructions for manually pushing libs to your device using adb. Once I work that out I’ll post an update here. Please subscribe to the blog or follow my tweets if you want to hear about the next installment.

Thanks to BogDan for his great work on Necessitas, and to all the helpful developers on android-qt and android-ndk lists. Thanks also to Damien Buhl and the posters on googlecode. If I hadn’t succeeded in getting a previous version of android-lighthouse building with their instructions I probably wouldn’t have known where to start this time around. Thanks to David Turner on android-ndk for answering my questions about current Windows NDK status and to mingw.anrdroid who should be bringing us some Cygwin-free Necessitas goodness soon.

Internet audio streaming apps for music performance — some options

A couple of weeks ago bP/555 asked me to recommend a solution for streaming high quality audio from a music performance in Barcelona to the  DIALECTIC night at Horse Bazaar here in Melbourne. The venues at either end were connected to the Internet using domestic-grade ADSL2 modems. This certainly wasn’t a super-high-bandwidth Internet 2 academic network scenario. bP wanted high quality audio (“320 kbps mp3″ was the way he put it), video was optional. It wasn’t a collaborative performance, the audio only had to travel in one direction.

I’ve been working with streaming real-time audio over LANs a bit lately, but streaming from Barcelona to Melbourne is quite a different proposition — not something I have any personal experience with. I do have an interest in network music peformance and I’ve been following development of the CELT low latency codec for a while, which is designed for low-latency distributed music performance. So I was interested to find some answers. I figured the best thing to do was to ask around and see what solutions people are using. I asked on ACMA-L and the celt-dev list and pinged a few friends privately too. This post shares the responses I received and a few of my thoughts on different approaches. If you know of something else I havn’t mentioned here please post it in the comments.

A few preliminaries

Before I get on to the alternatives people suggested I want to note a couple of points I made to bP:

  • The computers at either end need to be connected to the network using cables — you don’t want WiFi packet loss layered on top of the already high demands of real-time streaming media.
  • Since the computers will likely be behind NAT routers at either end, you either need a plug-and-play solution that can deal with NAT traversal  (possibly using intermediate servers) or have access to configure the routers to forward traffic on to your computers.

Another thing that became apparent, and that was reinforced to me by a number of people, was that you need to be sure your internet link can sustain the bandwidth needed for the performance. It’s all good and well to ask for 320kbps audio streaming, but if your Internet link can’t reliably sustain the bandwidth then it won’t happen. In general no public link between Barcelona and Melbourne with ADSL endpoints guarantees bandwidth but you can at least test things a work out what the upper practical limit is — keep in mind that it will depend on network routes and network load (i.e. day of week and time of day).

On to the options that were suggested…

Hi-fi chat apps

Early on I’d suggested Skype as a simple and easy plug-and-play option. I like the idea of using a voice chat-app because it’s super-easy to set up and the software should be able to navigate the hazards of domestic situations like NAT firewall traversal without fuss. However one of the key things bP wanted was high-fidelity music grade audio. This put a lot of voice-quality internet-chat applications out of the picture. I learnt there are solutions that use the music-quality CELT codec I mentioned above (which incidentally promises better quality than mp3 at the same bit rate). To this end, Dennis Heerema on the celt-dev mailing list suggested the following two free solutions:

  • Fideliphone - Simple to configure, point to point, almost but not-quite out of beta.
  • TeamTalk - Client-server config, uses speex codec or CELT (mono and stereo).

Shoutcast, Icecast and friends

A number of people suggested shoutcast or icecast streaming. These are the usual go-to choices for internet radio broadcasting. You can buy streams from various commercial suppliers around the web.

People had good things to say about Nicecast ($40, Mac OSX only) as a simple solution for non-technical people. Matt Hitchcock said: “It is a GUI wrapper with an icecast underbelly and runs intuitively and without any fuss.” There’s a list of other icecast clients at the icecast site.

Jordan Reyne uses shoutcast for her Second Life performances “their sound quality is excellent”. She uses a free broadcast client called butt (“ broadcast using this tool”), which is available for Windows, Mac and Linux. Jordan has posted a 6 part blog on using shoutcast streams inside VRs such as Second Life or Heritge Trust etc.

Distributed network music performance

Distributed network music performance is the idea of playing music live with other people over the internet. Audio quality is important here, but so is low-delay. After all, playing in time is difficult with a 50ms delay, let alone a 10 second delay! One important thing to note here is that audio codecs such as mp3 add a lot of delay just encoding and decoding the audio. So it becomes important to use uncompressed audio or low latency codecs such as CELT or its successor Opus (both with <5ms codec delay).

For low-delay distributed network music performance Alex Carôt’s Soundjack seems to be the go-to solution. As well as distributed musical performances Alex says that it’s “already used by numerous radio stations and for other high-quality audio/video streaming purposes.” The big advantage of systems like this is that they are designed to provide low-latency two-way communication for collaborative performances. Delays are often orders-of-magnitude lower than systems designed for broadcast streaming. Unfortunately you still have the speed of light to contend with (~86ms from Melbourne to Barcelona in a fibre optic cable, not to mention routing delays).

For high-bandwidth situations a few people suggested Juan-Pablo Caceres’ JackTrip. JackTrip can handle uncompressed and multichannel audio and is commonly used on local area networks. For streaming over the Internet from Barcelona to Melbourne using ADSL modems this option just wasn’t going to cut it.

Mixlr

Mixlr’s mission “is to make audio broadcasting as easy as possible, both for casual broadcasters, musicians and DJs.” They provide client and server applications for both Mac and PC. It’s not a free service but you can sign up for a 15 day free trial. This solution really appealed to me for bP’s situation — it’s focused on high quality audio for musical performance and it’s packaged in an easy to use, off-the-shelf app.

VLC (Video Lan Client)

A couple of people suggested VLC. Scott Gresham-Lancaster wrote:

Another option that I have used in the past particularly if you are just going one way with most of the content is to use Video Lan Client. If you have enough bandwidth you can do point to point streaming of HD with that, especially if you have hardware compression. We did a direct link between the stage at Stanford and University in Bejing this way with great success. Engadget have a nice intro here:

www.engadget.com/2005/11/29/how-to-stream-almost-anything-using-vlc

Other Contenders

I’m not really familiar with live video broadcasting services such as Livestream and USTREAM but the venue in Barcelona had used Livestream before and it appears to offer good quality audio, although perhaps not the “320kbps mp3″ quality we were looking for. These options support video and broadcast, which although not critical in this case could be useful for many applications.

Quicktime broadcaster was also suggested, this is “Apple’s standards-based live encoding software.”

Wrap up

Which solution did bP end up using? I’m not telling :-) As often happens there was little time to try out all of the options and a choice was made based on expediency. My feeling is that any of the solutions listed above could have worked well and are certainly worth exploring for future events.

In the future I hope setting up a hi-fi intercontinental audio link is as easy as placing a phone call, and with many of the solutions above we’re almost there. The hardest part for the uninitiated seems to be in working out which solution to choose. Do you have experience performing music over the internet? If so, please share your experience in the comments.

Many thanks to Alex Carôt, Nick Collins, Ken Fields, Scott Gresham-Lancaster, Dennis Heerema, Matt Hitchcock, Julian Knowles, Jordan Reyne, Rob Watson and the members of ACMA-L and celt-dev for helping bP and I out and answering our questions. Thanks to bP for asking the question, I’ve learnt a lot.