Wednesday, December 29, 2010

Zipping with applicative functors in F#

In my last post I briefly described functors from a F# perspective. Now it's the turn of applicative functors. Since there were few to none concrete examples in my previous post, this time I'll start with an example. Hopefully everyone remembers Seq.zip. Remember also Seq.zip3 ? Just to recap, they combine sequences into sequences of pairs and triples respectively. The signatures are:

Seq.zip : seq<'a> -> seq<'b> -> seq<'a * 'b>
Seq.zip3 : seq<'a> -> seq<'b> -> seq<'c> -> seq<'a * 'b * 'c>

Now let's say we want zip5:

Seq.zip5 : seq<'a> -> seq<'b> -> seq<'c> -> seq<'d> -> seq<'e> -> seq<'a * 'b * 'c * 'd * 'e>

We could implement it with a nested zip3 and a flattening map:

let zip5 a b c d e = 
    Seq.zip3 a b (Seq.zip3 c d e) |> Seq.map (fun (n,m,(o,p,q)) -> n,m,o,p,q)

Or we could write it using an applicative functor, like this:

let zip5 a b c d e =
    puree (fun n m o p q -> n,m,o,p,q) <*> a <*> b <*> c <*> d <*> e

I think you'll agree that the latter looks much cleaner, so let's see how we get there.

Applicative functors are defined by two operations: pure and <*> (actually but we can't write that in ASCII!). Again we can't express these generically in .NET's type system due to the lack of type constructor abstraction, but this is how their signatures would look like:

pure : 'a -> 't<'a> 
(<*>) : 't<'a -> 'b> -> 't<'a> -> 't<'b>

I prefer the bracket-less ML way of expressing types, whenever possible:

pure : 'a -> 'a 't 
(<*>) : ('a -> 'b) 't -> 'a 't -> 'b 't

Intuitively, pure lifts a value (usually a function) inside the domain of the applicative functor, and <*> applies the function inside the applicative to an applicative value, yielding another applicative value.

At this point, it might not be immediately clear that <*> is an application, but compare its signature with the regular function application operator (<|) and you'll see the similarity:

(<|) : ('a -> 'b) -> 'a -> 'b

Since pure is a reserved keyword in F# (it doesn't do anything yet, though) we'll use puree in our code instead. MashedPotatoes

The applicative functor that enables the zip5 implementation above is known as ZipList:

module ZipList = 
    let puree v = Seq.initInfinite (fun _ -> v)
    let (<*>) f a = Seq.zip f a |> Seq.map (fun (k,v) -> k v)

The signatures for ZipList:

puree : 'a -> 'a seq 
(<*>) : ('a -> 'b) seq -> 'a seq -> 'b seq

In plain English, <*> in a ZipList applies a sequence of functions to a sequence of values, pairwise; while puree creates an infinite sequence of values (usually functions) ready to be applied to another sequence of values using <*>.

Functors and applicatives

Since we're always using this ZipList in the form "puree function <*> value" we might as well wrap that in an operator:

let (<!>) f a = puree f <*> a

Sow now we can write zip5 a bit shorter:

let zip5 a b c d e =  
    (fun n m o p q -> n,m,o,p,q) <!> a <*> b <*> c <*> d <*> e

Let's take a moment to see the signature of <!>

(<!>) : ('a -> 'b) -> 'a seq -> 'b seq

This signature should look familiar to you. Yes, it's Seq.map. Applicative functors are also functors, which is quite obvious from its name but we hadn't seen it until now. So we can write a map (a.k.a. lift) function for every applicative functor exactly as the <!> operator above.

Observing the law

Applicative functors, just as regular functors, need to satisfy some laws. Just as the last time, we'll use FsCheck, and again we'll have to cheat a little to get comparable types (we can't compare infinite sequences).

open System.Linq
// limited comparison for infinite sequences
let toListFrom r a = Enumerable.Take(a, List.length r) |> Seq.toList

type ApplicativeLaws<'a,'b,'c when 'a:equality and 'b:equality>() =
    static member identity (v: 'a list) = 
        let x = puree id <*> v
        toListFrom v x = v
    static member composition (u: ('a -> 'b) list) (v: ('c -> 'a) list) (w: 'c list) = 
        let toList = toListFrom u
        let a = puree (<<) <*> u <*> v <*> w
        let b = u <*> (v <*> w)
        toList a = toList b
    static member homomorphism (f: 'a -> 'b) x = 
        let toList a = Enumerable.Take(a, 10000) |> Seq.toList
        let a = puree f <*> puree x
        let b = puree (f x)
        toList a = toList b
    static member interchange (u: ('a -> 'b) list) (y: 'a) = 
        let toList = toListFrom u
        let a = u <*> puree y
        let b = puree ((|>) y) <*> u
        toList a = toList b

FsCheck.Check.QuickAll<ApplicativeLaws<_,_,_>>()

In the next post, we'll see how applicative functors relate to monads, with a parsing example.

Friday, December 24, 2010

Notes on Haskell functors and F#

I've been learning a bit of Haskell lately, and I wanted to share some of what I have learned so far, from a F# perspective. I hope you find these notes useful, and if you find a mistake please let me know. I'll start with a brief explanation of functors and how they translate (or not) to F#.

Functors are basically things that can be mapped over. In F# we have lots of them:

Set.map : ('a -> 'b) -> 'a Set -> 'b Set 
Seq.map : ('a -> 'b) -> 'a seq -> 'b seq 
List.map : ('a -> 'b) -> 'a list -> 'b list 
Array.map : ('a -> 'b) -> 'a [] -> 'b []

See the pattern? If we could generalize this, the signature would look something like this:

map : ('a -> 'b) -> 'a 'T -> 'b 'T

or:

map : ('a -> 'b) -> 'T<'a> -> 'T<'b>

This function is called fmap in Haskell, and it defines the Functor typeclass, one of the most basic typeclasses in Haskell. (I won't talk about typeclasses here, the post would go way out of scope). Unfortunately, .NET's type system isn't flexible enough to express this. Joe Duffy has a great article explaining it for us .NET developers. Actually it seems that OCaml's functors can be encoded in .NET so it would be possible in principle, but quite awkwardly so.

At this point you might be thinking that the concept of functors only applies to collections, but it's actually more general than that. For example, Options are also functors:

Option.map : ('a -> 'b) -> 'a option -> 'b option

Even functions are functors. The map function in this case is the composition operator (<<)

(<<) : ('a -> 'b) -> ('c -> 'a) -> ('c -> 'b)

To make it easier to recognize this as a functor, you might interpret it as:

(<<) : ('a -> 'b) -> "function that takes 'c and returns"<'a> -> "function that takes 'c and returns"<'b>

Haskell's type system is powerful enough to be able to abstract that.

Also, all monads are functors. You can define map in terms of Bind and Return. For example, for async:

let asyncMap f m = 
    async { 
        let! x = m 
        return f x 
    }

asyncMap : ('a -> 'b) -> Async<'a> -> Async<'b>

In the context of monads, map is usually called lift. In fact, in Haskell they're pretty much equivalent, save for class constraints. The name "lift" comes from the notion of lifting a function to operate in the domain of the monad.

The same code as above, desugared:

let asyncLift f m = async.Bind(m, fun x -> async.Return(f x))

For a FParsec monad:

let parserLift f m = parser.Bind(m, fun x -> parser.Return(f x))

Now, F# can't generalize over monads (for the same reasons I mentioned above), but can we still write a generic lift for all monads? Sort of, thanks to inlining and member constraint invocation expressions:

let inline lift builder f m = 
    let inline ret x = (^a: (member Return: 'b -> 'c) (builder,f x)) 
    (^a: (member Bind: 'd * ('e -> 'c) -> 'c) (builder, m, ret))

You can find more generic monad stuff like this in the M<'a> Lib project. Using this generic lift function we can create monad-specific lifts like this:

let asyncLift x = lift async x
let parserLift x = lift parser x

Actually, FParsec already includes lift, although slightly disguised:

(|>>) : Parser<'a,'s> -> ('a -> 'b) -> Parser<'b,'s>

This is lift, only with flipped parameters.

One thing to note about functors is that not everything that satisfies the signature of fmap automatically becomes a functor. There are some laws that functors must obey to have the behavior one would expect. These laws actually come from the definition of a functor in category theory. I'm not going to explain these laws in detail here, you can find good explanations in this article on basic category theory from a Haskell perspective or this one on functors from a Scala perspective.

Haskell developers can test these laws using QuickCheck, so in F# we could try to test them with FsCheck:

open FsCheck
open FsCheck.Prop

let runAsyncRet f a = Async.RunSynchronously (f (async.Return a)) 
let liftId x = runAsyncRet id x = id x 
let liftDist x (f,g) = runAsyncRet (asyncLift (f << g)) x = runAsyncRet (asyncLift f << asyncLift g) x 
Check.Quick liftId 
Check.Quick liftDist

I haven't showed any real-world examples of functors in F#. The lack of ad-hoc polymorphism takes away much of the power of the abstraction, but still many times it's simpler to use lift instead of computation expression sugar for trivial expressions. Moreover it encourages writing simpler, shorter, composable functions, which fits the functional style better. There's a good article on the Haskell wiki against "do notation" (i.e. syntactic sugar for monadic code).

In the next article I'll talk about applicative functors, in my opinion a more interesting abstraction, and I'll show some more concrete real-world examples of how they can be used in F#.

Monday, December 13, 2010

Customizing SolrNet

One of the most voted enhancement requests for SolrNet (an Apache Solr client for .NET) right now is to add support for POSTing when querying.

Let me explain: queries are serialized by SolrNet and sent to Solr via HTTP. Normally, queries are issued with a GET request and the query itself goes in the query string part of the URL. A simple query URL might look like this: http://localhost:8983/solr/select?q=id:123 .

The problem arises when the query is too long to fit in the query string. Even though the HTTP protocol does not place any a priori limit on the length of a URI, most (all?) servers do, for performance and security reasons.

Here's a little program that reproduces this issue:

internal class Program {
    private const string serverURL = "http://localhost:8983/solr";

    private static void Main(string[] args) {
        Startup.Init<Dictionary<string, object>>(serverURL);
        var solr = Startup.Container.GetInstance<ISolrOperations<Dictionary<string, object>>>();
        solr.Query(Query.Field("id").In(Enumerable.Range(0, 1000).Select(x => x.ToString()).ToArray()));
    }
}

This creates the query "id:0 OR id:1 OR ... OR id:999", it's about 10KB after encoding, more than enough for our tests. Running this against Solr on Jetty 6 makes Jetty throw:

2010-12-13 17:52:33.362::WARN:  handle failed 
java.io.IOException: FULL 
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:274) 
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) 
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) 
        at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) 
        at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

Not very graceful... it should probably respond with 414 Request-URI Too Long instead of throwing like this, but clients shouldn't send such long URIs anyway.

Steven Livingston has a good blog post describing a patch modifying some classes in SolrNet to deal with this issue. However, even though I never foresaw this problem when writing SolrNet, solving it does not really require any changes to the existing codebase.

In this particular case, what we need to do concretely is override the Get() method of the ISolrConnection service and make it issue POST requests instead of GET. We can write a decorator to achieve this:

public class PostSolrConnection : ISolrConnection {
    private readonly ISolrConnection conn;
    private readonly string serverUrl;

    public PostSolrConnection(ISolrConnection conn, string serverUrl) {
        this.conn = conn;
        this.serverUrl = serverUrl;
    }

    public string Post(string relativeUrl, string s) {
        return conn.Post(relativeUrl, s);
    }

    public string Get(string relativeUrl, IEnumerable<KeyValuePair<string, string>> parameters) {
        var u = new UriBuilder(serverUrl);
        u.Path += relativeUrl;
        var request = (HttpWebRequest) WebRequest.Create(u.Uri);
        request.Method = "POST";
        request.ContentType = "application/x-www-form-urlencoded";
        var qs = string.Join("&", parameters
            .Select(kv => string.Format("{0}={1}", HttpUtility.UrlEncode(kv.Key), HttpUtility.UrlEncode(kv.Value)))
            .ToArray());
        request.ContentLength = Encoding.UTF8.GetByteCount(qs);
        request.ProtocolVersion = HttpVersion.Version11;
        request.KeepAlive = true;
        try {
            using (var postParams = request.GetRequestStream())
            using (var sw = new StreamWriter(postParams))
                sw.Write(qs);
            using (var response = request.GetResponse())
            using (var responseStream = response.GetResponseStream())
            using (var sr = new StreamReader(responseStream, Encoding.UTF8, true))
                return sr.ReadToEnd();
        } catch (WebException e) {
            throw new SolrConnectionException(e);
        }
    }
}

Now we have to apply this decorator:

private static void Main(string[] args) {
    Startup.Init<Dictionary<string, object>>(new PostSolrConnection(new SolrConnection(serverURL), serverURL));
    var solr = Startup.Container.GetInstance<ISolrOperations<Dictionary<string, object>>>();
    solr.Query(Query.Field("id").In(Enumerable.Range(0, 1000).Select(x => x.ToString()).ToArray()));
}

That's it! If you're using Windsor, applying the decorator looks like this:

private static void Main(string[] args) {
    var container = new WindsorContainer();
    container.Register(Component.For<ISolrConnection>()
        .ImplementedBy<PostSolrConnection>()
        .Parameters(Parameter.ForKey("serverUrl").Eq(serverURL)));
    container.AddFacility("solr", new SolrNetFacility(serverURL));
    var solr = container.Resolve<ISolrOperations<Dictionary<string, object>>>();
    solr.Query(Query.Field("id").In(Enumerable.Range(0, 1000).Select(x => x.ToString()).ToArray()));
}

This is the real benefit of writing decoupled code. Not testability, but flexibility. Testability is nice of course, but not the primary purpose.
When your code is decoupled, you can even implement entire features mostly by rearranging the object graph. This is pretty much how I implemented multicore support in SolrNet.

The PostSolrConnection implementation above works with SolrNet 0.3.0 and probably also 0.2.3. PostSolrConnection is not the default because: a) it needs to be tested thoroughly, and b) Solr doesn't emit cache headers when POSTing so it precludes caching.

Monday, December 6, 2010

SolrNet 0.3.0 released

SolrNet is a Solr client for .NET. I just released SolrNet 0.3.0.

Longest. Beta. Ever. I know. But I really wanted to document everything and change many things in the release package, and in order to do that I had to get rid of MSBuild first.

There aren't many changes in the library itself:

  • Upgraded Ninject module to Ninject 2.0 RTM
  • Upgraded StructureMap registry to StructureMap 2.6.1
  • Upgraded Windsor facility to Windsor 2.5.2
  • Added support for multi-core for StructureMap
  • Improved response parsing performance
  • Fixed a couple of minor bugs

If you're upgrading from 0.2.3 please read about the breaking changes.

Also, this will be the last release to support .NET 2.0.

As for the package, it now contains:

  • Merged and unmerged assemblies (only the merged assembly with all IoC integrations was included before).
  • PDBs for all assemblies.
  • All assemblies are now signed with a strong name.
  • Doxygen-compiled documentation (replacing Sandcastle which was too bloated).
  • Note explaining exactly what assemblies you need depending on how you integrate the library. Hopefully this will help clear up the confusion.
I'd like to thank the following people who contributed to this release:

Binaries, docs and sample app downloads here. You can also get it via NuGet packages.