Wednesday, April 29, 2009

SolrNet under continuous integration

SolrNet just got hosted at the CodeBetter TeamCity servers for open source projects! 326 tests passed, 20 ignored (most of the latter are integration tests that need a running Solr instance). Now I just need to organize the targets and make the successful builds downloadable.

A big thank you to the people at CodeBetter, JetBrains, IdeaVine and Devlicio.us for this wonderful initiative.

Sunday, April 12, 2009

Email validation with FParsec

Email address validation is one of those topics that keep coming up again and again. It seems that we developers never get it quite right, but with good reason: the spec is downright insane. There are six RFCs involved, which obsolete some other RFCs, and in turn have some erratas. I hope the guys at the IETF had some very good reasons to make this so damn complex!

Anyway, you can validate all the RFCs you want and you can still get an invalid address. joe@example.com is syntactically correct yet there isn't any Joe that works at example.com :-)

What's interesting here is the different approaches taken to cope with such a monster:

The last one really caught my interest so I ported it (mostly as an exercise) to F# + FParsec. Then I grabbed Dominic's testcase and ran it with FsUnit. The result? It passes 84% of Dominic's tests (with no false negatives). Here's the code (updated to F# 2.0 / FParsec trunk 5/17/2010):

module EmailValidation.EmailValidator

open System
open FParsec
open FParsec.Primitives
open FParsec.CharParsers

let isValidEmail email =
    let wsp = anyOf " \t" >>% ()
    let crlf = pchar '\n' >>% ()
    let nullChar = pchar (char 0) >>% ()
    let ranges = Seq.map (Seq.map char) >> Seq.concat >> Seq.toArray >> (fun x -> String x) >> (fun x -> anyOf x >>% ())
    let vchar = ranges [{0x21..0x7e}]
    let obsNoWsCtl = ranges [{1..8};{11..12};{14..31};{127..127}]
    let atomText = digit <|> letter <|> anyOf "!#$%&'*+-/=?^_`{|}~"
    let atom = many1 atomText >>% ()
    let fws = (many1 wsp >>. optional (crlf >>. many1 wsp)) <|> (many1 (crlf >>. many1 wsp) >>% ())
    let commentText = ranges [{33..39};{42..91};{93..126}] <|> obsNoWsCtl
    let quotedPair = pchar '\\' >>. (vchar <|> wsp <|> crlf <|> obsNoWsCtl <|> nullChar)
    let rec commentContent x = (commentText <|> quotedPair <|> comment) x
    and comment = between (pchar '(') (pchar ')') (many (commentContent <|> fws)) >>% ()
    let cfws = many (comment <|> fws)
    let quotedText = ranges [{33..33};{35..91};{93..126}] <|> obsNoWsCtl
    let quotedContent = quotedText <|> quotedPair
    let quotedString = between (pchar '"') (pchar '"') (many (optional fws >>. quotedContent) >>. optional fws)
    let dottedAtoms = sepBy1 (optional cfws >>. (atom <|> quotedString) >>. optional cfws) (pchar '.') >>% ()
    let localPart = dottedAtoms
    let domainText = ranges [{33..90};{94..126}] <|> obsNoWsCtl
    let domainLiteral =  between (optional cfws >>. pchar '[') (pchar ']' >>. optional cfws) (many (optional fws >>. domainText) >>. optional fws)
    let domain = dottedAtoms <|> domainLiteral 
    let addrSpec = localPart >>. pchar '@' >>. domain >>. eof
    match run addrSpec email with
    | Failure (msg, _, _) -> false
    | Success _ -> true

Just for reference, here's the actual test output:

192 passed.
36 failed.
0 erred.
----
Failed: ID "21": "123456789012345678901234567890123456789012345678901234567890@1
2345678901234567890123456789012345678901234567890123456789.123456789012345678901
23456789012345678901234567890123456789.12345678901234567890123456789012345678901
234567890123456789.1234.example.com"
Expected: false
Actual: true
----
Failed: ID "23": "12345678901234567890123456789012345678901234567890123456789012
345@example.com"
Expected: false
Actual: true
----
Failed: ID "31": """@example.com"
Expected: false
Actual: true
----
Failed: ID "34": "x@x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.
x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.
x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.
x23456789.x23456789.x23456789.x23456"
Expected: false
Actual: true
----
Failed: ID "35": "first.last@[.12.34.56.78]"
Expected: false
Actual: true
----
Failed: ID "36": "first.last@[12.34.56.789]"
Expected: false
Actual: true
----
Failed: ID "37": "first.last@[::12.34.56.78]"
Expected: false
Actual: true
----
Failed: ID "38": "first.last@[IPv5:::12.34.56.78]"
Expected: false
Actual: true
----
Failed: ID "39": "first.last@[IPv6:1111:2222:3333::4444:5555:12.34.56.78]"
Expected: false
Actual: true
----
Failed: ID "40": "first.last@[IPv6:1111:2222:3333:4444:5555:12.34.56.78]"
Expected: false
Actual: true
----
Failed: ID "41": "first.last@[IPv6:1111:2222:3333:4444:5555:6666:7777:12.34.56.7
8]"
Expected: false
Actual: true
----
Failed: ID "42": "first.last@[IPv6:1111:2222:3333:4444:5555:6666:7777]"
Expected: false
Actual: true
----
Failed: ID "43": "first.last@[IPv6:1111:2222:3333:4444:5555:6666:7777:8888:9999]
"
Expected: false
Actual: true
----
Failed: ID "44": "first.last@[IPv6:1111:2222::3333::4444:5555:6666]"
Expected: false
Actual: true
----
Failed: ID "45": "first.last@[IPv6:1111:2222:3333::4444:5555:6666:7777]"
Expected: false
Actual: true
----
Failed: ID "46": "first.last@[IPv6:1111:2222:333x::4444:5555]"
Expected: false
Actual: true
----
Failed: ID "47": "first.last@[IPv6:1111:2222:33333::4444:5555]"
Expected: false
Actual: true
----
Failed: ID "48": "first.last@example.123"
Expected: false
Actual: true
----
Failed: ID "49": "first.last@com"
Expected: false
Actual: true
----
Failed: ID "50": "first.last@-xample.com"
Expected: false
Actual: true
----
Failed: ID "51": "first.last@exampl-.com"
Expected: false
Actual: true
----
Failed: ID "52": "first.last@x23456789012345678901234567890123456789012345678901
2345678901234.example.com"
Expected: false
Actual: true
----
Failed: ID "97": "test@123.123.123.123"
Expected: false
Actual: true
----
Failed: ID "115": "test@12345678901234567890123456789012345678901234567890123456
78901234567890123456789012345678901234567890123456789012345678901234567890123456
78901234567890123456789012345678901234567890123456789012345678901234567890123456
789012345678901234567890123456789012.com"
Expected: false
Actual: true
----
Failed: ID "116": "test@example"
Expected: false
Actual: true
----
Failed: ID "153": "first."".last@example.com"
Expected: false
Actual: true
----
Failed: ID "158": "first.last@[IPv6:1111:2222:3333:4444:5555:6666:12.34.567.89]"

Expected: false
Actual: true
----
Failed: ID "159": ""test\
 blah"@example.com"
Expected: false
Actual: true
----
Failed: ID "190": "a@b"
Expected: false
Actual: true
----
Failed: ID "199": "aaa@[123.123.123.333]"
Expected: false
Actual: true
----
Failed: ID "201": "a@bar"
Expected: false
Actual: true
----
Failed: ID "205": "a@-b.com"
Expected: false
Actual: true
----
Failed: ID "206": "a@b-.com"
Expected: false
Actual: true
----
Failed: ID "213": "invalid@special.museum-"
Expected: false
Actual: true
----
Failed: ID "216": "foobar@192.168.0.1"
Expected: false
Actual: true
----
Failed: ID "227": ""null \0"@char.com"
Expected: false
Actual: true

Thursday, April 2, 2009

Using Windsor in F#

I've been doing some experiments with F# which involves using Windsor (my default choice for a IoC container) and found that Windsor's fluent interface is... not so fluent when used in F#.

UPDATE 10/21/2010: The F# team has loosened the syntax a lot, F# 2.0 can now consume the Windsor fluent API without any changes.

Fluent interface as-is in F#

Example: given this code in C#

container.Register(Component.For<IMyServiceContract>().ImplementedBy<MyServiceImpl>());

Let's try to translate this to F#. (I'll do it step by step so it's more didactic)

Code: container.Register(Component.For<IMyServiceContract>().ImplementedBy<MyServiceImpl>())
Compiler says: Error: Successive arguments should be separated by spaces or tupled, and arguments involving function or method applications should be parenthesized.
Explanation: Component.For<IMyServiceContract>() is a method application, so it has to be parenthesized:

Code: container.Register((Component.For<IMyServiceContract>()).ImplementedBy<MyServiceImpl>())
Compiler says: Error: Type constraint mismatch. The type ComponentRegistration<IMyServiceContract> is not compatible with type IRegistration array.
The type 'ComponentRegistration<IMyServiceContract>' is not compatible with the type 'IRegistration array'.
Explanation: The reason for these errors is that F# expects an array as the parameter for Register, since it's signature is

IWindsorContainer Register(params IRegistration[] registrations)

and F# doesn't support params arrays, so we have to explicitly construct an array:

Code: container.Register([|(Component.For<IMyServiceContract>()).ImplementedBy<MyServiceImpl>()|])
Compiler says: Error: This expression has type ComponentRegistration<IMyServiceContract> but is here used with type IRegistration.
Explanation: What? But ComponentRegistration<IMyServiceContract> implements the IRegistration interface! Yeah, but Register() takes an array of IRegistration, not an array of ComponentRegistration<IMyServiceContract>, and F# doesn't implement array covariance. You can write this in C#:

IMyServiceContract[] arr = new MyServiceImpl[] { new MyServiceImpl() };

but you can't write this in F#:

let arr: IMyServiceContract[] = [| MyServiceImpl() |]

This is actually a Good Thing since this kind of covariance has some nasty consequences. So we have no option but to cast:

Code: container.Register [| (((Component.For<IMyServiceContract>()).ImplementedBy<MyServiceImpl>()) :> IRegistration) |]
Compiler says: Warning: This expression should have type 'unit', but has type 'IWindsorContainer'.
Explanation: We have to do something about the return value. We're not using it in this example, so we'll just discard it:

Code: let _ = container.Register [| (((Component.For<IMyServiceContract>()).ImplementedBy<MyServiceImpl>()) :> IRegistration) |]

This finally compiles, but it's way too noisy. Not fluent at all!

We could have also written:

let _ = container.Register(Seq.to_array (Seq.cast [(Component.For<IMyServiceContract>()).ImplementedBy<MyServiceImpl>()]))

but it's just as ugly.

Extension method solution

We could hide the casting and array stuff in an extension method:

module WindsorContainerExtensions = 
    type IWindsorContainer with
        member x.RegisterComponents (r: seq<#IRegistration>) = 
            let _ = x.Register (Seq.to_array (Seq.cast r))
            () 

which allows us to write:

container.RegisterComponents [(Component.For<IMyServiceContract>()).ImplementedBy<MyServiceImpl>()]

Much better, but it doesn't quite fit the F# spirit.

Function pipelines solution

A better-yet solution is to use function pipelining to build a DSL, like FsUnit and FsTest do, so we can write:

typeof<IMyServiceContract>
    |> implementedBy (typeof<MyServiceImpl>)
    |> registerIn container

Here are the functions that support this:

let implementedBy impl (service: Type) =
    (Component.For service).ImplementedBy impl       
    
let registerIn (container: IWindsorContainer) registration = 
    container.RegisterComponents [ registration ]

It's easy to follow this pattern and implement similar functions to cover more functionality. For example, this sets the lifestyle for a registration:

let withLifestyle lifestyle (registration: ComponentRegistration<_>) = 
    (registration.LifeStyle).Is lifestyle

Usage:

typeof<IMyServiceContract>
    |> implementedBy (typeof<MyServiceImpl>)
    |> withLifestyle LifestyleType.Transient
    |> registerIn container


Conclusion

Fluent interfaces are cool but quite language-dependent, so if you're designing an API targeting the CLR and thinking of building a fluent interface, make sure you also provide an alternative, simpler, non-fluent API so other languages can build their own flavor of fluent interface (Windsor does provide this, of course)