Thursday, April 7, 2011

Object Relational Impedance Mismatch Thought

Stop trying to make a serialized form of data the same thing as the "live" data. Serialized data is a recipe for the live data, not the same thing.

Solving a problem can be easier when you realize the question was incorrect to begin with.

I don't think of programs running in memory to be the same thing as the source code I wrote either. There's too many steps between my fingers, the disk, the compiler, and the OS that can make this assumption horribly wrong.

I realize that having the code eventually tell the computer to do what you mean is the point of a language, but we can't be naive to the way computers actually work, if we want to understand why this assumption can be dangerous and break down.

I feel this is a similar situation for ORMs. Perhaps the best way to code is not to pretend an ORM can be perfect.

I believe this is one of the things NoSQL will have taught us when we're all ready to jump on the next "new hawtness"

Tuesday, December 14, 2010

The D2 Programming Language

Just began looking at this one last night. It's available, with source, from Digital Mars (http://www.digitalmars.com/d/2.0/), and it's quite interesting.

It is definitely a general purpose programming language, and addresses many areas where people want to pull out their hair with C++. It seems it can directly compete with Go (Google's programming language), in the areas of concurrency, has a system of generics and is what I would call a "more than complete" language :-).

It's actually pretty big, and many people might claim that you don't have to understand what you don't use. That's only true if you're the only one authoring code in this language, or have a set of restrictions to a subset of the language that you go by.

Go, on the other hand, is pretty small, and fairly simple. One can understand the entire language fairly quickly by just reading the specification. It's quick to learn, the tools are quick, and the code runs reasonably fast.

D2, at least with the Digital Mars compiler, is pretty fast too. It can be executed as a "script" of sorts, making it nice for system administrative tasks where you often need the source to be right there.

Both languages seem to do quite well at achieving their stated goals and philosophies, and I expect them both to become more important to know as time goes on.

I hope to see both grow in popularity in the not too far off future.

Dave

Sunday, November 28, 2010

Goroutines vs function literals (closures)

Goroutines are a kind of heavy way to deal with a situation where you just want some kind of lazy evaluation. Say I would like to process a file line by line and the basic guts of it looks like this with a goroutines:


func lineStreamer (out chan <- string) {
file, err := os.Open("/usr/share/dict/words", os.O_RDONLY, 0666)
if err != nil {
panic("Failed to open file for reading")
}
defer file.Close()

reader := bufio.NewReader(file)
for {
line, err := reader.ReadString(byte('\n'))
if err != nil {
break
}
// Do something interesting here perhaps other than returning a line
out <- line
}
close(out)
}


This greatly simplifies the act of opening the file, dealing with bufio, and gives me an interface I can just read lines from (or processed lines from) on a channel. But it seems kind of slow, running at about 2.04 to 2.07 seconds on my macbook pro with no runtime tuning. If I raise GOMAXPROCS to 2 I'm getting between 1.836 seconds to 1.929 seconds. GOMAXPROCS at 3 is getting me fairly regular 1.83 seconds.

This got me thinking about how I'd so something like this in other languages. I don't think I'd need coroutines to do it in Scheme for example, as I could do some delay/force thing to get stuff evaluated in chunks.

This led me to the following, possibly non-idiomatic version of a Go program using function literals.


type Cont func()(string, os.Error, Cont) // string, error, and a function returning the next

func lineStreamer (file *os.File, reader *bufio.Reader) (string, os.Error, Cont) {
line, error := reader.ReadString(byte('\n'))
return line, error, func () (string, os.Error, Cont){
return lineStreamer(file, reader)
}
}


To evaluate all the lines I can do something like the following:


s,err,next := lineStreamer(file, reader)

for err == nil{
fmt.Printf("%s", s)
s,err,next = next()
}


And my run times are down to about 1.2 seconds.

I guess my question is, is this idiomatic or not.

Tuesday, June 1, 2010

OMG C++?

I had an interesting problem to solve involving some code that was essentially driven this way:

while(getline(cin, string) {
// process string
}


A coworker of mine suggested that this would only process a line at a time, which it will, but I was wondering if it was reading a line at a time as well as just serializing activity based on lines read. In essence I wondered if the input was buffered for cin.

On my platform I'm testing with, Mac OS X Snow Leopard, it appears that no buffering is really going on.

Here's some code to show what I mean:


void show_stats () {
if (in) {
cout << "Stream is broken or closed\n" << endl;
}
else {
cout << "Availble bytes buffered: " <<cin.rdbuf()->in_avail() << endl;
}
}



This looks at cin's underlying streambuf implementation and looks to see if there's any available bytes in the buffer. When there's no bytes in the buffer, the istream calls on the internal streambuf's "underflow" function to go get more data, and adjust the buffer for some number of "put back bytes".

What I found was that at no point was I seeing any buffered input coming in for cin, so I decided to write my own streambuf and subsequent istream classes to deal with both buffering and any file descriptor (unix pipe, socket, file etc).


#include <cstdio>
#include <cstring>
#include <streambuf>
#include <unistd.h>
#include <iostream>
#include <errno.h>

class fd_inbuf_buffered : public std::streambuf
{
protected:
int fd;
const int bSize;
char * buffer;

public:
fd_inbuf_buffered (int _fd, int _bSize=10) : fd(_fd), bSize(_bSize)
{
buffer = new char [bSize];
// The get pointer should not be at the beginning of the buffer, because
// it limits the ability to do put back into the input stream should
// there be a need to. Ideally that situation does not come up, but we
// leave room for 4 bytes, by pointing all 3 locations to 4 beyond the
// beginning of the buffer.
// 4 was the size used in an implementation in Josuttis' "The C++ Standard Library"
setg( buffer + 4, // beginning of putback area
buffer + 4, // read position
buffer + 4); // end position
}

~fd_inbuf_buffered ()
{
delete [] buffer;
}

protected:
// Underflow is what fills our buffer from the fd.
// if we don't override this, we get the parent, which just returns EOF.
virtual int_type underflow ()
{
//read position before end of buffer
if (gptr() < egptr())
{
return *gptr();
}

int numPutback = gptr() - eback();

//must limit the number of characters previously read into the putback
//buffer... 4 maximum

if (numPutback > 4)
{
numPutback = 4;
}
// Copy up to the putback buffer size characters back into the putback
// area of our buffer.
std::memcpy (buffer + (4 - numPutback), gptr() - numPutback, numPutback);

// read new characters
int num;
retry:
num = read(fd, buffer + 4, bSize - 4);
if (num == 0)
{
return EOF;
}
else if (num == -1) {
switch (errno) {
case EAGAIN:
case EINTR:
goto retry;
}
}

//reset buffer pointers
setg(buffer + (4 - numPutback), buffer + 4, buffer + 4 + num);

return *gptr();
}
};


struct fd_istream : public std::istream
{
protected:
fd_inbuf_buffered buf;
public:
explicit fd_istream (int fd, int bufsz) : buf(fd, bufsz), std::istream(&buf) {}
};



Now I can declare an istream like so:


fd_istream my_cin(0, 1000);



Where 0 is the numeric file descriptor for stdin and 1000 is the buffer size in bytes.

Because I went with the standard IOStream library, as opposed to just writing C style IO directly, I can use it in the same way I'd use any istream. I can use it with iterators or algorithms from the standard library, and I can even use it with getline as you can see below.


void show_stats () {
if (!my_cin) {
cout << "Stream is broken or closed\n" << endl;
}
else {
cout << "Availble bytes buffered: " << my_cin.rdbuf()->in_avail() << endl;
}
}

int main () {
string line;
while (getline(my_cin, line)) {
cout << line << endl;
show_stats();
}
show_stats();
}



In an example run, such as "cat /usr/share/dict/words | ./a.out" I see something like the following:


Availble bytes buffered: 12
Pinacoceras
Availble bytes buffered: 0
Pinacoceratidae
Availble bytes buffered: 980
pinacocytal
Availble bytes buffered: 968
pinacocyte
Availble bytes buffered: 957
pinacoid
Availble bytes buffered: 948
pinacoidal
Availble bytes buffered: 937
pinacol
Availble bytes buffered: 929
pinacolate
Availble bytes buffered: 918



showing how the buffer grows each time I make it read a certain number of bytes, and flows back down to 0. At 0 it calls underflow again, and I can get more data if available or when I hit EOF, I return that from underflow, causing the stream to terminate.

This stream will work for pipes, sockets and files as long as the file descriptor is provided to the constructor. Now because I have a putback buffer size of at least 4, I will have to have allocated at least 4 bytes in my streambuf to make room for the pointers to work properly. There are possibly better ways to deal with it, but for demonstration purposes, this works nicely.

C++ isn't always so bad after all. It just depends on how it's written.

Monday, May 31, 2010

Not usually a fan of IDEs... but

I'm thinking of trying to use Leksah as my primary Haskell development environment on the Mac. I like that they seem to be willing to incorporate Yi as their editing environment to some extent, and I'd like to see where that goes.

Wednesday, May 26, 2010

PLT Scheme is easy

Lots of nice frameworks too. A friend showed me some code he was working to use the Twitter APIs over http to look at people's tweets (if they're not protected).

I thought this was cool, it was 7 lines of code. So I thought I'd wrap it up in a GUI.

Keep in mind I'm NOT a GUI programmer by trade, and that this was my very first venture into PLT GUI programming. It's easy to pick up, and now I've got something hideous that works.




#lang scheme/gui (require net/url xml)
(define (u screenname) (string->url (string-append "http://api.twitter.com/1/statuses/\
user_timeline.xml?screen_name=" screenname)))
(define (f v) (match v (`(text ,_ . ,v) `(,(string-append* v)))
(`(,_ ,_ . ,v) (append-map f v)) (else '())))
(define g (compose f xml->xexpr document-element read-xml))
;(call/input-url (u "omgjkh") get-pure-port g)

(define dialog (new dialog%
[label "Twitter Screen Name Activity Grabulatrixatronulator"]
[width 600]
[height 100]))

(define textfield (new text-field% [parent dialog] [label "Enter a Screen Name"]))
(send textfield set-value "omgjkh")
(display (send textfield get-value))
(newline)

(define newframe (new frame%
[label "Results"]
[width 1000]
[height 600]
))
(define tf (new text-field% [parent newframe] [label ""] [min-height 500]))

(define (appender los)
(cond ((null? los) "")
(else (string-append (car los) "\n" (appender (cdr los))))))

(new button% [parent dialog]
[label "GITERDUN"]
[callback (lambda (button event)
(let ((text (appender (call/input-url (u (send textfield get-value)) get-pure-port g))))
(display text)
(newline)
(send tf set-value text)
(send dialog show #f)
(display "here")(newline)
(send newframe show #t)
(display "here2")(newline)))])

(send dialog show #t)

Current Plan 9 Environment... and loving it!

I've got a Plan 9 CPU server running in VMWare Fusion on Mac OS X Snow Leopard. I ran into a few problems with the setup of this as VMWare Fusion's emulation of IDE disks didn't agree much with Plan 9. Changing to SCSI disks made all the difference in the world.

I followed and updated a little the Plan 9 wiki's instructions on setting up a CPU/Auth server, and then used it with drawterm, a unix program that works like a little Plan 9 terminal to connect to CPU servers, and all was good.

There's a project out there called vx32 which implements a sandboxing/virtualization in userspace library that has been used for a port of the Plan 9 kernel. I grabbed the latest Mercurial snapshot of this code base, and compiled it (after patching it up so Snow Leopard didn't complain about the deprecated ucontext.h stuff), and now I have a Plan 9 kernel (almost, it's not 100% the same) running as a terminal to connect to my Fusion CPU server.

So, now what? Well I may take a crack at the port of the Go language to Plan 9... when I get time to do this again.