Showing posts with label article. Show all posts
Showing posts with label article. Show all posts

Friday, January 16, 2015

Sayonara 2014

Being a Ph.D student I've spent a lot of time optimizing Cohen et al. algorithm for the 2-hop cover (Hub Labels). Finally we published our results at Syphosium of Experimental Algorithms:

Daniel Delling, Andrew V. Goldberg, Ruslan Savchenko, and Renato F. Werneck. Hub Labels: Theory and Practice

After the talk I reimplemented the Hub Labeling approximation algorithm from scratch and published it: https://github.com/savrus/hl

Also, in 2014 we solved some theoretical problems related to Hub Labels including NP-hardness of the minimum Hub Labeling problem and approximation guarantee for Hierarchical Hub Labeling greedy algorithms. The preprint is available from arXiv: http://arxiv.org/abs/1501.02492

That's it for 2014, that's it.

Monday, January 6, 2014

Sayonara 2013

Just a few updates.

First, I finally have a scientific publication. Together with Andrew Goldberg and Ilya Razenshteyn we worked on hub labels and created a paper Separating Hierarchical and General Hub Labelings. It was accepted to the MFCS2013 conference and published in the proceedings. You can have a preprint from arxiv or a final paper from the publisher.

Second, I realized there are many textbook algorithms almost everybody implemented and I didn't. I started algory project on GitHub to fill this gap.
I don't promise a cool graph library in the near future, but I hope to add new algorithms there sometimes.


Sunday, February 13, 2011

Homemade regular expression matcher

Here we try to build a program (all the code is provided "as is" without any warranty including but not limited to mental, psychic and wrong teaching... in short, usual open source license) that matches a string against a regular expression over simplified alphabet: base characters can be English letters only and special characters allowed are '|' (or), '*' (closure) and concatenation requires no character. All ideas given below can be found in Compilers: Principles Techniques and Tools, 2nd edition by A. Aho, M.S. Lam, R. Sethi, J.D. Ullman.


1. Basic Definitions and Notations
We give this for consistency. Reader may just skip this section

Token is a basic block of input. In our case tokens will be small letters of english alphabet and four special tokens: |, *, ( and ).
Alphabet is some set of tokens. Unless stated otherwise we assume all tokens are non-special.
String is a sequence of tokens.
Language over an alphabet is some set of strings, each sting consists of tokens from the alphabet.
Regular expressions language is a language RE over non-special and special tokens with the following properties:
1) empty string is in RE,
2) any string formed by a single non-special token is in RE.
3) if A and B are in RE, then AB, A|B, A* and (A) are in RE too.
Regular expression is an element of language RE.
Regular language L(regexp) determined by a regular expression regexp is defined as follows:
1) if regexp is an empty string, then L(regexp) contains only one string which is empty;
2) if regexp is a single token T, then L(regexp) has only one string T;
3) if regexp is of form AB where A and B are regular expressions, then L(regexp) consists of such and only such strings which can be represented as a concatenation XY of a string X from L(A) and a string Y from L(B);
4) if regexp is of form A|B where A and B are regular expressions, then L(regexp) consists of all strings from L(A) and L(B);
5) if regexp is of form (A) where A is a regular expressions, then L(regexp) is exactly L(A);
6) if regexp is of form A* where A is a regular expressions, then L(regexp) is a union of an empty string language, L(A), L(AA), L(AAA), and so on infinitely. This operation in language algebra is called Kleene closure.
Nondeterministic Finite Automata (NFA) is a triple (S,A,T) where S is a finite set of states, A is an alphabet which can contain a special empty token and T is a transitions function, which given a state and a token returns a new state. There should be chosen one starting state and some positive number of accepting states. NFA is usually depicted as a directed graph where nodes are states, and edges are labeled by tokens and edge labeled by some token X goes from a state Y to state T(Y,X). NFA is said to accept a string R if there is a directed path in the graph representing this NFA from the starting state to one of accepting states such that a string formed by edge labels with empty labels removed is exactly the R. Note, we can omit the requirement of T to be a function by saying there is some special non-accepting state Q such that all undefined transactions should go to Q.

Theorem 1 For each regular expression regexp there is an NFA that accepts strings from and only from L(regexp). Proof is done by induction on construction of regexp: NFA accepting empty string and tokens are obvious, NFA for other cases can be made using NFAs from subexpression. We will give actual constructions in the real code below.
The reverse (for every NFA there is a regular expression defining the same language) is also true. We will not need this fact later so we leave the proof for the reader.

Context-Free Grammar (CFG) is triple (T,N,P) where T is a set of terminals, N is a set of nonterminals, and P is a set of productions. Each production has a head which is always a nonterminal and a body which is a sequence of terminals and nonterminals. One of the nonterminals is designated as a start symbol.
CFG defines a language in the following way. At first we have only start symbol. Then if there are some nonterminals one is picked and substituted by a body of some production having this nonterminal as a head. We stop when there are no nonterminals left. All strings which can be derived in such a way form a language defined by this CFG.
To write down a CFG we will use a notation similar to yacc(1)'s: on each line we write a nonterminal, then ":" and then all productions having this nonterminal as a head separated by "," (we choose comma instead of yacc(1)'s pipe because it is used as RE language token and comma is absolutely vacant), the special word 'epsilon' denotes an empty production body.

Parse Tree for a string s derived using CFG G's productions is a rooted ordered tree with labels at its nodes: root is labeled by start symbol of G; each inner node corresponds to a nonterminal substitution and is labeled by that nonterminal, it's childs are labeled by terminals and nonterminals from corresponding production body in the same order. All leafs are labeled by terminals and being traversed from left to right form the string s.
Parsing is a process of constructing a parse tree from a sting.
Parser is a program capable of performing parsing.

LL(1) parsing algorithm is defined as follows. We read an input string from left to right. To represent a parse state we remember current position in the string and we have a parse stack, initially holding only a start symbol of a CFG. At each step we pop a token from the top of the stack and if it is a terminal we check if it is the same as the current token in the input. If it's not then parse error occurs, otherwise we advance the input. If the token form the stack is nonterminal we choose a production which has this token as it's head and can derive a string starting with the current input terminal. By the definition of LL(1) parser there is exactly one such production, otherwise parse error occurs.
LL(1) context-free grammar is a CFG for which LL(1) parser exists.


2. LL(1) Grammar for Regular Expressions

There is a natural CFG for the language RE obtained from the definition of RE:
S : E
E : E|T , T
T : TF , F
F : F* , R
R : (E), epsilon, L
L : q,w,e,r,t,y,u,i,o,p,a,s,d,f,g,h,j,k,l,z,x,c,v,b,n,m

Unfortunately this grammar is not LL(1) since choice between productions can't be done by looking at the current symbol only. We have to alter this grammar to make another CFG which would be LL(1) one. There are some techniques for this, namely left recursion elimination, left factoring, but we will not give them here. An LL(1) grammar for RE language got by the author is just presented without explanation.

S  :  S'
S' :  TT'
T' :  |S' , epsilon
T  :  PP'
P' :  PP' , epsilon
P  :  RR'
R' :  *R' , epsilon
R  :  (S') , epsilon , L
L  :  q,w,e,r,t,y,u,i,o,p,a,s,d,f,g,h,j,k,l,z,x,c,v,b,n,m


3. Token Class

Now that we have a grammar we can present the Token class. We have tokens for epsilon, grammar symbols and input characters.
typedef enum {
    TOKEN_EPSILON, /* empty token */
    /* nonterminals of regexp LL(1) grammar */
    TOKEN_START,   /* S  -> S' */
    TOKEN_SP,      /* S' -> TT' */
    TOKEN_TP,      /* T' -> |S'  or  epsilon */
    TOKEN_T,       /* T  -> PP' */
    TOKEN_PP,      /* P' -> PP'  or  epsilon */
    TOKEN_P,       /* P  -> RR' */
    TOKEN_RP,      /* R' -> *R'  or  epsilon */
    TOKEN_R,       /* R  -> (S')  or  a,b,c,...,z  or  epsilon */
    /*end nonterminals */
    TOKEN_NORMAL   /* token for the input character */
} TokenType;

class Token {
    TokenType type;  /* type of the token */
    char normal;     /* character for TOKEN_NORMAL token */
public:
    /* create special token of specified type */
    Token(TokenType t) : type(t) {}
    /* create  token for the specified character */
    Token(char n) : type(TOKEN_NORMAL), normal(n) {}

    bool is_epsilon() const { return type == TOKEN_EPSILON; }
    TokenType get_type() const { return type; }
    char get_char() const { return normal; }
    
    /* this is used by std::map data structure. only tokens with
     * TOKEN_NORMAL and TOKEN_EPSILON types held there */
    bool operator <(const Token& t) const { 
        if ((type == t.type) && (type != TOKEN_EPSILON))
            return normal < t.normal;
        return type < t.type;
    }
    bool operator ==(const Token &t) const {
        if ((type == t.type) && (type != TOKEN_EPSILON))
            return normal == t.normal;
        return type == t.type;
    }
};


4. NFA Class

We start with a single NFA state. The class represents the state itself and contains all transitions from this state: endpoints for the same token are grouped in vectors and tokens are used as keys in a std::map data structure. State also has a temporary bit is_on used later by algorithms to mark visited states.

class NFAState;

typedef std::vector<NFAState*> Svector;
typedef std::map<Token, Svector> ASmap;

/* Single NFA state */
class NFAState {
    private:
    ASmap moves;  /* outgoing edges of this state. */

    public:
    bool is_on; /* temporary bit to use by others */
    NFAState() : is_on(false) {}
    /* get vector of transitions for a token */
    const Svector & get_move(Token a) { return moves[a]; }
    /* get all transitions for this state */
    const ASmap & get_all_moves() { return moves; }
    /* add transition */
    void add_move(Token a, NFAState *target) {
        moves[a].push_back(target);
    }
    
};

Next is the whole NFA class. Since we only require one finish state in this program we do not support a general case by this class. However general case can be reduced to ours by simply adding a new finish state and epsilon transitions to it from old finish states. NFA class contains routines to construct a new NFA using existing NFAs. We said earlier that for every regular expression we can build a NFA accepting the same language by recursive on regular expression but we omitted actual constructs, now we give them in the code.

/* Nondeterministic Finite Automata */
class NFA {
    public:
    NFAState *start;   /* start state */
    NFAState *finish;  /* finish state */

    NFA(NFA& nfa) : start(nfa.start), finish(nfa.finish) {}
    NFA() {}

    NFA& operator=(const NFA &nfa) {
        this->start = nfa.start;
        this->finish = nfa.finish;
        return *this;
    }

    /* NFA which accepts a specified token */
    void init_token(Token t) {
        start = new NFAState;
        finish = new NFAState;
        start->add_move(t, finish);
    }

    /* NFA which accepts a string either _s_ or _t_ accepts */
    void init_or(NFA &s, NFA &t) {
        start = new NFAState;
        finish = new NFAState;
        Token epsilon = Token(TOKEN_EPSILON);

        start->add_move(epsilon, s.start);
        start->add_move(epsilon, t.start);
        s.finish->add_move(epsilon, finish);
        t.finish->add_move(epsilon, finish);
    }

    /* NFA which accepts a concatenation of _s_ and _t_ accepted strings */
    void init_concat(NFA &s, NFA &t) {
        start = s.start;
        finish = t.finish;
        Token epsilon = Token(TOKEN_EPSILON);

        s.finish->add_move(epsilon, t.start);
    }

    /* NFA which accepts a closure of _s_ */
    void init_asterisk(NFA &s) {
        start = new NFAState;
        finish = new NFAState;
        Token epsilon = Token(TOKEN_EPSILON);

        start->add_move(epsilon, finish);
        start->add_move(epsilon, s.start);
        s.finish->add_move(epsilon, finish);
        s.finish->add_move(epsilon, s.start);
    }

    /* states are allocated in heap, we need to free them explicitly */
    void delete_states() {
        Svector states, processing;
        processing.push_back(start);

        /* collect all states in the _states_ vector */
        while (!processing.empty()) {
            NFAState *p = processing.back();
            processing.pop_back();
            states.push_back(p);
            ASmap all_moves = p->get_all_moves();
            for (ASmap::iterator it = all_moves.begin(), end = all_moves.end();
                    it != end; ++it) {
                Svector &moves = (*it).second;
                for (Svector::iterator iit = moves.begin(), eend = moves.end();
                        iit < eend; ++iit) {
                    if ((*iit)->is_on) continue;
                    (*iit)->is_on = 1;
                    processing.push_back(*iit);
                }
            }
        }

        for (Svector::iterator it = states.begin(), end = states.end();
                it < end; ++it)
            delete *it;
    }
};

5. NFA Emulation

To run an NFA we first need to introduce a notion of a hyperstate. Since in NFA there can be multiple ways for a transition we need to simulate them all at once. Hyperstate is a collection of NFA states the automata can be in for the current input. Hyperstate can be constructed with the help of epsilon-closure: given a set of states, epsilon-closure is this set plus states were we can get by epsilon transitions from this set. So emulation is done as follows: initial hyperstate is the epsilon-closure of start state, each step the next hyperstate is the epsilon-closure of the set got by current token transactions from the current hyperstate. At the end of the input we check if the finish state is in current hyperstate. If it is, then NFA accepts the input, otherwise, rejects.

class NFAHyperState {
    Svector states;
    private:
    /* add all states from an NFAState with theirs's epsilon-clousure
     * which are not already in hyperstate */
    void add_closure(NFAState &s) {
        if (s.is_on) return;
        std::queue<NFAState*> Q;
        Q.push(&s);
        while (!Q.empty()) {
            NFAState *p = Q.front();
            Q.pop();
            states.push_back(p);
            p->is_on = true;
            Svector moves = p->get_move(Token(TOKEN_EPSILON));
            for (Svector::iterator it = moves.begin(), end = moves.end();
                    it < end; ++it) {
                if ((*it)->is_on) continue;
                Q.push(*it);
            }
        }
    }

    /* clear temporary markings in states. Execute after step is done */
    void clear_marks() {
        for (Svector::iterator it = states.begin(), end = states.end();
                it < end; ++it) {
            (*it)->is_on = false;
        }
    }

    public:
    NFAHyperState(NFAState *s) {
        /* add epsilon-closure of start state */
        add_closure(*s);
        clear_marks();
    }

    /* perform a single transition */
    void move(Token a) {
        Svector old; /* current hyperspace in this routine */
        old.swap(states);
        states.clear();
        /* now current hyperstate is clear, fill it using moves from
         * states in old hyperstate */
        for (Svector::iterator it = old.begin(), end = old.end();
                it < end; ++it) {
            Svector moves = (*it)->get_move(a);
            for (Svector::iterator jt = moves.begin(), jend = moves.end();
                    jt < jend; ++jt) {
                add_closure(**jt);
            }
        }
        clear_marks();
    }

    /* check if a specified NFA state is in this hyperstate */
    bool is_in(NFAState *f) {
        for (Svector::iterator it = states.begin(), end = states.end();
                it < end; ++it) {
            if ((*it) == f) return true;
        }
        return false;
    }
};

/* NFA Emulator. */
class NFAEmulator {
    private:
    NFAHyperState hyperstate;  /* all work is done here */
    NFAState *finish_state;    /* NFA finish state */

    public:
    NFAEmulator(NFA &nfa) : hyperstate(nfa.start), finish_state(nfa.finish) {}
    /* hyperstate transition */
    void move(Token a) { hyperstate.move(a); }
    /* check if NFA accepts the input */
    bool is_accepted() { return hyperstate.is_in(finish_state); }
};

6. Regular Expression Parse Tree

Now we return to our LL(1) grammar and present the parser. Since the grammar appears to be not very friendly to NFA construction operations, we have to build a parse tree first. After parse tree is constructed, build_nfa from its root can be called to construct the NFA for parsed regular expression. Routine build_nfa works by recursively generating NFA for the childs and then for the parent. Sometimes it pass an NFA generated by the one child to it's sibling. The reader is advised to draw a parse tree for some simple expression to see why this happens.

/* grammar productions */
typedef enum {
    RULE_S_SP,     /* S  -> S' */
    RULE_SP_TTP,   /* S' -> TT' */
    RULE_TP_SP,    /* T' -> S' */
    RULE_TP_E,     /* T' -> epsilon */
    RULE_T_PPP,    /* T  -> PP' */
    RULE_PP_PPP,   /* P' -> PP' */
    RULE_PP_E,     /* P' -> epsilon */
    RULE_P_RRP,    /* P  -> RR' */
    RULE_RP_ARP,   /* R' -> *R' */
    RULE_RP_E,     /* R' -> epsilon */
    RULE_R_E,      /* R  -> epsilon */
    RULE_R_LETTER, /* R  -> L */
    RULE_R_BSPB,   /* R  -> (S') */
    RULE_LETTER,   /* L  -> letter */
} GrammarRule;

class ParseTreeNode;

typedef std::vector<ParseTreeNode*> Nvector;

/* Node of a Parse tree */
class ParseTreeNode {
    GrammarRule type;       /* grammar production corresponding to this node */
    Nvector childs;         /* child nodes */
    ParseTreeNode *parent;  /* parend node */
    char letter;            /* character for RULE_LETTER type */
    
    /* nfa corresponding to this node. When constructing a prse tree this is
     * used as a parameter to _inherit_ nfa form parent or siblings */
    NFA nfa;

    public:
    /* node for a new parse tree with the starting production */
    ParseTreeNode() : type(RULE_S_SP) {}
    /* new node for existing parse tree */
    ParseTreeNode(ParseTreeNode &p) : parent(&p) {
        p.childs.push_back(this);
    }
    void set_type(GrammarRule t) { type = t; }
    void set_letter(char c) { letter = c; }
    ~ParseTreeNode() {
        for (Nvector::iterator it=childs.begin(),end=childs.end();
                it<end; ++it)
            delete *it;
        childs.clear();
    }

    /* build NFA by traversing a parse tree
     * should be called from the root of the parse tree */
    NFA & build_nfa() {
        NFA new_nfa; /* temporary NFA */
        switch(type) {
        case RULE_S_SP: /* S -> S' */
            /* NFA for this node is just the NFA for it's child */
            nfa = childs[0]->build_nfa();
            break;
        case RULE_SP_TTP: /* S' -> TT' */
            /* construct NFA for T-child and pass it to the T'-child */
            childs[1]->nfa = childs[0]->build_nfa();
            /* NFA for this node is the same as it's T-child's */
            nfa = childs[1]->build_nfa();
            break;
        case RULE_TP_SP: /* T' -> |S' */
            /* NFA is OR of inherited nfa and S'-child's NFA */
            new_nfa.init_or(nfa, childs[1]->build_nfa());
            nfa = new_nfa;
            break;
        case RULE_TP_E:
            /* NFA is right the inherited nfa, do nothing */
            break;
        case RULE_T_PPP: /* T -> PP' */
            /* construct NFA for P-child, pass it to P'-child */
            childs[1]->nfa = childs[0]->build_nfa();
            /* NFA is P'-child's nfa */
            nfa = childs[1]->build_nfa();
            break;
        case RULE_PP_PPP: /* P' -> PP' */
            /* construct concatenation NFA of inherited nfa and P-child's */
            new_nfa.init_concat(nfa, childs[0]->build_nfa());
            /* pass new_nfa to P'-childs */
            childs[1]->nfa = new_nfa;
            /* NFA is P'-child's NFA */
            nfa = childs[1]->build_nfa();
            break;
        case RULE_PP_E:
            /* NFA is inherited nfa, do nothing */
            break;
        case RULE_P_RRP:  /* P -> RR' */
            /* construct NFA for R-child and pass it to R'-child */
            childs[1]->nfa = childs[0]->build_nfa();
            /* NFA is R'-child's NFA */
            nfa = childs[1]->build_nfa();
            break;
        case RULE_RP_ARP: /* R' -> *R' */
            /* construct asterisk NFA from inherited nfa, pass it to child */
            childs[1]->nfa.init_asterisk(nfa);
            /* NFA is R'-child's nfa */
            nfa = childs[1]->build_nfa();
            break;
        case RULE_RP_E:
            /* NFA is inherited nfa, do nothing */
            break;
        case RULE_R_E:
            /* NFA is accepting only epsilon token*/
            nfa.init_token(Token(TOKEN_EPSILON));
            break;
        case RULE_R_LETTER: /* R -> L */
            /* NFA is child's NFA */
            nfa = childs[0]->build_nfa();
            break;
        case RULE_R_BSPB:
            /* NFA is S'-child's NFA */
            nfa = childs[1]->build_nfa();
            break;
        case RULE_LETTER: /* letter -> a|...|z */
            /* NFA is accepting only letter token */
            nfa.init_token(Token(letter));
            break;
        }
        return nfa;
    }
};


7. Regular Expression Parser

Now we can finally present the parser itself. LL(1) parsing algorithm has been discussed above, the only exception is that parse stack now holds not only tokens but parse tree nodes too.

typedef std::pair<Token, ParseTreeNode*> TNpair;
typedef std::vector<TNpair> TNvector;

/* Regular Expression Parser */
class REParse {
    std::string regexp;         /* regular expression to parse */
    unsigned int position;      /* current position in regexp */
    TNvector parse_stack;       /* LL(1) parse stack */
    ParseTreeNode *parse_tree;  /* parse tree */

    /* push token-node pair on a parse stack */
    void push_parse_state(Token &t, ParseTreeNode *node) {
        parse_stack.push_back(TNpair(t, node));
    }

    /* grammar production for a letter terminal in a body */
    void production(ParseTreeNode &node, GrammarRule rule, char letter) {
        node.set_type(rule);
        node.set_letter(letter);
    }
    /* grammar production with empty body */
    void production(ParseTreeNode &node, GrammarRule rule) {
        node.set_type(rule);
    }
    /* grammar production with one nonterminal in a body */
    void production(ParseTreeNode &node, GrammarRule rule, Token t1) {
        ParseTreeNode *c1 = new ParseTreeNode(node);
        push_parse_state(t1, c1);
        node.set_type(rule);
    }
    /* grammar production with two nonterminals in a body */
    void production(ParseTreeNode &node, GrammarRule rule, Token t1, Token t2) {
        ParseTreeNode *c1 = new ParseTreeNode(node);
        ParseTreeNode *c2 = new ParseTreeNode(node);
        push_parse_state(t2, c2);
        push_parse_state(t1, c1);
        node.set_type(rule);
    }
    /* grammar production with three nonterminals in a body */
    void production(ParseTreeNode &node, GrammarRule rule, Token t1, Token t2, Token t3) {
        ParseTreeNode *c1 = new ParseTreeNode(node);
        ParseTreeNode *c2 = new ParseTreeNode(node);
        ParseTreeNode *c3 = new ParseTreeNode(node);
        push_parse_state(t3, c3);
        push_parse_state(t2, c2);
        push_parse_state(t1, c1);
        node.set_type(rule);
    }

    /* can current character be the beginning of a string derieved from P' */
    bool is_first_PP(char c) const { return (isalpha(c) || (c == '(')); }
    /* can current character be the beginning of a string derieved from T' */
    bool is_first_TP(char c) const { return (is_first_PP(c) || (c == '|')); }

    /* A step of LL(1) parsing: process a token from the top of the stack */
    void process_token(Token &t, ParseTreeNode &node) {
#define T(x) Token(x)
        char lookup = regexp[position];
        switch (t.get_type()) {
        case TOKEN_START:
            production(node, RULE_S_SP, T(TOKEN_SP)); break;
        case TOKEN_SP:
            production(node, RULE_SP_TTP, T(TOKEN_T), T(TOKEN_TP)); break;
        case TOKEN_T:
            production(node, RULE_T_PPP, T(TOKEN_P), T(TOKEN_PP)); break;
        case TOKEN_P:
            production(node, RULE_P_RRP, T(TOKEN_R), T(TOKEN_RP)); break;
        case TOKEN_R:
            if (lookup == '(')
                production(node, RULE_R_BSPB, T('('), T(TOKEN_SP), T(')'));
            else if (isalpha(lookup))
                production(node, RULE_R_LETTER, T(lookup));
            else
                production(node, RULE_R_E);
            break;
        case TOKEN_TP:
            if (lookup == '|')
                production(node, RULE_TP_SP, T('|'), T(TOKEN_SP));
            else
                production(node, RULE_TP_E);
            break;
        case TOKEN_PP:
            if (is_first_PP(lookup))
                production(node, RULE_PP_PPP, T(TOKEN_P), T(TOKEN_PP));
            else
                production(node, RULE_PP_E);
            break;
        case TOKEN_RP:
            if (lookup == '*')
                production(node, RULE_RP_ARP, T('*'), T(TOKEN_RP));
            else
                production(node, RULE_RP_E);
            break;
        case TOKEN_NORMAL:
            /* check if input doesn't finish unexpectedly */
            if (position >= regexp.length()) {
                std::cerr << "Parse error: went past input" << std::endl;
                throw 10; /* ABORT */
            }
            /* this production can be used for passing throwgh
             * special tokens, in this case we have to check */
            if (lookup != t.get_char()) {
                std::cerr << "Parse error: expected '" << t.get_char()
                    << "' got '" << lookup << "'" << std::endl;
                throw 20; /* ABORT */
            }
            production(node, RULE_LETTER, lookup);
            ++position; /* all right, advance input position */
            break;
        default:
            std::cerr << "Parse error" << std::endl;
            throw 30; /* ABORT */
        }
    }

    public:
    REParse(std::string re) : regexp(re), position(0) {
        /* initialize parse tree and stack */
        parse_tree = new ParseTreeNode();
        parse_stack.push_back(TNpair(Token(TOKEN_START), parse_tree));
        /* LL(1) parsing cycle */
        while (!parse_stack.empty()) {
            TNpair& tnpair = parse_stack.back();
            parse_stack.pop_back();
            process_token(tnpair.first, *tnpair.second);
        }
        /* check if we weren't able to process all the input regexp */
        if (position != regexp.length()) {
            std::cerr << "Parse error: finished early" << std::endl;
            throw 40; /* ABORT */
        }

    }

    /* get parse tree */
    ParseTreeNode * get_parse_tree() const { return parse_tree; }

    ~REParse() {
        delete parse_tree;
    }

};


8. Epilogue

We presented all the parts to build a regular expression matcher. First we need to parse a regular expression alongside with building a parse tree, next we build an NFA from a parse tree, then we emulate NFA and finally check if NFA accepts the input string. As a final example here is the main() function of our program. It reads two string: first one is a regular expression, second is a matched string. It outputs "matched" if the input string appears to be in the language defined by the regular expression, otherwise the program outputs "not matched"

int main() {
    std::string re;
    std::cin >> re;  /* read regular expression */
    REParse parser(re);  /* parse regular expression */
    NFA nfa = parser.get_parse_tree()->build_nfa(); /* build NFA */
    NFAEmulator nfaemu(nfa); /* Initialize NFA emulator */

    std::string match;
    std::cin >> match; /* read input string */

    /* process string */
    for (unsigned int i=0; i < match.length(); ++i)
        nfaemu.move(match[i]);
    if (nfaemu.is_accepted()) std::cout << "matched";
    else std::cout << "not matched";
    std::cout << std::endl;

    /* free states allocated in heap */
    nfa.delete_states();
}

9. Further Reading

Aforementioned Compilers: Principles, Techniques and Tools contains every idea presented here. Introduction to Automata Theory, Languages and Computation by Hopcroft, Motwani and Ullman explains the mathematics behind automatas and conext-free grammars. Finally, this is a very good online resource about regular expressions in practice with parsing done by Dijkstra's shunting yard algorithm and reverse polish notation processing which is much more simpler than LL(1) parser presented here.

Thursday, November 11, 2010

using template translator in binary translation system

Our bosses at work asked us to give a talk about template translator (rather simple binary translation layer) at the conference. Text and slides are now available (in Russian).

Tuesday, October 6, 2009

Writing a DMA Bootloader

Preamble

At work I was asked to write a program which performs a dma transfer. After reading some documentation, articles and posts on the internet i've done it finally.Since I haven't found a tutorial for this I'll try to write one here.


Abstract

Bootloader sets up the environment for the OS kernel, loads the kernel and transfers execution to it. By using DMA mode the job of loading the kernel and other jobs can be done in parallel thus reducing boot time. Here I'll try to present a DMA mode transfer as early as possible.


Master Boot Record

First of all what is a bootloader? On startup x86 processor executes instruction at address 0xFFFFFFF0 ([3], 9.1.4 "First Instruction Executed") which is located in BIOS area ([2], 3.1.3 "BIOS Memory"). After BIOS finishes hardware checking and other jobs it loads first sector (512 bytes), called Master Boot Record, of a boot device into 0x0000:0x7c00 memory location, saves drive number into DL register and transfers execution control to 0x7c00 ([1], 6.5.1 "Booting from BAIDs"). Although it is very well known that MBR must be ended with 55 AA signature I can't find exact document proclaiming that. So bootloader is just a 510-byte length code with additional 0xaa55 (little-endian) after the end to be executed in real mode at addres 0x7c00.

It is possible to use BIOS fuctions in bootloader. They are invoked as interrupts using 'int' instruction ([3], 6 "Interrupt and Exception Handling"; [3], 16.1.4 "Interrupt and Exception Handling" in real mode; see [5] for detailed description of 'int' instruction), parameters are passed through registers (AH is typically used for selecting a function while interrupt vector selects a function class). The large reference known as Ralf Brown's Interrupt List is available on the internet (see Links). For example, to print a message one can use the following function (in NASM syntax; see Links for NASM manual):

; prints a null-terminated string pointed by DS:SI
print: lodsb
test al, al
je .done
mov ah, 0x0e ; BIOS int 0x10/AH=0x0e function. AL = character to write
mov bx, 0x0007 ; BH = page number, BL = foreground color
int 0x10 ; display a character on the screen, advancing the cursor
; and scrolling the screen as necessary
jmp print
.done: ret

BIOS can even be used for loading a sector from hdd to memory. For more details refer to Rallf Brown's Interrupt List.

; read from disk drive in LBA mode
; disk drive specified by DL is be used
mov si, addrpacket
mov ah, 0x42 ; BIOS int 0x13/AH=0x42 function. DL = drive number
; DS:SI = disk address packet address
int 0x13 ; performs READ from disk drive in LBA mode

Here addrpacket is a data structure stored in the main memory. In the following example addrpacket tells BIOS to transfer the second sector of the disk drive to address 0x7e00 (or DS:0x7e00, RBIL doesn't say it explicitly)

addrpacket db 0x10 ; size of this disk address packet structure
db 0 ; reserved
dw 1 ; number of blocks to transfer
dd 0x7e00 ; transfer buffer in main memory
dd 1, 0 ; starting absolute 64-bit block number

It is up to BIOS to choose which mode the latter transfer is performed in. It could be DMA as well as PIO so we can't rely on BIOS here.


ISA DMA vs PCI IDE DMA

The idea of DMA is older than PCI. On the internet there are plenty of tutorials and examples describing ISA DMA (see Links). If you read something about two DMA controllers (8237A compatible, in particular) or I/O ports 0x00 - 0x0f, then the talk is about ISA DMA. What we need here is IDE DMA which is quite more complicated.


PCI and PCI Configuration Space

In x86 architecture there are two address spaces supported by the processor: main memory ([4], 3.3 "Memory Organization") and I/O address space which elements are called ports ([4], 13 "Input/Output"). While accessing main memory is done by 'mov' and several other instructions, I/O ports are accessed by 'in' and 'out' instructions (see [5], [6] for detailed description of these instructions)

PCI presented another address space called "PCI Configurations Space" in which each PCI device obtains a 256-byte area. PCI Configuration Space is accessed using two ports: 0x0cf8 and 0x0cfc ([8], 3.2.2.3.2 "Software Generation of Configuration Transactions"). The first one is called CONFIG_ADDRESS and the second one is called CONFIG_DATA. Configuration transaction works as follows: software writes to CONFIG_ADDRESS to specify which device's configuration register should be accessed via CONFIG_DATA. Then it reads or writes CONFIG_DATA port thus performing load to or store from device's configuration space. CONFIG_DATA is 32-bit port and 4 bytes aligned at 4-byte boundary are accessed at once.

CONFIG_ADDRESS has the following format ([8], 3.2.2.3.2 "Software Generation of Configuration Transactions")

31 30 24 23 16 15 11 10 8 7 2 1 0
+-+----------+------------+---------------+-----------------+-----------------+-+-+
| | Reserved | Bus Number | Device Number | Function Number | Register Number |0|0|
+-+----------+------------+---------------+-----------------+-----------------+-+-+
31 - Enable bit

Enable bit specifies if CONFIG_DATA will be accessed. Function Number selects a function in multi-functional device - PCI device which has separate PCI configuration space for each supported function number. For example, PIIX controller we consider later here has PCI-ISA bridge as function 0 and PCI IDE controller as function 1.

PCI Configuration space for each PCI device consists of header and device dependent region ([8], 6.1 "Configuration Space Organization"). Header is specified as follows

+----------+-------------+-------------+---------------+----------------------+
| register | bits 31-24 | bits 23-16 | bits 15-8 | bits 7-0 |
+----------+-------------+-------------+---------------+----------------------+
| 00 | Device ID | Vendor ID |
| 04 | Status | Command |
| 08 | Class code | Revision ID |
| 0C | BIST | Header type | Latency Timer | Cache Line Size |
| 10 | Base address #0 (BAR0) |
| 14 | Base address #1 (BAR1) |
| 18 | Base address #2 (BAR2) |
| 1C | Base address #3 (BAR3) |
| 20 | Base address #4 (BAR4) |
| 24 | Base address #5 (BAR5) |
| 28 | Cardbus CIS Pointer |
| 2C | Subsystem ID | Subsystem Vendor ID |
| 30 | Expansion ROM base address |
| 34 | Reserved | Capabilities Pointer |
| 38 | Reserved |
| 3C | Max latency | Min Grant | Interrupt PIN | Interrupt Line |
+----------+-------------+-------------+---------------+----------------------+

Here we only use Vendor ID (0x00), Status (0x06), Class Code (0x09) and BAR4 (0x20). When talking about Class Code it is often split into three bytes: base class code (0x0b), subclass code (0x0a) and programming interface (0x09) ([8], 6.2.1 "Device Identification"). Base Address Registers are used to specify mapping between either main memory or i/o space and device memory ([8], 6.2.5 "Base Addresses"). Bit 0 of BAR determines the type of memory: 0 for main memory, 1 for i/o space.

For more information see [10], 24 "The PCI Local Bus" and [8].


PCI IDE contorller

To peform DMA IDE transfer IDE controller must be capable of Bus Master function. Such device is determined as follows: base class code and subclass code must both be 0x01 to identify IDE device, and bit 7 of programming interface must be 1 to identify that the device is capable of Bus Master operation ([7], 5.0. "PCI Specifics"; [8], Appendix D "Class Codes"). PIIX/PIIX3 controller is a multi-functional PCI device and it's function 1 is exactly an IDE device with class code 0x010180 ([2], 2.3. "PCI Configuration Registers—IDE Interface (Function 1)", 2.3.6. "CLASSC - CLASS CODE REGISTER (Function 1)").

We need to use 2 bits of PCI Command register ([2], 2.3.3. "PCICMD - COMMAND REGISTER (Function 1)"):

PCI Command Register (address 0x04-0x05)
+-----+------------------------------------------------------------------------+
| bit | Description |
+-----+------------------------------------------------------------------------+
| 2 | Bus Master Enable (BME). 1 - enable, 0 - disable |
| 0 | I/O Space Enable (IOSE). 1 - enable access to ATA and Bus Master ports |
+-----+------------------------------------------------------------------------+

ATA ports are called Legacy IDE ports and are used for sending ATA commands. They have fixed addresses, we'll talk about them when discussing ATA part of DMA protocol. Bus Master ports have changeable base address which is configurable via BAR4.

BAR4 is called Bus Master Interface Base Address Register (BMIBA) in PIIX/PIIX3 documentation ([2], 2.3.9. "BMIBA—BUS MASTER INTERFACE BASE ADDRESS REGISTER (Function 1)"). Bits 2-15 specify 4-bytes aligned base address, bit 0 is hardwired to 1 to indicate this BAR is for I/O address space, other bits are reserved and set to 0.

The latter I/O address range is called PCI Bus Master IDE Registers ([2], 2.7. "PCI BUS Master IDE Registers") and includes the following ports:

+--------+--------------------------------------------------------------------+
| Offset | Desscripton |
+--------+--------------------------------------------------------------------+
| 0x00 | Primary Channel Bus Master IDE Command Register |
| 0x02 | Primary Channel Bus Master IDE Status Register |
| 0x04 | Primary Channel Bus Master IDE Descriptor Table Pointer Register |
| 0x08 | Secondary Channel Bus Master IDE Command Register |
| 0x0a | Secondary Channel Bus Master IDE Status Register |
| 0x0c | Secondary Channel Bus Master IDE Descriptor Table Pointer Register |
+--------+--------------------------------------------------------------------+

Command and Status ports are 8-bit registers while Descriptor Table Pointer is 32-bit one. Bit 3 of Command Register is used for specifying Read/Write operation (Bus Master Read operation reads from memory, Bus Master Write operation writes to memory), bit 0 is used for Start/Stop Bus Master operation. Bit 0 of Status Register indicates if Bus Master operation is in progress, bit 1 indicates errors. For more details refer to [2], 2.7. "PCI BUS Master IDE Registers".

Bits 2-31 of Descriptor Table Pointer Register specify a 4-bytes aligned base address for Physical Region Descriptor Table. PRD Table entry has the following format ([7] 1.2. "Physical Region Descriptor"; [2], 3.5.3. "BUS MASTER FUNCTION"):

+---------+----------+----------+----------+----------+
| | byte 3 | byte 2 | byte 1 | byte 0 |
+---------+----------+----------+----------+----------+
| dword 0 | Memory Region Physical Base Address |0|
| dword 1 |EOT| Reserved | Byte Count |0|
+---------+---+------+----------+----------+--------+-+

Region Address is an address in main memory associated with DMA transfer. Byte Count specify the latter region size in bytes. EOT bit indicates if this PRD is the last one in the table. Note that Physical Region must not cross 64K boundary.


ATA

Legacy IDE I/O ports have constant addresses ([2], 3.5.1. "ATA REGISTER BLOCK DECODE"). They are divided into four blocks: Primary Command Block (base 0x01f0), Primary Control Block (base 03F4h), Secondary Command Block (base 0x0170), Secondary Control Block (base 0x0374). Here registers of command block are presented ([9], 6.2 "I/O register descriptions")

+--------+--------------------------------+
| offset | Register Function |
+--------+--------------------------------+
| 0x00 | Data |
| 0x01 | Error / Features |
| 0x02 | Sector Count |
| 0x03 | LBA bits 0-7 |
| 0x04 | LBA bits 8-15 |
| 0x05 | LBA bits 16-23 |
| 0x06 | Device / Head |
| 0x07 | Status / Command |
+--------+--------------------------------+

Each register is one byte length. Bit 7 of Status register is called BSY and indicates that device holds control block registers. If it is set all writes to control block registers are ignored. Bit 6 of Status register is called DRDY and indicates that device is capable of accepting a command.

To perform a DMA transfer software must do the following steps ([9], 8.6 "DMA data transfer commands")
a) The host reads the Status or Alternate Status register until the BSY bit is equal to 0;
b) The host writes the Device/Head register with the appropriate DEV bit value;
c) The host reads the Status or Alternate Status register until the BSY bit is equal to 0 and the DRDY bit is equal to 1;
d) The host writes any required command parameters to the Features, Sector Count, Sector Number,
Cylinder High, Cylinder Low and Device/Head registers;
e) The host initializes the DMA channel;
f) The host writes the command code (0xc8 for DMA Read with retry) to the Command register;


Conclusion

Given specifications are enough to perform a DMA transaction. Code with large number of comments is given below.


References

[1] BIOS Boot Specification, Version 1.01, January 1996, Compaq, Phoenix, Intel
[2] 82371FB (PIIX) AND 82371SB (PIIX3) PCI ISA IDE XCELERATOR, April 1997, Intel
[3] Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A: System Programming Guide, Part 1, June 2009, Intel Corporation
[4] Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 1: Basic Architecture, June 2009, Intel Corporation
[5] Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2B: Instruction Set Reference, A-M, June 2009, Intel Corporation
[6] Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2B: Instruction Set Reference, N-Z, June 2009, Intel Corporation
[7] Programming Interface for Bus Master IDE Controller, Revision 1.0, 5/16/94, Brad Hosler, Intel Corporation
[8] PCI Local Bus Specification, Revision 2.3, March 29, 2002, PCI Special Interest Group
[9] Information Technology - AT Attachment Interface with Extensions - 2 (ATA-2), Working Draft, Revision 4c, March 18, 1996, X3T10
[10] The Indispensable PC Hardware Book, Third Edition, Hans-Peter Messmer, ADDISON-WESLEY


Links

http://wiki.osdev.org/MBR_(x86)
http://wiki.osdev.org/Memory_Map_(x86)
http://wiki.osdev.org/PCI
http://wiki.osdev.org/BIOS
http://wiki.osdev.org/RBIL
http://wiki.osdev.org/ISA_DMA
http://www.nasm.us/doc - NASM manual
http://www.ata-atapi.com/hiwchs.html - CHS to LBA translation
http://www.bswd.com/cornucop.htm - links to ata/atapi specfications drafts
http://www.cs.cmu.edu/~ralf/files.html - BIOS interrupts
http://www.osdever.net/tutorials/lba.php - an example of data transfer in PIO mode
http://www.nondot.org/sabre/os/articles/TheBootProcess - articles about booting and bootloader examples
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/dma.html - ISA DMA explanation


The Code

This code works under qemu. Just save all files, run make and execute 'qemu -hda hdd' command. You should see a message "Hello world from DMA Bootloader" indicating everithing goes well.


loader.asm

;
; DMA Bootloader
;
; This program is provided as an example for educational
; purpose ONLY and WITHOUT ANY WARRANTY
;
; Copyright 2009, Ruslan Savchenko

;------------------------------------------------
; macroses
;------------------------------------------------

%macro PCI_CONF_READ 1-2
; %1 - pci address register base
; %2 - pci conf register num
mov eax, %1
%ifnum %2
or ax, %2
%endif
call pci_conf_read
%endm

%macro PCI_CONF_WRITE 1-2
; WARNING! mangles ebx!
; %1 - pci address register base
; %2 - pci conf register num
mov ebx, %1
%ifnum %2
or bx, %2
%endif
call pci_conf_write
%endm

;------------------------------------------------
; consts
;------------------------------------------------

%define iobase 0xff0 ; Bus Master I/O port base
%define sector 57 ; Number of sector to load. Must be less than 256
%define codestart 0x500 ; address where load sector to

;------------------------------------------------
; code
;------------------------------------------------

org 0x7c00 ; tell NASM this code is to be executed at address 0x7c00

start
; configure stack at first
cli ; disable interrupts
mov ax, 0x9000 ; stack segment base
mov ss, ax ; write segment base to SS register
mov sp, 0xffff ; stack pointer
sti ; enable interrupts


; Loop through all PCI devices and their functions
; to find IDE Bus Master

mov cx, 0xffff ; Bus Number, Devce Number and
; Register Number together occupy
; 16 bits of PCI CONFIG_ADDRESS
mov ebx, 0x80000000 ; set Enable bit
.m1
PCI_CONF_READ ebx ; load Device ID and Vendor ID to eax
cmp ax, 0xffff ; check if device is not present
je .m2 ; device not present
PCI_CONF_READ ebx, 0x08 ; load class code, subclass code and
; prog interface to eax
shr eax, 15
cmp eax, 0x010180 >> 7 ; check if class = 0x01,
; subclass = 0x01 and
; bit 7 of prog_if = 1
jne .m2 ; not an IDE Bus Master device
; found Bus Master capable IDE controller
mov dword [pciide], ebx ; save CONFIG_ADDRESS to access
; IDE Bus Master in future
.m2
add ebx, 0x100 ; try next function/device/bus
loop .m1

mov eax, iobase | 1 ; Bus Master Interface Base Address
PCI_CONF_WRITE [pciide], 0x20 ; save address to BMIBA Register

PCI_CONF_READ [pciide], 0x04 ; PCI Command register
or ax, 1 ; set I/O Space Enable Bit
PCI_CONF_WRITE [pciide], 0x04 ; save PCICMD


; wait while ATA device is bisy
mov dx, 0x01f7 ; ATA Primary Status/Command
.m3
in al, dx
cmp al, 0x80 ; check if BSY bit is set
jae .m3

; for Primary device set use of Drive 0 (Master) in LBA mode
mov dx, 0x01f6 ; ATA Primary Device/Head
mov al, 0x40 ; LBA mode, Drive = 0, LBA 27-24 = 0
out dx, al

; wait for ATA defice to accept command
mov dx, 0x01f7 ; ATA Primary Status/Command
.m4
in al, dx
and al, 0xc0 ; BSY and DRDY bits
cmp al, 0x40 ; check if BSY=0, DRDY=1
jne .m4

; Set parameters of transaction:
; nubmer of sectors, first sector LBA address and mode
;
;mov dx, 0x01f2 ; ATA Primary Sector Count
;mov al, 1 ; count = 1
;out dx, al
;
;mov dx, 0x01f3 ; ATA Primary LBA 0-7
;mov al, sector ; LBA 0-7 = sector
;out dx, al
;
;mov dx, 0x01f4 ; ATA Primary LBA 8-15
;mov al, 0 ; LBA 8-15 = 0
;out dx, al
;
;mov dx, 0x01f5 ; ATA Primary LBA 16-23
;mov al, 0 ; LBA 16-23 = 0
;out dx, al
;
;mov dx, 0x01f6 ; ATA Primary Device/Head
;mov al, 0x40 ; LBA mode, Drive= 0
;out dx, al
;
; Using loop here reduce code size by 12 bytes

mov cx, 5 ; 5 iterations
mov dx, 0x01f2 ; ATA Primary Sector Count
mov si, atasector ; transaction parameters array
cld ; string operations will increase SI
.m5
lodsb ; load next byte to al and increase SI
out dx, al ; write byte to corresponding ATA Primary
; Command Block port
inc dx ; increase DX
loop .m5

; initialize host dma transfer
PCI_CONF_READ [pciide], 0x04 ; PCI Command Register
or ax, 4 ; enable Bus Master Function
PCI_CONF_WRITE [pciide], 0x04 ; save PCICMD

; set Drive 0 DMA Capable in Bus Master IDE Status Register
; This bit does not affect hardware operation
;mov word dx, iobase | 0x2 ; Bus Master IDE Status Register
;in al, dx
;or al, 0x20 ; set Drive 0 DMA Capable
;out dx, al ; save Bus Master IDE Status

; write PRD Table pointer
xor eax, eax
mov ax, prd
mov dx, iobase | 0x4 ; Bus Master IDE Descriptor
; Table Pointer Register
out dx, eax

; command ATA to perform DMA transfer
mov dx, 0x01f7 ; ATA Primary Status/Command
mov al, 0xc8 ; DMA Read With Retry Command
out dx, al

; tell Bus Master to start DMA transfer
mov dx, iobase ; Bus Master IDE Command Register
mov al, 9 ; Start/Stop Bit set
; Read/Write Bit set
out dx, al

;
; Place to do some other job while sector is transfered from
; drive to memory without holding CPU
;

; wait until transfer is finished
mov dx, iobase | 0x2 ; Bus Master IDE Status Register
.m6
in al, dx
bt ax, 0 ; Bus Master IDE Active Bit
jc .m6

; transfer execution to recently loaded code
jmp (codestart >> 4):0

;------------------------------------------------
; functions
;------------------------------------------------

pci_conf_read
; Input:
; EAX = PCI CONFIG_ADDRESS
; Output:
; EAX = PCI CONFIG_DATA
mov dx, 0xcf8 ; CONFIG_ADDRESS
out dx, eax
mov dx, 0xcfc ; CONFIG_DATA
in eax, dx
ret

pci_conf_write
; Input:
; EAX = DATA
; EBX = PCI CONFIG_ADDRESS
; Output:
; EBX = DATA
xchg eax, ebx
mov dx, 0xcf8 ; CONFIG_ADDRESS
out dx, eax
mov dx, 0xcfc ; CONFIG_DATA
mov eax, ebx
out dx, eax
ret

;------------------------------------------------
; data
;------------------------------------------------

; array of bytes to be passed through ATA I/O ports
atasector db 1, sector, 0, 0, 0x40
; sector count = 1
; LBA 0-7 = sector
; LBA 8-15 = 0
; LBA 16-24 = 0
; device/head = 0x40 (LBA mode, drive 0)

align 4
; Physical Region Descrptor
prd dd codestart ; Memory Region
dd 0x80000200 ; EOT, 512 bytes

; buffer to save CONFIG_ADDRESS of PCI IDE device
pciide dd 0

;------------------------------------------------
; Space for Partition Table
;------------------------------------------------

times 436-($-$$) db 0xff

;------------------------------------------------
; boot sector magic end
;------------------------------------------------

times 510-($-$$) db 0
dw 0xAA55 ; MBR Signature


sector.asm

;------------------------------------------------
; code
;------------------------------------------------

org 0x500 ; tell NASM to assume this program
; begins at 0x500 when loaded into memory

start:
mov si, hellomsg
call print_si

cli ; disable interrupts
hlt ; infinite loop which can be broken only by interrupt

;------------------------------------------------
; functions
;------------------------------------------------

print_si
push ax
push bx
.next
lodsb
test al, al
je .done
mov ah, 0x0e ; BIOS int 0x10/AH=0x0e function. AL = character to write
mov bx, 0x0007 ; BH = page number, BL = foreground color
int 0x10 ; display a character on the screen, advancing the cursor
; and scrolling the screen as necessary
jmp .next

.done:
pop bx
pop ax
ret

;------------------------------------------------
; data
;------------------------------------------------

hellomsg db 'Hello World from DMA Bootloader', 13, 10, 0

;------------------------------------------------
; sector end
;------------------------------------------------

times 512 -($-$$) db 0


Makefile

hdd: loader sector
rm -f hdd
dd if=loader of=hdd bs=512 count=1
dd if=sector of=hdd bs=512 count=1 seek=57

loader: loader.asm
nasm loader.asm
sector: sector.asm
nasm sector.asm

clean:
rm -f hdd loader sector



Known Issues

If BIOS sets DS to nonzero value the code fails. Add initializing DS to zero at the very start to ensure the code is correct.
This code doesn't check for _any_ errors. If you want to add error checks refer to corresponding documentation.
No DMA timing configuration is done. Doing this can improve transaction speed significantly. UDMA Mode 5 is 6 times faster than UDMA Mode 0.
On DMA transfer completion controller sends an interrupt. This also should be taken care of.
After issuing a DMA transfer command to device one would probably want to wait until device is ready. In this case add the following code before setting Start/Stop bit of Bus Master Command Register. But be careful and read [9] before for this standard doesn't enfocrce setting DRQ here

; wait until ATA device is ready to transfer data
mov dx, 0x01f7 ; ATA Primary Status/Command
.m7
in al, dx
test al, 0x08 ; check if DRQ bit is set
jz .m7