<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-8099934077984653989</id><updated>2012-02-02T23:04:35.553+03:00</updated><category term='C++'/><category term='linux'/><category term='sql'/><category term='python'/><category term='bug'/><category term='perl'/><category term='binary translation'/><category term='pam'/><category term='gcc'/><category term='program'/><category term='uguu'/><category term='article'/><category term='mplayer'/><category term='crc32'/><category term='c'/><category term='patch'/><category term='asm'/><title type='text'>サヴルスのブログ</title><subtitle type='html'>Original content only</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://savrus.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://savrus.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>savrus</name><uri>http://www.blogger.com/profile/15866109971001922680</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://3.bp.blogspot.com/_a0-3CVBSqgA/Sfd_lpyrebI/AAAAAAAAAG4/1B-3_qAgrdM/S220/IMG_4347.JPG'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>17</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-8099934077984653989.post-9125589135842387038</id><published>2011-02-13T19:32:00.005+02:00</published><updated>2011-02-14T06:38:53.589+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='program'/><category scheme='http://www.blogger.com/atom/ns#' term='article'/><category scheme='http://www.blogger.com/atom/ns#' term='C++'/><title type='text'>Homemade regular expression matcher</title><content type='html'>Here we try to build a program (all the code is provided "as is" without any warranty including but not limited to mental, psychic and wrong teaching... in short, usual open source license) that matches a string against a regular expression over simplified alphabet: base characters can be English letters only and special characters allowed are '|' (&lt;i&gt;or&lt;/i&gt;), '*' (&lt;i&gt;closure&lt;/i&gt;) and &lt;i&gt;concatenation&lt;/i&gt; requires no character. All ideas given below can be found in &lt;i&gt;Compilers: Principles Techniques and Tools, 2nd edition&lt;/i&gt; by A. Aho, M.S. Lam, R. Sethi, J.D. Ullman.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;1. Basic Definitions and Notations&lt;br /&gt;We give this for consistency. Reader may just skip this section&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Token&lt;/i&gt; is a basic block of input. In our case tokens will be small letters of english alphabet and four &lt;i&gt;special tokens&lt;/i&gt;: &lt;b&gt;|&lt;/b&gt;, &lt;b&gt;*&lt;/b&gt;, &lt;b&gt;(&lt;/b&gt; and &lt;b&gt;)&lt;/b&gt;.&lt;br /&gt;&lt;i&gt;Alphabet&lt;/i&gt; is some set of tokens. Unless stated otherwise we assume all tokens are non-special.&lt;br /&gt;&lt;i&gt;String&lt;/i&gt; is a sequence of tokens.&lt;br /&gt;&lt;i&gt;Language over an alphabet&lt;/i&gt; is some set of strings, each sting consists of tokens from the alphabet.&lt;br /&gt;&lt;i&gt;Regular expressions language&lt;/i&gt; is a language RE over non-special and special tokens with the following properties:&lt;br /&gt;1) empty string is in RE,&lt;br /&gt;2) any string formed by a single non-special token is in RE.&lt;br /&gt;3) if &lt;b&gt;A&lt;/b&gt; and &lt;b&gt;B&lt;/b&gt; are in RE, then &lt;b&gt;AB&lt;/b&gt;, &lt;b&gt;A|B&lt;/b&gt;, &lt;b&gt;A*&lt;/b&gt; and &lt;b&gt;(A)&lt;/b&gt; are in RE too.&lt;br /&gt;&lt;i&gt;Regular expression&lt;/i&gt; is an element of language RE.&lt;br /&gt;&lt;i&gt;Regular language&lt;/i&gt; &lt;b&gt;L(regexp)&lt;/b&gt; determined by a regular expression &lt;b&gt;regexp&lt;/b&gt; is defined as follows:&lt;br /&gt;1) if &lt;b&gt;regexp&lt;/b&gt; is an empty string, then &lt;b&gt;L(regexp)&lt;/b&gt; contains only one string which is empty;&lt;br /&gt;2) if &lt;b&gt;regexp&lt;/b&gt; is a single token &lt;b&gt;T&lt;/b&gt;, then &lt;b&gt;L(regexp)&lt;/b&gt; has only one string &lt;b&gt;T&lt;/b&gt;;&lt;br /&gt;3) if &lt;b&gt;regexp&lt;/b&gt; is of form &lt;b&gt;AB&lt;/b&gt; where &lt;b&gt;A&lt;/b&gt; and &lt;b&gt;B&lt;/b&gt; are regular expressions, then &lt;b&gt;L(regexp)&lt;/b&gt; consists of such and only such strings which can be represented as a concatenation &lt;b&gt;XY&lt;/b&gt; of a string &lt;b&gt;X&lt;/b&gt; from &lt;b&gt;L(A)&lt;/b&gt; and a string &lt;b&gt;Y&lt;/b&gt; from &lt;b&gt;L(B)&lt;/b&gt;;&lt;br /&gt;4) if &lt;b&gt;regexp&lt;/b&gt; is of form &lt;b&gt;A|B&lt;/b&gt; where &lt;b&gt;A&lt;/b&gt; and &lt;b&gt;B&lt;/b&gt; are regular expressions, then &lt;b&gt;L(regexp)&lt;/b&gt; consists of all strings from &lt;b&gt;L(A)&lt;/b&gt; and &lt;b&gt;L(B)&lt;/b&gt;;&lt;br /&gt;5) if &lt;b&gt;regexp&lt;/b&gt; is of form &lt;b&gt;(A)&lt;/b&gt; where &lt;b&gt;A&lt;/b&gt; is a regular expressions, then &lt;b&gt;L(regexp)&lt;/b&gt; is exactly &lt;b&gt;L(A)&lt;/b&gt;;&lt;br /&gt;6) if &lt;b&gt;regexp&lt;/b&gt; is of form &lt;b&gt;A*&lt;/b&gt; where &lt;b&gt;A&lt;/b&gt; is a regular expressions, then &lt;b&gt;L(regexp)&lt;/b&gt; is a union of an empty string language, &lt;b&gt;L(A)&lt;/b&gt;, &lt;b&gt;L(AA)&lt;/b&gt;, &lt;b&gt;L(AAA)&lt;/b&gt;, and so on infinitely. This operation in language algebra is called Kleene closure.&lt;br /&gt;&lt;i&gt;Nondeterministic Finite Automata&lt;/i&gt; (NFA) is a triple &lt;b&gt;(S,A,T)&lt;/b&gt; where &lt;b&gt;S&lt;/b&gt; is a finite set of &lt;i&gt;states&lt;/i&gt;, &lt;b&gt;A&lt;/b&gt; is an alphabet which can contain a special &lt;b&gt;empty&lt;/b&gt; token and &lt;b&gt;T&lt;/b&gt; is a transitions function, which given a state and a token returns a new state. There should be chosen one &lt;i&gt;starting&lt;/i&gt; state and some positive number of &lt;i&gt;accepting&lt;/i&gt; states. NFA is usually depicted as a directed graph where nodes are states, and edges are labeled by tokens and edge labeled by some token &lt;b&gt;X&lt;/b&gt; goes from a state &lt;b&gt;Y&lt;/b&gt; to state &lt;b&gt;T(Y,X)&lt;/b&gt;. NFA is said to &lt;i&gt;accept&lt;/i&gt; a string &lt;b&gt;R&lt;/b&gt; if there is a directed path in the graph representing this NFA from the starting state to one of accepting states such that a string formed by edge labels with &lt;b&gt;empty&lt;/b&gt; labels removed is exactly the &lt;b&gt;R&lt;/b&gt;. Note, we can omit the requirement of &lt;b&gt;T&lt;/b&gt; to be a function by saying there is some special non-accepting state &lt;b&gt;Q&lt;/b&gt; such that all undefined transactions should go to &lt;b&gt;Q&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Theorem 1&lt;/b&gt; &lt;i&gt;For each regular expression &lt;b&gt;regexp&lt;/b&gt; there is an NFA that accepts strings from and only from &lt;b&gt;L(regexp)&lt;/b&gt;&lt;/i&gt;. Proof is done by induction on construction of &lt;b&gt;regexp&lt;/b&gt;: NFA accepting empty string and tokens are obvious, NFA for other cases can be made using NFAs from subexpression. We will give actual constructions in the real code below.&lt;br /&gt;The reverse (for every NFA there is a regular expression defining the same language) is also true. We will not need this fact later so we leave the proof for the reader.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Context-Free Grammar&lt;/i&gt; (CFG) is triple &lt;b&gt;(T,N,P)&lt;/b&gt; where &lt;b&gt;T&lt;/b&gt; is a set of &lt;i&gt;terminals&lt;/i&gt;, &lt;b&gt;N&lt;/b&gt; is a set of &lt;i&gt;nonterminals&lt;/i&gt;, and &lt;b&gt;P&lt;/b&gt; is a set of &lt;i&gt;productions&lt;/i&gt;. Each production has a &lt;i&gt;head&lt;/i&gt; which is always a nonterminal and a &lt;i&gt;body&lt;/i&gt; which is a sequence of terminals and nonterminals. One of the nonterminals is designated as a &lt;i&gt;start symbol&lt;/i&gt;.&lt;br /&gt;CFG defines a language in the following way. At first we have only start symbol. Then if there are some nonterminals one is picked and substituted by a body of some production having this nonterminal as a head. We stop when there are no nonterminals left. All strings which can be &lt;i&gt;derived&lt;/i&gt; in such a way form a language defined by this CFG.&lt;br /&gt;To write down a CFG we will use a notation similar to yacc(1)'s: on each line we write a nonterminal, then ":" and then all productions having this nonterminal as a head separated by "," (we choose comma instead of yacc(1)'s pipe because it is used as RE language token and comma is absolutely vacant), the special word 'epsilon' denotes an empty production body.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Parse Tree&lt;/i&gt; for a string &lt;b&gt;s&lt;/b&gt; derived using CFG &lt;b&gt;G&lt;/b&gt;'s productions is a rooted ordered tree with labels at its nodes: root is labeled by start symbol of &lt;b&gt;G&lt;/b&gt;; each inner node corresponds to a nonterminal substitution and is labeled by that nonterminal, it's childs are labeled by terminals and nonterminals from corresponding production body in the same order. All leafs are labeled by terminals and being traversed from left to right form the string &lt;b&gt;s&lt;/b&gt;.&lt;br /&gt;&lt;i&gt;Parsing&lt;/i&gt; is a process of constructing a parse tree from a sting.&lt;br /&gt;&lt;i&gt;Parser&lt;/i&gt; is a program capable of performing parsing.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;LL(1) parsing algorithm&lt;/i&gt; is defined as follows. We read an input string from left to right. To represent a parse state we remember current position in the string and we have a parse stack, initially holding only a start symbol of a CFG. At each step we pop a token from the top of the stack and if it is a terminal we check if it is the same as the current token in the input. If it's not then parse error occurs, otherwise we advance the input. If the token form the stack is nonterminal we choose a production which has this token as it's head and can derive a string starting with the current input terminal. By the definition of LL(1) parser there is exactly one such production, otherwise parse error occurs.&lt;br /&gt;&lt;i&gt;LL(1) context-free grammar&lt;/i&gt; is a CFG for which LL(1) parser exists.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;2. LL(1) Grammar for Regular Expressions&lt;br /&gt;&lt;br /&gt;There is a natural CFG for the language RE obtained from the definition of RE:&lt;br /&gt;&lt;pre&gt;S : E&lt;br /&gt;E : E|T , T&lt;br /&gt;T : TF , F&lt;br /&gt;F : F* , R&lt;br /&gt;R : (E), epsilon, L&lt;br /&gt;L : q,w,e,r,t,y,u,i,o,p,a,s,d,f,g,h,j,k,l,z,x,c,v,b,n,m&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Unfortunately this grammar is not LL(1) since choice between productions can't be done by looking at the current symbol only. We have to alter this grammar to make another CFG which would be LL(1) one. There are some techniques for this, namely &lt;i&gt;left recursion elimination&lt;/i&gt;, &lt;i&gt;left factoring&lt;/i&gt;, but we will not give them here. An LL(1) grammar for RE language got by the author is just presented without explanation.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;S  :  S'&lt;br /&gt;S' :  TT'&lt;br /&gt;T' :  |S' , epsilon&lt;br /&gt;T  :  PP'&lt;br /&gt;P' :  PP' , epsilon&lt;br /&gt;P  :  RR'&lt;br /&gt;R' :  *R' , epsilon&lt;br /&gt;R  :  (S') , epsilon , L&lt;br /&gt;L  :  q,w,e,r,t,y,u,i,o,p,a,s,d,f,g,h,j,k,l,z,x,c,v,b,n,m&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;3. Token Class&lt;br /&gt;&lt;br /&gt;Now that we have a grammar we can present the Token class. We have tokens for epsilon, grammar symbols and input characters.&lt;br /&gt;&lt;pre&gt;typedef enum {&lt;br /&gt;    TOKEN_EPSILON, /* empty token */&lt;br /&gt;    /* nonterminals of regexp LL(1) grammar */&lt;br /&gt;    TOKEN_START,   /* S  -&amp;gt; S' */&lt;br /&gt;    TOKEN_SP,      /* S' -&amp;gt; TT' */&lt;br /&gt;    TOKEN_TP,      /* T' -&amp;gt; |S'  or  epsilon */&lt;br /&gt;    TOKEN_T,       /* T  -&amp;gt; PP' */&lt;br /&gt;    TOKEN_PP,      /* P' -&amp;gt; PP'  or  epsilon */&lt;br /&gt;    TOKEN_P,       /* P  -&amp;gt; RR' */&lt;br /&gt;    TOKEN_RP,      /* R' -&amp;gt; *R'  or  epsilon */&lt;br /&gt;    TOKEN_R,       /* R  -&amp;gt; (S')  or  a,b,c,...,z  or  epsilon */&lt;br /&gt;    /*end nonterminals */&lt;br /&gt;    TOKEN_NORMAL   /* token for the input character */&lt;br /&gt;} TokenType;&lt;br /&gt;&lt;br /&gt;class Token {&lt;br /&gt;    TokenType type;  /* type of the token */&lt;br /&gt;    char normal;     /* character for TOKEN_NORMAL token */&lt;br /&gt;public:&lt;br /&gt;    /* create special token of specified type */&lt;br /&gt;    Token(TokenType t) : type(t) {}&lt;br /&gt;    /* create  token for the specified character */&lt;br /&gt;    Token(char n) : type(TOKEN_NORMAL), normal(n) {}&lt;br /&gt;&lt;br /&gt;    bool is_epsilon() const { return type == TOKEN_EPSILON; }&lt;br /&gt;    TokenType get_type() const { return type; }&lt;br /&gt;    char get_char() const { return normal; }&lt;br /&gt;    &lt;br /&gt;    /* this is used by std::map data structure. only tokens with&lt;br /&gt;     * TOKEN_NORMAL and TOKEN_EPSILON types held there */&lt;br /&gt;    bool operator &amp;lt;(const Token&amp; t) const { &lt;br /&gt;        if ((type == t.type) &amp;&amp; (type != TOKEN_EPSILON))&lt;br /&gt;            return normal &amp;lt; t.normal;&lt;br /&gt;        return type &amp;lt; t.type;&lt;br /&gt;    }&lt;br /&gt;    bool operator ==(const Token &amp;t) const {&lt;br /&gt;        if ((type == t.type) &amp;&amp; (type != TOKEN_EPSILON))&lt;br /&gt;            return normal == t.normal;&lt;br /&gt;        return type == t.type;&lt;br /&gt;    }&lt;br /&gt;};&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;4. NFA Class&lt;br /&gt;&lt;br /&gt;We start with a single NFA state. The class represents the state itself and contains all transitions from this state: endpoints for the same token are grouped in vectors and tokens are used as keys in a std::map data structure. State also has a temporary bit is_on used later by algorithms to mark visited states.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;class NFAState;&lt;br /&gt;&lt;br /&gt;typedef std::vector&amp;lt;NFAState*&amp;gt; Svector;&lt;br /&gt;typedef std::map&amp;lt;Token, Svector&amp;gt; ASmap;&lt;br /&gt;&lt;br /&gt;/* Single NFA state */&lt;br /&gt;class NFAState {&lt;br /&gt;    private:&lt;br /&gt;    ASmap moves;  /* outgoing edges of this state. */&lt;br /&gt;&lt;br /&gt;    public:&lt;br /&gt;    bool is_on; /* temporary bit to use by others */&lt;br /&gt;    NFAState() : is_on(false) {}&lt;br /&gt;    /* get vector of transitions for a token */&lt;br /&gt;    const Svector &amp; get_move(Token a) { return moves[a]; }&lt;br /&gt;    /* get all transitions for this state */&lt;br /&gt;    const ASmap &amp; get_all_moves() { return moves; }&lt;br /&gt;    /* add transition */&lt;br /&gt;    void add_move(Token a, NFAState *target) {&lt;br /&gt;        moves[a].push_back(target);&lt;br /&gt;    }&lt;br /&gt;    &lt;br /&gt;};&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Next is the whole NFA class. Since we only require one finish state in this program we do not support a general case by this class. However general case can be reduced to ours by simply adding a new finish state and epsilon transitions to it from old finish states. NFA class contains routines to construct a new NFA using existing NFAs. We said earlier that for every regular expression we can build a NFA accepting the same language by recursive on regular expression but we omitted actual constructs, now we give them in the code.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;/* Nondeterministic Finite Automata */&lt;br /&gt;class NFA {&lt;br /&gt;    public:&lt;br /&gt;    NFAState *start;   /* start state */&lt;br /&gt;    NFAState *finish;  /* finish state */&lt;br /&gt;&lt;br /&gt;    NFA(NFA&amp; nfa) : start(nfa.start), finish(nfa.finish) {}&lt;br /&gt;    NFA() {}&lt;br /&gt;&lt;br /&gt;    NFA&amp; operator=(const NFA &amp;nfa) {&lt;br /&gt;        this-&amp;gt;start = nfa.start;&lt;br /&gt;        this-&amp;gt;finish = nfa.finish;&lt;br /&gt;        return *this;&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    /* NFA which accepts a specified token */&lt;br /&gt;    void init_token(Token t) {&lt;br /&gt;        start = new NFAState;&lt;br /&gt;        finish = new NFAState;&lt;br /&gt;        start-&amp;gt;add_move(t, finish);&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    /* NFA which accepts a string either _s_ or _t_ accepts */&lt;br /&gt;    void init_or(NFA &amp;s, NFA &amp;t) {&lt;br /&gt;        start = new NFAState;&lt;br /&gt;        finish = new NFAState;&lt;br /&gt;        Token epsilon = Token(TOKEN_EPSILON);&lt;br /&gt;&lt;br /&gt;        start-&amp;gt;add_move(epsilon, s.start);&lt;br /&gt;        start-&amp;gt;add_move(epsilon, t.start);&lt;br /&gt;        s.finish-&amp;gt;add_move(epsilon, finish);&lt;br /&gt;        t.finish-&amp;gt;add_move(epsilon, finish);&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    /* NFA which accepts a concatenation of _s_ and _t_ accepted strings */&lt;br /&gt;    void init_concat(NFA &amp;s, NFA &amp;t) {&lt;br /&gt;        start = s.start;&lt;br /&gt;        finish = t.finish;&lt;br /&gt;        Token epsilon = Token(TOKEN_EPSILON);&lt;br /&gt;&lt;br /&gt;        s.finish-&amp;gt;add_move(epsilon, t.start);&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    /* NFA which accepts a closure of _s_ */&lt;br /&gt;    void init_asterisk(NFA &amp;s) {&lt;br /&gt;        start = new NFAState;&lt;br /&gt;        finish = new NFAState;&lt;br /&gt;        Token epsilon = Token(TOKEN_EPSILON);&lt;br /&gt;&lt;br /&gt;        start-&amp;gt;add_move(epsilon, finish);&lt;br /&gt;        start-&amp;gt;add_move(epsilon, s.start);&lt;br /&gt;        s.finish-&amp;gt;add_move(epsilon, finish);&lt;br /&gt;        s.finish-&amp;gt;add_move(epsilon, s.start);&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    /* states are allocated in heap, we need to free them explicitly */&lt;br /&gt;    void delete_states() {&lt;br /&gt;        Svector states, processing;&lt;br /&gt;        processing.push_back(start);&lt;br /&gt;&lt;br /&gt;        /* collect all states in the _states_ vector */&lt;br /&gt;        while (!processing.empty()) {&lt;br /&gt;            NFAState *p = processing.back();&lt;br /&gt;            processing.pop_back();&lt;br /&gt;            states.push_back(p);&lt;br /&gt;            ASmap all_moves = p-&amp;gt;get_all_moves();&lt;br /&gt;            for (ASmap::iterator it = all_moves.begin(), end = all_moves.end();&lt;br /&gt;                    it != end; ++it) {&lt;br /&gt;                Svector &amp;moves = (*it).second;&lt;br /&gt;                for (Svector::iterator iit = moves.begin(), eend = moves.end();&lt;br /&gt;                        iit &amp;lt; eend; ++iit) {&lt;br /&gt;                    if ((*iit)-&amp;gt;is_on) continue;&lt;br /&gt;                    (*iit)-&amp;gt;is_on = 1;&lt;br /&gt;                    processing.push_back(*iit);&lt;br /&gt;                }&lt;br /&gt;            }&lt;br /&gt;        }&lt;br /&gt;&lt;br /&gt;        for (Svector::iterator it = states.begin(), end = states.end();&lt;br /&gt;                it &amp;lt; end; ++it)&lt;br /&gt;            delete *it;&lt;br /&gt;    }&lt;br /&gt;};&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;5. NFA Emulation&lt;br /&gt;&lt;br /&gt;To run an NFA we first need to introduce a notion of a hyperstate. Since in NFA there can be multiple ways for a transition we need to simulate them all at once. &lt;i&gt;Hyperstate&lt;/i&gt; is a collection of NFA states the automata can be in for the current input. Hyperstate can be constructed with the help of &lt;i&gt;epsilon-closure&lt;/i&gt;: given a set of states, epsilon-closure is this set plus states were we can get by epsilon transitions from this set. So emulation is done as follows: initial hyperstate is the epsilon-closure of start state, each step the next hyperstate is the epsilon-closure of the set got by current token transactions from the current hyperstate. At the end of the input we check if the finish state is in current hyperstate. If it is, then NFA accepts the input, otherwise, rejects.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;class NFAHyperState {&lt;br /&gt;    Svector states;&lt;br /&gt;    private:&lt;br /&gt;    /* add all states from an NFAState with theirs's epsilon-clousure&lt;br /&gt;     * which are not already in hyperstate */&lt;br /&gt;    void add_closure(NFAState &amp;s) {&lt;br /&gt;        if (s.is_on) return;&lt;br /&gt;        std::queue&amp;lt;NFAState*&amp;gt; Q;&lt;br /&gt;        Q.push(&amp;s);&lt;br /&gt;        while (!Q.empty()) {&lt;br /&gt;            NFAState *p = Q.front();&lt;br /&gt;            Q.pop();&lt;br /&gt;            states.push_back(p);&lt;br /&gt;            p-&amp;gt;is_on = true;&lt;br /&gt;            Svector moves = p-&amp;gt;get_move(Token(TOKEN_EPSILON));&lt;br /&gt;            for (Svector::iterator it = moves.begin(), end = moves.end();&lt;br /&gt;                    it &amp;lt; end; ++it) {&lt;br /&gt;                if ((*it)-&amp;gt;is_on) continue;&lt;br /&gt;                Q.push(*it);&lt;br /&gt;            }&lt;br /&gt;        }&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    /* clear temporary markings in states. Execute after step is done */&lt;br /&gt;    void clear_marks() {&lt;br /&gt;        for (Svector::iterator it = states.begin(), end = states.end();&lt;br /&gt;                it &amp;lt; end; ++it) {&lt;br /&gt;            (*it)-&amp;gt;is_on = false;&lt;br /&gt;        }&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    public:&lt;br /&gt;    NFAHyperState(NFAState *s) {&lt;br /&gt;        /* add epsilon-closure of start state */&lt;br /&gt;        add_closure(*s);&lt;br /&gt;        clear_marks();&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    /* perform a single transition */&lt;br /&gt;    void move(Token a) {&lt;br /&gt;        Svector old; /* current hyperspace in this routine */&lt;br /&gt;        old.swap(states);&lt;br /&gt;        states.clear();&lt;br /&gt;        /* now current hyperstate is clear, fill it using moves from&lt;br /&gt;         * states in old hyperstate */&lt;br /&gt;        for (Svector::iterator it = old.begin(), end = old.end();&lt;br /&gt;                it &amp;lt; end; ++it) {&lt;br /&gt;            Svector moves = (*it)-&amp;gt;get_move(a);&lt;br /&gt;            for (Svector::iterator jt = moves.begin(), jend = moves.end();&lt;br /&gt;                    jt &amp;lt; jend; ++jt) {&lt;br /&gt;                add_closure(**jt);&lt;br /&gt;            }&lt;br /&gt;        }&lt;br /&gt;        clear_marks();&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    /* check if a specified NFA state is in this hyperstate */&lt;br /&gt;    bool is_in(NFAState *f) {&lt;br /&gt;        for (Svector::iterator it = states.begin(), end = states.end();&lt;br /&gt;                it &amp;lt; end; ++it) {&lt;br /&gt;            if ((*it) == f) return true;&lt;br /&gt;        }&lt;br /&gt;        return false;&lt;br /&gt;    }&lt;br /&gt;};&lt;br /&gt;&lt;br /&gt;/* NFA Emulator. */&lt;br /&gt;class NFAEmulator {&lt;br /&gt;    private:&lt;br /&gt;    NFAHyperState hyperstate;  /* all work is done here */&lt;br /&gt;    NFAState *finish_state;    /* NFA finish state */&lt;br /&gt;&lt;br /&gt;    public:&lt;br /&gt;    NFAEmulator(NFA &amp;nfa) : hyperstate(nfa.start), finish_state(nfa.finish) {}&lt;br /&gt;    /* hyperstate transition */&lt;br /&gt;    void move(Token a) { hyperstate.move(a); }&lt;br /&gt;    /* check if NFA accepts the input */&lt;br /&gt;    bool is_accepted() { return hyperstate.is_in(finish_state); }&lt;br /&gt;};&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;6. Regular Expression Parse Tree&lt;br /&gt;&lt;br /&gt;Now we return to our LL(1) grammar and present the parser. Since the grammar appears to be not very friendly to NFA construction operations, we have to build a parse tree first. After parse tree is constructed, build_nfa from its root can be called to construct the NFA for parsed regular expression. Routine build_nfa works by recursively generating NFA for the childs and then for the parent. Sometimes it pass an NFA generated by the one child to it's sibling. The reader is advised to draw a parse tree for some simple expression to see why this happens.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;/* grammar productions */&lt;br /&gt;typedef enum {&lt;br /&gt;    RULE_S_SP,     /* S  -&amp;gt; S' */&lt;br /&gt;    RULE_SP_TTP,   /* S' -&amp;gt; TT' */&lt;br /&gt;    RULE_TP_SP,    /* T' -&amp;gt; S' */&lt;br /&gt;    RULE_TP_E,     /* T' -&amp;gt; epsilon */&lt;br /&gt;    RULE_T_PPP,    /* T  -&amp;gt; PP' */&lt;br /&gt;    RULE_PP_PPP,   /* P' -&amp;gt; PP' */&lt;br /&gt;    RULE_PP_E,     /* P' -&amp;gt; epsilon */&lt;br /&gt;    RULE_P_RRP,    /* P  -&amp;gt; RR' */&lt;br /&gt;    RULE_RP_ARP,   /* R' -&amp;gt; *R' */&lt;br /&gt;    RULE_RP_E,     /* R' -&amp;gt; epsilon */&lt;br /&gt;    RULE_R_E,      /* R  -&amp;gt; epsilon */&lt;br /&gt;    RULE_R_LETTER, /* R  -&amp;gt; L */&lt;br /&gt;    RULE_R_BSPB,   /* R  -&amp;gt; (S') */&lt;br /&gt;    RULE_LETTER,   /* L  -&amp;gt; letter */&lt;br /&gt;} GrammarRule;&lt;br /&gt;&lt;br /&gt;class ParseTreeNode;&lt;br /&gt;&lt;br /&gt;typedef std::vector&amp;lt;ParseTreeNode*&amp;gt; Nvector;&lt;br /&gt;&lt;br /&gt;/* Node of a Parse tree */&lt;br /&gt;class ParseTreeNode {&lt;br /&gt;    GrammarRule type;       /* grammar production corresponding to this node */&lt;br /&gt;    Nvector childs;         /* child nodes */&lt;br /&gt;    ParseTreeNode *parent;  /* parend node */&lt;br /&gt;    char letter;            /* character for RULE_LETTER type */&lt;br /&gt;    &lt;br /&gt;    /* nfa corresponding to this node. When constructing a prse tree this is&lt;br /&gt;     * used as a parameter to _inherit_ nfa form parent or siblings */&lt;br /&gt;    NFA nfa;&lt;br /&gt;&lt;br /&gt;    public:&lt;br /&gt;    /* node for a new parse tree with the starting production */&lt;br /&gt;    ParseTreeNode() : type(RULE_S_SP) {}&lt;br /&gt;    /* new node for existing parse tree */&lt;br /&gt;    ParseTreeNode(ParseTreeNode &amp;p) : parent(&amp;p) {&lt;br /&gt;        p.childs.push_back(this);&lt;br /&gt;    }&lt;br /&gt;    void set_type(GrammarRule t) { type = t; }&lt;br /&gt;    void set_letter(char c) { letter = c; }&lt;br /&gt;    ~ParseTreeNode() {&lt;br /&gt;        for (Nvector::iterator it=childs.begin(),end=childs.end();&lt;br /&gt;                it&amp;lt;end; ++it)&lt;br /&gt;            delete *it;&lt;br /&gt;        childs.clear();&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    /* build NFA by traversing a parse tree&lt;br /&gt;     * should be called from the root of the parse tree */&lt;br /&gt;    NFA &amp; build_nfa() {&lt;br /&gt;        NFA new_nfa; /* temporary NFA */&lt;br /&gt;        switch(type) {&lt;br /&gt;        case RULE_S_SP: /* S -&amp;gt; S' */&lt;br /&gt;            /* NFA for this node is just the NFA for it's child */&lt;br /&gt;            nfa = childs[0]-&amp;gt;build_nfa();&lt;br /&gt;            break;&lt;br /&gt;        case RULE_SP_TTP: /* S' -&amp;gt; TT' */&lt;br /&gt;            /* construct NFA for T-child and pass it to the T'-child */&lt;br /&gt;            childs[1]-&amp;gt;nfa = childs[0]-&amp;gt;build_nfa();&lt;br /&gt;            /* NFA for this node is the same as it's T-child's */&lt;br /&gt;            nfa = childs[1]-&amp;gt;build_nfa();&lt;br /&gt;            break;&lt;br /&gt;        case RULE_TP_SP: /* T' -&amp;gt; |S' */&lt;br /&gt;            /* NFA is OR of inherited nfa and S'-child's NFA */&lt;br /&gt;            new_nfa.init_or(nfa, childs[1]-&amp;gt;build_nfa());&lt;br /&gt;            nfa = new_nfa;&lt;br /&gt;            break;&lt;br /&gt;        case RULE_TP_E:&lt;br /&gt;            /* NFA is right the inherited nfa, do nothing */&lt;br /&gt;            break;&lt;br /&gt;        case RULE_T_PPP: /* T -&amp;gt; PP' */&lt;br /&gt;            /* construct NFA for P-child, pass it to P'-child */&lt;br /&gt;            childs[1]-&amp;gt;nfa = childs[0]-&amp;gt;build_nfa();&lt;br /&gt;            /* NFA is P'-child's nfa */&lt;br /&gt;            nfa = childs[1]-&amp;gt;build_nfa();&lt;br /&gt;            break;&lt;br /&gt;        case RULE_PP_PPP: /* P' -&amp;gt; PP' */&lt;br /&gt;            /* construct concatenation NFA of inherited nfa and P-child's */&lt;br /&gt;            new_nfa.init_concat(nfa, childs[0]-&amp;gt;build_nfa());&lt;br /&gt;            /* pass new_nfa to P'-childs */&lt;br /&gt;            childs[1]-&amp;gt;nfa = new_nfa;&lt;br /&gt;            /* NFA is P'-child's NFA */&lt;br /&gt;            nfa = childs[1]-&amp;gt;build_nfa();&lt;br /&gt;            break;&lt;br /&gt;        case RULE_PP_E:&lt;br /&gt;            /* NFA is inherited nfa, do nothing */&lt;br /&gt;            break;&lt;br /&gt;        case RULE_P_RRP:  /* P -&amp;gt; RR' */&lt;br /&gt;            /* construct NFA for R-child and pass it to R'-child */&lt;br /&gt;            childs[1]-&amp;gt;nfa = childs[0]-&amp;gt;build_nfa();&lt;br /&gt;            /* NFA is R'-child's NFA */&lt;br /&gt;            nfa = childs[1]-&amp;gt;build_nfa();&lt;br /&gt;            break;&lt;br /&gt;        case RULE_RP_ARP: /* R' -&amp;gt; *R' */&lt;br /&gt;            /* construct asterisk NFA from inherited nfa, pass it to child */&lt;br /&gt;            childs[1]-&amp;gt;nfa.init_asterisk(nfa);&lt;br /&gt;            /* NFA is R'-child's nfa */&lt;br /&gt;            nfa = childs[1]-&amp;gt;build_nfa();&lt;br /&gt;            break;&lt;br /&gt;        case RULE_RP_E:&lt;br /&gt;            /* NFA is inherited nfa, do nothing */&lt;br /&gt;            break;&lt;br /&gt;        case RULE_R_E:&lt;br /&gt;            /* NFA is accepting only epsilon token*/&lt;br /&gt;            nfa.init_token(Token(TOKEN_EPSILON));&lt;br /&gt;            break;&lt;br /&gt;        case RULE_R_LETTER: /* R -&amp;gt; L */&lt;br /&gt;            /* NFA is child's NFA */&lt;br /&gt;            nfa = childs[0]-&amp;gt;build_nfa();&lt;br /&gt;            break;&lt;br /&gt;        case RULE_R_BSPB:&lt;br /&gt;            /* NFA is S'-child's NFA */&lt;br /&gt;            nfa = childs[1]-&amp;gt;build_nfa();&lt;br /&gt;            break;&lt;br /&gt;        case RULE_LETTER: /* letter -&amp;gt; a|...|z */&lt;br /&gt;            /* NFA is accepting only letter token */&lt;br /&gt;            nfa.init_token(Token(letter));&lt;br /&gt;            break;&lt;br /&gt;        }&lt;br /&gt;        return nfa;&lt;br /&gt;    }&lt;br /&gt;};&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;7. Regular Expression Parser&lt;br /&gt;&lt;br /&gt;Now we can finally present the parser itself. LL(1) parsing algorithm has been discussed above, the only exception is that parse stack now holds not only tokens but parse tree nodes too.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;typedef std::pair&amp;lt;Token, ParseTreeNode*&amp;gt; TNpair;&lt;br /&gt;typedef std::vector&amp;lt;TNpair&amp;gt; TNvector;&lt;br /&gt;&lt;br /&gt;/* Regular Expression Parser */&lt;br /&gt;class REParse {&lt;br /&gt;    std::string regexp;         /* regular expression to parse */&lt;br /&gt;    unsigned int position;      /* current position in regexp */&lt;br /&gt;    TNvector parse_stack;       /* LL(1) parse stack */&lt;br /&gt;    ParseTreeNode *parse_tree;  /* parse tree */&lt;br /&gt;&lt;br /&gt;    /* push token-node pair on a parse stack */&lt;br /&gt;    void push_parse_state(Token &amp;t, ParseTreeNode *node) {&lt;br /&gt;        parse_stack.push_back(TNpair(t, node));&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    /* grammar production for a letter terminal in a body */&lt;br /&gt;    void production(ParseTreeNode &amp;node, GrammarRule rule, char letter) {&lt;br /&gt;        node.set_type(rule);&lt;br /&gt;        node.set_letter(letter);&lt;br /&gt;    }&lt;br /&gt;    /* grammar production with empty body */&lt;br /&gt;    void production(ParseTreeNode &amp;node, GrammarRule rule) {&lt;br /&gt;        node.set_type(rule);&lt;br /&gt;    }&lt;br /&gt;    /* grammar production with one nonterminal in a body */&lt;br /&gt;    void production(ParseTreeNode &amp;node, GrammarRule rule, Token t1) {&lt;br /&gt;        ParseTreeNode *c1 = new ParseTreeNode(node);&lt;br /&gt;        push_parse_state(t1, c1);&lt;br /&gt;        node.set_type(rule);&lt;br /&gt;    }&lt;br /&gt;    /* grammar production with two nonterminals in a body */&lt;br /&gt;    void production(ParseTreeNode &amp;node, GrammarRule rule, Token t1, Token t2) {&lt;br /&gt;        ParseTreeNode *c1 = new ParseTreeNode(node);&lt;br /&gt;        ParseTreeNode *c2 = new ParseTreeNode(node);&lt;br /&gt;        push_parse_state(t2, c2);&lt;br /&gt;        push_parse_state(t1, c1);&lt;br /&gt;        node.set_type(rule);&lt;br /&gt;    }&lt;br /&gt;    /* grammar production with three nonterminals in a body */&lt;br /&gt;    void production(ParseTreeNode &amp;node, GrammarRule rule, Token t1, Token t2, Token t3) {&lt;br /&gt;        ParseTreeNode *c1 = new ParseTreeNode(node);&lt;br /&gt;        ParseTreeNode *c2 = new ParseTreeNode(node);&lt;br /&gt;        ParseTreeNode *c3 = new ParseTreeNode(node);&lt;br /&gt;        push_parse_state(t3, c3);&lt;br /&gt;        push_parse_state(t2, c2);&lt;br /&gt;        push_parse_state(t1, c1);&lt;br /&gt;        node.set_type(rule);&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    /* can current character be the beginning of a string derieved from P' */&lt;br /&gt;    bool is_first_PP(char c) const { return (isalpha(c) || (c == '(')); }&lt;br /&gt;    /* can current character be the beginning of a string derieved from T' */&lt;br /&gt;    bool is_first_TP(char c) const { return (is_first_PP(c) || (c == '|')); }&lt;br /&gt;&lt;br /&gt;    /* A step of LL(1) parsing: process a token from the top of the stack */&lt;br /&gt;    void process_token(Token &amp;t, ParseTreeNode &amp;node) {&lt;br /&gt;#define T(x) Token(x)&lt;br /&gt;        char lookup = regexp[position];&lt;br /&gt;        switch (t.get_type()) {&lt;br /&gt;        case TOKEN_START:&lt;br /&gt;            production(node, RULE_S_SP, T(TOKEN_SP)); break;&lt;br /&gt;        case TOKEN_SP:&lt;br /&gt;            production(node, RULE_SP_TTP, T(TOKEN_T), T(TOKEN_TP)); break;&lt;br /&gt;        case TOKEN_T:&lt;br /&gt;            production(node, RULE_T_PPP, T(TOKEN_P), T(TOKEN_PP)); break;&lt;br /&gt;        case TOKEN_P:&lt;br /&gt;            production(node, RULE_P_RRP, T(TOKEN_R), T(TOKEN_RP)); break;&lt;br /&gt;        case TOKEN_R:&lt;br /&gt;            if (lookup == '(')&lt;br /&gt;                production(node, RULE_R_BSPB, T('('), T(TOKEN_SP), T(')'));&lt;br /&gt;            else if (isalpha(lookup))&lt;br /&gt;                production(node, RULE_R_LETTER, T(lookup));&lt;br /&gt;            else&lt;br /&gt;                production(node, RULE_R_E);&lt;br /&gt;            break;&lt;br /&gt;        case TOKEN_TP:&lt;br /&gt;            if (lookup == '|')&lt;br /&gt;                production(node, RULE_TP_SP, T('|'), T(TOKEN_SP));&lt;br /&gt;            else&lt;br /&gt;                production(node, RULE_TP_E);&lt;br /&gt;            break;&lt;br /&gt;        case TOKEN_PP:&lt;br /&gt;            if (is_first_PP(lookup))&lt;br /&gt;                production(node, RULE_PP_PPP, T(TOKEN_P), T(TOKEN_PP));&lt;br /&gt;            else&lt;br /&gt;                production(node, RULE_PP_E);&lt;br /&gt;            break;&lt;br /&gt;        case TOKEN_RP:&lt;br /&gt;            if (lookup == '*')&lt;br /&gt;                production(node, RULE_RP_ARP, T('*'), T(TOKEN_RP));&lt;br /&gt;            else&lt;br /&gt;                production(node, RULE_RP_E);&lt;br /&gt;            break;&lt;br /&gt;        case TOKEN_NORMAL:&lt;br /&gt;            /* check if input doesn't finish unexpectedly */&lt;br /&gt;            if (position &amp;gt;= regexp.length()) {&lt;br /&gt;                std::cerr &amp;lt;&amp;lt; "Parse error: went past input" &amp;lt;&amp;lt; std::endl;&lt;br /&gt;                throw 10; /* ABORT */&lt;br /&gt;            }&lt;br /&gt;            /* this production can be used for passing throwgh&lt;br /&gt;             * special tokens, in this case we have to check */&lt;br /&gt;            if (lookup != t.get_char()) {&lt;br /&gt;                std::cerr &amp;lt;&amp;lt; "Parse error: expected '" &amp;lt;&amp;lt; t.get_char()&lt;br /&gt;                    &amp;lt;&amp;lt; "' got '" &amp;lt;&amp;lt; lookup &amp;lt;&amp;lt; "'" &amp;lt;&amp;lt; std::endl;&lt;br /&gt;                throw 20; /* ABORT */&lt;br /&gt;            }&lt;br /&gt;            production(node, RULE_LETTER, lookup);&lt;br /&gt;            ++position; /* all right, advance input position */&lt;br /&gt;            break;&lt;br /&gt;        default:&lt;br /&gt;            std::cerr &amp;lt;&amp;lt; "Parse error" &amp;lt;&amp;lt; std::endl;&lt;br /&gt;            throw 30; /* ABORT */&lt;br /&gt;        }&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    public:&lt;br /&gt;    REParse(std::string re) : regexp(re), position(0) {&lt;br /&gt;        /* initialize parse tree and stack */&lt;br /&gt;        parse_tree = new ParseTreeNode();&lt;br /&gt;        parse_stack.push_back(TNpair(Token(TOKEN_START), parse_tree));&lt;br /&gt;        /* LL(1) parsing cycle */&lt;br /&gt;        while (!parse_stack.empty()) {&lt;br /&gt;            TNpair&amp; tnpair = parse_stack.back();&lt;br /&gt;            parse_stack.pop_back();&lt;br /&gt;            process_token(tnpair.first, *tnpair.second);&lt;br /&gt;        }&lt;br /&gt;        /* check if we weren't able to process all the input regexp */&lt;br /&gt;        if (position != regexp.length()) {&lt;br /&gt;            std::cerr &amp;lt;&amp;lt; "Parse error: finished early" &amp;lt;&amp;lt; std::endl;&lt;br /&gt;            throw 40; /* ABORT */&lt;br /&gt;        }&lt;br /&gt;&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    /* get parse tree */&lt;br /&gt;    ParseTreeNode * get_parse_tree() const { return parse_tree; }&lt;br /&gt;&lt;br /&gt;    ~REParse() {&lt;br /&gt;        delete parse_tree;&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;};&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;8. Epilogue&lt;br /&gt;&lt;br /&gt;We presented all the parts to build a regular expression matcher. First we need to parse a regular expression alongside with building a parse tree, next we build an NFA from a parse tree, then we emulate NFA and finally check if NFA accepts the input string. As a final example here is the main() function of our program. It reads two string: first one is a regular expression, second is a matched string. It outputs "matched" if the input string appears to be in the language defined by the regular expression, otherwise the program outputs "not matched"&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;int main() {&lt;br /&gt;    std::string re;&lt;br /&gt;    std::cin &amp;gt;&amp;gt; re;  /* read regular expression */&lt;br /&gt;    REParse parser(re);  /* parse regular expression */&lt;br /&gt;    NFA nfa = parser.get_parse_tree()-&amp;gt;build_nfa(); /* build NFA */&lt;br /&gt;    NFAEmulator nfaemu(nfa); /* Initialize NFA emulator */&lt;br /&gt;&lt;br /&gt;    std::string match;&lt;br /&gt;    std::cin &amp;gt;&amp;gt; match; /* read input string */&lt;br /&gt;&lt;br /&gt;    /* process string */&lt;br /&gt;    for (unsigned int i=0; i &amp;lt; match.length(); ++i)&lt;br /&gt;        nfaemu.move(match[i]);&lt;br /&gt;    if (nfaemu.is_accepted()) std::cout &amp;lt;&amp;lt; "matched";&lt;br /&gt;    else std::cout &amp;lt;&amp;lt; "not matched";&lt;br /&gt;    std::cout &amp;lt;&amp;lt; std::endl;&lt;br /&gt;&lt;br /&gt;    /* free states allocated in heap */&lt;br /&gt;    nfa.delete_states();&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;9. Further Reading&lt;br /&gt;&lt;br /&gt;Aforementioned &lt;i&gt;Compilers: Principles, Techniques and Tools&lt;/i&gt; contains every idea presented here. &lt;i&gt;Introduction to Automata Theory, Languages and Computation&lt;/i&gt; by Hopcroft, Motwani and Ullman explains the mathematics behind automatas and conext-free grammars. Finally, &lt;a href="http://swtch.com/~rsc/regexp"&gt;this&lt;/a&gt; is a very good online resource about regular expressions in practice with parsing done by Dijkstra's shunting yard algorithm and reverse polish notation processing which is much more simpler than LL(1) parser presented here.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8099934077984653989-9125589135842387038?l=savrus.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://savrus.blogspot.com/feeds/9125589135842387038/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8099934077984653989&amp;postID=9125589135842387038' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/9125589135842387038'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/9125589135842387038'/><link rel='alternate' type='text/html' href='http://savrus.blogspot.com/2011/02/homemade-regular-expression-matcher.html' title='Homemade regular expression matcher'/><author><name>savrus</name><uri>http://www.blogger.com/profile/15866109971001922680</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://3.bp.blogspot.com/_a0-3CVBSqgA/Sfd_lpyrebI/AAAAAAAAAG4/1B-3_qAgrdM/S220/IMG_4347.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8099934077984653989.post-6454598098951162655</id><published>2010-12-25T09:46:00.002+02:00</published><updated>2010-12-25T21:55:34.464+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sql'/><category scheme='http://www.blogger.com/atom/ns#' term='uguu'/><title type='text'>updatable SQL table with external ordering</title><content type='html'>Nearly half a year a crucial bug was found in uguu. It was fixed that time and it haven't bothered us since then. I've always wanted to write about it here.&lt;br /&gt;&lt;br /&gt;We store all files in a single SQL table. Each row has a path_id, and a file_id. The first is same for all files from the same directory. The second represents a (lexicographical) order on files. We assign file ids sequentially without a gap. While updating contents of a share we need to insert/delete some rows in the files table preserving the sequential order.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;First solution.&lt;br /&gt;&lt;br /&gt;To insert a row with file id N, increment ids for rows with ids &gt;= N&lt;br /&gt;&lt;pre&gt;UPDATE files SET file_id = file_id + 1 WHERE file_id &gt;= N AND path_id = some_path&lt;/pre&gt;then insert the new row with file_id N.&lt;br /&gt;&lt;br /&gt;Deletion is performed analogously:&lt;br /&gt;&lt;pre&gt;DELETE FROM files WHERE file_id = N AND path_id = some_path&lt;br /&gt;UPDATE files SET file_id = file_id - 1 WHERE file_id &gt; N AND path_id = some_path&lt;/pre&gt;&lt;br /&gt;Clearly, for a directory with M files addition of K new files with ids 1,2....,K can result in O(M*K^2) time. And which is more frustrating if all changes are done in a single transaction, the amount of additional space can rise dramatically for Postgresql 8.4. The same is true for deletion.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Linear solution.&lt;br /&gt;&lt;br /&gt;Work is done in two passes. In the first pass removed files are deleted from the table and new files are inserted into a new temporary table. In the second pass file ids are being corrected in a single loop with the help of postgresql cursors. We make loop over rows from old table and use cursor to point to current entry in the new table. After the second pass gaps in file ids in the old table are exactly ids of new files (in a temporary table). Finally, we insert everything from temporary table into old one. For an implementation of the second pass see function push_path_files &lt;a href="http://code.google.com/p/uguu/source/browse/bin/dbinit.py?r=80b508b617dec8d982d6996e82606266295539e5#210"&gt;here&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8099934077984653989-6454598098951162655?l=savrus.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://savrus.blogspot.com/feeds/6454598098951162655/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8099934077984653989&amp;postID=6454598098951162655' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/6454598098951162655'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/6454598098951162655'/><link rel='alternate' type='text/html' href='http://savrus.blogspot.com/2010/12/updatable-sql-table-with-external.html' title='updatable SQL table with external ordering'/><author><name>savrus</name><uri>http://www.blogger.com/profile/15866109971001922680</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://3.bp.blogspot.com/_a0-3CVBSqgA/Sfd_lpyrebI/AAAAAAAAAG4/1B-3_qAgrdM/S220/IMG_4347.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8099934077984653989.post-521544492464972258</id><published>2010-11-11T01:19:00.003+02:00</published><updated>2010-11-12T17:40:07.275+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='binary translation'/><category scheme='http://www.blogger.com/atom/ns#' term='article'/><title type='text'>using template translator in binary translation system</title><content type='html'>Our bosses at work asked us to give a talk about template translator (rather simple binary translation layer) at the &lt;a href="http://2010.it-edu.ru"&gt;conference&lt;/a&gt;. &lt;a href="http://savrus-home.appspot.com/data/itedu10.pdf"&gt;Text&lt;/a&gt; and &lt;a href="http://savrus-home.appspot.com/data/itedu10-slides.pdf"&gt;slides&lt;/a&gt; are now available (in Russian).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8099934077984653989-521544492464972258?l=savrus.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://savrus.blogspot.com/feeds/521544492464972258/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8099934077984653989&amp;postID=521544492464972258' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/521544492464972258'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/521544492464972258'/><link rel='alternate' type='text/html' href='http://savrus.blogspot.com/2010/11/using-template-translator-in-binary.html' title='using template translator in binary translation system'/><author><name>savrus</name><uri>http://www.blogger.com/profile/15866109971001922680</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://3.bp.blogspot.com/_a0-3CVBSqgA/Sfd_lpyrebI/AAAAAAAAAG4/1B-3_qAgrdM/S220/IMG_4347.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8099934077984653989.post-7479889120533892134</id><published>2010-08-23T12:19:00.009+03:00</published><updated>2010-08-26T09:03:49.412+03:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='c'/><category scheme='http://www.blogger.com/atom/ns#' term='patch'/><title type='text'>a contribution</title><content type='html'>It seems a little patch I sent to the btpd project more than a year ago is finally accepted: &lt;a href="http://github.com/btpd/btpd/commit/79641da4f277dced59904f76f2dec4c5b4fdcb3c"&gt;commit&lt;/a&gt;, &lt;a href="http://lists.stargirl.org/pipermail/btpd-users/2009-February/000449.html"&gt;original message&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8099934077984653989-7479889120533892134?l=savrus.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://savrus.blogspot.com/feeds/7479889120533892134/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8099934077984653989&amp;postID=7479889120533892134' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/7479889120533892134'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/7479889120533892134'/><link rel='alternate' type='text/html' href='http://savrus.blogspot.com/2010/08/it-seems-little-patch-ive-sent-to-btpd.html' title='a contribution'/><author><name>savrus</name><uri>http://www.blogger.com/profile/15866109971001922680</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://3.bp.blogspot.com/_a0-3CVBSqgA/Sfd_lpyrebI/AAAAAAAAAG4/1B-3_qAgrdM/S220/IMG_4347.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8099934077984653989.post-3400800851884716301</id><published>2010-04-22T18:30:00.049+03:00</published><updated>2010-05-11T15:49:27.247+03:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='program'/><category scheme='http://www.blogger.com/atom/ns#' term='uguu'/><category scheme='http://www.blogger.com/atom/ns#' term='c'/><title type='text'>Evolution of uguu scanning process</title><content type='html'>Scanning is very essential part of a search engine. It is about how database is filled and updated. You can't search if nothing was scanned before. In this post I'd like to cover how scanning in &lt;a href="http://code.google.com/p/uguu"&gt;uguu&lt;/a&gt; is implemented and how it has evolved over time.&lt;br /&gt;&lt;br /&gt;From the very start it was decided that scanning should be split between two programs. The first one connects to a share, somehow constructs a list of contents of the share and prints it to the standard output. The second one is protocol-independent script which reads contents from the standard input and executes a sequence of SQL commands to insert contents into the database. This sounded very reasonable that time since I had already written &lt;span style="font-style:italic;"&gt;smbscan&lt;/span&gt; in C and I didn't want to deal with SQL in C. Python appeared to be easier for this task. Also, simple text output helps a lot with testing and debugging.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;smbscan&lt;/span&gt; itself also consisted of protocol dependent part (connecting to a share, go up, go to subdir, readdir) and independent one (constructing file tree). When Radist (the other one &lt;span style="font-style:italic;"&gt;uguu&lt;/span&gt; developer) wanted to port his ftp scanner to &lt;i&gt;uguu&lt;/i&gt; I asked him to use tree constructor from &lt;span style="font-style:italic;"&gt;smbscan&lt;/span&gt; so we could change scanner output format more easily if necessary. Later this common part evolved into &lt;span style="font-style:italic;"&gt;libuguu&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;Now it's time to turn back to the scanning process. At first we thought that when time to scan a share comes all contents of the share should be dropped from the database and the share itself should be rescanned by protocol-dependent scanner having it's output read by the database updating script. If some error occurs SQL transaction rollback is executed and scanning process is marked as failed. If the scanner report success then changes are committed to the database.  All described process is done inside a single SQL transaction so there is no need to care about what our users see in the middle of scanning process - they see old version until new one is committed.&lt;br /&gt;&lt;br /&gt;When we first launched &lt;span style="font-style:italic;"&gt;uguu&lt;/span&gt; it appeared scanning was very slow. Some optimizations were required. First idea came to my mind was not to perform database updating if contents of a share were unchanged. I thought to execute diff on scanner outputs saved to files, but Radist suggested comparing only security hash digests. We started to keep SHA1 digest of the scanner output in the database and checked against it before each update. Since many shares don't change their contents often &lt;span style="font-style:italic;"&gt;uguu&lt;/span&gt; gained a big speedup here.&lt;br /&gt;&lt;br /&gt;By that time I thought "If it were possible to diff old and new scanner outputs and commit only actual changes...". But scanning speed wasn't the biggest problem anymore - we needed to improve search time next. At the beginning of April database schema change was committed and scanning became major issue again.&lt;br /&gt;&lt;br /&gt;Next long-awaited feature was automatic recursion detection in the tree constructor. The source of the problem is that filesystem links are not always detectable by the client. Suppose you have a directory A and you create a symbolic link B in it pointed to A. Then SMB protocol client would see that B is just a plain directory and it has a subdirectory B and so on infinitely. If A has a huge subtree then it is copied each time SMB client enters B resulting very huge scanner output and of course big database workload. Solution was to calculate MD5 digest for each directory and compare against it's ancestors. If a match occurs then directory parent's MD5 is compared to that ancestor's MD5. Recursion is said to be detected if along the path to the root we find a chain of directories which MD5s are the same as of such chain moved to the end of the path (current directory) and overall number of items in that chain's directories is bigger than a certain threshold. The latter condition provides a scale for a chain size: from 1 if linked directory is large to the threshold if there is a directory with only one child - the link itself.&lt;br /&gt;&lt;br /&gt;Recursion detection is very useful feature if there are shares with recursive links. But there are very few such shares in our environment.&lt;br /&gt;&lt;br /&gt;The next idea was quite simple to implement: if two shares have the same SHA1 digest then keep only one copy of contents in the database. This was supposed to reduce amount of files in the database for hosts with same contents via both SMB and FTP. But in reality this wasn't very helpful since some changes can occur between SMB and FTP scanning and which is more frustrating when a filename is not good enough it is shown differently via SMB and FTP. The latter argument kicks the whole idea to the ground.&lt;br /&gt;&lt;br /&gt;So the only last hope remained - to construct a diff against old contents. Compared to previous work this was a big task with a huge amount of new code in &lt;span style="font-style:italic;"&gt;libuguu&lt;/span&gt; but it is finally done and currently is being tested on our installation.&lt;br /&gt;&lt;br /&gt;Tree diffing is very straightforward: first the whole old tree is reconstructed from a file, then while constructing a new tree using protocol-dependent walker, tree constructor compares new contents against old ones. If a new directory is added (respectively an old one is deleted) tree constructor prints the whole subtree with prefix '+' (respectively '-'). Same goes for files. If only size or number of items in a directory (or just size in case of a file) has been changed then only the directory (file) is printed with '*' modifier. Strictly speaking, output is a bit more complex to help further processing by the database updating script.&lt;br /&gt;&lt;br /&gt;Dullness of the algorithm can be viewed in a situation when just some directory's name has been changed and that directory has a huge subtree. Then two trees will be presented in diff (one with '-' prefix and the other with '+'). However in &lt;span style="font-style:italic;"&gt;uguu&lt;/span&gt; we keep tsvector of entire path for each file to allow searching in full paths. So we would have to update the entire subtree anyway.&lt;br /&gt;&lt;br /&gt;Now &lt;span style="font-style:italic;"&gt;uguu&lt;/span&gt; has scanners for FTP, SMB and WebDAV (the latter was written last weekend and is still quite unreliable). All of them use the tree constructor from &lt;span style="font-style:italic;"&gt;libuguu&lt;/span&gt; and thus have all the features described above.&lt;br /&gt;&lt;br /&gt;PS. Thanks Radist for pointing out some errors.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8099934077984653989-3400800851884716301?l=savrus.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://savrus.blogspot.com/feeds/3400800851884716301/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8099934077984653989&amp;postID=3400800851884716301' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/3400800851884716301'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/3400800851884716301'/><link rel='alternate' type='text/html' href='http://savrus.blogspot.com/2010/04/evolution-of-uguu-scanning-process.html' title='Evolution of uguu scanning process'/><author><name>savrus</name><uri>http://www.blogger.com/profile/15866109971001922680</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://3.bp.blogspot.com/_a0-3CVBSqgA/Sfd_lpyrebI/AAAAAAAAAG4/1B-3_qAgrdM/S220/IMG_4347.JPG'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8099934077984653989.post-796751619689356808</id><published>2010-02-08T00:00:00.011+02:00</published><updated>2010-02-08T11:20:36.675+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='program'/><category scheme='http://www.blogger.com/atom/ns#' term='uguu'/><category scheme='http://www.blogger.com/atom/ns#' term='python'/><title type='text'>simple port scanner</title><content type='html'>Strange, but well-known scanning tool nmap has a quite aggressive license. It is in fact GNU GPL Version 2 but with some exceptions. Among others there is one which may force you not to use nmap in your scripts:&lt;br /&gt;&lt;pre&gt; * Note that the GPL places important restrictions on "derived works", yet *&lt;br /&gt; * it does not provide a detailed definition of that term.  To avoid       *&lt;br /&gt; * misunderstandings, we consider an application to constitute a           *&lt;br /&gt; * "derivative work" for the purpose of this license if it does any of the *&lt;br /&gt; * following:                                                              *&lt;br /&gt;...&lt;br /&gt; * o Executes Nmap and parses the results (as opposed to typical shell or  *&lt;br /&gt; *   execution-menu apps, which simply display raw Nmap output and so are  *&lt;br /&gt; *   not derivative works.)                                                * &lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;This is very confusing and I've decided to write my own port scanner for &lt;a href="http://code.google.com/p/uguu/"&gt;uguu&lt;/a&gt; which is BSD-licensed. Needless to say, we planned to use nmap originally. With having our own tool we strike several goals at once: no nmap license terms violation, no external program execution in our python scripts, lessen the number external parts uguu is depended on, scan a list of host-port pairs at once without grouping by ports (you can give nmap list of hosts through stdin but you can't specify ports there, afaik). And with all these advantages it was fine by me to have a scanner with less performance. But as for my small /24 network with 4 computers and no crappy firewalls it is even faster than nmap (nmap -PS -p): 3 seconds against 8 seconds.&lt;br /&gt;&lt;br /&gt;The scanner is written in python and is available as a part of &lt;a href="http://code.google.com/p/uguu/"&gt;uguu&lt;/a&gt; codebase (bin/network.py). It can be executed as a standalone application with ip range (three formats are available: single host, ip1-ip2, ip/netmask) and port specified in command line. Ip range can't be large for now due to number of open files limitations, but it worked for me with /24 netmask.&lt;br /&gt;&lt;br /&gt;PS. Should I pass some other options to nmap? But no cheating here, it has to be a user-invoked port scanner.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8099934077984653989-796751619689356808?l=savrus.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://savrus.blogspot.com/feeds/796751619689356808/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8099934077984653989&amp;postID=796751619689356808' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/796751619689356808'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/796751619689356808'/><link rel='alternate' type='text/html' href='http://savrus.blogspot.com/2010/02/simple-port-scanner.html' title='simple port scanner'/><author><name>savrus</name><uri>http://www.blogger.com/profile/15866109971001922680</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://3.bp.blogspot.com/_a0-3CVBSqgA/Sfd_lpyrebI/AAAAAAAAAG4/1B-3_qAgrdM/S220/IMG_4347.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8099934077984653989.post-6850572750699312758</id><published>2010-01-11T10:55:00.008+02:00</published><updated>2010-02-08T01:02:57.912+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='program'/><category scheme='http://www.blogger.com/atom/ns#' term='uguu'/><category scheme='http://www.blogger.com/atom/ns#' term='python'/><category scheme='http://www.blogger.com/atom/ns#' term='c'/><title type='text'>uguu - search engine for local networks</title><content type='html'>I've spent new year holidays on designing and writing (alongside with another one developer) a local searcher which would allow quick search and browsing through SMB and FTP shares. For years we were using &lt;a href="http://sourceforge.net/projects/fsmbsearch"&gt;http://sourceforge.net/projects/fsmbsearch&lt;/a&gt; but I don't like neither it's design which almost doesn't rely on relational database (I may be wrong here and it would be interesting to find out the answer) nor it's implementation when installation requires dealing with autohell, patching some outdated version of samba and getting loads of perl modules.&lt;br /&gt;&lt;br /&gt;The new search engine called uguu is entirely open source and is available at &lt;a href="http://code.google.com/p/uguu"&gt;http://code.google.com/p/uguu&lt;/a&gt;. It uses libsmbclient and small amount of C code for scanning SMB share, postgres for database, python for scripts and django as web framework.&lt;br /&gt;It is still very far way to go (for example, no search page yet) and there are some things I plan to do myself, but if you want to contribute in any way feel free to join google group devoted to this project: &lt;a href="http://groups.google.com/group/uguu"&gt;http://groups.google.com/group/uguu&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8099934077984653989-6850572750699312758?l=savrus.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://savrus.blogspot.com/feeds/6850572750699312758/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8099934077984653989&amp;postID=6850572750699312758' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/6850572750699312758'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/6850572750699312758'/><link rel='alternate' type='text/html' href='http://savrus.blogspot.com/2010/01/uguu-search-engine-for-local-networks.html' title='uguu - search engine for local networks'/><author><name>savrus</name><uri>http://www.blogger.com/profile/15866109971001922680</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://3.bp.blogspot.com/_a0-3CVBSqgA/Sfd_lpyrebI/AAAAAAAAAG4/1B-3_qAgrdM/S220/IMG_4347.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8099934077984653989.post-1011247496867230441</id><published>2009-10-31T17:40:00.011+02:00</published><updated>2011-02-08T10:33:26.729+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='linux'/><category scheme='http://www.blogger.com/atom/ns#' term='c'/><title type='text'>Spy mode inotify</title><content type='html'>Although linux inotify(7) user-space interface allows to catch events on file operations it doesn't provide a way to answer a simple question: who the hell is accessing my files. By 'who' I mean which process, of course. So if we want to get an answer for that question we have to go into kernel space. (Constantly doing 'lsof|grep' is out of question here.)&lt;br /&gt;&lt;br /&gt;Little thinking give us a hope this can be done with quite simple kernel module. Indeed, inotify handler must be called just in the moment a file is accessed, meaning inotify handler is running in context of the process which is accessing a file. So our inotify handler can just output to dmesg desired fields from 'current' task. This should work until some sophisticated buffering is introduced between file operations and inotify handlers.&lt;br /&gt;&lt;br /&gt;Kernel provides an easy way to use self-written inotify handler in a kernel module. See Documentation/filesystems/inotify.txt for details.&lt;br /&gt;&lt;br /&gt;The code below just exhibits the described approach works. For trying it do&lt;br /&gt;&lt;pre&gt;insmod spy_inotify.ko path=file_to_spy_on&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Then access that file and watch dmesg for messages like the following&lt;br /&gt;&lt;pre&gt;[spy_inotify] Catched inotify event for file ../../cpufreq.c. mask=32, pid=8183 executable="less"&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;spy_inotify.c&lt;br /&gt;&lt;pre&gt;/*  spy_inotify - linux kernel `module for spying on inotify events&lt;br /&gt;*  prints to dmesg _who_ exactly is doing something with a file&lt;br /&gt;*&lt;br /&gt;*  Copyright (C) 2009 Ruslan Savchenko&lt;br /&gt;*&lt;br /&gt;*  This program is free software; you can redistribute it and/or modify&lt;br /&gt;*  it under the terms of the GNU General Public License as published by&lt;br /&gt;*  the Free Software Foundation; either version 2 of the License, or&lt;br /&gt;*  (at your option) any later version.&lt;br /&gt;*&lt;br /&gt;*  This program is distributed in the hope that it will be useful,&lt;br /&gt;*  but WITHOUT ANY WARRANTY; without even the implied warranty of&lt;br /&gt;*  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the&lt;br /&gt;*  GNU General Public License for more details.&lt;br /&gt;*&lt;br /&gt;*  You should have received a copy of the GNU General Public License&lt;br /&gt;*  along with this program; if not, write to the Free Software&lt;br /&gt;*  Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA&lt;br /&gt;*/&lt;br /&gt;&lt;br /&gt;#include &amp;lt;linux/module.h&amp;gt;&lt;br /&gt;#include &amp;lt;linux/kernel.h&amp;gt;&lt;br /&gt;#include &amp;lt;linux/fs.h&amp;gt;&lt;br /&gt;#include &amp;lt;linux/namei.h&amp;gt;&lt;br /&gt;#include &amp;lt;linux/inotify.h&amp;gt;&lt;br /&gt;#include &amp;lt;linux/sched.h&amp;gt;&lt;br /&gt;#include &amp;lt;linux/string.h&amp;gt;&lt;br /&gt;&lt;br /&gt;#define SPY_INOTIFY "[spy_inotify] "&lt;br /&gt;#define MASK (IN_ACCESS | IN_ATTRIB | IN_CLOSE_WRITE | IN_CLOSE_NOWRITE \&lt;br /&gt;| IN_CREATE | IN_DELETE | IN_DELETE_SELF | IN_MODIFY \&lt;br /&gt;| IN_MOVE_SELF | IN_MOVED_FROM | IN_MOVED_TO | IN_OPEN )&lt;br /&gt;&lt;br /&gt;static char *path = "/";&lt;br /&gt;module_param(path, charp, 0);&lt;br /&gt;MODULE_PARM_DESC(path, "Path to spy at");&lt;br /&gt;static char comm[TASK_COMM_LEN];&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;void sinotify_event(struct inotify_watch *watch, u32 wd, u32 mask,&lt;br /&gt;u32 cookie, const char *name, struct inode *inode);&lt;br /&gt;void sinotify_destroy(struct inotify_watch *watch);&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;static struct inotify_operations si_op = {&lt;br /&gt;sinotify_event, sinotify_destroy };&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;void sinotify_event(struct inotify_watch *watch, u32 wd, u32 mask,&lt;br /&gt;u32 cookie, const char *name, struct inode *inode) {&lt;br /&gt;task_lock(current);&lt;br /&gt;strncpy(comm, current-&gt;comm, sizeof(current-&gt;comm));&lt;br /&gt;task_unlock(current);&lt;br /&gt;printk(KERN_INFO SPY_INOTIFY&lt;br /&gt;"Catched inotify event for file %s. mask=%d, pid=%d executable=\"%s\"\n",&lt;br /&gt;path, mask, current-&gt;tgid, comm);&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;static struct inotify_handle *ih = 0;&lt;br /&gt;static struct inotify_watch watch;&lt;br /&gt;&lt;br /&gt;void sinotify_destroy(struct inotify_watch *watch) {&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;int init_sinotify(void)&lt;br /&gt;{&lt;br /&gt;struct path s_path;&lt;br /&gt;int err;&lt;br /&gt;&lt;br /&gt;printk(KERN_INFO SPY_INOTIFY "Spying at %s\n", path);&lt;br /&gt;&lt;br /&gt;if ((err = kern_path(path, LOOKUP_FOLLOW, &amp;s_path)) != 0) {&lt;br /&gt;printk(KERN_ERR SPY_INOTIFY "kern_path() returned %d\n", err);&lt;br /&gt;goto out;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;/* FIXME: crappy nullcheck and no actions on error. Odd */&lt;br /&gt;if ((s_path.dentry == NULL) || (s_path.dentry-&gt;d_inode == NULL)) {&lt;br /&gt;printk(KERN_ERR SPY_INOTIFY "NULL dentry or inode\n");&lt;br /&gt;err = -ENOENT;&lt;br /&gt;goto out;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;ih = inotify_init(&amp;si_op);&lt;br /&gt;if (IS_ERR(ih)) {&lt;br /&gt;printk(KERN_ERR SPY_INOTIFY "inotify_init() returned bad ptr\n");&lt;br /&gt;err = PTR_ERR(ih);&lt;br /&gt;goto out;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;inotify_init_watch(&amp;watch);&lt;br /&gt;&lt;br /&gt;err = inotify_add_watch(ih, &amp;watch, s_path.dentry-&gt;d_inode, MASK);&lt;br /&gt;if (err &lt; 0) {&lt;br /&gt;printk(KERN_ERR SPY_INOTIFY&lt;br /&gt;"inotify_add_watch() returned bad descriptor: %d\n", err);&lt;br /&gt;goto clean_ih;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;return 0;&lt;br /&gt;&lt;br /&gt;clean_ih:&lt;br /&gt;inotify_destroy(ih);&lt;br /&gt;&lt;br /&gt;out:&lt;br /&gt;return err;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;void exit_sinotify(void)&lt;br /&gt;{&lt;br /&gt;inotify_rm_watch(ih, &amp;watch);&lt;br /&gt;inotify_destroy(ih);&lt;br /&gt;printk(KERN_INFO SPY_INOTIFY "End spying\n");&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;module_init(init_sinotify);&lt;br /&gt;module_exit(exit_sinotify);&lt;br /&gt;&lt;br /&gt;MODULE_LICENSE("GPL");&lt;br /&gt;MODULE_AUTHOR("Ruslan Savchenko");&lt;br /&gt;MODULE_DESCRIPTION("Spy-mode inotify");&lt;br /&gt;&lt;/pre&gt;Makefile&lt;pre&gt;obj-m += spy_inotify.o&lt;br /&gt;&lt;br /&gt;all:&lt;br /&gt;make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules&lt;br /&gt;&lt;br /&gt;clean:&lt;br /&gt;make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8099934077984653989-1011247496867230441?l=savrus.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://savrus.blogspot.com/feeds/1011247496867230441/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8099934077984653989&amp;postID=1011247496867230441' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/1011247496867230441'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/1011247496867230441'/><link rel='alternate' type='text/html' href='http://savrus.blogspot.com/2009/10/spy-mode-inotify.html' title='Spy mode inotify'/><author><name>savrus</name><uri>http://www.blogger.com/profile/15866109971001922680</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://3.bp.blogspot.com/_a0-3CVBSqgA/Sfd_lpyrebI/AAAAAAAAAG4/1B-3_qAgrdM/S220/IMG_4347.JPG'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8099934077984653989.post-2569244185920642038</id><published>2009-10-09T11:30:00.003+03:00</published><updated>2010-02-21T20:55:11.628+02:00</updated><title type='text'>.pathrc</title><content type='html'>For a long time I've been using a 'usr' directory in my home to install additional/patched software built from scratch. It is very convenient since the main distribution keeps clean and using of environment variables makes it look like everything is normal. For some unknown reason long in the past i've moved variable initializations out of my .bashrc file to .pathrc which is presented below. &lt;br /&gt;&lt;br /&gt;Hehe, .pathrc is not something I'm very proud about but several days ago I suddenly urged for it without having any access to my home computer. So let it be posted here.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;# set PATH so it includes user's private bin if it exists&lt;br /&gt;USER_PREFIX=~/usr&lt;br /&gt;if [ -d ${USER_PREFIX}/bin ] ; then&lt;br /&gt;    PATH=${USER_PREFIX}/bin${PATH:+:$PATH}&lt;br /&gt;    export PATH&lt;br /&gt;fi&lt;br /&gt;if [ -d ${USER_PREFIX}/sbin ] ; then&lt;br /&gt;    PATH=${USER_PREFIX}/sbin${PATH:+:$PATH}&lt;br /&gt;    export PATH&lt;br /&gt;fi&lt;br /&gt;if [ -d ${USER_PREFIX}/include ] ; then&lt;br /&gt;    C_INCLUDE_PATH=${USER_PREFIX}/include${C_INCLUDE_PATH:+:$C_INCLUDE_PATH}&lt;br /&gt;    export C_INCLUDE_PATH&lt;br /&gt;fi&lt;br /&gt;if [ -d ${USER_PREFIX}/include ] ; then&lt;br /&gt;    CPATH=${USER_PREFIX}/include${CPATH:+:$CPATH}&lt;br /&gt;    export CPATH&lt;br /&gt;fi&lt;br /&gt;for l in lib lib64; do&lt;br /&gt;    if [ -d ${USER_PREFIX}/${l} ] ; then&lt;br /&gt;        LIBRARY_PATH=${USER_PREFIX}/${l}${LIBRARY_PATH:+:$LIBRARY_PATH}&lt;br /&gt;        export LIBRARY_PATH&lt;br /&gt;    fi  &lt;br /&gt;    if [ -d ${USER_PREFIX}/${l} ] ; then&lt;br /&gt;        LD_LIBRARY_PATH=${USER_PREFIX}/${l}${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}&lt;br /&gt;        export LD_LIBRARY_PATH&lt;br /&gt;    fi&lt;br /&gt;done&lt;br /&gt;if [ -d ${USER_PREFIX}/share/man ] ; then&lt;br /&gt;    # MANPATH should be ended with ":", otherwise man(1) doesn't&lt;br /&gt;    # search in system paths. See manpath(1) for details.&lt;br /&gt;    MANPATH=${USER_PREFIX}/share/man:${MANPATH}&lt;br /&gt;    export MANPATH&lt;br /&gt;fi&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8099934077984653989-2569244185920642038?l=savrus.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://savrus.blogspot.com/feeds/2569244185920642038/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8099934077984653989&amp;postID=2569244185920642038' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/2569244185920642038'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/2569244185920642038'/><link rel='alternate' type='text/html' href='http://savrus.blogspot.com/2009/10/pathrc.html' title='.pathrc'/><author><name>savrus</name><uri>http://www.blogger.com/profile/15866109971001922680</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://3.bp.blogspot.com/_a0-3CVBSqgA/Sfd_lpyrebI/AAAAAAAAAG4/1B-3_qAgrdM/S220/IMG_4347.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8099934077984653989.post-2358653235488583239</id><published>2009-10-06T20:09:00.070+03:00</published><updated>2010-05-03T10:08:46.050+03:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='program'/><category scheme='http://www.blogger.com/atom/ns#' term='asm'/><category scheme='http://www.blogger.com/atom/ns#' term='article'/><title type='text'>Writing a DMA Bootloader</title><content type='html'>Preamble&lt;br /&gt;&lt;br /&gt;At work I was asked to write a program which performs a dma transfer. After reading some documentation, articles and posts on the internet i've done it finally.Since I haven't found a tutorial for this I'll try to write one here. &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Abstract&lt;br /&gt;&lt;br /&gt;Bootloader sets up the environment for the OS kernel, loads the kernel and transfers execution to it. By using DMA mode the job of loading the kernel and other jobs can be done in parallel thus reducing boot time. Here I'll try to present a DMA mode transfer as early as possible.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Master Boot Record&lt;br /&gt;&lt;br /&gt;First of all what is a &lt;span style="font-style:italic;"&gt;bootloader&lt;/span&gt;? On startup x86 processor executes instruction at address 0xFFFFFFF0 ([3], 9.1.4 "First Instruction Executed") which is located in BIOS area ([2], 3.1.3 "BIOS Memory"). After BIOS finishes hardware checking and other jobs it loads first sector (512 bytes), called Master Boot Record, of a boot device into 0x0000:0x7c00 memory location, saves drive number into DL register and transfers execution control to 0x7c00 ([1], 6.5.1 "Booting from BAIDs"). Although it is very well known that MBR must be ended with 55 AA signature I can't find exact document proclaiming that. So bootloader is just a 510-byte length code with additional 0xaa55 (little-endian) after the end to be executed in real mode at addres 0x7c00.&lt;br /&gt;&lt;br /&gt;It is possible to use BIOS fuctions in bootloader. They are invoked as interrupts using 'int' instruction ([3], 6 "Interrupt and Exception Handling"; [3], 16.1.4 "Interrupt and Exception Handling" in real mode; see [5] for detailed description of 'int' instruction), parameters are passed through registers (AH is typically used for selecting a function while interrupt vector selects a function class). The large reference known as Ralf Brown's Interrupt List is available on the internet (see Links). For example, to print a message one can use the following function (in NASM syntax; see Links for NASM manual):&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;; prints a null-terminated string pointed by DS:SI&lt;br /&gt;print:  lodsb&lt;br /&gt;        test al, al&lt;br /&gt;        je .done&lt;br /&gt;        mov ah, 0x0e   ; BIOS int 0x10/AH=0x0e function. AL = character to write&lt;br /&gt;        mov bx, 0x0007 ; BH = page number, BL = foreground color&lt;br /&gt;        int 0x10       ; display a character on the screen, advancing the cursor&lt;br /&gt;                       ; and scrolling the screen as necessary&lt;br /&gt;        jmp print&lt;br /&gt;.done:  ret&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;BIOS can even be used for loading a sector from hdd to memory. For more details refer to Rallf Brown's Interrupt List.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;        ; read from disk drive in LBA mode&lt;br /&gt;        ; disk drive specified by DL is be used&lt;br /&gt;        mov si, addrpacket&lt;br /&gt;        mov ah, 0x42 ; BIOS int 0x13/AH=0x42 function. DL = drive number&lt;br /&gt;                     ; DS:SI = disk address packet address&lt;br /&gt;        int 0x13     ; performs READ from disk drive in LBA mode&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Here addrpacket is a data structure stored in the main memory. In the following example addrpacket tells BIOS to transfer the second sector of the disk drive to address 0x7e00 (or DS:0x7e00, RBIL doesn't say it explicitly)&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;        addrpacket db 0x10   ; size of this disk address packet structure&lt;br /&gt;                   db 0      ; reserved&lt;br /&gt;                   dw 1      ; number of blocks to transfer&lt;br /&gt;                   dd 0x7e00 ; transfer buffer in main memory&lt;br /&gt;                   dd 1, 0   ; starting absolute 64-bit block number&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;It is up to BIOS to choose which mode the latter transfer is performed in. It could be DMA as well as PIO so we can't rely on BIOS here.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;ISA DMA vs PCI IDE DMA&lt;br /&gt;&lt;br /&gt;The idea of DMA is older than PCI. On the internet there are plenty of tutorials and examples describing ISA DMA (see Links). If you read something about two DMA controllers (8237A compatible, in particular) or I/O ports 0x00 - 0x0f, then the talk is about ISA DMA. What we need here is IDE DMA which is quite more complicated.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;PCI and PCI Configuration Space&lt;br /&gt;&lt;br /&gt;In x86 architecture there are two address spaces supported by the processor: main memory ([4], 3.3 "Memory Organization") and I/O address space which elements are called ports ([4], 13 "Input/Output"). While accessing main memory is done by 'mov' and several other instructions, I/O ports are accessed by 'in' and 'out' instructions (see [5], [6] for detailed description of these instructions)&lt;br /&gt;&lt;br /&gt;PCI presented another address space called "PCI Configurations Space" in which each PCI device obtains a 256-byte area. PCI Configuration Space is accessed using two ports: 0x0cf8 and 0x0cfc ([8], 3.2.2.3.2 "Software Generation of Configuration Transactions"). The first one is called CONFIG_ADDRESS and the second one is called CONFIG_DATA. Configuration transaction works as follows: software writes to CONFIG_ADDRESS to specify which device's configuration register should be accessed via CONFIG_DATA. Then it reads or writes CONFIG_DATA port thus performing load to or store from device's configuration space. CONFIG_DATA is 32-bit port and 4 bytes aligned at 4-byte boundary are accessed at once.&lt;br /&gt;&lt;br /&gt;CONFIG_ADDRESS has the following format ([8], 3.2.2.3.2 "Software Generation of Configuration Transactions")&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;31 30      24 23        16 15           11 10              8 7               2 1 0&lt;br /&gt;+-+----------+------------+---------------+-----------------+-----------------+-+-+&lt;br /&gt;| | Reserved | Bus Number | Device Number | Function Number | Register Number |0|0|&lt;br /&gt;+-+----------+------------+---------------+-----------------+-----------------+-+-+&lt;br /&gt;31 - Enable bit&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Enable bit specifies if CONFIG_DATA will be accessed. Function Number selects a function in multi-functional device - PCI device which has separate PCI configuration space for each supported function number. For example, PIIX controller we consider later here has PCI-ISA bridge as function 0 and PCI IDE controller as function 1.&lt;br /&gt;&lt;br /&gt;PCI Configuration space for each PCI device consists of header and device dependent region ([8], 6.1 "Configuration Space Organization"). Header is specified as follows&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;+----------+-------------+-------------+---------------+----------------------+&lt;br /&gt;| register | bits 31-24  | bits 23-16  | bits 15-8     | bits 7-0             |&lt;br /&gt;+----------+-------------+-------------+---------------+----------------------+&lt;br /&gt;| 00       | Device ID                 | Vendor ID                            |&lt;br /&gt;| 04       | Status                    | Command                              |&lt;br /&gt;| 08       | Class code                                | Revision ID          |&lt;br /&gt;| 0C       | BIST        | Header type | Latency Timer | Cache Line Size      |&lt;br /&gt;| 10       | Base address #0 (BAR0)                                           |&lt;br /&gt;| 14       | Base address #1 (BAR1)                                           |&lt;br /&gt;| 18       | Base address #2 (BAR2)                                           |&lt;br /&gt;| 1C       | Base address #3 (BAR3)                                           |&lt;br /&gt;| 20       | Base address #4 (BAR4)                                           |&lt;br /&gt;| 24       | Base address #5 (BAR5)                                           |&lt;br /&gt;| 28       | Cardbus CIS Pointer                                              |&lt;br /&gt;| 2C       | Subsystem ID              | Subsystem Vendor ID                  | &lt;br /&gt;| 30       | Expansion ROM base address                                       |&lt;br /&gt;| 34       | Reserved                                  | Capabilities Pointer |&lt;br /&gt;| 38       | Reserved                                                         |&lt;br /&gt;| 3C       | Max latency | Min Grant   | Interrupt PIN | Interrupt Line       |&lt;br /&gt;+----------+-------------+-------------+---------------+----------------------+&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Here we only use Vendor ID (0x00), Status (0x06), Class Code (0x09) and BAR4 (0x20). When talking about Class Code it is often split into three bytes: base class code (0x0b), subclass code (0x0a) and programming interface (0x09) ([8], 6.2.1 "Device Identification"). Base Address Registers are used to specify mapping between either main memory or i/o space and device memory ([8], 6.2.5 "Base Addresses"). Bit 0 of BAR determines the type of memory: 0 for main memory, 1 for i/o space.&lt;br /&gt;&lt;br /&gt;For more information see [10], 24 "The PCI Local Bus" and [8].&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;PCI IDE contorller&lt;br /&gt;&lt;br /&gt;To peform DMA IDE transfer IDE controller must be capable of Bus Master function. Such device is determined as follows: base class code and subclass code must both be 0x01 to identify IDE device, and bit 7 of programming interface must be 1 to identify that the device is capable of Bus Master operation ([7], 5.0. "PCI Specifics"; [8], Appendix D "Class Codes"). PIIX/PIIX3 controller is a multi-functional PCI device and it's function 1 is exactly an IDE device with class code 0x010180 ([2], 2.3. "PCI Configuration Registers—IDE Interface (Function 1)", 2.3.6. "CLASSC - CLASS CODE REGISTER (Function 1)"). &lt;br /&gt;&lt;br /&gt;We need to use 2 bits of PCI Command register ([2], 2.3.3. "PCICMD - COMMAND REGISTER (Function 1)"):&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;PCI Command Register (address 0x04-0x05)&lt;br /&gt;+-----+------------------------------------------------------------------------+&lt;br /&gt;| bit | Description                                                            |&lt;br /&gt;+-----+------------------------------------------------------------------------+&lt;br /&gt;|  2  | Bus Master Enable (BME). 1 - enable, 0 - disable                       |&lt;br /&gt;|  0  | I/O Space Enable (IOSE). 1 - enable access to ATA and Bus Master ports |&lt;br /&gt;+-----+------------------------------------------------------------------------+&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;ATA ports are called Legacy IDE ports and are used for sending ATA commands. They have fixed addresses, we'll talk about them when discussing ATA part of DMA protocol. Bus Master ports have changeable base address which is configurable via BAR4.&lt;br /&gt;&lt;br /&gt;BAR4 is called Bus Master Interface Base Address Register (BMIBA) in PIIX/PIIX3 documentation ([2], 2.3.9. "BMIBA—BUS MASTER INTERFACE BASE ADDRESS REGISTER (Function 1)"). Bits 2-15 specify 4-bytes aligned base address, bit 0 is hardwired to 1 to indicate this BAR is for I/O address space, other bits are reserved and set to 0.&lt;br /&gt;&lt;br /&gt;The latter I/O address range is called PCI Bus Master IDE Registers ([2], 2.7. "PCI BUS Master IDE Registers") and includes the following ports:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;+--------+--------------------------------------------------------------------+&lt;br /&gt;| Offset | Desscripton                                                        |&lt;br /&gt;+--------+--------------------------------------------------------------------+&lt;br /&gt;|  0x00  | Primary Channel Bus Master IDE Command Register                    |&lt;br /&gt;|  0x02  | Primary Channel Bus Master IDE Status Register                     |&lt;br /&gt;|  0x04  | Primary Channel Bus Master IDE Descriptor Table Pointer Register   |&lt;br /&gt;|  0x08  | Secondary Channel Bus Master IDE Command Register                  |&lt;br /&gt;|  0x0a  | Secondary Channel Bus Master IDE Status Register                   |&lt;br /&gt;|  0x0c  | Secondary Channel Bus Master IDE Descriptor Table Pointer Register |&lt;br /&gt;+--------+--------------------------------------------------------------------+&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Command and Status ports are 8-bit registers while Descriptor Table Pointer is 32-bit one. Bit 3 of Command Register is used for specifying Read/Write operation (Bus Master Read operation reads from memory, Bus Master Write operation writes to memory), bit 0 is used for Start/Stop Bus Master operation. Bit 0 of Status Register indicates if Bus Master operation is in progress, bit 1 indicates errors. For more details refer to [2], 2.7. "PCI BUS Master IDE Registers".&lt;br /&gt;&lt;br /&gt;Bits 2-31 of Descriptor Table Pointer Register specify a 4-bytes aligned base address for Physical Region Descriptor Table. PRD Table entry has the following format ([7] 1.2. "Physical Region Descriptor"; [2], 3.5.3. "BUS MASTER FUNCTION"):&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;+---------+----------+----------+----------+----------+&lt;br /&gt;|         |  byte 3  |  byte 2  |  byte 1  |  byte 0  |&lt;br /&gt;+---------+----------+----------+----------+----------+&lt;br /&gt;| dword 0 | Memory Region Physical Base Address     |0|&lt;br /&gt;| dword 1 |EOT|    Reserved     |  Byte Count       |0|&lt;br /&gt;+---------+---+------+----------+----------+--------+-+&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Region Address is an address in main memory associated with DMA transfer. Byte Count specify the latter region size in bytes. EOT bit indicates if this PRD is the last one in the table. Note that Physical Region must not cross 64K boundary.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;ATA&lt;br /&gt;&lt;br /&gt;Legacy IDE I/O ports have constant addresses ([2], 3.5.1. "ATA REGISTER BLOCK DECODE"). They are divided into four blocks: Primary Command Block (base 0x01f0), Primary Control Block (base 03F4h), Secondary Command Block (base 0x0170), Secondary Control Block (base 0x0374). Here registers of command block are presented ([9], 6.2 "I/O register descriptions")&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;+--------+--------------------------------+&lt;br /&gt;| offset | Register Function              |&lt;br /&gt;+--------+--------------------------------+&lt;br /&gt;|  0x00  | Data                           |&lt;br /&gt;|  0x01  | Error / Features               |&lt;br /&gt;|  0x02  | Sector Count                   |&lt;br /&gt;|  0x03  | LBA bits 0-7                   |&lt;br /&gt;|  0x04  | LBA bits 8-15                  |&lt;br /&gt;|  0x05  | LBA bits 16-23                 |&lt;br /&gt;|  0x06  | Device / Head                  |&lt;br /&gt;|  0x07  | Status / Command               |&lt;br /&gt;+--------+--------------------------------+&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Each register is one byte length. Bit 7 of Status register is called BSY and indicates that device holds control block registers. If it is set all writes to control block registers are ignored. Bit 6 of Status register is called DRDY and indicates that device is capable of accepting a command.&lt;br /&gt;&lt;br /&gt;To perform a DMA transfer software must do the following steps ([9], 8.6 "DMA data transfer commands")&lt;br /&gt;a) The host reads the Status or Alternate Status register until the BSY bit is equal to 0;&lt;br /&gt;b) The host writes the Device/Head register with the appropriate DEV bit value;&lt;br /&gt;c) The host reads the Status or Alternate Status register until the BSY bit is equal to 0 and the DRDY bit is equal to 1;&lt;br /&gt;d) The host writes any required command parameters to the Features, Sector Count, Sector Number,&lt;br /&gt;Cylinder High, Cylinder Low and Device/Head registers;&lt;br /&gt;e) The host initializes the DMA channel;&lt;br /&gt;f) The host writes the command code (0xc8 for DMA Read with retry) to the Command register;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Conclusion&lt;br /&gt;&lt;br /&gt;Given specifications are enough to perform a DMA transaction. Code with large number of comments is given below.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;References&lt;br /&gt;&lt;br /&gt;[1] BIOS Boot Specification, Version 1.01, January 1996, Compaq, Phoenix, Intel&lt;br /&gt;[2] 82371FB (PIIX) AND 82371SB (PIIX3) PCI ISA IDE XCELERATOR, April 1997, Intel&lt;br /&gt;[3] Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A: System Programming Guide, Part 1, June 2009, Intel Corporation&lt;br /&gt;[4] Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 1: Basic Architecture, June 2009, Intel Corporation&lt;br /&gt;[5] Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2B: Instruction Set Reference, A-M, June 2009, Intel Corporation&lt;br /&gt;[6] Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2B: Instruction Set Reference, N-Z, June 2009, Intel Corporation&lt;br /&gt;[7] Programming Interface for Bus Master IDE Controller, Revision 1.0, 5/16/94, Brad Hosler, Intel Corporation&lt;br /&gt;[8] PCI Local Bus Specification, Revision 2.3, March 29, 2002, PCI Special Interest Group&lt;br /&gt;[9] Information Technology - AT Attachment Interface with Extensions - 2 (ATA-2), Working Draft, Revision 4c, March 18, 1996, X3T10&lt;br /&gt;[10] The Indispensable PC Hardware Book, Third Edition, Hans-Peter Messmer, ADDISON-WESLEY&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Links&lt;br /&gt;&lt;br /&gt;http://wiki.osdev.org/MBR_(x86)&lt;br /&gt;http://wiki.osdev.org/Memory_Map_(x86)&lt;br /&gt;http://wiki.osdev.org/PCI&lt;br /&gt;http://wiki.osdev.org/BIOS&lt;br /&gt;http://wiki.osdev.org/RBIL&lt;br /&gt;http://wiki.osdev.org/ISA_DMA&lt;br /&gt;http://www.nasm.us/doc  - NASM manual&lt;br /&gt;http://www.ata-atapi.com/hiwchs.html  - CHS to LBA translation&lt;br /&gt;http://www.bswd.com/cornucop.htm  - links to ata/atapi specfications drafts&lt;br /&gt;http://www.cs.cmu.edu/~ralf/files.html  - BIOS interrupts&lt;br /&gt;http://www.osdever.net/tutorials/lba.php  - an example of data transfer in PIO mode&lt;br /&gt;http://www.nondot.org/sabre/os/articles/TheBootProcess  - articles about booting and bootloader examples&lt;br /&gt;http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/dma.html  - ISA DMA explanation&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The Code&lt;br /&gt;&lt;br /&gt;This code works under qemu. Just save all files, run make and execute 'qemu -hda hdd' command. You should see a message "Hello world from DMA Bootloader" indicating everithing goes well.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;loader.asm&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;;&lt;br /&gt;; DMA Bootloader&lt;br /&gt;;&lt;br /&gt;; This program is provided as an example for educational&lt;br /&gt;; purpose ONLY and WITHOUT ANY WARRANTY&lt;br /&gt;; &lt;br /&gt;; Copyright 2009, Ruslan Savchenko&lt;br /&gt;&lt;br /&gt;;------------------------------------------------&lt;br /&gt;; macroses&lt;br /&gt;;------------------------------------------------&lt;br /&gt;&lt;br /&gt;%macro PCI_CONF_READ 1-2&lt;br /&gt;; %1 - pci address register base&lt;br /&gt;; %2 - pci conf register num&lt;br /&gt;        mov eax, %1&lt;br /&gt;        %ifnum %2&lt;br /&gt;        or ax, %2&lt;br /&gt;        %endif&lt;br /&gt;        call pci_conf_read&lt;br /&gt;%endm&lt;br /&gt;&lt;br /&gt;%macro PCI_CONF_WRITE 1-2&lt;br /&gt;; WARNING! mangles ebx!&lt;br /&gt;; %1 - pci address register base&lt;br /&gt;; %2 - pci conf register num&lt;br /&gt;        mov ebx, %1&lt;br /&gt;        %ifnum %2&lt;br /&gt;        or bx, %2&lt;br /&gt;        %endif&lt;br /&gt;        call pci_conf_write&lt;br /&gt;%endm&lt;br /&gt;&lt;br /&gt;;------------------------------------------------&lt;br /&gt;; consts &lt;br /&gt;;------------------------------------------------&lt;br /&gt;&lt;br /&gt;%define iobase 0xff0     ; Bus Master I/O port base&lt;br /&gt;%define sector 57        ; Number of sector to load. Must be less than 256&lt;br /&gt;%define codestart 0x500  ; address where load sector to&lt;br /&gt;&lt;br /&gt;;------------------------------------------------&lt;br /&gt;; code&lt;br /&gt;;------------------------------------------------&lt;br /&gt;&lt;br /&gt;        org 0x7c00 ; tell NASM this code is to be executed at address 0x7c00&lt;br /&gt;&lt;br /&gt;start&lt;br /&gt;        ; configure stack at first&lt;br /&gt;        cli                     ; disable interrupts &lt;br /&gt;        mov ax, 0x9000          ; stack segment base&lt;br /&gt;        mov ss, ax              ; write segment base to SS register&lt;br /&gt;        mov sp, 0xffff          ; stack pointer&lt;br /&gt;        sti                     ; enable interrupts&lt;br /&gt;&lt;br /&gt;        &lt;br /&gt;        ; Loop through all PCI devices and their functions&lt;br /&gt;        ; to find IDE Bus Master&lt;br /&gt;&lt;br /&gt;        mov cx, 0xffff               ; Bus Number, Devce Number and&lt;br /&gt;                                     ; Register Number together occupy&lt;br /&gt;                                     ; 16 bits of PCI CONFIG_ADDRESS&lt;br /&gt;        mov ebx, 0x80000000          ; set Enable bit&lt;br /&gt;.m1   &lt;br /&gt;        PCI_CONF_READ ebx            ; load Device ID and Vendor ID to eax&lt;br /&gt;        cmp ax, 0xffff               ; check if device is not present&lt;br /&gt;        je .m2                       ; device not present&lt;br /&gt;        PCI_CONF_READ ebx, 0x08      ; load class code, subclass code and&lt;br /&gt;                                     ; prog interface to eax&lt;br /&gt;        shr eax, 15&lt;br /&gt;        cmp eax, 0x010180 &gt;&gt; 7       ; check if class = 0x01,&lt;br /&gt;                                     ; subclass = 0x01 and &lt;br /&gt;                                     ; bit 7 of prog_if = 1&lt;br /&gt;        jne .m2                      ; not an IDE Bus Master device&lt;br /&gt;        ; found Bus Master capable IDE controller&lt;br /&gt;        mov dword [pciide], ebx      ; save CONFIG_ADDRESS to access&lt;br /&gt;                                     ; IDE Bus Master in future&lt;br /&gt;.m2&lt;br /&gt;        add ebx, 0x100               ; try next function/device/bus &lt;br /&gt;        loop .m1&lt;br /&gt;        &lt;br /&gt;        mov eax, iobase | 1               ; Bus Master Interface Base Address&lt;br /&gt;        PCI_CONF_WRITE [pciide], 0x20     ; save address to BMIBA Register&lt;br /&gt;&lt;br /&gt;        PCI_CONF_READ [pciide], 0x04      ; PCI Command register&lt;br /&gt;        or ax, 1                          ; set I/O Space Enable Bit&lt;br /&gt;        PCI_CONF_WRITE [pciide], 0x04     ; save PCICMD&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;        ; wait while ATA device is bisy&lt;br /&gt;        mov dx, 0x01f7          ; ATA Primary Status/Command&lt;br /&gt;.m3&lt;br /&gt;        in al, dx&lt;br /&gt;        cmp al, 0x80            ; check if BSY bit is set&lt;br /&gt;        jae .m3&lt;br /&gt;&lt;br /&gt;        ; for Primary device set use of Drive 0 (Master) in LBA mode&lt;br /&gt;        mov dx, 0x01f6          ; ATA Primary Device/Head&lt;br /&gt;        mov al, 0x40            ; LBA mode, Drive = 0, LBA 27-24 = 0&lt;br /&gt;        out dx, al&lt;br /&gt;&lt;br /&gt;        ; wait for ATA defice to accept command&lt;br /&gt;        mov dx, 0x01f7          ; ATA Primary Status/Command&lt;br /&gt;.m4&lt;br /&gt;        in al, dx&lt;br /&gt;        and al, 0xc0            ; BSY and DRDY bits&lt;br /&gt;        cmp al, 0x40            ; check if BSY=0, DRDY=1&lt;br /&gt;        jne .m4&lt;br /&gt;&lt;br /&gt;        ; Set parameters of transaction:&lt;br /&gt;        ; nubmer of sectors, first sector LBA address and mode&lt;br /&gt;        ;&lt;br /&gt;        ;mov dx, 0x01f2         ; ATA Primary Sector Count&lt;br /&gt;        ;mov al, 1              ; count = 1&lt;br /&gt;        ;out dx, al&lt;br /&gt;        ;&lt;br /&gt;        ;mov dx, 0x01f3         ; ATA Primary LBA 0-7&lt;br /&gt;        ;mov al, sector         ; LBA 0-7 = sector &lt;br /&gt;        ;out dx, al&lt;br /&gt;        ;&lt;br /&gt;        ;mov dx, 0x01f4         ; ATA Primary LBA 8-15&lt;br /&gt;        ;mov al, 0              ; LBA 8-15 = 0 &lt;br /&gt;        ;out dx, al             &lt;br /&gt;        ;&lt;br /&gt;        ;mov dx, 0x01f5         ; ATA Primary LBA 16-23&lt;br /&gt;        ;mov al, 0              ; LBA 16-23 = 0&lt;br /&gt;        ;out dx, al             &lt;br /&gt;        ;&lt;br /&gt;        ;mov dx, 0x01f6         ; ATA Primary Device/Head&lt;br /&gt;        ;mov al, 0x40           ; LBA mode, Drive= 0&lt;br /&gt;        ;out dx, al&lt;br /&gt;        ;&lt;br /&gt;        ; Using loop here reduce code size by 12 bytes&lt;br /&gt;        &lt;br /&gt;        mov cx, 5               ; 5 iterations&lt;br /&gt;        mov dx, 0x01f2          ; ATA Primary Sector Count&lt;br /&gt;        mov si, atasector       ; transaction parameters array &lt;br /&gt;        cld                     ; string operations will increase SI&lt;br /&gt;.m5&lt;br /&gt;        lodsb                   ; load next byte to al and increase SI&lt;br /&gt;        out dx, al              ; write byte to corresponding ATA Primary&lt;br /&gt;                                ; Command Block port&lt;br /&gt;        inc dx                  ; increase DX&lt;br /&gt;        loop .m5&lt;br /&gt;&lt;br /&gt;        ; initialize host dma transfer&lt;br /&gt;        PCI_CONF_READ [pciide], 0x04      ; PCI Command Register&lt;br /&gt;        or ax, 4                          ; enable Bus Master Function&lt;br /&gt;        PCI_CONF_WRITE [pciide], 0x04     ; save PCICMD&lt;br /&gt;&lt;br /&gt;        ; set Drive 0 DMA Capable in Bus Master IDE Status Register&lt;br /&gt;        ; This bit does not affect hardware operation&lt;br /&gt;        ;mov word dx, iobase | 0x2         ; Bus Master IDE Status Register&lt;br /&gt;        ;in al, dx&lt;br /&gt;        ;or al, 0x20                       ; set Drive 0 DMA Capable&lt;br /&gt;        ;out dx, al                        ; save Bus Master IDE Status&lt;br /&gt;&lt;br /&gt;        ; write PRD Table pointer&lt;br /&gt;        xor eax, eax&lt;br /&gt;        mov ax, prd&lt;br /&gt;        mov dx, iobase | 0x4            ; Bus Master IDE Descriptor&lt;br /&gt;                                        ; Table Pointer Register&lt;br /&gt;        out dx, eax&lt;br /&gt;&lt;br /&gt;        ; command ATA to perform DMA transfer&lt;br /&gt;        mov dx, 0x01f7                  ; ATA Primary Status/Command&lt;br /&gt;        mov al, 0xc8                    ; DMA Read With Retry Command&lt;br /&gt;        out dx, al&lt;br /&gt;&lt;br /&gt;        ; tell Bus Master to start DMA transfer&lt;br /&gt;        mov dx, iobase                  ; Bus Master IDE Command Register&lt;br /&gt;        mov al, 9                       ; Start/Stop Bit set&lt;br /&gt;                                        ; Read/Write Bit set&lt;br /&gt;        out dx, al&lt;br /&gt;&lt;br /&gt;        ;&lt;br /&gt;        ; Place to do some other job while sector is transfered from&lt;br /&gt;        ; drive to memory without holding CPU&lt;br /&gt;        ;&lt;br /&gt;&lt;br /&gt;        ; wait until transfer is finished&lt;br /&gt;        mov dx, iobase | 0x2            ; Bus Master IDE Status Register&lt;br /&gt;.m6&lt;br /&gt;        in al, dx&lt;br /&gt;        bt ax, 0                        ; Bus Master IDE Active Bit&lt;br /&gt;        jc .m6&lt;br /&gt;&lt;br /&gt;        ; transfer execution to recently loaded code&lt;br /&gt;        jmp (codestart &gt;&gt; 4):0&lt;br /&gt;&lt;br /&gt;;------------------------------------------------&lt;br /&gt;; functions &lt;br /&gt;;------------------------------------------------&lt;br /&gt;&lt;br /&gt;pci_conf_read&lt;br /&gt;; Input:&lt;br /&gt;;        EAX = PCI CONFIG_ADDRESS&lt;br /&gt;; Output:&lt;br /&gt;;        EAX = PCI CONFIG_DATA&lt;br /&gt;        mov dx, 0xcf8           ; CONFIG_ADDRESS&lt;br /&gt;        out dx, eax&lt;br /&gt;        mov dx, 0xcfc           ; CONFIG_DATA&lt;br /&gt;        in eax, dx&lt;br /&gt;        ret&lt;br /&gt;&lt;br /&gt;pci_conf_write&lt;br /&gt;; Input:&lt;br /&gt;;        EAX = DATA&lt;br /&gt;;        EBX = PCI CONFIG_ADDRESS&lt;br /&gt;; Output:&lt;br /&gt;;       EBX = DATA&lt;br /&gt;        xchg eax, ebx&lt;br /&gt;        mov dx, 0xcf8           ; CONFIG_ADDRESS&lt;br /&gt;        out dx, eax&lt;br /&gt;        mov dx, 0xcfc           ; CONFIG_DATA&lt;br /&gt;        mov eax, ebx&lt;br /&gt;        out dx, eax&lt;br /&gt;        ret&lt;br /&gt;&lt;br /&gt;;------------------------------------------------&lt;br /&gt;; data&lt;br /&gt;;------------------------------------------------&lt;br /&gt;&lt;br /&gt;        ; array of bytes to be passed through ATA I/O ports&lt;br /&gt;        atasector       db 1, sector, 0, 0, 0x40&lt;br /&gt;                        ; sector count = 1&lt;br /&gt;                        ; LBA 0-7 = sector&lt;br /&gt;                        ; LBA 8-15 = 0&lt;br /&gt;                        ; LBA 16-24 = 0&lt;br /&gt;                        ; device/head = 0x40 (LBA mode, drive 0)&lt;br /&gt;&lt;br /&gt;align 4&lt;br /&gt;        ; Physical Region Descrptor&lt;br /&gt;        prd             dd codestart    ; Memory Region&lt;br /&gt;                        dd 0x80000200   ; EOT, 512 bytes&lt;br /&gt;&lt;br /&gt;        ; buffer to save CONFIG_ADDRESS of PCI IDE device&lt;br /&gt;        pciide          dd 0&lt;br /&gt;&lt;br /&gt;;------------------------------------------------&lt;br /&gt;; Space for Partition Table&lt;br /&gt;;------------------------------------------------&lt;br /&gt;&lt;br /&gt;        times 436-($-$$) db 0xff&lt;br /&gt;&lt;br /&gt;;------------------------------------------------&lt;br /&gt;; boot sector magic end&lt;br /&gt;;------------------------------------------------&lt;br /&gt;&lt;br /&gt;        times 510-($-$$) db 0&lt;br /&gt;        dw 0xAA55               ; MBR Signature&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;sector.asm&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;;------------------------------------------------&lt;br /&gt;; code&lt;br /&gt;;------------------------------------------------&lt;br /&gt;&lt;br /&gt;        org 0x500 ; tell NASM to assume this program&lt;br /&gt;                  ; begins at 0x500 when loaded into memory&lt;br /&gt;&lt;br /&gt;start:&lt;br /&gt;        mov si, hellomsg&lt;br /&gt;        call print_si&lt;br /&gt; &lt;br /&gt;        cli  ; disable interrupts&lt;br /&gt;        hlt  ; infinite loop which can be broken only by interrupt&lt;br /&gt;&lt;br /&gt;;------------------------------------------------&lt;br /&gt;; functions &lt;br /&gt;;------------------------------------------------&lt;br /&gt;&lt;br /&gt;print_si&lt;br /&gt;        push ax&lt;br /&gt;        push bx&lt;br /&gt;.next&lt;br /&gt;        lodsb&lt;br /&gt;        test al, al&lt;br /&gt;        je .done&lt;br /&gt;        mov ah, 0x0e   ; BIOS int 0x10/AH=0x0e function. AL = character to write&lt;br /&gt;        mov bx, 0x0007 ; BH = page number, BL = foreground color&lt;br /&gt;        int 0x10       ; display a character on the screen, advancing the cursor&lt;br /&gt;                       ; and scrolling the screen as necessary&lt;br /&gt;        jmp .next&lt;br /&gt;&lt;br /&gt;.done:&lt;br /&gt;        pop bx&lt;br /&gt;        pop ax&lt;br /&gt;        ret&lt;br /&gt;&lt;br /&gt;;------------------------------------------------&lt;br /&gt;; data&lt;br /&gt;;------------------------------------------------&lt;br /&gt;&lt;br /&gt;        hellomsg db 'Hello World from DMA Bootloader', 13, 10, 0&lt;br /&gt;&lt;br /&gt;;------------------------------------------------&lt;br /&gt;; sector end&lt;br /&gt;;------------------------------------------------&lt;br /&gt;&lt;br /&gt;        times 512 -($-$$) db 0&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Makefile&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;hdd: loader sector&lt;br /&gt;        rm -f hdd&lt;br /&gt;        dd if=loader of=hdd bs=512 count=1&lt;br /&gt;        dd if=sector of=hdd bs=512 count=1 seek=57&lt;br /&gt;&lt;br /&gt;loader: loader.asm&lt;br /&gt;        nasm loader.asm&lt;br /&gt;sector: sector.asm&lt;br /&gt;        nasm sector.asm&lt;br /&gt;&lt;br /&gt;clean:&lt;br /&gt;        rm -f hdd loader sector&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Known Issues&lt;br /&gt;&lt;br /&gt;If BIOS sets DS to nonzero value the code fails. Add initializing DS to zero at the very start to ensure the code is correct.&lt;br /&gt;This code doesn't check for _any_ errors. If you want to add error checks refer to corresponding documentation.&lt;br /&gt;No DMA timing configuration is done. Doing this can improve transaction speed significantly. UDMA Mode 5 is 6 times faster than UDMA Mode 0.&lt;br /&gt;On DMA transfer completion controller sends an interrupt. This also should be taken care of.&lt;br /&gt;After issuing a DMA transfer command to device one would probably want to wait until device is ready. In this case add the following code before setting Start/Stop bit of Bus Master Command Register. But be careful and read [9] before for this standard doesn't enfocrce setting DRQ here&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;        ; wait until ATA device is ready to transfer data&lt;br /&gt;        mov dx, 0x01f7                  ; ATA Primary Status/Command&lt;br /&gt;.m7&lt;br /&gt;        in al, dx&lt;br /&gt;        test al, 0x08                   ; check if DRQ bit is set&lt;br /&gt;        jz .m7&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8099934077984653989-2358653235488583239?l=savrus.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://savrus.blogspot.com/feeds/2358653235488583239/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8099934077984653989&amp;postID=2358653235488583239' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/2358653235488583239'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/2358653235488583239'/><link rel='alternate' type='text/html' href='http://savrus.blogspot.com/2009/10/writing-dma-bootloader.html' title='Writing a DMA Bootloader'/><author><name>savrus</name><uri>http://www.blogger.com/profile/15866109971001922680</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://3.bp.blogspot.com/_a0-3CVBSqgA/Sfd_lpyrebI/AAAAAAAAAG4/1B-3_qAgrdM/S220/IMG_4347.JPG'/></author><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8099934077984653989.post-4697143993404497693</id><published>2009-08-08T14:12:00.009+03:00</published><updated>2009-10-08T14:18:27.817+03:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='c'/><category scheme='http://www.blogger.com/atom/ns#' term='patch'/><title type='text'>LANG vs LC_MESSAGES in man(1) error messages</title><content type='html'>At first I'd like to say that I'm pretty happy with english being my system's language. But since I have to use russian sometimes I've set LANG=ru_RU.UTF-8. Unfortunately man(1) error messages are encoded in KOI8-R so they look like garbage on my locale. I don't particularly want them to be in russian so I've set LC_MESSAGE=C. The man page locale(1P) says:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;  LC_MESSAGES&lt;br /&gt;              Determine the locale that should be used to  affect  the  format&lt;br /&gt;              and contents of diagnostic messages written to standard error.&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;But it doesn't work for man(1). After little investigation I've learned that man(1) uses catgets(3) to get an error string. The catalog descriptor for catgets(3) is received from catopen(3) which (in case $LC_MESSAGES=C) carefully tries to open /usr/share/locale/C and some other dirs with C locale and finally fallbacks to /usr/share/locale/ru. If man(1) obtains an empty string from catgets(3) it looks for it in builtin table which is what I need.&lt;br /&gt;As for now I can't say which part of this complicated situation is buggy, so I've inserted a hack into man(1) to get an error string right from builtin table if the locale is C or POSIX.&lt;br /&gt;Here is my patch.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;diff -rNu a/man-1.6f/src/gripes.c b/man-1.6f/src/gripes.c&lt;br /&gt;--- a/man-1.6f/src/gripes.c     2006-11-21 22:53:44.000000000 +0300&lt;br /&gt;+++ b/man-1.6f/src/gripes.c     2009-08-11 10:36:08.000000000 +0400&lt;br /&gt;@@ -99,15 +99,22 @@&lt;br /&gt; static char *&lt;br /&gt; getmsg (int n) {&lt;br /&gt;        char *s = "";&lt;br /&gt;-&lt;br /&gt;-       catinit ();&lt;br /&gt;-       if (catfd != (nl_catd) -1) {&lt;br /&gt;-               s = catgets(catfd, 1, n, "");&lt;br /&gt;-               if (*s &amp;&amp; is_suspect(s))&lt;br /&gt;-                       s = "";&lt;br /&gt;-       }&lt;br /&gt;-       if (*s == 0 &amp;&amp; 0 &lt; n &amp;&amp; n &lt;= MAXMSG)&lt;br /&gt;-               s = msg[n];&lt;br /&gt;+    char *lm;&lt;br /&gt;+    &lt;br /&gt;+    lm = getenv("LC_MESSAGES");&lt;br /&gt;+    if (lm &amp;&amp; (!strcmp(lm, "C") || !strcmp(lm, "POSIX"))) {&lt;br /&gt;+           if (0 &lt; n &amp;&amp; n &lt;= MAXMSG)&lt;br /&gt;+                   s = msg[n];&lt;br /&gt;+    } else {&lt;br /&gt;+        catinit ();&lt;br /&gt;+        if (catfd != (nl_catd) -1) {&lt;br /&gt;+            s = catgets(catfd, 1, n, "");&lt;br /&gt;+            if (*s &amp;&amp; is_suspect(s))&lt;br /&gt;+                s = "";&lt;br /&gt;+        }&lt;br /&gt;+        if (*s == 0 &amp;&amp; 0 &lt; n &amp;&amp; n &lt;= MAXMSG)&lt;br /&gt;+            s = msg[n];&lt;br /&gt;+    }&lt;br /&gt;        if (*s == 0) {&lt;br /&gt;                fprintf(stderr,&lt;br /&gt;                        "man: internal error - cannot find message %d\n", n);&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8099934077984653989-4697143993404497693?l=savrus.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://savrus.blogspot.com/feeds/4697143993404497693/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8099934077984653989&amp;postID=4697143993404497693' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/4697143993404497693'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/4697143993404497693'/><link rel='alternate' type='text/html' href='http://savrus.blogspot.com/2009/08/lang-vs-lcmessages-in-man1-error.html' title='LANG vs LC_MESSAGES in man(1) error messages'/><author><name>savrus</name><uri>http://www.blogger.com/profile/15866109971001922680</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://3.bp.blogspot.com/_a0-3CVBSqgA/Sfd_lpyrebI/AAAAAAAAAG4/1B-3_qAgrdM/S220/IMG_4347.JPG'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8099934077984653989.post-8390351676616596658</id><published>2009-02-24T11:59:00.007+02:00</published><updated>2009-10-08T14:18:13.825+03:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='program'/><category scheme='http://www.blogger.com/atom/ns#' term='perl'/><title type='text'>IRC DCC auto leacher</title><content type='html'>Long ago in 2007 I wrote an irssi script to leach many packs from xdcc bots automaticaly. It's quite unfinished (ugly ui, etc) but I dind't touch it for ages although i'm still using it as it is. It supports downloading from only one bot at a time. If downloading breaks it will retry after timeout. Note that some bots hate if you retry right after a failure and they will ban you (sometimes waiting for more than 5 minutes is required).&lt;br /&gt;&lt;br /&gt;Interfase is like this:&lt;br /&gt;&lt;br /&gt;/xdccget queue [botname] [packs...] - you should start with this. After a minute xdccget will start leaching.&lt;br /&gt;/xdccget add [packs...] - add more packs&lt;br /&gt;/xdccget del - delete last pack&lt;br /&gt;/xdccget pause - stop downloading, remove timers&lt;br /&gt;/xdccget resume - resume downloading&lt;br /&gt;&lt;br /&gt;here is the script:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;#  2007, savrus&lt;br /&gt;#&lt;br /&gt;# Advanced downloader from xdcc bot. Independent on bot's language&lt;br /&gt;#&lt;br /&gt;# Based on XDCCget by Stefan "tommie" Tomanek&lt;br /&gt;#&lt;br /&gt;# Dustributed under GNU GPL v2 or higher&lt;br /&gt;&lt;br /&gt;use vars qw($VERSION %IRSSI);&lt;br /&gt;$VERSION = "20070321";&lt;br /&gt;%IRSSI = (&lt;br /&gt;    authors     =&gt; "savrus",&lt;br /&gt;    contact     =&gt; "go to hell",&lt;br /&gt;    name        =&gt; "xdccget",&lt;br /&gt;    description =&gt; "auto download from xdcc bot",&lt;br /&gt;    license     =&gt; "GPLv2",&lt;br /&gt;    changed     =&gt; "$VERSION",&lt;br /&gt;    commands    =&gt; "xdccget"&lt;br /&gt;);&lt;br /&gt;&lt;br /&gt;use Irssi;&lt;br /&gt;use vars qw(@g_queue $g_nick $g_server $g_witem $g_timer);&lt;br /&gt;&lt;br /&gt;sub show_help() {&lt;br /&gt;    my $help="xdccget $VERSION&lt;br /&gt;/xdccget queue Nickname &lt;number&gt; &lt;number&gt;...&lt;br /&gt;    Queue the specified packs of the server 'Nickname'&lt;br /&gt;/xdccget help&lt;br /&gt;    Display this help&lt;br /&gt;";&lt;br /&gt;    print CLIENTCRAP $help;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;sub sig_dcc_closed {&lt;br /&gt;    my ($dcc) = @_;&lt;br /&gt;    my ($dir,$file) = $dcc-&gt;{file} =~ m,(.*)/(.*),;&lt;br /&gt;&lt;br /&gt;    return unless $dcc-&gt;{type} eq 'GET';&lt;br /&gt;    return unless $dcc-&gt;{nick} eq $g_nick;&lt;br /&gt;&lt;br /&gt;    if ($dcc-&gt;{transfd} &lt; $dcc-&gt;{size}) {&lt;br /&gt;        process_queue($dcc-&gt;{nick});&lt;br /&gt;    }&lt;br /&gt;    else {&lt;br /&gt;#        rename $dcc-&gt;{file}, "/ircdone/$file";&lt;br /&gt;        shift_queue($dcc-&gt;{nick});&lt;br /&gt;        process_queue($dcc-&gt;{nick});&lt;br /&gt;    }&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;sub cmd_xdccget {&lt;br /&gt;    my ($args, $server, $witem) = @_;&lt;br /&gt;    my @arg = split(/ /, $args);&lt;br /&gt;&lt;br /&gt;    if ($arg[0] eq 'queue'){&lt;br /&gt;        shift @arg;&lt;br /&gt;        initialize_queue("@arg", $server, $witem);&lt;br /&gt;    }&lt;br /&gt;    elsif ($arg[0] eq 'add'){&lt;br /&gt;        shift @arg;&lt;br /&gt;        add_to_queue(@arg);&lt;br /&gt;    }&lt;br /&gt;    elsif ($arg[0] eq 'del'){&lt;br /&gt;        dell_from_queue();&lt;br /&gt;    }&lt;br /&gt;    elsif ($arg[0] eq 'pause'){&lt;br /&gt;        pause();&lt;br /&gt;    }&lt;br /&gt;    elsif ($arg[0] eq 'resume'){&lt;br /&gt;        resume();&lt;br /&gt;    }&lt;br /&gt;    elsif ($arg[0] eq 'help') {&lt;br /&gt;        show_help();&lt;br /&gt;    }&lt;br /&gt;    else {&lt;br /&gt;        print CLIENTCRAP "xdccget: $g_nick @g_queue";&lt;br /&gt;    }&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;sub pause {&lt;br /&gt;    $nick = $g_nick;&lt;br /&gt;    clean();&lt;br /&gt;    $g_nick = $nick;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;sub resume {&lt;br /&gt;    process_queue();&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;sub shift_queue {&lt;br /&gt;    shift @g_queue;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;sub clean {&lt;br /&gt;# Clean previous job&lt;br /&gt;    if ($g_nick) {&lt;br /&gt;        my $nick = $g_nick;&lt;br /&gt;        $g_server-&gt;command("MSG $g_nick xdcc remove");&lt;br /&gt;        $g_nick = "";&lt;br /&gt;        $g_server-&gt;command("DCC close get $nick");&lt;br /&gt;    }&lt;br /&gt;    if ($g_timer) {&lt;br /&gt;        Irssi::timeout_remove($g_timer);&lt;br /&gt;    }&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;sub initialize_queue {&lt;br /&gt;    my ($args, $server, $witem) = @_;&lt;br /&gt;    my @args = split(/ /, $args);&lt;br /&gt;&lt;br /&gt;    clean();&lt;br /&gt;&lt;br /&gt;    $g_nick = $args[0];&lt;br /&gt;    shift @args;&lt;br /&gt;    @g_queue = @args;&lt;br /&gt;    $g_server = $server;&lt;br /&gt;    $g_witem = $witem;&lt;br /&gt;    &lt;br /&gt;    process_queue();&lt;br /&gt;}&lt;br /&gt; &lt;br /&gt;sub process_queue {&lt;br /&gt;    while ((@g_queue[0] &lt;= 0) &amp;&amp; (scalar @g_queue &gt; 0)) {&lt;br /&gt;        shift @g_queue;&lt;br /&gt;    }&lt;br /&gt;    if (scalar @g_queue &gt; 0) {&lt;br /&gt;        $g_timer=Irssi::timeout_add(60 * 1000, 'transfer', undef);&lt;br /&gt;    }&lt;br /&gt;}    &lt;br /&gt;&lt;br /&gt;sub transfer {&lt;br /&gt;    Irssi::timeout_remove($g_timer);&lt;br /&gt;    $g_server-&gt;command("MSG $g_nick xdcc send @g_queue[0]");&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;sub add_to_queue {&lt;br /&gt;    my (@args) = @_;&lt;br /&gt;    push @g_queue, @args;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;sub dell_from_queue {&lt;br /&gt;    pop @g_queue;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;Irssi::signal_add('dcc closed', 'sig_dcc_closed');&lt;br /&gt;#&lt;br /&gt;# TODO: handle rejoining the channel (self disconnection + auto reconnection)&lt;br /&gt;# Be careful: content may be changed in such situation.&lt;br /&gt;#&lt;br /&gt;#Irssi::signal_add('channel joined', 'sig_channel_joined');&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;## Used in a modified script for a bot that was always disconnecting from the channel.&lt;br /&gt;## But be careful - usually this happens when bot owner shuffles the packlist&lt;br /&gt;#&lt;br /&gt;#sub sig_message_join {&lt;br /&gt;#    my ($server, $channel, $nick, $addr) = @_;&lt;br /&gt;#    if ($nick eq $g_nick){&lt;br /&gt;#        print CLIENTCRAP "XDCCGET DEBUG: $nick joined channel (wait for $g_nick)";&lt;br /&gt;#        $g_server-&gt;command("MSG $g_nick xdcc remove");&lt;br /&gt;#        $g_server-&gt;command("DCC close get $g_nick");&lt;br /&gt;#        if ($g_timer) {&lt;br /&gt;#            Irssi::timeout_remove($g_timer);&lt;br /&gt;#        }&lt;br /&gt;#        resume();&lt;br /&gt;#    }&lt;br /&gt;#}&lt;br /&gt;#Irssi::signal_add_last('message join', 'sig_message_join');&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Irssi::command_bind('xdccget', \&amp;cmd_xdccget);&lt;br /&gt;&lt;br /&gt;print CLIENTCRAP '%B&gt;&gt;%n '.$IRSSI{name}.' '.$VERSION.' loaded';&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8099934077984653989-8390351676616596658?l=savrus.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://savrus.blogspot.com/feeds/8390351676616596658/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8099934077984653989&amp;postID=8390351676616596658' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/8390351676616596658'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/8390351676616596658'/><link rel='alternate' type='text/html' href='http://savrus.blogspot.com/2009/02/irc-dcc-auto-leacher.html' title='IRC DCC auto leacher'/><author><name>savrus</name><uri>http://www.blogger.com/profile/15866109971001922680</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://3.bp.blogspot.com/_a0-3CVBSqgA/Sfd_lpyrebI/AAAAAAAAAG4/1B-3_qAgrdM/S220/IMG_4347.JPG'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8099934077984653989.post-4226529734768935398</id><published>2009-02-23T15:53:00.008+02:00</published><updated>2009-10-08T14:17:58.636+03:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='c'/><category scheme='http://www.blogger.com/atom/ns#' term='patch'/><category scheme='http://www.blogger.com/atom/ns#' term='mplayer'/><title type='text'>mplayer screenshot name</title><content type='html'>About a year ago a friend of mine complained of the filename style for screenshots by mplayer. He said it should contain the original file name and a timestamp (so one could easily watch screenshoted scene once again). I hacked mplayer but unfortunately the patch didn't get accepted to the mainline.&lt;br /&gt;&lt;br /&gt;Here is the patch anyway:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;Index: libmpcodecs/vf_screenshot.c&lt;br /&gt;===================================================================&lt;br /&gt;--- libmpcodecs/vf_screenshot.c (revision 28346)&lt;br /&gt;+++ libmpcodecs/vf_screenshot.c (working copy)&lt;br /&gt;@@ -22,9 +22,19 @@&lt;br /&gt; #include "libswscale/swscale.h"&lt;br /&gt; #include "libavcodec/avcodec.h"&lt;br /&gt; &lt;br /&gt;+#include "m_option.h"&lt;br /&gt;+#include "m_struct.h"&lt;br /&gt;+&lt;br /&gt;+#include "mplayer.h"&lt;br /&gt;+&lt;br /&gt;+#define SHOT_FNAME_LENGTH 102&lt;br /&gt;+#define TIMESTAMP_LENGTH 20&lt;br /&gt;+&lt;br /&gt; struct vf_priv_s {&lt;br /&gt;     int frameno;&lt;br /&gt;-    char fname[102];&lt;br /&gt;+    char *fname;&lt;br /&gt;+    char *basename;&lt;br /&gt;+    int style;&lt;br /&gt;     /// shot stores current screenshot mode:&lt;br /&gt;     /// 0: don't take screenshots&lt;br /&gt;     /// 1: take single screenshot, reset to 0 afterwards&lt;br /&gt;@@ -36,7 +46,7 @@&lt;br /&gt;     AVCodecContext *avctx;&lt;br /&gt;     uint8_t *outbuffer;&lt;br /&gt;     int outbuffer_size;&lt;br /&gt;-};&lt;br /&gt;+} vf_priv_dflt;&lt;br /&gt; &lt;br /&gt; //===========================================================================//&lt;br /&gt; &lt;br /&gt;@@ -92,14 +102,65 @@&lt;br /&gt;     else return 0;&lt;br /&gt; }&lt;br /&gt; &lt;br /&gt;-static void gen_fname(struct vf_priv_s* priv)&lt;br /&gt;+static void gen_fname(struct vf_priv_s* priv, double pts)&lt;br /&gt; {&lt;br /&gt;-    do {&lt;br /&gt;-        snprintf (priv-&gt;fname, 100, "shot%04d.png", ++priv-&gt;frameno);&lt;br /&gt;-    } while (fexists(priv-&gt;fname) &amp;&amp; priv-&gt;frameno &lt; 100000);&lt;br /&gt;-    if (fexists(priv-&gt;fname)) {&lt;br /&gt;-        priv-&gt;fname[0] = '\0';&lt;br /&gt;-        return;&lt;br /&gt;+    char *base = "shot";&lt;br /&gt;+    char tstamp[TIMESTAMP_LENGTH];&lt;br /&gt;+    char ts_sep;&lt;br /&gt;+    uint32_t hour = (uint32_t) (pts/3600);&lt;br /&gt;+    uint32_t min  = (uint32_t) (pts/60) % 60;&lt;br /&gt;+    uint32_t sec  = (uint32_t) pts % 60;&lt;br /&gt;+    uint32_t msec = (uint32_t) (pts*100) % 100;&lt;br /&gt;+    size_t n;&lt;br /&gt;+&lt;br /&gt;+    switch (priv-&gt;style) {&lt;br /&gt;+    case 1:&lt;br /&gt;+    case 2:&lt;br /&gt;+        ts_sep = (priv-&gt;style == 1) ? ':' : '-';&lt;br /&gt;+        /* TIMESTAMP_LENGTH is enough for 2^32 seconds */&lt;br /&gt;+        snprintf(tstamp, TIMESTAMP_LENGTH,&lt;br /&gt;+                  "%02" PRIu32 "%c%02" PRIu32 "%c%02" PRIu32 ".%02" PRIu32,&lt;br /&gt;+                   hour, ts_sep, min, ts_sep, sec, msec);&lt;br /&gt;+&lt;br /&gt;+        if (priv-&gt;basename)&lt;br /&gt;+           base = priv-&gt;basename;&lt;br /&gt;+        else {&lt;br /&gt;+            if (strstr(filename, "://"))&lt;br /&gt;+                base = "shot";&lt;br /&gt;+            else {&lt;br /&gt;+                base = strrchr(filename, '/');&lt;br /&gt;+                if (base == NULL) &lt;br /&gt;+                    base = filename;&lt;br /&gt;+                else&lt;br /&gt;+                    base++;&lt;br /&gt;+            }&lt;br /&gt;+        }&lt;br /&gt;+&lt;br /&gt;+        /* 8 is length of ".[].png" plus '\0' */&lt;br /&gt;+        n = strlen(base) + strlen(tstamp) + 8;&lt;br /&gt;+        priv-&gt;fname = malloc(n);&lt;br /&gt;+        if (!priv-&gt;fname) {&lt;br /&gt;+            mp_msg(MSGT_VFILTER,MSGL_ERR,"Unable to allocate memory in vf_screenshot.c\n");&lt;br /&gt;+            return;&lt;br /&gt;+        }&lt;br /&gt;+        snprintf(priv-&gt;fname, n, "%s.[%s].png", base, tstamp);&lt;br /&gt;+        break;&lt;br /&gt;+    default:&lt;br /&gt;+    case 0:&lt;br /&gt;+        priv-&gt;fname = malloc(SHOT_FNAME_LENGTH);&lt;br /&gt;+        if (!priv-&gt;fname) {&lt;br /&gt;+            mp_msg(MSGT_VFILTER,MSGL_ERR,"Unable to allocate memory in vf_screenshot.c\n");&lt;br /&gt;+            return;&lt;br /&gt;+        }&lt;br /&gt;+        do {&lt;br /&gt;+            snprintf (priv-&gt;fname, SHOT_FNAME_LENGTH, "shot%04d.png", ++priv-&gt;frameno);&lt;br /&gt;+        } while (fexists(priv-&gt;fname) &amp;&amp; priv-&gt;frameno &lt; 100000);&lt;br /&gt;+        if (fexists(priv-&gt;fname)) {&lt;br /&gt;+            free(priv-&gt;fname);&lt;br /&gt;+            priv-&gt;fname = NULL;&lt;br /&gt;+            return;&lt;br /&gt;+        }&lt;br /&gt;+        break;&lt;br /&gt;     }&lt;br /&gt; &lt;br /&gt;     mp_msg(MSGT_VFILTER,MSGL_INFO,"*** screenshot '%s' ***\n",priv-&gt;fname);&lt;br /&gt;@@ -196,11 +257,13 @@&lt;br /&gt;     if(vf-&gt;priv-&gt;shot) {&lt;br /&gt;         if (vf-&gt;priv-&gt;shot==1)&lt;br /&gt;             vf-&gt;priv-&gt;shot=0;&lt;br /&gt;-        gen_fname(vf-&gt;priv);&lt;br /&gt;-        if (vf-&gt;priv-&gt;fname[0]) {&lt;br /&gt;+        gen_fname(vf-&gt;priv, pts);&lt;br /&gt;+        if (vf-&gt;priv-&gt;fname) {&lt;br /&gt;             if (!vf-&gt;priv-&gt;store_slices)&lt;br /&gt;               scale_image(vf-&gt;priv, dmpi);&lt;br /&gt;             write_png(vf-&gt;priv);&lt;br /&gt;+            free(vf-&gt;priv-&gt;fname);&lt;br /&gt;+            vf-&gt;priv-&gt;fname = NULL;&lt;br /&gt;         }&lt;br /&gt;         vf-&gt;priv-&gt;store_slices = 0;&lt;br /&gt;     }&lt;br /&gt;@@ -263,6 +326,8 @@&lt;br /&gt;     av_freep(&amp;vf-&gt;priv-&gt;avctx);&lt;br /&gt;     if(vf-&gt;priv-&gt;ctx) sws_freeContext(vf-&gt;priv-&gt;ctx);&lt;br /&gt;     if (vf-&gt;priv-&gt;buffer) free(vf-&gt;priv-&gt;buffer);&lt;br /&gt;+    if (vf-&gt;priv-&gt;fname) free(vf-&gt;priv-&gt;fname);&lt;br /&gt;+    if (vf-&gt;priv-&gt;basename) free(vf-&gt;priv-&gt;basename);&lt;br /&gt;     free(vf-&gt;priv-&gt;outbuffer);&lt;br /&gt;     free(vf-&gt;priv);&lt;br /&gt; }&lt;br /&gt;@@ -278,7 +343,7 @@&lt;br /&gt;     vf-&gt;draw_slice=draw_slice;&lt;br /&gt;     vf-&gt;get_image=get_image;&lt;br /&gt;     vf-&gt;uninit=uninit;&lt;br /&gt;-    vf-&gt;priv=malloc(sizeof(struct vf_priv_s));&lt;br /&gt;+    if (!vf-&gt;priv) vf-&gt;priv = calloc(1, sizeof(struct vf_priv_s));&lt;br /&gt;     vf-&gt;priv-&gt;frameno=0;&lt;br /&gt;     vf-&gt;priv-&gt;shot=0;&lt;br /&gt;     vf-&gt;priv-&gt;store_slices=0;&lt;br /&gt;@@ -294,14 +359,27 @@&lt;br /&gt;     return 1;&lt;br /&gt; }&lt;br /&gt; &lt;br /&gt;+#define ST_OFF(f) M_ST_OFF(struct vf_priv_s,f)&lt;br /&gt;+static const m_option_t vf_opts_fields[] = {&lt;br /&gt;+    {"style", ST_OFF(style), CONF_TYPE_INT, 0, 0, 2, NULL},&lt;br /&gt;+    {"basename", ST_OFF(basename), CONF_TYPE_STRING, 0, M_OPT_MIN, M_OPT_MAX, NULL},&lt;br /&gt;+    {NULL, NULL, 0, 0, 0, 0, NULL}&lt;br /&gt;+};&lt;br /&gt; &lt;br /&gt;+static const m_struct_t vf_opts = {&lt;br /&gt;+    "screenshot",&lt;br /&gt;+    sizeof(struct vf_priv_s),&lt;br /&gt;+    &amp;vf_priv_dflt,&lt;br /&gt;+    vf_opts_fields&lt;br /&gt;+};&lt;br /&gt;+&lt;br /&gt; const vf_info_t vf_info_screenshot = {&lt;br /&gt;     "screenshot to file",&lt;br /&gt;     "screenshot",&lt;br /&gt;     "A'rpi, Jindrich Makovicka",&lt;br /&gt;     "",&lt;br /&gt;     screenshot_open,&lt;br /&gt;-    NULL&lt;br /&gt;+    &amp;vf_opts&lt;br /&gt; };&lt;br /&gt; &lt;br /&gt; //===========================================================================//&lt;br /&gt;Index: DOCS/man/en/mplayer.1&lt;br /&gt;===================================================================&lt;br /&gt;--- DOCS/man/en/mplayer.1       (revision 28346)&lt;br /&gt;+++ DOCS/man/en/mplayer.1       (working copy)&lt;br /&gt;@@ -7180,15 +7180,28 @@&lt;br /&gt; .RE&lt;br /&gt; .&lt;br /&gt; .TP&lt;br /&gt;-.B screenshot&lt;br /&gt;+.B screenshot[=style:basename]&lt;br /&gt; Allows acquiring screenshots of the movie using slave mode&lt;br /&gt; commands that can be bound to keypresses.&lt;br /&gt; See the slave mode documentation and the INTERACTIVE CONTROL&lt;br /&gt; section for details.&lt;br /&gt;-Files named 'shotNNNN.png' will be saved in the working directory,&lt;br /&gt;-using the first available number \- no files will be overwritten.&lt;br /&gt; The filter has no overhead when not used and accepts an arbitrary&lt;br /&gt; colorspace, so it is safe to add it to the configuration file.&lt;br /&gt;+.RSs&lt;br /&gt;+.IPs &amp;lt;style&amp;gt;&lt;br /&gt;+0: Files named 'shotNNNN.png' will be saved in the working directory,&lt;br /&gt;+using the first available number \- no files will be overwritten.&lt;br /&gt;+.br&lt;br /&gt;+1: Files named 'basename.[hh:mm:ss.ms].png' will be saved in the&lt;br /&gt;+working directory.&lt;br /&gt;+.br&lt;br /&gt;+2: Files named 'basename.[hh-mm-ss.ms].png' will be saved in the&lt;br /&gt;+working directory. Use this if your environment doesn't allow colon&lt;br /&gt;+in filenames.&lt;br /&gt;+.IPs &amp;lt;basename&amp;gt;&lt;br /&gt;+Basename for a screenshot file. If not set and mplayer is playing a file,&lt;br /&gt;+the file name will be used. If not set and mplayer is playing a stream,&lt;br /&gt;+"shot" will be used.&lt;br /&gt; .RE&lt;br /&gt; .&lt;br /&gt; .TP&lt;br /&gt;Index: mencoder.c&lt;br /&gt;===================================================================&lt;br /&gt;--- mencoder.c  (revision 28346)&lt;br /&gt;+++ mencoder.c  (working copy)&lt;br /&gt;@@ -115,6 +115,7 @@&lt;br /&gt; char* audio_lang=NULL;&lt;br /&gt; char* dvdsub_lang=NULL;&lt;br /&gt; static char* spudec_ifo=NULL;&lt;br /&gt;+char* filename=NULL;&lt;br /&gt; &lt;br /&gt; static char** audio_codec_list=NULL;  // override audio codec&lt;br /&gt; static char** video_codec_list=NULL;  // override video codec&lt;br /&gt;@@ -397,7 +398,6 @@&lt;br /&gt; double v_timer_corr=0;&lt;br /&gt; &lt;br /&gt; m_entry_t* filelist = NULL;&lt;br /&gt;-char* filename=NULL;&lt;br /&gt; &lt;br /&gt; int decoded_frameno=0;&lt;br /&gt; int next_frameno=-1;&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8099934077984653989-4226529734768935398?l=savrus.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://savrus.blogspot.com/feeds/4226529734768935398/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8099934077984653989&amp;postID=4226529734768935398' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/4226529734768935398'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/4226529734768935398'/><link rel='alternate' type='text/html' href='http://savrus.blogspot.com/2009/02/mplayer-screenshot-name.html' title='mplayer screenshot name'/><author><name>savrus</name><uri>http://www.blogger.com/profile/15866109971001922680</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://3.bp.blogspot.com/_a0-3CVBSqgA/Sfd_lpyrebI/AAAAAAAAAG4/1B-3_qAgrdM/S220/IMG_4347.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8099934077984653989.post-1091103571095662150</id><published>2008-11-18T12:55:00.011+02:00</published><updated>2009-10-08T14:17:42.027+03:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='program'/><category scheme='http://www.blogger.com/atom/ns#' term='c'/><category scheme='http://www.blogger.com/atom/ns#' term='pam'/><title type='text'>pam_setquota, pam_kill</title><content type='html'>Being network administrator at MSU dorm two years ago, I made a public ssh server. Users were presented in mysql database on another server, so I used pam_mysql and libnss_mysql, which were already existed, for my public ssh server. I also wanted to set disk quota automatically for each user, but linux setquota(8) doesn't allow you to edit quota for non-existing user. Nor did work any pam_setquotas I found. So I wrote a one myself.&lt;br /&gt;&lt;br /&gt;Of course, I edited limits.conf. But it didn't save from stupid cpu-intensive "while(1);" programs some nasty users had left running. I decided to kill every user's process, if he/she is no longer logged on the system, and wrote pam_kill for that.&lt;br /&gt;&lt;br /&gt;Today, to share my old code, I created two Google Code projects: pam-setquota and pam-kill. You can access them via my Google Code &lt;a href="http://code.google.com/u/ruslan.savchenko"&gt;profile&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8099934077984653989-1091103571095662150?l=savrus.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://savrus.blogspot.com/feeds/1091103571095662150/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8099934077984653989&amp;postID=1091103571095662150' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/1091103571095662150'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/1091103571095662150'/><link rel='alternate' type='text/html' href='http://savrus.blogspot.com/2008/11/pamsetquota-pamkill.html' title='pam_setquota, pam_kill'/><author><name>savrus</name><uri>http://www.blogger.com/profile/15866109971001922680</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://3.bp.blogspot.com/_a0-3CVBSqgA/Sfd_lpyrebI/AAAAAAAAAG4/1B-3_qAgrdM/S220/IMG_4347.JPG'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8099934077984653989.post-1400866335258341813</id><published>2008-11-15T17:06:00.017+02:00</published><updated>2009-10-08T14:17:26.397+03:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='gcc'/><category scheme='http://www.blogger.com/atom/ns#' term='asm'/><category scheme='http://www.blogger.com/atom/ns#' term='crc32'/><category scheme='http://www.blogger.com/atom/ns#' term='c'/><title type='text'>crc32</title><content type='html'>Long time passed since my last play with crc32. I wanted to learn and benchmark zlib's implementation as well (it's quite complex in comparison to basic ones), but it seems it will stuck in my todo list for ages. So, I decided to write what I know right now.&lt;br /&gt;&lt;br /&gt;This is a very simple implementation taken from rhash (&lt;a href="http://rhash.sourceforge.net"&gt;rhash.sourceforge.net&lt;/a&gt;). Every article about crc32 describes the code below, it is not something outstanding from rhash.&lt;br /&gt;&lt;pre&gt;unsigned get_crc32(unsigned crcinit, const char *p, int len) {&lt;br /&gt;  register unsigned crc;&lt;br /&gt;  const char *e = p + len;&lt;br /&gt;    &lt;br /&gt;  for(crc=crcinit^0xFFFFFFFF; p&amp;lt;e; p++) &lt;br /&gt;    crc = crcTable[(crc^ *p) &amp; 0xFF] ^ (crc &amp;gt;&amp;gt; 8);&lt;br /&gt;  return( crc^0xFFFFFFFF );&lt;br /&gt;}&lt;/pre&gt;&lt;br /&gt;And this is an x86 asm code, produced by gcc 4.1.2 from the 'for' loop:&lt;br /&gt;&lt;pre&gt;.L11:&lt;br /&gt;        movsbl  (%ecx,%esi),%eax&lt;br /&gt;        incl    %ecx&lt;br /&gt;        xorl    %edx, %eax&lt;br /&gt;        andl    $255, %eax&lt;br /&gt;        shrl    $8, %edx&lt;br /&gt;        xorl    crcTable(,%eax,4), %edx&lt;br /&gt;        cmpl    %ebx, %ecx&lt;br /&gt;        jne     .L11&lt;/pre&gt;&lt;br /&gt;One of my friends found a bit faster implementation written in inline asm and used it for his hash checker ArXSum. I will not give here his code, because my optimization of rhash code is even faster. ArXSum just gave me an idea to use 8-bit registers to get rid of the andl instruction.&lt;br /&gt;First, I enforced gcc to use 8-bit register. I hoped it would be enough, but it won't.&lt;br /&gt;&lt;pre&gt;unsigned get_crc32(unsigned crcinit, const char *p, int len) {&lt;br /&gt;  register unsigned crc;&lt;br /&gt;  unsigned char m;&lt;br /&gt;  const char *e = p + len;&lt;br /&gt;  m = 0;&lt;br /&gt;    &lt;br /&gt;  for(crc=crcinit^0xFFFFFFFF; p&amp;lt;e; p++) { &lt;br /&gt;    m = (crc^ *p);&lt;br /&gt;    crc = crcTable[m] ^ (crc &amp;gt;&amp;gt; 8);&lt;br /&gt;  }&lt;br /&gt;  return( crc^0xFFFFFFFF );&lt;br /&gt;}&lt;/pre&gt;&lt;br /&gt;The code produced is&lt;br /&gt;&lt;pre&gt;.L11:&lt;br /&gt;        movzbl  (%ecx,%esi), %eax&lt;br /&gt;        incl    %ecx&lt;br /&gt;        xorb    %dl, %al&lt;br /&gt;        movzbl  %al, %eax&lt;br /&gt;        shrl    $8, %edx&lt;br /&gt;        xorl    crcTable(,%eax,4), %edx&lt;br /&gt;        cmpl    %ebx, %ecx&lt;br /&gt;        jne     .L11&lt;/pre&gt;&lt;br /&gt;Do you see that utterly useless second movzbl? My final optimization was just to remove it and to add "xorl %eax,%eax" before the loop (that would be "m = 0" which had been lost by gcc). Newest version of gcc also produces the same code.&lt;br /&gt;&lt;br /&gt;I still want to carefully look into zlib one day and to compare their high-level optimization with mine. I will eventually post about it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8099934077984653989-1400866335258341813?l=savrus.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://savrus.blogspot.com/feeds/1400866335258341813/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8099934077984653989&amp;postID=1400866335258341813' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/1400866335258341813'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/1400866335258341813'/><link rel='alternate' type='text/html' href='http://savrus.blogspot.com/2008/11/crc32.html' title='crc32'/><author><name>savrus</name><uri>http://www.blogger.com/profile/15866109971001922680</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://3.bp.blogspot.com/_a0-3CVBSqgA/Sfd_lpyrebI/AAAAAAAAAG4/1B-3_qAgrdM/S220/IMG_4347.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8099934077984653989.post-2209265520590444613</id><published>2007-10-29T22:43:00.005+02:00</published><updated>2009-10-08T14:17:09.882+03:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='gcc'/><category scheme='http://www.blogger.com/atom/ns#' term='bug'/><category scheme='http://www.blogger.com/atom/ns#' term='c'/><title type='text'>gcc optimizer</title><content type='html'>I found that the following code causes segfault when compiled with -O2. Compiler:&lt;br /&gt;gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;#include &amp;lt;stdlib.h&amp;gt;&lt;br /&gt;&lt;br /&gt;struct prime {&lt;br /&gt;        struct prime *next;&lt;br /&gt;};&lt;br /&gt;&lt;br /&gt;int main(){&lt;br /&gt;        struct prime primes = {.next=NULL};&lt;br /&gt;        struct prime *p = &amp;primes;&lt;br /&gt;&lt;br /&gt;        while (p-&gt;next != NULL){&lt;br /&gt;                p = p-&gt;next;&lt;br /&gt;        }&lt;br /&gt;        p-&gt;next = (struct prime*) malloc (sizeof(struct prime));&lt;br /&gt;        return 0;&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8099934077984653989-2209265520590444613?l=savrus.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://savrus.blogspot.com/feeds/2209265520590444613/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8099934077984653989&amp;postID=2209265520590444613' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/2209265520590444613'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/2209265520590444613'/><link rel='alternate' type='text/html' href='http://savrus.blogspot.com/2007/10/gcc-optimizer.html' title='gcc optimizer'/><author><name>savrus</name><uri>http://www.blogger.com/profile/15866109971001922680</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://3.bp.blogspot.com/_a0-3CVBSqgA/Sfd_lpyrebI/AAAAAAAAAG4/1B-3_qAgrdM/S220/IMG_4347.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8099934077984653989.post-230180678306792863</id><published>2007-10-14T21:51:00.000+03:00</published><updated>2007-10-14T23:46:00.521+03:00</updated><title type='text'></title><content type='html'>I've just started a new blog. I'll try to keep it clean and for my own accomplishments only.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8099934077984653989-230180678306792863?l=savrus.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/230180678306792863'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8099934077984653989/posts/default/230180678306792863'/><link rel='alternate' type='text/html' href='http://savrus.blogspot.com/2007/10/test.html' title=''/><author><name>savrus</name><uri>http://www.blogger.com/profile/15866109971001922680</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://3.bp.blogspot.com/_a0-3CVBSqgA/Sfd_lpyrebI/AAAAAAAAAG4/1B-3_qAgrdM/S220/IMG_4347.JPG'/></author></entry></feed>
