sgrep 'start or "\n" .. (end or "\n") containing "Hello World"'Same query using sample macros
sgrep 'LINE containing "Hello World"'
sgrep '"\nFrom: " .. "\n" extracting ("\n" in "\nFrom: ")'Same query using sample macros
sgrep 'MAIL_FROM'
NEWS_FROM in (NEWS_HEADER containing (NEWS_SUBJ containing ("sgrep" or "linux")))Same query with macros expanded
(("\nFrom: " in ( ( start or (("\n\nFrom ") extracting ("\n\n" in "\n\nFrom "))) .. ("\n" in "\n\n")) extracting ("\n" in "\nFrom: ") .. ( "\n" or end ))) in (( ( start or (("\n\nFrom ") extracting ("\n\n" in "\n\nFrom "))) .. ("\n" in "\n\n")) containing ((("\nSubject: " in ( ( start or (("\n\nFrom ") extracting ("\n\n" in "\n\nFrom "))) .. ("\n" in "\n\n")) extracting ("\n" in "\nSubject: ") .. ( "\n" or end ))) containing ("sgrep" or "linux")))Now you see that macros are very useful :)
sgrep -o"%f:%r\n" '(HTML_TITLE in (start .. end containing (HTML_HREF containing "www.cs.helsinki.fi")))' *.htmlSame query with macros expanded
((( ( "<TITLE>" or ( ("<TITLE " or "<TITLE\t" or "<TITLE\n") .. ">")) .. ( "</TITLE>" ) )) in (start .. end containing ((( (( " " or "\t" or "\n" or "\r") __ ">" in (inner(("<" not in ("</" or "<!" or "<?" )) .. ">" ) extracting (("<" not in ("</" or "<!" or "<?" )) __ (( " " or "\t" or "\n" or "\r") or ">" ) in inner(("<" not in ("</" or "<!" or "<?" )) .. ">" ) ))) containing "HREF=" ._ (( " " or "\t" or "\n" or "\r") or ">"))) containing "www.cs.helsinki.fi")))
sgrep '("<TITLE>" .. "</TITLE>") or ("<H1>" .. "</H1>") or \ ("<H2>" .. "</H2>") or ("<H3>" .. "</H3>") or \ ("<H4>" .. "</H4>") or ("<H5>" .. "</H5>") or \ ("<H6>" .. "</H6>") or ("<H7>" .. "</H7>") or\ ("<H8>" .. "</H8>") or ("<H9>" .. "</H9>")'Same query using example macros. This query is more exact, since it uses the SGML macros which can handle tags which contain attributes.
sgrep 'HTML_TITLE or HTML_H1 or HTML_H3 or HTML_H4 or HTML_H5 \ or HTML_H6 or NAMED_ELEMS(H7) or NAMED_ELEMS(H8) \ or NAMED_ELEMS(H9)'Previous query with macros expanded
(( ( "<TITLE>" or ( ("<TITLE " or "<TITLE\t" or "<TITLE\n") .. ">")) .. ( "</TITLE>" ) )) or (( ( "<H1>" or ( ("<H1 " or "<H1\t" or "<H1\n") .. ">")) .. ( "</H1>" ) )) or (( ( "<H3>" or ( ("<H3 " or "<H3\t" or "<H3\n") .. ">")) .. ( "</H3>" ) )) or (( ( "<H4>" or ( ("<H4 " or "<H4\t" or "<H4\n") .. ">")) .. ( "</H4>" ) )) or (( ( "<H5>" or ( ("<H5 " or "<H5\t" or "<H5\n") .. ">")) .. ( "</H5>" ) )) or (( ( "<H6>" or ( ("<H6 " or "<H6\t" or "<H6\n") .. ">")) .. ( "</H6>" ) )) or ( ( "<H7>" or ( ("<H7 " or "<H7\t" or "<H7\n") .. ">")) .. ( "</H7>" ) ) or ( ( "<H8>" or ( ("<H8 " or "<H8\t" or "<H8\n") .. ">")) .. ( "</H8>" ) ) or ( ( "<H9>" or ( ("<H9 " or "<H9\t" or "<H9\n") .. ">")) .. ( "</H9>" ) )
sgrep -a -o" " 'NAMED_STAG(FONT) or "</FONT>"'Same example with macros expanded
sgrep -a -o" " ( "<FONT>" or ( ("<FONT " or "<FONT\t" or \ "<FONT\n") .. ">")) or "</FONT>"A different solution to same problem
sgrep 'start .. end extracting (NAMED_STAG(FONT) or "</FONT>")'
sgrep -c '"<FIG>" .. "</FIG>" in ("<SUBPARA>".."</SUBPARA>")'Same example using sample macros
sgrep -c 'NAMED_ELEMS(FIG) in NAMED_ELEMS(SUBPARA) not in NAMED_ELEMS(PARA)'
sgrep 'HTML_TITLE in (start .. end containing (\ join(12,"SGML") or (HTML_H1 or HTML_H2 containing "SGML") ) )' *.html
sgrep 'MAIL_FROM in (MAIL_MESS containing \ (MAIL_SUBJ containing "SGML") \ not containing (MAIL_BODY containing "HTML") \ containing (MAIL_DATE containing "1996") \ not containing (MAIL_FROM containing "flame@hot.com") )'
sgrep '"&"'the script does not convert the query to proper HTML because the "&" phrase looks like correct entity, and is bypassed. Instead you get HTML which when rendered by browser looks like this
sgrep '"&"'Yes, this did bite me. Thanks to Axel Boldt for pointing this out.
Here is the manually edited script anyway:
#!/bin/tcsh sgrep -a -o"<" '"<" in ("<PRE>"__"</PRE>")' | \ sgrep -a -o">" '">" in ("<PRE>"__"</PRE>")' | \ sgrep -a -o"&" '"&" in ("<PRE>"__"</PRE>") \ not in (">" or "<" or "&")'Sgrep home page
This document is maintained by Jani Jaakkola