<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>chys&#039;s random notes &#187; assembly</title>
	<atom:link href="http://en.chys.info/tag/assembly/feed/" rel="self" type="application/rss+xml" />
	<link>http://en.chys.info</link>
	<description>Study more problems; Talk less of isms.</description>
	<lastBuildDate>Tue, 27 Dec 2011 11:56:38 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Intel announces AVX2</title>
		<link>http://en.chys.info/2011/06/intel-announces-avx2/</link>
		<comments>http://en.chys.info/2011/06/intel-announces-avx2/#comments</comments>
		<pubDate>Tue, 14 Jun 2011 06:38:01 +0000</pubDate>
		<dc:creator>chys</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[assembly]]></category>
		<category><![CDATA[hardware]]></category>

		<guid isPermaLink="false">http://en.chys.info/?p=898</guid>
		<description><![CDATA[The documentation is available for download. The instruction set war is still there &#8211; Intel still doesn&#8217;t plan to support many XOP features of AMD; also Intel still plans to use FMA3 while AMD uses FMA4. Nevertheless, this time Intel is at least not making the war even worse. In addition to extending most SSE2/SSE3/SSE4 [...]<hr/>
No related posts.]]></description>
			<content:encoded><![CDATA[<p>The documentation is available for <a href="http://software.intel.com/file/36945">download</a>.</p>
<p>The <a href="http://www.agner.org/optimize/blog/read.php?i=25">instruction set war</a> is still there &#8211; Intel still doesn&#8217;t plan to support many <a href="http://en.wikipedia.org/wiki/XOP_instruction_set">XOP</a> features of AMD; also Intel still plans to use <a href="http://en.wikipedia.org/wiki/FMA_instruction_set">FMA3</a> while AMD uses FMA4. Nevertheless, this time Intel is at least not making the war even worse. In addition to extending most SSE2/SSE3/SSE4 instructions to 256 bits (this is no surprise), they copied BMI (with an extension called BMI2) and <a href="http://en.wikipedia.org/wiki/CVT16_instruction_set">CVT16</a> from AMD. If I recall correctly, Intel had never copied so many instructions from AMD at once, with the notable exception of x86-64.</p>
<hr/><p>No related posts.</p>]]></content:encoded>
			<wfw:commentRss>http://en.chys.info/2011/06/intel-announces-avx2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>.note.GNU-stack</title>
		<link>http://en.chys.info/2010/12/note-gnu-stack/</link>
		<comments>http://en.chys.info/2010/12/note-gnu-stack/#comments</comments>
		<pubDate>Sat, 25 Dec 2010 07:16:17 +0000</pubDate>
		<dc:creator>chys</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[assembly]]></category>
		<category><![CDATA[security]]></category>

		<guid isPermaLink="false">http://en.chys.info/?p=870</guid>
		<description><![CDATA[GCC always appends one line to any assembler file (.s) file it generates: .section .note.GNU-stack,"",@progbits Literally, it adds an empty section named .note.GNU-stack to the object file, but it actually serves a hint to the linker* that code in this object file does not require an executable stack. GNU assembler also accepts command-line option “--noexecstack”, [...]<hr/>
Related posts:<ol>
<li><a href='http://en.chys.info/2009/04/dynamic-library-symlinks/' rel='bookmark' title='Dynamic library symlinks'>Dynamic library symlinks</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>GCC always appends one line to any assembler file (.s) file it generates:</p>
<blockquote><pre>	.section	.note.GNU-stack,"",@progbits</pre>
</blockquote>
<p>Literally, it adds an empty section named <code>.note.GNU-stack</code> to the object file, but it actually serves a hint to the linker* that code in <em>this</em> object file does <em>not</em> require an executable stack. GNU assembler also accepts command-line option “<code>--noexecstack</code>”, which has the same effect.</p>
<p>If <em>every</em> object file contains a section of this name, the linker knows the whole program does not need an executable stack, and the resulting executable will run with a non-executable stack if the OS and underlying hardware support it (see also <a href="http://en.wikipedia.org/wiki/NX_bit">NX bit</a>).</p>
<p>Why is this important? In practice, virtually no program needs an executable stack (hackers may sometimes use it, though), but <a href="http://en.wikipedia.org/wiki/Buffer_overflow">buffer overflow attacks</a> frequently insert and run code in stacks. A non-executable stack helps improve security without any overhead.</p>
<p>* GNU linker only.</p>
<hr/><p>Related posts:<ol>
<li><a href='http://en.chys.info/2009/04/dynamic-library-symlinks/' rel='bookmark' title='Dynamic library symlinks'>Dynamic library symlinks</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://en.chys.info/2010/12/note-gnu-stack/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Unaligned access</title>
		<link>http://en.chys.info/2009/12/unaligned-access/</link>
		<comments>http://en.chys.info/2009/12/unaligned-access/#comments</comments>
		<pubDate>Sat, 26 Dec 2009 07:10:24 +0000</pubDate>
		<dc:creator>chys</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[assembly]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[segfault]]></category>
		<category><![CDATA[x86-64]]></category>

		<guid isPermaLink="false">http://en.chys.info/?p=717</guid>
		<description><![CDATA[Misalignment is not an error (only incurs a performance penalty) on x86 processors except for a few new instructions added in recent years. MOVDQA, for example, is an SSE2 instruction requiring alignment on 16-byte boundaries. Textbooks have normally taught us we get a bus error if a CPU which disallows unaligned access actually encounters one. [...]<hr/>
No related posts.]]></description>
			<content:encoded><![CDATA[<p><a href="http://en.wikipedia.org/wiki/Data_structure_alignment">Misalignment</a> is not an error (only incurs a performance penalty) on x86 processors except for a few new instructions added in recent years. <a href="http://www.sesp.cse.clrc.ac.uk/html/SoftwareTools/vtune/users_guide/mergedProjects/analyzer_ec/mergedProjects/reference_olh/mergedProjects/instructions/instruct32_hh/vc183.htm">MOVDQA</a>, for example, is an SSE2 instruction requiring alignment on 16-byte boundaries.</p>
<p>Textbooks have normally taught us we get a <a href="http://en.wikipedia.org/wiki/Bus_error">bus error</a> if a CPU which disallows unaligned access actually encounters one.</p>
<p>But we observe a Linux process passing misaligned addresses to MOVDQA receives <code>SIGSEGV</code> (segmentation fault) instead of <code>SIGBUS</code> (bus error), on both ia32 and x86-64.</p>
<blockquote><pre>
<font color="blue">laptop /tmp $</font> cat a.c
int main ()
{
    char X[32];
    asm ("pxor %%xmm0,%%xmm0; movdqa %%xmm0,%0" : "=m"(X[1]) :: "xmm0");
    return 0;
}
<font color="blue">laptop /tmp $</font> gcc -msse2 a.c
<font color="blue">laptop /tmp $</font> ./a.out
Segmentation fault
<font color="blue">laptop /tmp $</font> kill -l $?
SEGV
</pre>
</blockquote>
<p>x86-64 (and ia32 beginning 80486SX) supports disallowing any misaligned access*. In that case, a normal instruction raises <code>SIGBUS</code>, but instructions which inherently requires alignment (e.g. <code>MOVDQA</code>) still raises <code>SIGSEGV</code>. It&#8217;s not so consistent.</p>
<p>* It is normally disabled. To enable it, set the AC bit in <a href="http://en.wikipedia.org/wiki/FLAGS_register_%28computing%29">FLAGS</a>:</p>
<blockquote><p><code>pushf</code><br />
<code>or $0x40000,(%esp)</code> (or <code>%rsp</code> on x86-64)<br />
<code>popf</code></p></blockquote>
<hr/><p>No related posts.</p>]]></content:encoded>
			<wfw:commentRss>http://en.chys.info/2009/12/unaligned-access/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Floating point exception</title>
		<link>http://en.chys.info/2009/12/floating-point-exception/</link>
		<comments>http://en.chys.info/2009/12/floating-point-exception/#comments</comments>
		<pubDate>Tue, 01 Dec 2009 07:28:15 +0000</pubDate>
		<dc:creator>chys</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[assembly]]></category>
		<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://en.chys.info/?p=710</guid>
		<description><![CDATA[It is already confusing enough that &#8220;floating point exception&#8221; may mean &#8220;division by zero&#8221; in integral arithmetic. It turns out it can also mean &#8220;overflow&#8221; in some cases, as in the following program (it&#8217;s difficult in C, so I had to use assembly): #include &#60;asm/unistd.h&#62; .code: .globl _start _start: mov $1, %eax mov $1, %edx [...]<hr/>
No related posts.]]></description>
			<content:encoded><![CDATA[<p>It is already confusing enough that &#8220;floating point exception&#8221; may <a href="http://www.digitalmars.com/d/archives/digitalmars/D/learn/Integer_division_by_zero_results_in_floating-point_exception_7160.html">mean &#8220;division by zero&#8221;</a> in integral arithmetic. It turns out it can also mean &#8220;overflow&#8221; in some cases, as in the following program (it&#8217;s difficult in C, so I had to use assembly):</p>
<blockquote><pre>#include &lt;asm/unistd.h&gt;
.code:
.globl _start
_start:
    mov $1, %eax
    mov $1, %edx
    div %eax
    mov $__NR_exit_group, %eax
    int $0x80
</pre>
</blockquote>
<p>(Type “<code>gcc -m32 -nostdlib a.S</code>” to compile and link.)</p>
<p>In this program, EDX:EAX (<code>0x100000001</code>) divided by ECX (<code>0x1</code>) cannot be represented in 32-bit integer and thus it is an overflow. X86 CPUs raise a &#8220;division by zero&#8221; interruption (<code>int 0</code>) in such cases, and &#8220;division by zero&#8221; is displayed as &#8220;floating point exception&#8221; in Linux&#8230;</p>
<hr/>
<p>PS. The same assembly program in Intel style:</p>
<blockquote><pre>.code
.startup
    MOV EAX,1
    MOV EDX, 1
    DIV EAX

    MOV EAX, __NR_exit_group
    INT 80H</pre>
</blockquote>
<hr/><p>No related posts.</p>]]></content:encoded>
			<wfw:commentRss>http://en.chys.info/2009/12/floating-point-exception/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Every C programmer should learn some assembly</title>
		<link>http://en.chys.info/2009/03/every-c-programmer-should-learn-some-assembly/</link>
		<comments>http://en.chys.info/2009/03/every-c-programmer-should-learn-some-assembly/#comments</comments>
		<pubDate>Fri, 13 Mar 2009 07:08:17 +0000</pubDate>
		<dc:creator>chys</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[assembly]]></category>
		<category><![CDATA[C/C++]]></category>

		<guid isPermaLink="false">http://blog.chys.info/?p=354</guid>
		<description><![CDATA[I am more convinced of this now. One of the most frequently asked questions in C is the difference between a pointer and an array. A newbie in C often finds it &#8220;mission impossible&#8221; to differentiate between the following four variable types: char p1[][8] = { "Hello", "world" }; char *p2[8] = { "Hello", "world" [...]<hr/>
Related posts:<ol>
<li><a href='http://en.chys.info/2008/10/mov-%edi-%edi/' rel='bookmark' title='mov %edi, %edi'>mov %edi, %edi</a></li>
<li><a href='http://en.chys.info/2009/07/string-literals/' rel='bookmark' title='String literals'>String literals</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>I am more convinced of this now.</p>
<p>One of the most frequently asked questions in C is the difference between a pointer and an array. A newbie in C often finds it &#8220;mission impossible&#8221; to differentiate between the following four variable types:<br />
<code>char p1[][8] = { "Hello", "world" };</code><br />
<code>char *p2[8] = { "Hello", "world" };</code><br />
<code>char (*p3)[8] = p1;</code><br />
<code>char **p4 = p2;</code></p>
<p>And it really is difficult to explain it clearly in a few words. However, if one knows some assembly, one can check the assembly listing generated by <del datetime="2009-03-13T07:16:35+00:00">an assembler</del><ins datetime="2009-03-13T07:16:35+00:00">a compiler</ins> and at least the difference between <code>p1</code> and <code>p2</code> should be straightforward:</p>
<blockquote><p><code>p1:</code><br />
<code>&nbsp;&nbsp;&nbsp;&nbsp;.string "Hello"</code><br />
<code>&nbsp;&nbsp;&nbsp;&nbsp;.zero 2</code><br />
<code>&nbsp;&nbsp;&nbsp;&nbsp;.string "world"</code><br />
<code>&nbsp;&nbsp;&nbsp;&nbsp;.zero 2</code><br />
<code>.LC0:</code><br />
<code>&nbsp;&nbsp;&nbsp;&nbsp;.string "Hello"</code><br />
<code>.LC1:</code><br />
<code>&nbsp;&nbsp;&nbsp;&nbsp;.string "world"</code><br />
<code>p2:</code><br />
<code>&nbsp;&nbsp;&nbsp;&nbsp;.long .LC0</code><br />
<code>&nbsp;&nbsp;&nbsp;&nbsp;.long .LC1</code><br />
<code>p3:</code><br />
<code>&nbsp;&nbsp;&nbsp;&nbsp;.long p1</code><br />
<code>p4:</code><br />
<code>&nbsp;&nbsp;&nbsp;&nbsp;.long p2</code></p></blockquote>
<p>(I prefer the <a href="http://sig9.com/articles/att-syntax">AT&#038;T-style</a> assembly)</p>
<p>I feel so lucky that I had learned some assembly used in <a href="http://en.wikipedia.org/wiki/Nintendo_Entertainment_System"><abbr title="Nintendo Entertainment System">NES</abbr></a> before starting C. So for me &#8220;pointer&#8221; has always been a very natural concept and surely different from an array. Many poor freshmen undergrads had to begin with C++ without any knowledge in assembly or C or even any other language &#8211; I would have been crazy had I been under such a situation.</p>
<hr/><p>Related posts:<ol>
<li><a href='http://en.chys.info/2008/10/mov-%edi-%edi/' rel='bookmark' title='mov %edi, %edi'>mov %edi, %edi</a></li>
<li><a href='http://en.chys.info/2009/07/string-literals/' rel='bookmark' title='String literals'>String literals</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://en.chys.info/2009/03/every-c-programmer-should-learn-some-assembly/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>mov %edi, %edi</title>
		<link>http://en.chys.info/2008/10/mov-%edi-%edi/</link>
		<comments>http://en.chys.info/2008/10/mov-%edi-%edi/#comments</comments>
		<pubDate>Sat, 11 Oct 2008 08:22:00 +0000</pubDate>
		<dc:creator>chys</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[assembly]]></category>
		<category><![CDATA[x86-64]]></category>

		<guid isPermaLink="false">http://blog.chys.info/2008/10/mov-%edi-%edi/</guid>
		<description><![CDATA[Here is a simple C function: long foo (unsigned a, unsigned b){&#160;&#160;&#160; return ((long)b&#60;&#60;32)&#124;a;} Compile it with an x86-64-targeted GCC with proper optimizations enabled (-O2 for example), you get the following instructions (in AT&#38;T-style assembly): foo:&#160;&#160;&#160;&#160;&#160;&#160;&#160; movq&#160;&#160;&#160; %rsi, %rax&#160;&#160;&#160;&#160;&#160;&#160;&#160; mov&#160;&#160;&#160;&#160; %edi, %edi&#160;&#160;&#160;&#160;&#160;&#160;&#160; salq&#160;&#160;&#160; $32, %rax&#160;&#160;&#160;&#160;&#160;&#160;&#160; orq&#160;&#160;&#160;&#160; %rdi, %rax&#160;&#160;&#160;&#160;&#160;&#160;&#160; ret Pay attention to the red line. [...]<hr/>
Related posts:<ol>
<li><a href='http://en.chys.info/2011/06/intel-announces-avx2/' rel='bookmark' title='Intel announces AVX2'>Intel announces AVX2</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>Here is a simple C function:</p>
<blockquote><p> <span style="font-family: &quot;Courier New&quot;,Courier,monospace;">long foo (unsigned a, unsigned b)</span><br style="font-family: &quot;Courier New&quot;,Courier,monospace;" /><span style="font-family: &quot;Courier New&quot;,Courier,monospace;">{</span><br style="font-family: &quot;Courier New&quot;,Courier,monospace;" /><span style="font-family: &quot;Courier New&quot;,Courier,monospace;">&nbsp;&nbsp;&nbsp; return ((long)b&lt;&lt;32)|a;</span><br style="font-family: &quot;Courier New&quot;,Courier,monospace;" /><span style="font-family: &quot;Courier New&quot;,Courier,monospace;">}</span></p></blockquote>
<p>Compile it with an x86-64-targeted GCC with proper optimizations enabled (<span style="font-family: &quot;Courier New&quot;,Courier,monospace;">-O2</span> for example), you get the following instructions (in <a href="http://sig9.com/articles/att-syntax">AT&amp;T-style assembly</a>):</p>
<blockquote><p><span style="font-family: &quot;Courier New&quot;,Courier,monospace;">foo:</span><br style="font-family: &quot;Courier New&quot;,Courier,monospace;" /><span style="font-family: &quot;Courier New&quot;,Courier,monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; movq&nbsp;&nbsp;&nbsp; %rsi, %rax</span><br style="font-family: &quot;Courier New&quot;,Courier,monospace;" /><b><span style="color: red; font-family: &quot;Courier New&quot;,Courier,monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; mov&nbsp;&nbsp;&nbsp;&nbsp; %edi, %edi</span></b><br style="font-family: &quot;Courier New&quot;,Courier,monospace;" /><span style="font-family: &quot;Courier New&quot;,Courier,monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; salq&nbsp;&nbsp;&nbsp; $32, %rax</span><br style="font-family: &quot;Courier New&quot;,Courier,monospace;" /><span style="font-family: &quot;Courier New&quot;,Courier,monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; orq&nbsp;&nbsp;&nbsp;&nbsp; %rdi, %rax</span><br style="font-family: &quot;Courier New&quot;,Courier,monospace;" /><span style="font-family: &quot;Courier New&quot;,Courier,monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ret</span></p></blockquote>
<p>Pay attention to the red line. Literally it means assigning the value of register <span style="font-family: &quot;Courier New&quot;,Courier,monospace;">edi</span> to register <span style="font-family: &quot;Courier New&quot;,Courier,monospace;">edi</span>. Five years ago, anybody would agree this instruction does nothing like <span style="font-family: &quot;Courier New&quot;,Courier,monospace;">nop</span>s. But in an x86-64 system, this is not the case.</p>
<p>In x86-64 assembly, any instruction with a 32-bit register as its destination zeroes the higher 32 bits of the corresponding 64-bit register at the same time. Consequently, the function of ‘<span style="font-family: &quot;Courier New&quot;,Courier,monospace;">mov %edi, %edi</span>’ is zeroing bits 32 to 63 of register <span style="font-family: &quot;Courier New&quot;,Courier,monospace;">rdi</span> while leaving the lower 32 bits (i.e., register <span style="font-family: &quot;Courier New&quot;,Courier,monospace;">edi</span>) unchanged.</p>
<p>One may want to rewrite it with a more intuitive <span style="font-family: &quot;Courier New&quot;,Courier,monospace;">and</span> instruction:</p>
<blockquote><p><span style="font-family: &quot;Courier New&quot;,Courier,monospace;">andq $0xffffffff, %rdi</span></p></blockquote>
<p>But this does NOT assemble! Because <span style="font-family: &quot;Courier New&quot;,Courier,monospace;">$0x00000000ffffffff</span> is not representable in signed 32-bit format, but 64-bit immediates are currently allowed only in <span style="font-family: &quot;Courier New&quot;,Courier,monospace;">mov</span> instructions whose destination is a general-purpose register (such a <span style="font-family: &quot;Courier New&quot;,Courier,monospace;">mov</span> is usually explicitly written as <span style="font-family: &quot;Courier New&quot;,Courier,monospace;">movabsq</span>). So if one must use <span style="font-family: &quot;Courier New&quot;,Courier,monospace;">and</span>, one need something like this:</p>
<blockquote><div style="font-family: &quot;Courier New&quot;,Courier,monospace;">movl $0xffffffff, %eax</div>
<div style="font-family: &quot;Courier New&quot;,Courier,monospace;">andq %rax, %rdi</div>
</blockquote>
<p><span style="font-family: inherit;">Remember </span>the zeroing rule for operations on 32-bit registers, so ‘<span style="font-family: &quot;Courier New&quot;,Courier,monospace;">movl $0xffffffff, %eax</span>’ is equivalent to ‘<span style="font-family: &quot;Courier New&quot;,Courier,monospace;">movabsq $0xffffffff, %rax</span>’&#8230;</p>
<p>X86-64 assembly really is too ugly, at least in this sense&#8230;</p>
<p>Reference<br />
[1] <a href="http://www.x86-64.org/documentation/assembly.html">Gentle Introduction to x86-64 Assembly</a>
<div class="blogger-post-footer">
<hr />
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/"><img alt="Creative Commons License" style="border-width:0" src="http://creativecommons.org/images/public/somerights20.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/">Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License</a>.</div>
<hr/><p>Related posts:<ol>
<li><a href='http://en.chys.info/2011/06/intel-announces-avx2/' rel='bookmark' title='Intel announces AVX2'>Intel announces AVX2</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://en.chys.info/2008/10/mov-%edi-%edi/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

