<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>chys&#039;s random notes &#187; x86-64</title>
	<atom:link href="http://en.chys.info/tag/x86-64/feed/" rel="self" type="application/rss+xml" />
	<link>http://en.chys.info</link>
	<description>Study more problems; Talk less of isms.</description>
	<lastBuildDate>Tue, 27 Dec 2011 11:56:38 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Unaligned access</title>
		<link>http://en.chys.info/2009/12/unaligned-access/</link>
		<comments>http://en.chys.info/2009/12/unaligned-access/#comments</comments>
		<pubDate>Sat, 26 Dec 2009 07:10:24 +0000</pubDate>
		<dc:creator>chys</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[assembly]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[segfault]]></category>
		<category><![CDATA[x86-64]]></category>

		<guid isPermaLink="false">http://en.chys.info/?p=717</guid>
		<description><![CDATA[Misalignment is not an error (only incurs a performance penalty) on x86 processors except for a few new instructions added in recent years. MOVDQA, for example, is an SSE2 instruction requiring alignment on 16-byte boundaries. Textbooks have normally taught us we get a bus error if a CPU which disallows unaligned access actually encounters one. [...]<hr/>
No related posts.]]></description>
			<content:encoded><![CDATA[<p><a href="http://en.wikipedia.org/wiki/Data_structure_alignment">Misalignment</a> is not an error (only incurs a performance penalty) on x86 processors except for a few new instructions added in recent years. <a href="http://www.sesp.cse.clrc.ac.uk/html/SoftwareTools/vtune/users_guide/mergedProjects/analyzer_ec/mergedProjects/reference_olh/mergedProjects/instructions/instruct32_hh/vc183.htm">MOVDQA</a>, for example, is an SSE2 instruction requiring alignment on 16-byte boundaries.</p>
<p>Textbooks have normally taught us we get a <a href="http://en.wikipedia.org/wiki/Bus_error">bus error</a> if a CPU which disallows unaligned access actually encounters one.</p>
<p>But we observe a Linux process passing misaligned addresses to MOVDQA receives <code>SIGSEGV</code> (segmentation fault) instead of <code>SIGBUS</code> (bus error), on both ia32 and x86-64.</p>
<blockquote><pre>
<font color="blue">laptop /tmp $</font> cat a.c
int main ()
{
    char X[32];
    asm ("pxor %%xmm0,%%xmm0; movdqa %%xmm0,%0" : "=m"(X[1]) :: "xmm0");
    return 0;
}
<font color="blue">laptop /tmp $</font> gcc -msse2 a.c
<font color="blue">laptop /tmp $</font> ./a.out
Segmentation fault
<font color="blue">laptop /tmp $</font> kill -l $?
SEGV
</pre>
</blockquote>
<p>x86-64 (and ia32 beginning 80486SX) supports disallowing any misaligned access*. In that case, a normal instruction raises <code>SIGBUS</code>, but instructions which inherently requires alignment (e.g. <code>MOVDQA</code>) still raises <code>SIGSEGV</code>. It&#8217;s not so consistent.</p>
<p>* It is normally disabled. To enable it, set the AC bit in <a href="http://en.wikipedia.org/wiki/FLAGS_register_%28computing%29">FLAGS</a>:</p>
<blockquote><p><code>pushf</code><br />
<code>or $0x40000,(%esp)</code> (or <code>%rsp</code> on x86-64)<br />
<code>popf</code></p></blockquote>
<hr/><p>No related posts.</p>]]></content:encoded>
			<wfw:commentRss>http://en.chys.info/2009/12/unaligned-access/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Linux’s vsyscall</title>
		<link>http://en.chys.info/2009/01/linux%e2%80%99s-vsyscall/</link>
		<comments>http://en.chys.info/2009/01/linux%e2%80%99s-vsyscall/#comments</comments>
		<pubDate>Mon, 26 Jan 2009 07:55:00 +0000</pubDate>
		<dc:creator>chys</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[x86-64]]></category>

		<guid isPermaLink="false">http://blog.chys.info/2009/01/linux%e2%80%99s-vsyscall/</guid>
		<description><![CDATA[It is obvious that querying the current time can in no way be done completely in userspace. However, strace does not record any system call used by the time function in Linux x86_64. Let’s disassemble glibc: $ objdump -d /lib64/libc-2.9.so &#124; fgrep -A5 '&#60;time&#62;:' 000000000008a510 &#60;time&#62;: 8a510: 48 83 ec 08 sub $0x8,%rsp 8a514: 48 [...]<hr/>
Related posts:<ol>
<li><a href='http://en.chys.info/2009/01/gspca-in-linux-2627/' rel='bookmark' title='gspca in Linux 2.6.27'>gspca in Linux 2.6.27</a></li>
<li><a href='http://en.chys.info/2010/12/concatenate-pdf-files-in-linux/' rel='bookmark' title='Concatenate PDF files in Linux'>Concatenate PDF files in Linux</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>It is obvious that querying the current time can in no way be done completely in userspace. However, <code><a href="http://strace.sourceforge.net">strace</a></code> does not record any system call used by the <code><a href="http://linux.die.net/man/2/time">time</a></code> function in Linux x86_64.</p>
<p>Let’s disassemble glibc:<br />
<blockquote>
<pre>$ objdump -d /lib64/libc-2.9.so | fgrep -A5 '&lt;time&gt;:'
000000000008a510 &lt;time&gt;:
   8a510:       48 83 ec 08             sub    $0x8,%rsp
   8a514:       48 c7 c0 00 04 60 ff    mov    $0xffffffffff600400,%rax
   8a51b:       ff d0                   callq  *%rax
   8a51d:       48 83 c4 08             add    $0x8,%rsp
   8a521:       c3                      retq</pre>
</blockquote>
<p>
It seems glibc is redirecting the function call to something fixed at virtual address <code>0xffffffffff600400</code>. But what is there?</p>
<p>Then I found out it was the so-called <a href="http://www.ukuug.org/events/linux2001/papers/html/AArcangeli-vsyscalls.html">vsyscall</a> (virtual system call) mechanism, which Linux used as an effort to make certain system calls as fast as possible. This does not involve the <code>syscall</code> instruction and is therefore ignored by <code>strace</code>.</p>
<p>The vsyscalls are part of the kernel, but the kernel pages containing them are executable with userspace privileges. And they’re mapped to fixed addresses in the <a href="http://en.wikipedia.org/wiki/Virtual_memory">virtual memory</a><sup>[1]</sup>.</p>
<p>There are currently 3 vsyscalls in Linux x86_64: <code><a href="http://www.kernel.org/doc/man-pages/online/pages/man2/gettimeofday.2.html">gettimeofday</a></code>, <code><a href="http://www.kernel.org/doc/man-pages/online/pages/man2/time.2.html">time</a></code> and <code><a href="http://www.kernel.org/doc/man-pages/online/pages/man2/getcpu.2.html">getcpu</a></code>. Their locations in the virtual memory can be found with the <code>VSYSCALL_ADDR</code> macro defined in <code>/usr/include/asm/vsyscall.h</code>:<br />
<blockquote>
<pre>#ifndef _ASM_X86_VSYSCALL_H
#define _ASM_X86_VSYSCALL_H

enum vsyscall_num {
    __NR_vgettimeofday,
    __NR_vtime,
    __NR_vgetcpu,
};

#define VSYSCALL_START (-10UL << 20)
#define VSYSCALL_SIZE 1024
#define VSYSCALL_END (-2UL << 20)
#define VSYSCALL_MAPPED_PAGES 1
#define VSYSCALL_ADDR(vsyscall_nr) (VSYSCALL_START+VSYSCALL_SIZE*(vsyscall_nr))

#endif /* _ASM_X86_VSYSCALL_H */</pre>
</blockquote>
<p>NOTE: We do not need to use vsyscalls explicitly. The corresponding glibc wrappers (for getcpu, it’s <code><a href="http://www.kernel.org/doc/man-pages/online/pages/man3/sched_getcpu.3.html">sched_getcpu</a></code>) already take advantage of them.</p>
<p>
[1] I really hate Microsoft’s use of the term ‘virtual memory’ to refer to swapping files in disks! It once confused me so much..
<div class="blogger-post-footer">
<hr />
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/"><img alt="Creative Commons License" style="border-width:0" src="http://creativecommons.org/images/public/somerights20.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/">Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License</a>.</div>
<hr/><p>Related posts:<ol>
<li><a href='http://en.chys.info/2009/01/gspca-in-linux-2627/' rel='bookmark' title='gspca in Linux 2.6.27'>gspca in Linux 2.6.27</a></li>
<li><a href='http://en.chys.info/2010/12/concatenate-pdf-files-in-linux/' rel='bookmark' title='Concatenate PDF files in Linux'>Concatenate PDF files in Linux</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://en.chys.info/2009/01/linux%e2%80%99s-vsyscall/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Scilab</title>
		<link>http://en.chys.info/2008/10/scilab/</link>
		<comments>http://en.chys.info/2008/10/scilab/#comments</comments>
		<pubDate>Tue, 21 Oct 2008 04:59:00 +0000</pubDate>
		<dc:creator>chys</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Scilab]]></category>
		<category><![CDATA[x86-64]]></category>

		<guid isPermaLink="false">http://blog.chys.info/2008/10/scilab/</guid>
		<description><![CDATA[In my Core 2 running 64-bit Linux, it insists on compiling one of the Fortran sources with “gfortran -march=athlon64 -mfpmath=sse -msse2 -m3dnow &#8230;”, regardless of the FFLAGS environment variable. (Other Fortran files honor FFLAGS, nevertheless&#8230;) It does not matter much if you insist on use “-march=athlon64”, but “-m3dnow” is really a problem since Intel never [...]<hr/>
No related posts.]]></description>
			<content:encoded><![CDATA[<p>In my Core 2 running 64-bit Linux, it insists on compiling one of the Fortran sources with “gfortran -march=athlon64 -mfpmath=sse -msse2 -m3dnow &#8230;”, regardless of the FFLAGS environment variable. (Other Fortran files honor FFLAGS, nevertheless&#8230;)</p>
<p>It does not matter much if you insist on use “-march=athlon64”, but “-m3dnow” is really a problem since Intel never ever supported 3DNow!
<div class="blogger-post-footer">
<hr />
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/"><img alt="Creative Commons License" style="border-width:0" src="http://creativecommons.org/images/public/somerights20.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/">Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License</a>.</div>
<hr/><p>No related posts.</p>]]></content:encoded>
			<wfw:commentRss>http://en.chys.info/2008/10/scilab/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>mov %edi, %edi</title>
		<link>http://en.chys.info/2008/10/mov-%edi-%edi/</link>
		<comments>http://en.chys.info/2008/10/mov-%edi-%edi/#comments</comments>
		<pubDate>Sat, 11 Oct 2008 08:22:00 +0000</pubDate>
		<dc:creator>chys</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[assembly]]></category>
		<category><![CDATA[x86-64]]></category>

		<guid isPermaLink="false">http://blog.chys.info/2008/10/mov-%edi-%edi/</guid>
		<description><![CDATA[Here is a simple C function: long foo (unsigned a, unsigned b){&#160;&#160;&#160; return ((long)b&#60;&#60;32)&#124;a;} Compile it with an x86-64-targeted GCC with proper optimizations enabled (-O2 for example), you get the following instructions (in AT&#38;T-style assembly): foo:&#160;&#160;&#160;&#160;&#160;&#160;&#160; movq&#160;&#160;&#160; %rsi, %rax&#160;&#160;&#160;&#160;&#160;&#160;&#160; mov&#160;&#160;&#160;&#160; %edi, %edi&#160;&#160;&#160;&#160;&#160;&#160;&#160; salq&#160;&#160;&#160; $32, %rax&#160;&#160;&#160;&#160;&#160;&#160;&#160; orq&#160;&#160;&#160;&#160; %rdi, %rax&#160;&#160;&#160;&#160;&#160;&#160;&#160; ret Pay attention to the red line. [...]<hr/>
Related posts:<ol>
<li><a href='http://en.chys.info/2011/06/intel-announces-avx2/' rel='bookmark' title='Intel announces AVX2'>Intel announces AVX2</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>Here is a simple C function:</p>
<blockquote><p> <span style="font-family: &quot;Courier New&quot;,Courier,monospace;">long foo (unsigned a, unsigned b)</span><br style="font-family: &quot;Courier New&quot;,Courier,monospace;" /><span style="font-family: &quot;Courier New&quot;,Courier,monospace;">{</span><br style="font-family: &quot;Courier New&quot;,Courier,monospace;" /><span style="font-family: &quot;Courier New&quot;,Courier,monospace;">&nbsp;&nbsp;&nbsp; return ((long)b&lt;&lt;32)|a;</span><br style="font-family: &quot;Courier New&quot;,Courier,monospace;" /><span style="font-family: &quot;Courier New&quot;,Courier,monospace;">}</span></p></blockquote>
<p>Compile it with an x86-64-targeted GCC with proper optimizations enabled (<span style="font-family: &quot;Courier New&quot;,Courier,monospace;">-O2</span> for example), you get the following instructions (in <a href="http://sig9.com/articles/att-syntax">AT&amp;T-style assembly</a>):</p>
<blockquote><p><span style="font-family: &quot;Courier New&quot;,Courier,monospace;">foo:</span><br style="font-family: &quot;Courier New&quot;,Courier,monospace;" /><span style="font-family: &quot;Courier New&quot;,Courier,monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; movq&nbsp;&nbsp;&nbsp; %rsi, %rax</span><br style="font-family: &quot;Courier New&quot;,Courier,monospace;" /><b><span style="color: red; font-family: &quot;Courier New&quot;,Courier,monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; mov&nbsp;&nbsp;&nbsp;&nbsp; %edi, %edi</span></b><br style="font-family: &quot;Courier New&quot;,Courier,monospace;" /><span style="font-family: &quot;Courier New&quot;,Courier,monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; salq&nbsp;&nbsp;&nbsp; $32, %rax</span><br style="font-family: &quot;Courier New&quot;,Courier,monospace;" /><span style="font-family: &quot;Courier New&quot;,Courier,monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; orq&nbsp;&nbsp;&nbsp;&nbsp; %rdi, %rax</span><br style="font-family: &quot;Courier New&quot;,Courier,monospace;" /><span style="font-family: &quot;Courier New&quot;,Courier,monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ret</span></p></blockquote>
<p>Pay attention to the red line. Literally it means assigning the value of register <span style="font-family: &quot;Courier New&quot;,Courier,monospace;">edi</span> to register <span style="font-family: &quot;Courier New&quot;,Courier,monospace;">edi</span>. Five years ago, anybody would agree this instruction does nothing like <span style="font-family: &quot;Courier New&quot;,Courier,monospace;">nop</span>s. But in an x86-64 system, this is not the case.</p>
<p>In x86-64 assembly, any instruction with a 32-bit register as its destination zeroes the higher 32 bits of the corresponding 64-bit register at the same time. Consequently, the function of ‘<span style="font-family: &quot;Courier New&quot;,Courier,monospace;">mov %edi, %edi</span>’ is zeroing bits 32 to 63 of register <span style="font-family: &quot;Courier New&quot;,Courier,monospace;">rdi</span> while leaving the lower 32 bits (i.e., register <span style="font-family: &quot;Courier New&quot;,Courier,monospace;">edi</span>) unchanged.</p>
<p>One may want to rewrite it with a more intuitive <span style="font-family: &quot;Courier New&quot;,Courier,monospace;">and</span> instruction:</p>
<blockquote><p><span style="font-family: &quot;Courier New&quot;,Courier,monospace;">andq $0xffffffff, %rdi</span></p></blockquote>
<p>But this does NOT assemble! Because <span style="font-family: &quot;Courier New&quot;,Courier,monospace;">$0x00000000ffffffff</span> is not representable in signed 32-bit format, but 64-bit immediates are currently allowed only in <span style="font-family: &quot;Courier New&quot;,Courier,monospace;">mov</span> instructions whose destination is a general-purpose register (such a <span style="font-family: &quot;Courier New&quot;,Courier,monospace;">mov</span> is usually explicitly written as <span style="font-family: &quot;Courier New&quot;,Courier,monospace;">movabsq</span>). So if one must use <span style="font-family: &quot;Courier New&quot;,Courier,monospace;">and</span>, one need something like this:</p>
<blockquote><div style="font-family: &quot;Courier New&quot;,Courier,monospace;">movl $0xffffffff, %eax</div>
<div style="font-family: &quot;Courier New&quot;,Courier,monospace;">andq %rax, %rdi</div>
</blockquote>
<p><span style="font-family: inherit;">Remember </span>the zeroing rule for operations on 32-bit registers, so ‘<span style="font-family: &quot;Courier New&quot;,Courier,monospace;">movl $0xffffffff, %eax</span>’ is equivalent to ‘<span style="font-family: &quot;Courier New&quot;,Courier,monospace;">movabsq $0xffffffff, %rax</span>’&#8230;</p>
<p>X86-64 assembly really is too ugly, at least in this sense&#8230;</p>
<p>Reference<br />
[1] <a href="http://www.x86-64.org/documentation/assembly.html">Gentle Introduction to x86-64 Assembly</a>
<div class="blogger-post-footer">
<hr />
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/"><img alt="Creative Commons License" style="border-width:0" src="http://creativecommons.org/images/public/somerights20.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/">Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License</a>.</div>
<hr/><p>Related posts:<ol>
<li><a href='http://en.chys.info/2011/06/intel-announces-avx2/' rel='bookmark' title='Intel announces AVX2'>Intel announces AVX2</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://en.chys.info/2008/10/mov-%edi-%edi/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

