Wednesday 25 September 2013

Linux: Making a system call from kernel space

System call is one of the few ways by which a user can request services from the kernel. You might be using functions like read(), write(), etc. These functions are implemented by the C library as wrappers around the system call, trying to pass minimum amount of work to the kernel. When the library functions makes the system call, the processor switches to a high privilege mode and the kernel does work on behalf of the user in a 'user context'.

Making the actual system call is different from calling a regular function. This is because the processor needs to enter a privileged mode and this cannot be achieved by a mere function call into the kernel space as this would allow the user into the kernel address space, a security bug. Instead, linux makes use of a software interrupt (interrupt number 0x80 on x86). In Linux, a system call is identified by an index called the system call number, which is available at /usr/include/asm/unistd_32.h in the 32 bit linux-3.11.1.

To make a system call, we place the system call number in the %eax register along with the operands in registers %ebx, %ecx and so on. On issuing the software interrupt 0x80, the processor switches to kernel mode and starts executing the appropriate interrupt handler. The interrupt handler, which is called 'syscall_handler' in this case saves the process state of the user program and calls the required system call with the appropriate arguments. After checking the user permission, the system call does its job and returns, saving the return value in the %eax register. It is the responsibility of the system call to check if the addresses provided by the user are valid and the user can access them.

Doing a system call from user space

Here is a piece of code which makes the getpid() system call, making use of inline assembly code.

#include <stdio.h>  // for printf()
#include <sys/syscall.h>  // for SYS_getpid

int _ret, _sys_call_no;

void test_syscall() {
  _sys_call_no = SYS_getpid;
  asm("pushal;"  // save all registers
      "mov _sys_call_no, %eax;"  // put system call no. in %eax
      "int $0x80;"  // sw interrupt
      "mov %eax, _ret;" // get result
      "popal;");  // load saved registers
  printf("%d\n", _ret);
}

int main() {
  test_syscall();
  return 0;
}

The glibc library provides 'syscall', a macro to do the above job for us. Here is our code making use of it.

#include <stdio.h>  // for printf()
#include <sys/syscall.h>  // for SYS_getpid
#include   // for syscall()

void test_syscall() {
  printf("%ld\n", syscall(SYS_getpid));
}

As an interesting example, let's call chdir() system call. I want to point out here that we are passing as argument an address into the user space.

#include <stdio.h>  // for printf()
#include <sys/syscall.h>  // for SYS_getpid

int _ret, _sys_call_no;
char _path[] = "..";  // path for chdir()

void test_syscall() {
  _sys_call_no = SYS_getpid;
  asm("pushal;"
      "mov $12, %eax;"
      "lea _path, %ebx;"
      "int $0x80;"
      "mov %eax, _ret;"
      "popal;");
  printf("%d\n", _ret);
}


Doing a system call from kernel space

Let's call the chdir() system call again, in a similar manner, but this time from the kernel space using a module. (You can read about writing a module in [1])

#include <linux/module.h>  // for moudule_init(), printk()

int _ret = 0;
char _path[] = "..";

static int __init kernel_syscall_init(void) {
  asm("pushal;"
      "mov $12, %eax;"
      "lea _path, %ebx;"
      "int $0x80;"
      "mov %eax, _ret;"
      "popal;");
  printk("%d\n", _ret);
  return 0;
}

static void __exit kernel_syscall_exit(void) {
}

MODULE_LICENSE("GPL");
module_init(kernel_syscall_init);
module_exit(kernel_syscall_exit);
You can automate the build process using this script and read about it in [2].

What happened on running the module? It returned a negative number, an error! But why, it ran correctly earlier?
The system call checks if the provided buffer is a legal address. When called from the kernel space, an address that lies in the user address range (0-3 GB for standard kernel configuration) is considered valid, and an address that lies in kernel address space (3 GB-4 GB) is not. When the system call is invoked from kernel space, we must prevent the usual check to fail, because the virtual address of our destination buffer will be in kernel space, above the 3 GB mark. Here is how it is done.

#include <asm/uaccess.h>  // for get_fs(), set_fs()
#include <linux/module.h>  // for moudule_init(), printk()

int _ret = 0;
char _path[] = "..";

static int __init kernel_syscall_init(void) {
  mm_segment_t saved_fs = get_fs();
  set_fs(get_ds());
  asm("pushal;"
      "mov $12, %eax;"
      "lea _path, %ebx;"
      "int $0x80;"
      "mov %eax, _ret;"
      "popal;");
  printk("%d\n", _ret);
  set_fs(saved_fs);
  return 0;
}

static void __exit kernel_syscall_exit(void) {
}

MODULE_LICENSE("GPL");
module_init(kernel_syscall_init);
module_exit(kernel_syscall_exit);
Working now. :)

What is this get_fs(), get_ds(), set_fs() stuff?

The field addr_limit is used to describe the highest legal virtual address and get_fs() and set_fs() are macros which act as modifiers. The limit that must be used is returned by get_ds(). It is important to restore addr_limit back to its original value, or else the user space program calling this code might retain it.

I know that it is hard to believe that modifiers for addr_limit would be called get_fs() and set_fs(). Here is the code to prove that I am right!

#include <asm/uaccess.h>  // for get_fs(), set_fs()
#include <linux/module.h>  // for moudule_init(), printk()

int _fs_register, _ds_register;

static int __init kernel_syscall_init(void) {
  // Initial values.
  mm_segment_t addr_limit = current_thread_info()->addr_limit;
  mm_segment_t fs_macro_value = get_fs();
  asm("mov %fs, _fs_register;"  // FS register to _fs_register
      "mov %ds, _ds_register;");  // DS register to _ds_register
  printk("addr_limit = %lu\n", addr_limit.seg);
  printk("fs_macro_value = %lu\n", fs_macro_value.seg);
  printk("_fs_register = %d\n", _fs_register);
  printk("_ds_register = %d\n", _ds_register);

  // Values after set_fs(get_ds()).
  mm_segment_t saved_fs = get_fs();
  set_fs(get_ds());
  addr_limit = current_thread_info()->addr_limit;
  fs_macro_value = get_fs();
  asm("mov %fs, _fs_register;"  // FS register to _fs_register
      "mov %ds, _ds_register;");  // DS register to _ds_register
  printk("addr_limit = %lu\n", addr_limit.seg);
  printk("fs_macro_value = %lu\n", fs_macro_value.seg);
  printk("_fs_register = %d\n", _fs_register);
  printk("_ds_register = %d\n", _ds_register);

  // Restored values.
  set_fs(saved_fs);
  addr_limit = current_thread_info()->addr_limit;
  fs_macro_value = get_fs();
  asm("mov %fs, _fs_register;"  // FS register to _fs_register
      "mov %ds, _ds_register;");  // DS register to _ds_register
  printk("addr_limit = %lu\n", addr_limit.seg);
  printk("fs_macro_value = %lu\n", fs_macro_value.seg);
  printk("_fs_register = %d\n", _fs_register);
  printk("_ds_register = %d\n", _ds_register);

  return 0;
}

static void __exit kernel_syscall_exit(void) {
}

MODULE_LICENSE("GPL");
module_init(kernel_syscall_init);
module_exit(kernel_syscall_exit);


[623233.082407] addr_limit = 3221225472
[623233.082432] fs_macro_value = 3221225472
[623233.082458] _fs_register = 216
[623233.082480] _ds_register = 123
[623233.082507] addr_limit = 4294967295
[623233.082526] fs_macro_value = 4294967295
[623233.082546] _fs_register = 216
[623233.082565] _ds_register = 123
[623233.082589] addr_limit = 3221225472
[623233.082607] fs_macro_value = 3221225472
[623233.082626] _fs_register = 216
[623233.082645] _ds_register = 123
It turns out that FS which is an extra segment register in x86 was used by linux earlier to keep the address of the user space data segment when running in kernel mode. This has since been done away with and the name only appears as legacy code in a few macros.

This is not a good thing to do

The Linux developers disapprove this, so they have not made it convenient for us to create modules that would bypass the expected privilege-restrictions. This is to keep the kernel cleaner and secure.

[1] http://pointer-overloading.blogspot.com/2013/09/a-hello-world-linux-kernel-module.html
[2] http://pointer-overloading.blogspot.com/2013/09/linux-python-script-to-build-linux.html

Tuesday 24 September 2013

Algorithms: Rotating a one-dimensional array of n elements by i positions

This post is now being maintained at my new blog.

I encountered this problem while reading [1] and I have to admit, it takes some effort to get to the bottom of it. Here is the precise problem specification.

"Rotate a one-dimensional vector of n elements left by i positions. For instance, with n = 8 and i = 3, the vector abcdefgh is rotated to defghabc."
Think about it. Take your time...

Solution 1: Using an additional buffer

The first solution which I could think, which was one of the two embarrassingly naive solutions I could think over the top of my head, is making use of an additional buffer. We copy the first i bits to the additional temporary buffer, move the next n - i bits to the beginning of our array and then copy back the i bits from the temporary buffer to our original array.

template<class T>
void rotate_array(std::vector<T> *array, int i) {
  int n = array->size();
  i = i % n;
  std::vector<T> *temp_array = new std::vector<T>(array->begin(), array->begin() + i);
  std::copy(array->begin() + i, array->end(), array->begin());
  std::copy(temp_array->begin(), temp_array->end(), array->begin() + n - i);
  delete temp_array;
}
You can see that solution one is fast, it takes O(n) time which is optimal. On the downside, it requires an additional buffer of size min(i, n - i) which is nothing but O(n). Clearly not a good solution.

Solution 2: Rotating the array by 1, i times

The second solution which I could think is by repeatedly rotating the array left by 1, i times. Rotating left by 1 is quite straight forward. Save the first element, swap each pair of consecutive elements and put the saved element at the end. Here is the code.

template<class T>
void rotate_array_by_one(std::vector<T> *array) {
  for (int i = 0; i < array->size() - 1; i++) {
    std::swap(array->at(i), array->at(i + 1));
  }
}

template<class T>
void rotate_array(std::vector<T> *array, int i) {
  int n = array->size();
  i = i % n;
  for (int j = 0; j < i; j++) {
    rotate_array_by_one<T>(array);
  }
}
This solution trades time for space and ends up having a space complexity of O(n) but a time complexity of O(n2). What next?

Solution 3: Rotating the array by i, using a special trick

Let's think why is it not possible to directly swap across i elements, similar to solution two, just across i elements instead of neighbors. We save the first elements, swap the first element with ith element, the ith element with the 2ith element and so on modulo n. We cycle through the array until we reach where we started. Did we cycle through all the elements? If yes, well and good. If not we have to repeat the process with 'untouched' elements. How do we keep a track?
The greatest common divisor of i and n is the number of cycles to be permuted. In terms of modern algebra, it is the number of cosets in the permutation group induced by the rotation. [1]

int gcd(int a, int b) {
  while (a != 0) {
    int c = a;
    a = b % a;
    b = c;
  }
  return b;
}

template<class T>
void rotate_array(std::vector<T> *array, int i) {
  int n = array->size();
  i = i % n;
  int gcd_n_i = gcd(i, n);
  for (int j = 0; j < gcd_n_i; j++) {
    for (int k = j; (k + i) % n != j; k = (k + i) % n) {
      std::swap(array->at(k), array->at((k + i) % n));
    }
  }
}
We have done it! We have a solution which is fast (O(n)) and takes up no extra space. But when I think about it this solution is by no means straight forward. Apart from the fact that a knowledge of number theory is required for this solution, it requires some subtle programming difficult to get right on the first attempt.

Solution 4: The winner, array reversal!

Have a look at the solution, it took less than a minute to type out and was correct in the first attempt, it speaks for itself.

template<class T>
void rotate_array(std::vector<T> *array, int i) {
  int n = array->size();
  i = i % n;
  std::reverse(array->begin(), array->begin() + i);
  std::reverse(array->begin() + i, array->end());
  std::reverse(array->begin(), array->end());
}
I just loved this solution.
Does it really need an explanation?

{1] Programming Pearls, Second Edition, by Jon Bentley.

Monday 23 September 2013

Linux, Python: A script to build linux kernel modules

In my previous posts ([1], [2]), I have been vocal about how irritating the use of the linux kernel build system(kbuild [3]) is. Don't get me wrong, it's a great and necessary tool. It scales up pretty well, as the whole linux kernel can be compiled with it.

The problem is that it doesn't contract too well, i.e. I consider it an overkill for writing and testing kernel modules. For this very reason, I decided to write a small script that takes as input linux kernel module (.c) code file(s) and gives you the linux kernel module object (.ko) code. Magic!



[1]: http://pointer-overloading.blogspot.in/2013/09/a-hello-world-linux-kernel-module.html [2]: http://pointer-overloading.blogspot.in/2013/09/linux-creating-entry-in-proc-file.html [3]: https://www.kernel.org/doc/Documentation/kbuild/makefiles.txt

Friday 20 September 2013

Linux: Creating an entry in /proc file system (Part 1: The hello_proc pseudo file)

I am using this column to explain the process of adding an entry to the /proc file system, by writing a linux module. After beginning with some theory, I'll describe a hello_proc kernel module for adding an entry to the /proc file system and end the column with a little introduction of 'sequence' files. I plan to go into sequence files in a later column.
Oh yeah, a warning, some of this column has been deduced by going through the kernel code and may be incorrect. Please feel free to comment and point at any inaccuracies. I make use of the linux kernel 3.11.1, please note that the old way of creating a /proc entry by making use of create_proc_entry() has been done away with in favor of proc_create(). I will be using the new approach.

The /proc file system

Look at the man page, by doing man proc. The man page will tell you that it is pseudo-file system which is used as an interface to kernel data structures. It is commonly mounted at /proc.
We say that the /proc file system contains pseudo-files, because these files, as opposed to normal files, take up no space on the hard drive. The kernel creates an illusion of a file by implementing all (well, almost all) file operations, whose content is generated dynamically by a kernel module. Since the kernel module runs in a privileged mode and in the kernel address space, a module has access to all kernel data structures and effectively the whole system. The /proc file system thus provides an interface between the user space and a kernel module! Isn't it cool!
Let's have a look at a /proc file existing in a linux kernel, by default. Look at /proc/version.

$ cat /proc/version
Linux version 3.11.1 (root@ubuntu) 
(gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) )
#1 SMP Fri Jan 27 23:40:56 PST 2012
To see that this content is not stored in a normal file, have a look at the file size.

$ ls -l /proc/version
-r--r--r-- 1 root root 0 2013-09-18 21:27 /proc/version
The file size is zero, because the contents of the file is being generated by an underlying kernel code. The contents of the file are generated dynamically and afresh and this is the reason that the contents of the file might be different, every time it is read. A few other proc files that you can explore are:

/proc/cpuinfo
/proc/meminfo
/proc/interrupts
....actually you can look at any file inside /proc


A hello_proc module to add a dummy entry into the proc file system

Enough talk, let's get down to business and get our hands dirty. Here is code for a kernel module which creates a file in the /proc file system. This pseudo-file does, well, nothing except output 'Hello proc!\n' when read.

#include <linux/module.h>
#include <linux/proc_fs.h>
#include <linux/seq_file.h>

static int hello_proc_show(struct seq_file *m, void *v) {
  seq_printf(m, "Hello proc!\n");
  return 0;
}

static int hello_proc_open(struct inode *inode, struct  file *file) {
  return single_open(file, hello_proc_show, NULL);
}

static const struct file_operations hello_proc_fops = {
  .owner = THIS_MODULE,
  .open = hello_proc_open,
  .read = seq_read,
  .llseek = seq_lseek,
  .release = single_release,
};

static int __init hello_proc_init(void) {
  proc_create("hello_proc", 0, NULL, &hello_proc_fops);
  return 0;
}

static void __exit hello_proc_exit(void) {
  remove_proc_entry("hello_proc", NULL);
}

MODULE_LICENSE("GPL");
module_init(hello_proc_init);
module_exit(hello_proc_exit);
Most of the above code is self explanatory. For a more general introduction about modules, have a look at my previous post. Here are a few points: (NOTE: For now, please accept sequence files as data buffers, at face value.)
  • Nothing special about the header files. linux/module.h is required for necessary module macros and functions. linux/proc_fs.h is required for the functions proc_create() and remove_proc_entry() for creating a pseudo-file in the proc file system. linux/seq_file.h is explained later.
  • The hello_proc_show() is what decides the output.
  • The hello_proc_open() is the open callback, called when the proc file is opened. Here, single_open() means that all the data is output at once. More on this when I discuss sequence files.
  • hello_proc_fops is the file operations structure used to define file manipulation callbacks for our pseudo-file
  • The file is actually created using proc_create(file_name, permission_bits, parent_dir, file_operations) where "hello_proc" is name, 0 in the permission_bits parameter means default 0444 permission for file, NULL in parent_dir means that our file is located at /proc.
  • The function remove_proc_entry(name, parent_dir) is used to remove our pseudo-file

Create linux kernel object code as described in my previous post. (I know, even I hate the way modules are compiled!)
Now, install the module, have a look at /proc/hello_proc, remove the module.

$ sudo insmod hello_proc.ko
$ cat /proc/hello_proc
Hello proc!
$ ls -l /proc/hello_proc
-r--r--r-- 1 root root 0 2013-09-18 21:32 /proc/hello_proc
$ sudo rmmod hello_proc
And, we are done!

Wait, what the hell were those sequence files?

Umm, the linux implementation of the proc file system has a limitation. Our module cannot output more than one page of data to the pseudo-file at once. A page is a pre-defined amount of memory, typically 4096 bytes, and is available in the PAGE_SIZE macro. This limitation is bypassed by using sequence files, which provide an interface to print multi-paged outputs easily, as a (you guessed it right!) sequence of pages. The interface is so convenient, that it is also used for single paged output like ours.
I plan to cover multi-paged outputs using sequence files in part 2 of this column, so stay tuned!

Thursday 19 September 2013

LInux: A 'Hello World' linux kernel module

Well, in linux, a kernel module is a piece of kernel object code which can extend the functionality of a running kernel, on demand! Isn't it awesome, being able to get inside a running kernel! Here is a module which is over-simplistic and does nothing, well what else can you expect from 'hello world'.

#include <linux/module.h>

int init(void) {
  printk("Hello world!\n");
  return 0;
}

void exit(void) {
  printk("Goodbye world!\n");
}

MODULE_LICENSE("GPL");
module_init(init);
module_exit(exit);



Compiling a kernel module can be cumbersome. Here is a Makefile to compile the module.

ifneq ($(KERNELRELEASE),)
obj-m := hello_world.o 

else
KDIR := /lib/modules/$(shell uname -r)/build
PWD := $(shell pwd)
default: 
 $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules 
 rm -r -f .tmp_versions *.mod.c .*.cmd *.o *.symvers 

endif


This process can create a spectrum of files (I try to remove most of the useless ones'). The linux kernel module object code is in hello_world.ko.
Install the module with the command...

$ sudo insmod hello_world.ko

Look at the list of all modules currently in the kernel with the command...

$ lsmod

Remove the module with the command...

$ sudo rmmod hello_world

You can have a look at the kernel output, which also contains the output of printk()

$ dmesg | tail -2
[435766.953016] Hello world!
[435766.953017] Goodbye world!

Wednesday 18 September 2013

C++: Template argument deduction to deduce array dimensions

If you call a template function, the template arguments can be omitted if the compiler can determine or deduce the template arguments by the context and usage of the function parameters. Here is an example...

#include <iostream>

template <class T>
void f(T arg1) {
  std::cout << arg1 << std::endl;
}

int main() {
  int a = 1;

  // The template argument is explicitly defined.
  f<int>(a);

  // The template argument is NOT explicitly defined. This usage is also
  // correct!!
  f(a);
  return 0;
}
The compiler is able to deduce the template argument T, when f() is called from the parameter a.
The compiler tries to deduce a template argument by comparing the type of the corresponding template parameter with the type of the argument used in the function call. The two types that the compiler compares (the template parameter and the argument used in the function call) must be of a certain structure in order for template argument deduction to work. A list of these type structures can be found at [1].

Non-type template parameters


The template facility in C++ doesn’t only allow you to parameterize with types (such as the int), but also with values. Non-type template parameters can be of the following types:
  • Integral (or enum) value
  • Pointer to object/function
  • Reference to object/function
  • Pointer to member

Here is example usage of non-type template parameters, obtained from [2].

template<int a[4]> struct A { };
template<int f(int)> struct B { };

int i;
int g(int) { return 0;}

A<&i> x;
B<&g> y;


Array dimension deduction


In C++, an array is passed to a function either as a pointer (int *array), an array of a defined size (int array[M][N]) or an array of an undefined size (int array [][]). In all these cases, the callee gets only the start address of the array and has no information about its size, i.e. its dimension(s). Just as in the case of type deduction, I'll try to make the compiler deduce array dimension using templates, parameterized using int. In the following examples, I'll pass a 2-D array to a function. This method can be extended to an array of any number of dimensions.

#include <iostream>

template<int M, int N>
void func(int arr[M][N]) {
  std::cout << sizeof(arr) << " " << M << " " << N << std::endl;
}

int main() {
  int arr[5][5];
  func<5, 5>(arr);
}
The above method works fine and there is nothing special as the array dimensions are passed as template arguments. The output of the above code is:
4 5 5
This is because sizeof(arr) is the size of the array pointer which is 4 bytes and not the size of the array.

Now, let's try to make the compiler deduce the array dimensions by changing func<5, 5>(arr) to func(arr). The result here is quite unexpected. I get the following compilation error:
error: no matching function for call to ‘func(int [5][5])’


What is happening is that, when the compiler instantiates arr, a conversion to pointer type takes place and the dimensionality information is lost. This is called 'decay'. 'Decay'ing makes it impossible to deduce the array dimensions using template argument deduction.

There is a simple work around. 'Decay' can be prevented by making the function argument a reference to an array, as follows..

#include <iostream>

template<int M, int N>
void func(int (&arr)[M][N]) {
  std::cout << sizeof(arr) << " " << M << " " << N << std::endl;
}

int main() {
  int arr[5][5];
  func(arr);
}
Voila! This works! We have deduced the dimensions of the array passed to the function at compile time.



[1]: http://publib.boulder.ibm.com/infocenter/comphelp/v8v101/index.jsp?topic=%2Fcom.ibm.xlcpp8a.doc%2Flanguage%2Fref%2Ftemplate_argument_deduction.htm [1]: http://publib.boulder.ibm.com/infocenter/comphelp/v8v101/index.jsp?topic=%2Fcom.ibm.xlcpp8a.doc%2Flanguage%2Fref%2Fnon-type_template_parameters.htm

Monday 16 September 2013

C++: Why is scanf faster than cin?

Regular competitive programmers and users of problem providing sites are familiar with this issue. Often, the input is very large and the most trivial (ah, not really!) task of reading input from stdin might prove to be a bottleneck. Such problem might be accompanied with 'Warning: large Input/Output data, be careful with certain languages'.

Is it really true? Is there such a big difference in the performance of scanf() versus the performance of cin in C++? Lets put this observation to test...

Let me create a dummy input file, similar to a large input a programming problem may have. A file containing a line with 16 bytes followed by a newline and having 1000000 such lines, making a file of 17 MB should be good enough.

$ yes 1111111111111111 | head -1000000 > /tmp/dummy

Now that I have a large enough input file, let's quickly compare the time taken to read the file from stdin (get the file from disk to stdin using redirection) by using scanf() versus cin.

cin_test.cc to test the performance of cin.

#include <iostream>

int main() {
  char buffer[256];
  while (std::cin >> buffer) {}
  return 0;
}
scanf_test.cc to test the performance of scanf().

#include <cstdlib>
#include <cstdio>

int main() {
  char buffer[256];
  while (scanf("%s", buffer) != EOF) {}
  return 0;
}

Here is what I get on running the two:

$ g++ cin_test.cc -o cin_test
$ time ./cin_test < /tmp/dummy

real    0m2.162s
user    0m1.696s
sys     0m0.332s 

$ g++ scanf_test.cc -o scanf_test
$ time ./scanf_test < /tmp/dummy

real    0m0.426s
user    0m0.248s
sys     0m0.084s

Well, the above results are consistent with our observations. But, when I think about it, I find it a little weird. Think about it! Our comparison shows that scanf() is about 5 times faster than cin. Five times!!
What went wrong with cin? On a high level both of them are wrappers over the read() system call, just syntactic sugar. The only visible difference is that scanf() has to explicitly declare the input type, whereas cin has the redirection operation overloaded using templates. This does not seem like a good enough reason for a performance hit of 5x.

Why is scanf faster than cin?
It turns out that iostream makes use of stdio's buffering system  So, cin wastes time synchronizing itself with the underlying C-library's stdio buffer, so that calls to both scanf()and cin can be interleaved.[1]

The good thing is that libstdc++ provides an option to turn off synchronization of all the iostream standard streams with their corresponding standard C streams using[2]

std::ios::sync_with_stdio(false);

and cin becomes faster than scanf() as it should have been.

Here is cin_test_2.c to show the same:

#include <iostream>

int main() {
  char buffer[256];
  std::ios_base::sync_with_stdio(false);
  while (std::cin >> buffer) {}
  return 0;
}

$ g++ cin_test_2.cc -o cin_test_2
$ time ./cin_test_2 < /tmp/dummy

real    0m0.380s
user    0m0.240s
sys     0m0.028s

And the world is as it should have been!!
Like with all things, there is a caveat here. With synchronization turned off, using cin and scanf() together will result in an undefined mess. 

With synchronization turned off, my benchmark indicates that cin is 8-10% faster than scanf(). This is probably because scanf() interprets the format arguments at runtime and makes use of variable number of arguments, whereas cin does this at compile time.

Now I am wondering, how fast can it be done?

$ time cat /tmp/dummy > /dev/null

real    0m0.185s
user    0m0.000s
sys     0m0.092s