I've been looking at some of the ARM discontigmem implementations, and
have come across a nasty bug. To illustrate this, I'm going to take
part of the generic kernel, and use the Alpha implementation to
illustrate the problem we're facing on ARM.
I'm going to argue here that virt_to_page() can, in the discontigmem
case, produce rather a nasty bug when used with non-direct mapped
kernel memory arguments.
In mm/memory.c:remap_pte_range() we have the following code:
page = virt_to_page(__va(phys_addr));
if ((!VALID_PAGE(page)) || PageReserved(page))
set_pte(pte, mk_pte_phys(phys_addr, prot));
Let's look closely at the first line:
page = virt_to_page(__va(phys_addr));
Essentially, what we're trying to do here is convert a physical address
to a struct page pointer.
__va() is defined, on Alpha, to be:
#define __va(x) ((void *)((unsigned long) (x) + PAGE_OFFSET))
so we produce a unique "va" for any physical address that is passed. No
problem so far. Now, lets look at virt_to_page() for the Alpha:
#define virt_to_page(kaddr) \
(ADDR_TO_MAPBASE(kaddr) + LOCAL_MAP_NR(kaddr))
Looks inoccuous enough. However, look closer at ADDR_TO_MAPBASE:
#define ADDR_TO_MAPBASE(kaddr) \
NODE_MEM_MAP(KVADDR_TO_NID((unsigned long)(kaddr)))
#define NODE_MEM_MAP(nid) (NODE_DATA(nid)->node_mem_map)
#define NODE_DATA(n) (&((PLAT_NODE_DATA(n))->gendata))
#define PLAT_NODE_DATA(n) (plat_node_data[(n)])
Ok, so here we get the map base via:
plat_node_data[KVADDR_TO_NID((unsigned long)(kaddr))]->
gendata.node_mem_map
plat_node_data is declared as:
plat_pg_data_t *plat_node_data[MAX_NUMNODES];
Lets look closer at KVADDR_TO_NID() and MAX_NUMNODES:
#define KVADDR_TO_NID(kaddr) PHYSADDR_TO_NID(__pa(kaddr))
#define __pa(x) ((unsigned long) (x) - PAGE_OFFSET)
#define PHYSADDR_TO_NID(pa) ALPHA_PA_TO_NID(pa)
#define ALPHA_PA_TO_NID(pa) ((pa) >> 36) /* 16 nodes max due 43bit kseg */
#define MAX_NUMNODES WILDFIRE_MAX_QBB
#define WILDFIRE_MAX_QBB 8 /* more than 8 requires other mods */
So, we have a maximum of 8 nodes total, and therefore the plat_node_data
array is 8 entries large.
Now, what happens if 'kaddr' is below PAGE_OFFSET (because the user has
opened /dev/mem and mapped some random bit of physical memory space)?
__pa returns a large positive number. We shift this large positive
number left by 36 bits, leaving 28 bits of large positive number, which
is larger than our total 8 nodes.
We use this 28-bit number to index plat_node_data. Whoops.
And now, for the icing on the cake, take a look at Alpha's pte_page()
implementation:
unsigned long kvirt; \
struct page * __xx; \
\
kvirt = (unsigned long)__va(pte_val(x) >> (32-PAGE_SHIFT)); \
__xx = virt_to_page(kvirt); \
\
__xx; \
Someone *please* tell me where I'm wrong. I really want to be wrong,
because I can see the same thing happening (in theory, and one report
in practice from a developer) on a certain ARM platform.
On ARM, however, we have cherry to add here. __va() may alias certain
physical memory addresses to the same virtual memory address, which
makes:
VALID_PAGE(virt_to_page(__va(phys)))
completely nonsensical.
I'll try kicking myself 3 times to see if I wake up from this horrible
dream now. 8)
-- Russell King (rmk@arm.linux.org.uk) The developer of ARM Linux http://www.arm.linux.org.uk/personal/aboutme.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/