NLS_SORT would be best set at the session level, I would guess.

A few questions:

How many rows in the table?

How many rows are returned by the query?

What is your sort_area_size set to?

What kind of performance difference are you getting between order by and no order by for the complete return of the result set (not just first row)?