hduser@nn1:/usr/local/hdpmetaxxcache$
________________________________________________________________
2. Copy the data file from the local directory to HDFS using the following command:
3. Verify the data file on HDFS with the following command: hadoop fs -ls
/project/data
4. Move the data file from the local directory to HDFS with the following command:
hadoop fs -mv file:///data/datafile /user/hduser/data
5. Use the distributed copy to copy the large data file to HDFS:
This command will initiate a MapReduce job with a number of mappers to run the
copy task in parallel.
________________________________________________________________
To copy multiple files from the local directory to HDFS, we can use the following
command:
hadoop fs -copyFromLocal src1 src2 data
This command will copy two files src1 and src2 from the local directory to the data
directory on HDFS.
Similarly, we can move files from the local directory to HDFS. Its only difference
from the previous command is that the local files will be deleted.
This command will move two files, src1 and src2, from the local directory to HDFS.
Although the distributed copy can be faster than the simple data importing
commands, it can incur a large load to the node that the data resides on because of
the possibility of high data transfer requests. distcp will be more useful when
copying data from one HDFS location to another. For example: