你好,游客 登录 注册 搜索
背景:
阅读新闻

在Win7虚拟机下搭建Hadoop2.6.0+Spark1.4.0单机环境

[日期:2015-08-02] 来源:Linux社区  作者:simplestupid [字体: ]

Hadoop的安装和配置可以参考我之前的文章:在Win7虚拟机下搭建Hadoop2.6.0伪分布式环境  http://www.linuxidc.com/Linux/2015-08/120942.htm

本篇介绍如何在Hadoop2.6.0基础上搭建spark1.4.0单机环境。

1. 软件准备

scala-2.11.7.tgz

spark-1.4.0-bin-hadoop2.6.tgz

都可以从官网下载。

2. scala安装和配置

scala-2.11.7.tgz解压缩即可。我解压缩到目录/home/vm/tools/scala,之后配置~/.bash_profile环境变量。

#scala

export SCALA_HOME=/home/vm/tools/scala

export PATH=$SCALA_HOME/bin:$PATH

使用source ~/.bash_profile生效。

验证scala安装是否成功:

交互式使用scala:

3. spark安装和配置

解压缩spark-1.4.0-bin-hadoop2.6.tgz到/home/vm/tools/spark目录,之后配置~/.bash_profile环境变量。

#spark

export SPARK_HOME=/home/vm/tools/spark

export PATH=$SPARK_HOME/bin:$PATH

修改$SPARK_HOME/conf/spark-env.sh

export SPARK_HOME=/home/vm/tools/spark

export SCALA_HOME=/home/vm/tools/scala

export JAVA_HOME=/home/vm/tools/jdk

export SPARK_MASTER_IP=192.168.62.129

export SPARK_WORKER_MEMORY=512m

修改$SPARK_HOME/conf/spark-defaults.conf

spark.master spark://192.168.62.129:7077

spark.serializer org.apache.spark.serializer.KryoSerializer

修改$SPARK_HOME/conf/spark-defaults.conf

192.168.62.129 这是我测试机器的IP地址

启动spark

cd /home/vm/tools/spark/sbin

sh start-all.sh

测试Spark是否安装成功

cd $SPARK_HOME/bin/

./run-example SparkPi

SparkPi的执行日志:

 
  1 vm@Ubuntu:~/tools/spark/bin$ ./run-example SparkPi
  2 
  3 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
  4 
  5 15/07/29 00:02:32 INFO SparkContext: Running Spark version 1.4.0
  6 
  7 15/07/29 00:02:33 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  8 
  9 15/07/29 00:02:34 INFO SecurityManager: Changing view acls to: vm
 10 
 11 15/07/29 00:02:34 INFO SecurityManager: Changing modify acls to: vm
 12 
 13 15/07/29 00:02:34 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(vm); users with modify permissions: Set(vm)
 14 
 15 15/07/29 00:02:35 INFO Slf4jLogger: Slf4jLogger started
 16 
 17 15/07/29 00:02:35 INFO Remoting: Starting remoting
 18 
 19 15/07/29 00:02:36 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.62.129:34337]
 20 
 21 15/07/29 00:02:36 INFO Utils: Successfully started service 'sparkDriver' on port 34337.
 22 
 23 15/07/29 00:02:36 INFO SparkEnv: Registering MapOutputTracker
 24 
 25 15/07/29 00:02:36 INFO SparkEnv: Registering BlockManagerMaster
 26 
 27 15/07/29 00:02:36 INFO DiskBlockManager: Created local directory at /tmp/spark-78277899-e4c4-4dcc-8c16-f46fce5e657d/blockmgr-be03da6d-31fe-43dd-959c-6cfa4307b269
 28 
 29 15/07/29 00:02:36 INFO MemoryStore: MemoryStore started with capacity 267.3 MB
 30 
 31 15/07/29 00:02:36 INFO HttpFileServer: HTTP File server directory is /tmp/spark-78277899-e4c4-4dcc-8c16-f46fce5e657d/httpd-fdc26a4d-c0b6-4fc9-9dee-fb085191ee5a
 32 
 33 15/07/29 00:02:36 INFO HttpServer: Starting HTTP Server
 34 
 35 15/07/29 00:02:37 INFO Utils: Successfully started service 'HTTP file server' on port 56880.
 36 
 37 15/07/29 00:02:37 INFO SparkEnv: Registering OutputCommitCoordinator
 38 
 39 15/07/29 00:02:37 INFO Utils: Successfully started service 'SparkUI' on port 4040.
 40 
 41 15/07/29 00:02:37 INFO SparkUI: Started SparkUI at http://192.168.62.129:4040
 42 
 43 15/07/29 00:02:40 INFO SparkContext: Added JAR file:/home/vm/tools/spark/lib/spark-examples-1.4.0-hadoop2.6.0.jar at http://192.168.62.129:56880/jars/spark-examples-1.4.0-hadoop2.6.0.jar with timestamp 1438099360726
 44 
 45 15/07/29 00:02:41 INFO Executor: Starting executor ID driver on host localhost
 46 
 47 15/07/29 00:02:41 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44722.
 48 
 49 15/07/29 00:02:41 INFO NettyBlockTransferService: Server created on 44722
 50 
 51 15/07/29 00:02:41 INFO BlockManagerMaster: Trying to register BlockManager
 52 
 53 15/07/29 00:02:41 INFO BlockManagerMasterEndpoint: Registering block manager localhost:44722 with 267.3 MB RAM, BlockManagerId(driver, localhost, 44722)
 54 
 55 15/07/29 00:02:41 INFO BlockManagerMaster: Registered BlockManager
 56 
 57 15/07/29 00:02:43 INFO SparkContext: Starting job: reduce at SparkPi.scala:35
 58 
 59 15/07/29 00:02:43 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:35) with 2 output partitions (allowLocal=false)
 60 
 61 15/07/29 00:02:43 INFO DAGScheduler: Final stage: ResultStage 0(reduce at SparkPi.scala:35)
 62 
 63 15/07/29 00:02:43 INFO DAGScheduler: Parents of final stage: List()
 64 
 65 15/07/29 00:02:43 INFO DAGScheduler: Missing parents: List()
 66 
 67 15/07/29 00:02:43 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:31), which has no missing parents
 68 
 69 15/07/29 00:02:44 INFO MemoryStore: ensureFreeSpace(1888) called with curMem=0, maxMem=280248975
 70 
 71 15/07/29 00:02:44 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1888.0 B, free 267.3 MB)
 72 
 73 15/07/29 00:02:44 INFO MemoryStore: ensureFreeSpace(1186) called with curMem=1888, maxMem=280248975
 74 
 75 15/07/29 00:02:44 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1186.0 B, free 267.3 MB)
 76 
 77 15/07/29 00:02:44 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:44722 (size: 1186.0 B, free: 267.3 MB)
 78 
 79 15/07/29 00:02:44 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:874
 80 
 81 15/07/29 00:02:44 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:31)
 82 
 83 15/07/29 00:02:44 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
 84 
 85 15/07/29 00:02:45 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1344 bytes)
 86 
 87 15/07/29 00:02:45 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 1346 bytes)
 88 
 89 15/07/29 00:02:45 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
 90 
 91 15/07/29 00:02:45 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
 92 
 93 15/07/29 00:02:45 INFO Executor: Fetching http://192.168.62.129:56880/jars/spark-examples-1.4.0-hadoop2.6.0.jar with timestamp 1438099360726
 94 
 95 15/07/29 00:02:45 INFO Utils: Fetching http://192.168.62.129:56880/jars/spark-examples-1.4.0-hadoop2.6.0.jar to /tmp/spark-78277899-e4c4-4dcc-8c16-f46fce5e657d/userFiles-27c8dd76-e417-4d13-9bfd-a978cbbaacd1/fetchFileTemp5302506499464337647.tmp
 96 
 97 15/07/29 00:02:47 INFO Executor: Adding file:/tmp/spark-78277899-e4c4-4dcc-8c16-f46fce5e657d/userFiles-27c8dd76-e417-4d13-9bfd-a978cbbaacd1/spark-examples-1.4.0-hadoop2.6.0.jar to class loader
 98 
 99 15/07/29 00:02:47 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 582 bytes result sent to driver
100 
101 15/07/29 00:02:47 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 582 bytes result sent to driver
102 
103 15/07/29 00:02:47 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 2641 ms on localhost (1/2)
104 
105 15/07/29 00:02:47 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 2718 ms on localhost (2/2)
106 
107 15/07/29 00:02:47 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:35) finished in 2.817 s
108 
109 15/07/29 00:02:47 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
110 
111 15/07/29 00:02:47 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:35, took 4.244145 s
112 
113 Pi is roughly 3.14622
114 
115 15/07/29 00:02:47 INFO SparkUI: Stopped Spark web UI at http://192.168.62.129:4040
116 
117 15/07/29 00:02:47 INFO DAGScheduler: Stopping DAGScheduler
118 
119 15/07/29 00:02:48 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
120 
121 15/07/29 00:02:48 INFO Utils: path = /tmp/spark-78277899-e4c4-4dcc-8c16-f46fce5e657d/blockmgr-be03da6d-31fe-43dd-959c-6cfa4307b269, already present as root for deletion.
122 
123 15/07/29 00:02:48 INFO MemoryStore: MemoryStore cleared
124 
125 15/07/29 00:02:48 INFO BlockManager: BlockManager stopped
126 
127 15/07/29 00:02:48 INFO BlockManagerMaster: BlockManagerMaster stopped
128 
129 15/07/29 00:02:48 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
130 
131 15/07/29 00:02:48 INFO SparkContext: Successfully stopped SparkContext
132 
133 15/07/29 00:02:48 INFO Utils: Shutdown hook called
134 
135 15/07/29 00:02:48 INFO Utils: Deleting directory /tmp/spark-78277899-e4c4-4dcc-8c16-f46fce5e657d

在浏览器中打开地址 http://192.168.62.129:8080 可以查看spark集群和任务基本情况:

4. spark-shell工具

在/home/vm/tools/spark/bin下执行./spark-shell,即可进入spark-shell交互界面。通过spark-shell可以进行一些调试工作。

 
  1 vm@ubuntu:~/tools/spark/bin$ ./spark-shell
  2 
  3 log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
  4 
  5 log4j:WARN Please initialize the log4j system properly.
  6 
  7 log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
  8 
  9 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
 10 
 11 15/07/29 00:08:02 INFO SecurityManager: Changing view acls to: vm
 12 
 13 15/07/29 00:08:02 INFO SecurityManager: Changing modify acls to: vm
 14 
 15 15/07/29 00:08:02 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(vm); users with modify permissions: Set(vm)
 16 
 17 15/07/29 00:08:03 INFO HttpServer: Starting HTTP Server
 18 
 19 15/07/29 00:08:04 INFO Utils: Successfully started service 'HTTP class server' on port 56464.
 20 
 21 Welcome to
 22 
 23 ____ __
 24 
 25 / __/__ ___ _____/ /__
 26 
 27 _\ \/ _ \/ _ `/ __/ '_/
 28 
 29 /___/ .__/\_,_/_/ /_/\_\ version 1.4.0
 30 
 31 /_/
 32 
 33  
 34 
 35 Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80)
 36 
 37 Type in expressions to have them evaluated.
 38 
 39 Type :help for more information.
 40 
 41 15/07/29 00:08:37 INFO SparkContext: Running Spark version 1.4.0
 42 
 43 15/07/29 00:08:38 INFO SecurityManager: Changing view acls to: vm
 44 
 45 15/07/29 00:08:38 INFO SecurityManager: Changing modify acls to: vm
 46 
 47 15/07/29 00:08:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(vm); users with modify permissions: Set(vm)
 48 
 49 15/07/29 00:08:40 INFO Slf4jLogger: Slf4jLogger started
 50 
 51 15/07/29 00:08:41 INFO Remoting: Starting remoting
 52 
 53 15/07/29 00:08:42 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.62.129:59312]
 54 
 55 15/07/29 00:08:42 INFO Utils: Successfully started service 'sparkDriver' on port 59312.
 56 
 57 15/07/29 00:08:42 INFO SparkEnv: Registering MapOutputTracker
 58 
 59 15/07/29 00:08:42 INFO SparkEnv: Registering BlockManagerMaster
 60 
 61 15/07/29 00:08:43 INFO DiskBlockManager: Created local directory at /tmp/spark-621ebed4-8bd8-4e87-9ea5-08b5c7f05e98/blockmgr-a12211dd-e0ba-4556-999c-6249b9c44d8a
 62 
 63 15/07/29 00:08:43 INFO MemoryStore: MemoryStore started with capacity 267.3 MB
 64 
 65 15/07/29 00:08:43 INFO HttpFileServer: HTTP File server directory is /tmp/spark-621ebed4-8bd8-4e87-9ea5-08b5c7f05e98/httpd-8512d909-5a81-4935-8fbd-2b2ed741ae26
 66 
 67 15/07/29 00:08:43 INFO HttpServer: Starting HTTP Server
 68 
 69 15/07/29 00:08:57 INFO Utils: Successfully started service 'HTTP file server' on port 43678.
 70 
 71 15/07/29 00:09:00 INFO SparkEnv: Registering OutputCommitCoordinator
 72 
 73 15/07/29 00:09:02 INFO Utils: Successfully started service 'SparkUI' on port 4040.
 74 
 75 15/07/29 00:09:02 INFO SparkUI: Started SparkUI at http://192.168.62.129:4040
 76 
 77 15/07/29 00:09:03 INFO Executor: Starting executor ID driver on host localhost
 78 
 79 15/07/29 00:09:03 INFO Executor: Using REPL class URI: http://192.168.62.129:56464
 80 
 81 15/07/29 00:09:04 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 50636.
 82 
 83 15/07/29 00:09:04 INFO NettyBlockTransferService: Server created on 50636
 84 
 85 15/07/29 00:09:04 INFO BlockManagerMaster: Trying to register BlockManager
 86 
 87 15/07/29 00:09:04 INFO BlockManagerMasterEndpoint: Registering block manager localhost:50636 with 267.3 MB RAM, BlockManagerId(driver, localhost, 50636)
 88 
 89 15/07/29 00:09:04 INFO BlockManagerMaster: Registered BlockManager
 90 
 91 15/07/29 00:09:05 INFO SparkILoop: Created spark context..
 92 
 93 Spark context available as sc.
 94 
 95 15/07/29 00:09:07 INFO HiveContext: Initializing execution hive, version 0.13.1
 96 
 97 15/07/29 00:09:09 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
 98 
 99 15/07/29 00:09:09 INFO ObjectStore: ObjectStore, initialize called
100 
101 15/07/29 00:09:10 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
102 
103 15/07/29 00:09:10 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
104 
105 15/07/29 00:09:11 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
106 
107 15/07/29 00:09:13 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
108 
109 15/07/29 00:09:19 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
110 
111 15/07/29 00:09:20 INFO MetaStoreDirectSql: MySQL check failed, assuming we are not on mysql: Lexical error at line 1, column 5. Encountered: "@" (64), after : "".
112 
113 15/07/29 00:09:23 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
114 
115 15/07/29 00:09:23 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
116 
117 15/07/29 00:09:28 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
118 
119 15/07/29 00:09:28 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
120 
121 15/07/29 00:09:29 INFO ObjectStore: Initialized ObjectStore
122 
123 15/07/29 00:09:29 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 0.13.1aa
124 
125 15/07/29 00:09:31 INFO HiveMetaStore: Added admin role in metastore
126 
127 15/07/29 00:09:31 INFO HiveMetaStore: Added public role in metastore
128 
129 15/07/29 00:09:31 INFO HiveMetaStore: No user is added in admin role, since config is empty
130 
131 15/07/29 00:09:32 INFO SessionState: No Tez session required at this point. hive.execution.engine=mr.
132 
133 15/07/29 00:09:32 INFO SparkILoop: Created sql context (with Hive support)..
134 
135 SQL context available as sqlContext.
136 
137  
138 
139 scala>

下一篇将介绍分别用eclipse和IDEA搭建spark开发环境。

Ubuntu14.04下Hadoop2.4.1单机/伪分布式安装配置教程  http://www.linuxidc.com/Linux/2015-02/113487.htm

CentOS安装和配置Hadoop2.2.0  http://www.linuxidc.com/Linux/2014-01/94685.htm

Ubuntu 13.04上搭建Hadoop环境 http://www.linuxidc.com/Linux/2013-06/86106.htm

Ubuntu 12.10 +Hadoop 1.2.1版本集群配置 http://www.linuxidc.com/Linux/2013-09/90600.htm

Ubuntu上搭建Hadoop环境(单机模式+伪分布模式) http://www.linuxidc.com/Linux/2013-01/77681.htm

Ubuntu下Hadoop环境的配置 http://www.linuxidc.com/Linux/2012-11/74539.htm

单机版搭建Hadoop环境图文教程详解 http://www.linuxidc.com/Linux/2012-02/53927.htm

更多Hadoop相关信息见Hadoop 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=13

本文永久更新链接地址http://www.linuxidc.com/Linux/2015-08/120945.htm

linux
本文评论   查看全部评论 (0)
表情: 表情 姓名: 字数

       

评论声明
  • 尊重网上道德,遵守中华人民共和国的各项有关法律法规
  • 承担一切因您的行为而直接或间接导致的民事或刑事法律责任
  • 本站管理人员有权保留或删除其管辖留言中的任意内容
  • 本站有权在网站内转载或引用您的评论
  • 参与本评论即表明您已经阅读并接受上述条款