come eseguire lo script python in spark job?

Ho impostato la scintilla su 3 macchine usando il metodo tar. Non ho fatto alcuna configurazione avanzata, ho modificato il file slaves e avviato master e worker. Sono in grado di vedere sparkUI sulla porta 8080. Ora voglio eseguire un semplice script python su spark cluster.come eseguire lo script python in spark job?

import sys 
from random import random 
from operator import add 

from pyspark import SparkContext 


if __name__ == "__main__": 
    """ 
     Usage: pi [partitions] 
    """ 
    sc = SparkContext(appName="PythonPi") 
    partitions = int(sys.argv[1]) if len(sys.argv) > 1 else 2 
    n = 100000 * partitions 

    def f(_): 
     x = random() * 2 - 1 
     y = random() * 2 - 1 
     return 1 if x ** 2 + y ** 2 < 1 else 0 

    count = sc.parallelize(xrange(1, n + 1), partitions).map(f).reduce(add) 
    print "Pi is roughly %f" % (4.0 * count/n) 

    sc.stop()

Sto facendo funzionare questo comando

scintilla presentare scintilla --master: // IP: 7077 pi.py 1

Ma ottenere seguente errore

14/12/22 18:31:23 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 
14/12/22 18:31:38 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 
14/12/22 18:31:43 INFO client.AppClient$ClientActor: Connecting to master spark://10.77.36.243:7077... 
14/12/22 18:31:53 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 
14/12/22 18:32:03 INFO client.AppClient$ClientActor: Connecting to master spark://10.77.36.243:7077... 
14/12/22 18:32:08 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 
14/12/22 18:32:23 ERROR cluster.SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up. 
14/12/22 18:32:23 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
14/12/22 18:32:23 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0 
14/12/22 18:32:23 INFO scheduler.DAGScheduler: Failed to run reduce at /opt/pi.py:21 
Traceback (most recent call last): 
    File "/opt/pi.py", line 21, in <module> 
    count = sc.parallelize(xrange(1, n + 1), partitions).map(f).reduce(add) 
    File "/usr/local/spark/python/pyspark/rdd.py", line 759, in reduce 
    vals = self.mapPartitions(func).collect() 
    File "/usr/local/spark/python/pyspark/rdd.py", line 723, in collect 
    bytesInJava = self._jrdd.collect().iterator() 
    File "/usr/local/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__ 
    File "/usr/local/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value 
py4j.protocol.Py4JJavaError: An error occurred while calling o26.collect. 
: org.apache.spark.SparkException: Job aborted due to stage failure: All masters are unresponsive! Giving up. 
     at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185) 
     at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174) 
     at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173) 
     at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) 
     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

fa chiunque abbia lo stesso problema. Plz aiuto in questo.

fonte

2014-12-22 sandip divekar

questo:

WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

suggerisce che il cluster non ha le risorse disponibili.

Controllare lo stato del cluster e ispezionare core e RAM (http://www.datastax.com/dev/blog/common-spark-troubleshooting).

Inoltre, ricontrolla i tuoi indirizzi IP.

Per ulteriori idee: Running a Job on Spark 0.9.0 throws error

fonte

2015-09-06 09:31:03

come eseguire lo script python in spark job?

risposta

Problemi correlati