Spark toDebugString non bello in python

Questo è ciò che ottengo quando uso toDebugString in scala:Spark toDebugString non bello in python

scala> val a = sc.parallelize(Array(1,2,3)).distinct 
a: org.apache.spark.rdd.RDD[Int] = MappedRDD[3] at distinct at <console>:12 

scala> a.toDebugString 
res0: String = 
(4) MappedRDD[3] at distinct at <console>:12 
| ShuffledRDD[2] at distinct at <console>:12 
+-(4) MappedRDD[1] at distinct at <console>:12 
    | ParallelCollectionRDD[0] at parallelize at <console>:12

Questo è l'equivalente in pitone:

>>> a = sc.parallelize([1,2,3]).distinct() 
>>> a.toDebugString() 
'(4) PythonRDD[6] at RDD at PythonRDD.scala:43\n | MappedRDD[5] at values at NativeMethodAccessorImpl.java:-2\n | ShuffledRDD[4] at partitionBy at NativeMethodAccessorImpl.java:-2\n +-(4) PairwiseRDD[3] at RDD at PythonRDD.scala:261\n | PythonRDD[2] at RDD at PythonRDD.scala:43\n | ParallelCollectionRDD[0] at parallelize at PythonRDD.scala:315'

Come si può vedere, l'output non è così bello in python come in scala. C'è qualche trucco per avere un output migliore di questa funzione?

Sto utilizzando Spark 1.1.0.

fonte

2014-10-13 poiuytrez

Prova ad aggiungere una dichiarazione print modo che la stringa di debug è effettivamente stampato, invece di visualizzare la sua __repr__:

>>> a = sc.parallelize([1,2,3]).distinct() 
>>> print a.toDebugString() 
(8) PythonRDD[27] at RDD at PythonRDD.scala:44 [Serialized 1x Replicated] 
| MappedRDD[26] at values at NativeMethodAccessorImpl.java:-2 [Serialized 1x Replicated] 
| ShuffledRDD[25] at partitionBy at NativeMethodAccessorImpl.java:-2 [Serialized 1x Replicated] 
+-(8) PairwiseRDD[24] at distinct at <stdin>:1 [Serialized 1x Replicated] 
    | PythonRDD[23] at distinct at <stdin>:1 [Serialized 1x Replicated] 
    | ParallelCollectionRDD[21] at parallelize at PythonRDD.scala:358 [Serialized 1x Replicated]

fonte

2014-10-13 14:55:36

non ha excuted, proprio nella cache dovreste usare:

a = sc.parallelize([1,2,3]).distinct() 
a.collect() 
[1, 2, 3]

fonte

2015-12-07 14:08:04 user3409371

Spark toDebugString non bello in python

risposta

Problemi correlati