Questions tagged [qubole]

Qubole Data Service (QDS) is cloud Big Data service running on an elastic Hadoop-based cluster

qubole
Filter by
Sorted by
Tagged with
0 votes
1 answer
55 views

Pyspark error- Invalid argument, not a string or column

I have a dataframe in Pyspark - df_all. It has some data and need to do the following count = ceil(df_all.count()/1000000) It gives the following error TypeError: Invalid argument, not a string or ...
user2280352's user avatar
0 votes
0 answers
8 views

How to view log file in qubole

I would like to retreive the Qubole usage report, but I didnt know where does the data stored, I dont want to download the log file everytime but my aim was to built a table out of it. table of log ...
Subhi's user avatar
  • 1
2 votes
1 answer
668 views

How do you write a presto query to split a string into its own column

Trying to splint a string into multiple columns in qubole using presto query. {"field0":[{"startdate":"2022-07-13","lastnightdate":"2022-07-16","...
Abe's user avatar
  • 23
0 votes
1 answer
689 views

Presto Pivoting Data

I am really new to Presto and having trouble pivoting data in it. The method I am using is the following: select distinct location_id, case when role_group = 'IT' then employee_number end as ...
llorcs's user avatar
  • 79
2 votes
1 answer
442 views

need regexp_extract help, beginner

I have string column "49b8b35e-b62c-4a42-9d73-192d131d127a,03c8a7e0-5153-11ec-873a-0242ac11000a,eec8aee4-0500-4940-b319-15924cc2d248" this string column has 3 values separate by ","...
ajk's user avatar
  • 21
2 votes
1 answer
48 views

Data comparisons in Qubole

I am very new to Qubole.We recently migrated Oracle ebiz data to Saleforce.We have both Ebiz and Salesforce data in the Qubole Data Lake.There are some discrepancies between Ebiz and Salesforce.What ...
user2280352's user avatar
1 vote
0 answers
429 views

Insert overwrite doesn't delete all the old data files

We are trying to insert overwrite a hive table. Most of the times it's overwriting as expected, i.e deleting any old files and replace new files. We are seeing some inconsistencies with this behavior, ...
Jas's user avatar
  • 11
1 vote
1 answer
826 views

Retrieve value in an array of an array with struct

I have a column in Hive table with type: array<array<struct<type:string,value:string,currency:string>>> Here is the sample of data in the column: [ [ { "type":...
user1761325's user avatar
0 votes
0 answers
355 views

Query Qubole data in Python

I'm trying to query Qubole data in Python, but running into some issues. Below is my code: from qds_sdk.qubole import Qubole Qubole.configure(api_token="api_token", api_url="https://us....
BirdPlay6's user avatar
0 votes
1 answer
719 views

How to safely insert parameters into a SQL query and get the resulting query?

I have to use a non DBAPI-compliant library to interact with a database (qds_sdk for Qubole). This library only allows to send raw SQL queries without parameters. Thus I would like a SQL injection-...
Roméo Després's user avatar
1 vote
1 answer
83 views

Exclude records with certain values in Qubole

Using Qubole I have Table A (columns in json parsed...) ID Recommendation Decision 1 GOOD GOOD 2 BAD BAD 2 GOOD BAD 3 GOOD BAD 4 ...
Kurlito's user avatar
  • 13
1 vote
2 answers
114 views

How to connect UiPath to Qubole Hive cluster and run a query

One of the teams using RPA in my company wants to automate reporting that is run in Qubole - Hive environment. The initial approach is to unleash the robot to log in to Okta, then Workbench in Qubole, ...
Krystian Duda's user avatar
0 votes
2 answers
306 views

How to get Python in Qubole to save CSV and TXT files to Azure data lake?

I have Qubole connected to Azure data lake, and I can start a spark cluster, and run PySpark on it. However, I can't save any native Python output, like text files or CSVs. I can't save anything other ...
HT.'s user avatar
  • 161
1 vote
2 answers
294 views

Result-set inconsistency between hive and hive-llap

we are using Hive 3.1.x clusters on HDI 4.0, with 1 being LLAP and another Just HIVE. we've created a managed tables on both the clusters with the row count being 272409. Before merge on both ...
Vinay K L's user avatar
0 votes
1 answer
462 views

How to change the timeout value when running commands on QDS

I've a spark-submit command that calls my python script. The code runs more than 36 hours, however because of the QDS timeout limit of 36 hours my command gets killed after 36 hours. Can someone help ...
Trupti's user avatar
  • 1
0 votes
1 answer
309 views

Logging and Debuging on Qubole

How does one log on Qubole/access logs from spark on Qubole? The setup I have: java library (JAR) Zeppelin Notebook (Scala), simply calling a method from the library Spark, Yarn cluster Log4j2 used ...
bde.dev's user avatar
  • 739
0 votes
1 answer
286 views

Spark Structured Streaming using spark-acid writeStream (with checkpoint) throwing org.apache.hadoop.fs.FileAlreadyExistsException

In our Spark app, we use Spark structured streaming. It uses Kafka as input stream, & HiveAcid as writeStream to Hive table. For HiveAcid, it is open source library called spark acid from qubole: ...
Shuwn Yuan Tee's user avatar
2 votes
1 answer
9k views

Pyspark Logging: Printing information at the wrong log level

Thanks for your time! I'd like to create and print legible summaries of my (hefty) data to my output when debugging my code, but stop creating and printing those summaries once finished to speed ...
Amit's user avatar
  • 41
0 votes
1 answer
887 views

Avoid pre-signed URL expiry when IAM role key rotates

In Airflow I have 2 tasks defined that run every day: the first one creates a zip file and saves it in AWS under s3://{bucket-name}/foo/bar/{date}/archive.zip the second one pre-signs that url (...
Maria Livia's user avatar
0 votes
3 answers
118 views

How to query table partitions list using

I need to programmatically query Qubole for the list of partitions for a Hive table. I can do this by calling the correct API endpoint as described here, but I would like to use the qds-sdj-java ...
GreenGiant's user avatar
  • 5,038
-1 votes
1 answer
206 views

trying to execute s3-sqs qubole connector for spark structured streaming

I am trying to follow, https://github.com/qubole/s3-sqs-connector and trying to load the connector but seems like the connector is not available on maven and while generating the buiold manually the ...
Dipesh's user avatar
  • 1
0 votes
1 answer
390 views

Qubole Presto datatype "Map" using the Like Operator

So I am trying to apply a simple like function for a Qubole query on Presto. For a string datatype I can simply do like '%United States of America%'. However for the column I am trying to apply ...
pp2000's user avatar
  • 35
1 vote
1 answer
302 views

Spark Submit Default Command line options

How can we change the parameters in Spark Submit Default Command line options in Qubole. Though there is a option to override the values if needed under "Spark Submit Command Line Options" but this ...
Throw's user avatar
  • 11
-1 votes
1 answer
88 views

Can I write an HTML script and pass information from the script to a cell on Qubole?

Is it possible to write an HTML script and have the user interact on the HTML script and pass the data back to the zeppelin cell and have it rerun the data passed back? Thank you! Update: Have some ...
Dillon's user avatar
  • 11
0 votes
1 answer
125 views

How to upgrade Python version on Qubole?

The current version on Qubole is 3.5.3, and some packages, like PyMC3 and future XGBoost need higher versions. How do I upgrade? And would that affect other clusters' settings? error message
HT.'s user avatar
  • 161
0 votes
1 answer
339 views

Unable to write or read from S3 bucket with Default AWS KMS encryption enabled

I am unable to read or write into a Default AWS KMS encrypted bucket without using the following configuration on my Qubole cluster fs.s3a.server-side-encryption-algorithm=SSE-KMS fs.s3a.server-side-...
Nunna Krishna Teja's user avatar
0 votes
1 answer
214 views

Qubole Kinesis Connector for Spark structured streaming throws an error

We are using Qubole Kinesis Connector (jar) for Spark structured streaming. This used to work fine but suddenly, it is throwing an error "S3 filesystem not found". We could use the KCL but we need ...
Lightning-Analytics's user avatar
0 votes
2 answers
64 views

Rest api in testdrive account?

Hi I am using Qubole trial version and it is test drive account so I am not getting API Token from control panel my accounts tab in qubole is there a way to access REST API's Now? Thanks in Advance
sai Kumar's user avatar
0 votes
2 answers
370 views

Running Scala jobs in Scheduler

My job runs fine in my notebook, but when I copy and paste the script into the Spark Scala scheduled job, I run into errors like "script.scala:15: error: not found: value sqlContext". What do I need ...
Paul Mineau's user avatar
0 votes
1 answer
82 views

PySpark Machine Learning on Wide Data in Qubole

I have a large dataset, with roughly 250 features, that I would like to use in a gradient-boosted trees classifier. I have millions of observations, but I'm having trouble getting the model to work ...
ErrorJordan's user avatar
0 votes
1 answer
92 views

Setting up AWS Glue to crawl Qubole

Currently I work with Qubole to access Hive data. I've added metadata from several databases, and want to add all the Hive metadata to AWS Glue. Is this possible? Any help is appreciated.
Ash_s94's user avatar
  • 797
0 votes
1 answer
109 views

Scale plot size of matplotlib plots in Qubole Notebook

Is there a possibility of increasing the size of the plot plotted using z.showplot() in qubole notebooks. import matplotlib as plt plt.figure() plt.bar(pandas_df_hr_sg[:]['hour'],pandas_df_hr_sg[:]['...
Mustajib Mohammed Khan's user avatar
0 votes
2 answers
254 views

How do I upgrade a library in Qubole's Jupyter Notebook, using PySpark?

Is there a way to do it right from a cell in the notebook? similar to pip install ... --upgrade I didn't know how to do what's instructed on https://docs.qubole.com/en/latest/faqs/general-questions/...
HT.'s user avatar
  • 161
0 votes
1 answer
169 views

How to pass --properties-file to spark-submit in Qubole?

I am using Spark in Qubole by having the clusters created in AWS. In Qubole Workbench, when I execute the below Command Line, it works fine and the command is successful /usr/lib/spark/bin/spark-...
Saravanan's user avatar
0 votes
2 answers
161 views

How to import a .py file to Qubole?

I'm connecting to Azure data lake, and I have the file there, but it's in a different path, and I don't know how to import it. Thank you in advance for your help!
HT.'s user avatar
  • 161
0 votes
1 answer
49 views

In the new Analyze UI, how do I edit the title of my query?

In the new Qubole Analyze UI that came out recently, I cannot seem to find a way to change the title of a command. In the old interface, I could click on the command title and it would become an ...
GreenGiant's user avatar
  • 5,038
1 vote
1 answer
664 views

How to create hive external table with avro file on qubole?

Can someone point in the doc to create external table on qubole base on avro files? CREATE TABLE my_table_name ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS ...
user10714010's user avatar
0 votes
1 answer
124 views

Performance analysis using Sparklens of Spark Streaming Application

I am trying to get performance analysis of a spark streaming application using sparklens. It is giving results like this Executor count 1 ( 80%) estimated time 01m 29s and estimated cluster ...
Abhay's user avatar
  • 697
0 votes
0 answers
898 views

How to fix 'Malformed class name' error in Spark Scala?

In Qubole notebook I am trying to get certain string from API response. It seems to be working just fine for sample data but fails when I use the full set. Spark version: 2.3.1; Scala version: 2.11; ...
Piotr's user avatar
  • 1
1 vote
2 answers
196 views

Implement case class inside a class

I am using the below code to run in Qubole Notebook and the code is running successfully. case class cls_Sch(Id:String, Name:String) class myClass { implicit val sparkSession = org.apache.spark....
Sarath Subramanian's user avatar
1 vote
1 answer
702 views

Extracting json field from string in Hive using dataset

I am trying a very basic hive query. I am trying to extract a json field from a dataset but I always get \N for the json field, however some_string comes okay Here is my query : WITH dataset AS ...
Bhavya Arora's user avatar
0 votes
1 answer
66 views

retrieve size of data copied with hadoop distcp

I am running a hadoop distcp command as below: hadoop distcp src-loc target-loc I want to know the size of the data copied by running this command. I am planning to run the command on Qubole. Any ...
sneha salvi's user avatar
2 votes
1 answer
6k views

How to create external tables from parquet files in s3 using hive 1.2?

I have created an external table in Qubole(Hive) which reads parquet(compressed: snappy) files from s3, but on performing a SELECT * table_name I am getting null values for all columns except the ...
S.Mehra's user avatar
  • 56
1 vote
1 answer
72 views

Get Qubole data row wise using java

Am trying to run a hive query using Qubole SDK. Though am able to get the desired result as string, in order to better process it, am looking to access this row-wise. Something like a list of java ...
roger_that's user avatar
  • 9,623
1 vote
1 answer
75 views

Recommendation on Performance optimization for SQL code

I have a code in Qubole that's taking almost 3 hours to execute. I am looking for some recommendations to decrease the code execution time. WITH -- Get latest date - 10 days before as day d AS ( ...
Flash's user avatar
  • 11
1 vote
1 answer
386 views

Syncing Qubole HIve table to Snowflake with Struct field

I have a table like following Qubole: use dm; CREATE EXTERNAL TABLE IF NOT EXISTS fact ( id string, fact_attr struct< attr1 : String, attr2 : String > ) STORED AS ...
Ambrish's user avatar
  • 3,647
1 vote
2 answers
190 views

Different results when distinct count by different time periods

I am trying to get a count of unique visitors. I first checked it by total without separating it by anytime frame. Main table (big data table sample): +-----------+----+-------+ |theDateTime|vD | ...
noobeerp's user avatar
  • 417
1 vote
1 answer
1k views

Big files causing shuffle error in hadoop map reduce

I am seeing the following error when I try to process big file like size > 35GB files, but doesn't happen when I try less big file like size < 10GB . App > Error: org.apache.hadoop.mapreduce....
Jal's user avatar
  • 2,244
0 votes
1 answer
192 views

Get correct value from array in Hive QL

I have a Wrapped Array and want to only get the corresponding value struct when I query with LATERAL VIEW EXPLODE. SAMPLE STRUCTURE: COLUMNNAME: theARRAY WrappedArray([null,theVal,valTags,[123,...
noobeerp's user avatar
  • 417
2 votes
1 answer
132 views

Debug failed shuffles in hadoop map reduces

I am seeing as the size of the input file increase failed shuffles increases and job complete time increases non linearly. eg. 75GB took 1h 86GB took 5h I also see average shuffle time increase 10 ...
Jal's user avatar
  • 2,244