Airflow

Expand all | Collapse all

How to Enable Python Snowflake Connector in Airflow DAGs?

  • 1.  How to Enable Python Snowflake Connector in Airflow DAGs?

    Posted 05-02-2019 02:11
    Hello,

    I am trying to add the Python Snowflake connector to my Python cluster using the Node Bootstrap script. However, even though the script seems to run successfully on cluster startup, I am unable to use the connector in my Python DAGs.

    The code I added to the Node Bootstrap script is:

    pip install snowflake-connector-python

    In my Airflow DAGs, I should be able to reference this by running:

    import snowflake.connector

    However, I am getting the following error: Broken DAG: [/usr/lib/airflow/dags/unit_tests_recent_records.py] No module named snowflake.connector

    The cluster is ID 21268 and here are the settings:

    Airflow Version: 1.8.2
    Python Version: 2.7

    Is there an alternative way to enabled Snowflake for my Airflow DAGs?

    ------------------------------
    Robin Tanner
    ------------------------------


  • 2.  RE: How to Enable Python Snowflake Connector in Airflow DAGs?

    Quboler
    Posted 05-02-2019 02:34
    Hi Robin,

    All the airflow processes run on a python virtual environment, you will need to activate it first before installing any packages. The command to activate the virtualenv is given below. Try adding this to the top of your nodebootstrap script.

    # for python 2.7 cluster
    source ${AIRFLOW_VIRTUALENV_LOC}/bin/activate
    
    # for python 3.5 cluster
    source ${AIRFLOW_VIRTUALENV_LOC}/bin/activate ${AIRFLOW_VIRTUALENV_LOC}
    ​


    ------------------------------
    Joy Chattaraj
    Qubole
    ------------------------------



  • 3.  RE: How to Enable Python Snowflake Connector in Airflow DAGs?

    Posted 05-02-2019 10:44
    Hi Joy,

    Thank you for responding. I received an error that my bootstrap script failed for my Python 3.5 environment. This is my Node Bootstrap script:

    # for python 3.5 cluster
    source ${AIRFLOW_VIRTUALENV_LOC}/bin/activate ${AIRFLOW_VIRTUALENV_LOC}

    pip3 install snowflake-connector-python==1.7.11


    My DAG still shows the error about the Snowflake connector missing.

    I am unable to see any logs on what the error is. Is there a location for this?

    The only error I see is in the Activity screen:

    Bootstrapping And Finalizing: Bootstrapping failed with status 127. (8s)
    Today 07:26 AM
    ​​

    ------------------------------
    Robin Tanner
    ------------------------------



  • 4.  RE: How to Enable Python Snowflake Connector in Airflow DAGs?

    Quboler
    Posted 05-02-2019 15:51
    Edited by Joy Chattaraj 05-02-2019 16:11
    Hi Robin,

    For Airflow clusters the nodebootstrap logs are not available on the UI as of now, but it has been fixed in our upcoming release which will be made available soon. For now you can check the the bootstrap logs in the location mentioned below (you can ssh on the cluster and view it or print this using a shell command from the analyze page):

    /media/ephemeral0/logs/others/node_bootstrap.log​


    Also FYI, if it's a python3.5 cluster there's a better way to install and maintain pacakges, using the environments page. You will find an environment with the name "default_env_<cluster_id>" on the environments page which is attached to your cluster. More details can be found here.

    The Python 3 cluster uses an Anaconda VirtualEnv instead of the regular python one, and with package management as mentioned above you can install/remove all packages easily from the UI, except noarch python packages. The Support for noarch versions is also coming soon.

    ------------------------------
    Joy Chattaraj
    Qubole
    ------------------------------