NetAffx Data Analysis Support—Getting Started | Thermo Fisher Scientific

Find valuable information.

Optimize your experiments to get the best results. We’ve compiled a detailed knowledge base of the top tips and tricks to meet your research needs.

View the relevant questions below:

Having problems with your experiment? Visit our

Troubleshooting page

Browse our FAQ database for
more information ›

NetAffx™ Analysis Center

What is the NetAffx™ Analysis Center?

The NetAffx™ Analysis Center is the most comprehensive resource of integrated array contents and functional annotations available. The flexible query capabilities provided help you retrieve biological information for specific probe sets. Please read our Analysis Center tutorial for more information.

Who can use the NetAffx™ Analysis Center?

The NetAffx™ Analysis Center is available to both customers and non-customers at no cost. The only requirement for use is the completion of a short registration form.

How much does it cost to use the NetAffx™ Analysis Center?

The NetAffx™ Analysis Center is available to both customers and non-customers at no cost. The only requirement for use is the completion of a short registration form.

What are the benefits of registering to use the NetAffx™ Analysis Center?

By registering, you receive unprecedented access to array content information, including probe sequences, and gene annotations. This information will enable you to derive the maximum value from your GeneChip™ array data.

How can you ensure that my data is secure?

The security of your valuable scientific data is very important to us. We neither store nor track any of the data you enter into our web site, including sequence information, probe set IDs, gene names, or identifiers. Also, we do not individually profile the external links you are using. All scientific data transferred to and from the NetAffx™ Analysis Center is secured by Secure Sockets Layer (SSL) level encryption. Information collected on forms in the Analysis Center-including name, address, and billing and shipping information-is also fully secured by SSL. This industry-standard Internet security protocol ensures that your data is fully protected while in transit, similar to that of Internet banking and standard credit card transactions.

How often is data updated on NetAffx™ Analysis Center?

NetAffx™ Analysis Center contains both public and in house-generated data. Unless otherwise noted, public data representations are updated once every quarter. Because we have considerable control over our proprietary internal data, we will keep you informed as new data and databanks become available.

How do I learn more about the databases supported by the NetAffx™ Analysis Center?

Our Database Information Resource describes the functions and contents of the various databases used in the Analysis Center. Additionally, you can use the hyperlinks of the databases listed in the "top page" of the Analysis Center to obtain summaries of the databases.

Can I enter multiple query values at once?

Yes, you can. To do this, use the quick search and standard search methods within the Analysis Center. When performing this type of query, separate the values using the "|" symbol. Use the Batch Query capability to query a larger number of values. This option allows you to enter multiple accession numbers, gene names, or probe set IDs. For more information, please download the NetAffx™ Analysis Center User's Guide.

What is Batch Query?

Batch Query is the query process that allows you to enter up to 500 query values at once. Batch Query helps you retrieve multiple annotations and query array contents using a large number of probe set IDs, gene names, or accession numbers.

What arrays does the NetAffx™ Analysis Center support?

The NetAffx™ Analysis Center supports most of our current catalog GeneChip™ arrays and several discontinued designs.

Does the NetAffx™ Analysis Center support spotted array users?

The NetAffx™ Analysis Center does not support spotted array users. The NetAffx Analysis Center is intended for users of GeneChip™ arrays. Because, our spotted array users are very important to us, we have generated many ideas for ways in which we could expand NetAffx Analysis Center to include spotted array support. If you have any suggestions for ways we could do this, or if you would to request a feature that is not currently on our site, please contact us.

Can I obtain probe, target, consensus/exemplar sequences for GeneChip™ probe arrays?

Yes, you can. The Target sequence (the sequence from which probes are selected) information is integrated with the probe set records in the Affymetrix Array Target Sequences databanks. The Consensus/Exemplar sequences (Consensus sequences are derived from sub- clustering UniGene clusters, and Exemplar sequences refer to the longest member of an Affymetrix sub-cluster) can be obtained from the Affymetrix Array Consensus and Exemplar Sequences databanks. The probe sequences for the catalog GeneChip™ probe arrays are now available.

What are the Gene Ontology™ (GO) annotations? Are GO annotations provided for all GeneChip™ expression arrays?

The GO consortiumprovides controlled vocabularies for the description of the molecular function, biological process, and cellular component of gene products. Currently, GO classifications are available for the Human (U133, U95, HuGeneFL, HC-G110), Murine (U74, Mu11k, Mu19k), Drosophila, Arabidopsis, and Yeast GeneChip™ arrays. The Analysis Center also provides hyperlinks to the AMIGO and QuickGO browsers, enabling you to view the complete hierarchy for all the GO terms associated with the probe sets.

What parameters can I specify for BLAST searching? Can I BLAST my sequence against all arrays?

Several BLAST search options are supported in the NetAffx™ Analysis Center. These options include the ability to turn on/off the Filter Sequence and Perform Gapped Alignment options, and the ability to select E-values from a drop-down menu. You can choose to BLAST your sequence against a particular target sequence database, or against all of the databases at once.

Where can I find the Affymetrix sub-cluster information?

The Affymetrix sub-cluster information is available as an independent databank in the NetAffx™ Analysis Center under the "Affymetrix sub-cluster libraries group".

What is the relevance of Affymetrix sub-cluster databanks?

The Affymetrix sub-cluster sequences are the building blocks of probe selection. In other words, an Affymetrix sub-cluster is a group of sequences all representing the same transcript, denoted by the transcript ID field in a Target Databank record. The sub-cluster databanks, in the NetAffx™ Analysis Center, provide a list of all the accession numbers of the public sequences used for probe selection. Sub-cluster databanks may also be queried with accession numbers to identify your favorite genes on GeneChip™ arrays. Use the sub-cluster information to precisely identify the correlation of probes with your favorite EST or mRNA sequences in public databases. This analysis is also relevant in the interpretation of GeneChip™ expression results.

Using the sub-cluster databanks, how can I determine if my favorite sequence was used for probe selection?

Obtain the accession number for your gene of interest from a public database - such as UniGene, GenBank, or dbEST-and query the corresponding sub-cluster databanks in the NetAffx™ Analysis Center. You may also perform a sequence alignment of your favorite gene with Affymetrix probe sequences, using the Probe Match tool.

How do I link directly (deep link) to Probe Set Annotation summaries in the Analysis Center?

You can now link directly to NetAffx™ probe set summaries from within your own applications or websites using the following URL:

To link to information for individual probe sets, use the following URL format:

https://www.affymetrix.com/LinkServlet?array=<ARRAYNAME>&probeset=<PROBESET>

To link to information for a list of probe sets, use the following URL format:

https://www.affymetrix.com/LinkServlet?array=<ARRAYNAME>&probeset=<PROBESET1,PROBESET2, etc…>

Detailed Information:

For details on the valid values of the ARRAYNAME and PROBESET parameters, please refer to the Direct Access To Probe Set Information Manual

Examples of Deep Links:

https://www.affymetrix.com/LinkServlet?probeset=10156_at

https://www.affymetrix.com/LinkServlet?array=U74&probeset=129277_at

https://www.affymetrix.com/LinkServlet?array=U95&probeset=1000_at

https://www.affymetrix.com/LinkServlet?probeset=1000_at,1002_f_at,38996_at

How do I cite the NetAffx™ Analysis Center in a publication?

When citing the NetAffx Analysis Center, please refer to:

Liu G, Loraine AE, Shigeta R, Cline M, Cheng J, Valmeekam V, Sun S, Kulp D, Siani-Rose MA. NetAffx: Affymetrix probesets and annotations. Nucleic Acids Res. 2003;31(1):82-6.

Where are the annotation and Sequence Files for download?

The annotation and sequence files contain the complete entries for all probe sets on the array, taken from the NetAffx Analysis Center. They are intended to be used primarily in spreadsheet applications and database programs (such as SQL databases). Interactive and Batch queries can be performed in the NetAffx Analysis Center to find information for individual probe sets of interest.

Annotation files are available for most Affymetrix GeneChip™ Arrays. Please select your array of interest.

Alignments
Manual, Alignments to Genome in PSL Format

What procedure is used to generate the genome alignment data?

Genome alignments are currently provided for Human/Mouse/Rat/Drosophila/C. elegans arrays. We align the target sequences against the genome sequence downloaded from the UCSC website using BLAT. While some of the target sequences do not align, perhaps due to the draft nature of several genomes, some targets align at multiple locations on the genome. We apply a filter to select the best hit for each target sequence. We use the following procedure:

calculate a score for each alignment as follows:
score=matches - (mismatches+5*qbaseinsert)
where matches = number of bases that match (including both repeat and non-repeat regions)
mismatches = number of bases that do not match in the alignment
qbaseinsert = number of bases inserted in the query
It is therefore possible that some of the scores are negative.
Pcgood=score*100/target size
The pcgood metric is provided on the web site and in the download files.
Select the alignment with the best score.
Derive genomic coordinates for the probes (25-mers) from the ""best"" target sequence alignment.

We use the genomic coordinates for each probe (25-mer) from above and search for transcripts (RefSeq and GenBank mRNA alignments to the genome from UCSC genome database) that overlap with the alignment of the probes. In the NetAffx summary report, we provide the transcript whose genomic alignment overlaps with the maximum number of probes from that particular probe set. We also provide the total number of probes from the probe set that overlap with the transcript as measure of the ability of the probe set to detect the corresponding transcript.

What is the difference between scaling and normalization when I scale or normalize my data to all genes on the array?

When scaling the data, you designate an arbitrary target signal and the Microarray Suite software scales the average intensity of all genes on each array, within a data set, to the target signal you specified. This process enables you to compare multiple arrays within a data set. We advise that you use the same target signal across all arrays being compared. Scaling can be performed independently of the comparison analysis. On the other hand, normalization can only be done when doing the comparison analysis in the Microarray Suite. In this case, the software compares an experimental array with a baseline array, and normalizes the average intensity of the experimental array to the average intensity of the baseline array during normalization. The normalization factor for a particular array changes when you change the comparison baseline array.

How important is it to evaluate the value of the Scaling Factor between different arrays?

Scaling Factor is the multiplication factor applied to each Signal value on an array. A Scaling Factor of 1.0 indicates that the average array intensity is equal to the Target Intensity. Scaling Factors will vary across different samples and there are no set guidelines for any particular sample type. However, if they differ by too much within a set of experiments, approximately 3-fold or more, this indicates wide variation in the .dat files. Therefore, the analyzed data (in the .chp file) should be treated with caution.

Which is greater, sample or assay variability?

Sample variability, which arises mainly from biological heterogeneity, is certainly higher than assay variability, and has been estimated to be at least 10-fold greater. We recommend that researchers run multiple samples per data point to account for sample-to-sample variability. In addition, carefully design the experiment in order to minimize potential variation associated with the samples.

Probe Analysis

Are probe sequences are available through the NetAffx™ Analysis Center?

Yes, probe sequences for all GeneChip™ probe arrays are available as an independent databank in the NetAffx™ Analysis Center. Query these databanks by probe set ID to search for probe sequences of interest. You may also "link" to this information from the Target or Sub-cluster databanks in the Analysis Center. The Download Center has compressed (*.zip) files of probe sequences to make it convenient for you to perform bulk downloads, such as all the probe sequences for a probe array of interest (for example, the HG-U133A array).

What relevant information is provided along with the Probe Sequences?

Serial Order: The relative order of probe sequences as they align with the consensus/exemplar sequence.
Probe Interrogation Position: The position of the 13th ("middle") nucleotide of the probe sequence as it aligns on the consensus/exemplar sequence.
Probe X/Probe Y coordinates: The X and Y coordinates of the probe sequence on the GeneChip™ array.
Target Strandedness: The sense/antisense orientation of the target sequence that can hybridize with the probe sequence.

How can I quickly determine if my favorite sequence aligns with probes on GeneChip™ arrays?

To quickly determine matching probes on GeneChip™ arrays, paste or upload a text file with the DNA sequence corresponding to your gene of interest in the Probe Match tool. To learn more about the Probe Match Tool, review the Probe Match Tool User's Guide.

What is the recommended sequence format for uploading into the Probe Match tool?

You may paste or upload sequences in FASTA format for analysis with the Probe Match tool. Please note: You may also paste sequence without any header information.

Can I upload or paste multiple DNA sequences into the Probe Match tool?

Yes, you can upload or paste a text file into the Probe Match tool with multiple sequences in FASTA format. Please note: The size of the uploaded file must not exceed 20 KB. (To determine the file size on a Windows-based system, right-click the text file icon, and select the "Properties" option).

Can I input a protein sequence into the Probe Match tool?

The Probe Match tool does not accept protein sequences as input. However, you may reverse-translate the protein sequence into DNA, and then use it for analysis with the Probe Match tool.

What is the difference between the BLAST algorithm and the alignment algorithm used by the Probe Match tool?

Alignments by the Probe Match tool produce positive results only if every base in a probe sequence matches perfectly (that is, the aligned bases are identical) with those in the query (input) sequence without any gaps in the query sequence. This algorithm is different from the BLAST algorithm, because the BLAST algorithm allows mismatches and gaps within the query sequence to produce a positive alignment.

Why does the alignment position detected by Probe Match tool differ from the probe interrogation position information associated with the probe sequence record?

The probe interrogation position, provided with the probe sequence information, indicates the base position on the consensus/exemplar sequence where the central base of the probe aligns (the 13th base of a 25mer probe). The position information generated by the Probe Match tool is strictly based on the alignment against the query (input) sequence. Because the consensus/exemplar may not be co-linear with your input sequence, the probe interrogation position in a probe sequence record may not match the output from the Probe Match tool.

Can I use the Probe Match tool to determine if a probe set detects splice variants?

Yes, you can. To do this, use all the sequences corresponding to the splice variants of a gene as input for the Probe Match tool. Please note: Use caution when interpreting positive results, and when considering the total number of probes in a probe set that actually match the various isoforms. The total number of probes matching the query sequence is provided in the "# Probes Matching Query" column of the Probe Match output. Discard the results if the number of matching probes constitutes less than 70% of the total number of probes in a probe set.

When should I select the "Search With Reverse Complement of Query" option in the Probe Match tool?

The probe sequences on the anti-sense arrays are designed to hybridize with the anti-sense strand of the corresponding gene sequence. This means that all the probe sequences are in the sense orientation. Select the "Search With Reverse Complement of Query" option if you do not obtain any matches (or "hits"), with the query sequence or if you are not sure about the orientation of your query (input) sequence. Please note: Use caution when interpreting results from such a query to avoid selecting false positive "hits" to your gene of interest.

What is contained in the tab-delimited format of the probe sequence download file?

The probe sequence download file in the Download Center is a tab-delimited file containing the following columns:

Probe Set Name: For example, 1007_s_at

Probe X: The X coordinate of the probe sequence on the GeneChip® probe array.

Probe Y: The Y coordinate of the probe sequence on the GeneChip probe array.

Probe Interrogation Position: The position of the 13th ("middle") nucleotide of the probe sequence as it aligns on the consensus/exemplar sequence.

Probe Sequence: The 25-base perfect match sequence.

Target Strandedness: The sense/antisense orientation of the target sequence that can hybridize with the probe sequence.

Note: Probe Sequence databanks in the NetAffx™ Analysis Center have an additional column called, "Serial Order". This column provides the relative order of probe sequences as they align with the consensus/exemplar sequence.

What is the probe orientation on the IVT Expression chips?

The probes on the 3' IVT platform are in the sense orientation.

What is the probe interrogation position?

The probe interrogation position indicates the base position on the consensus/exemplar sequence where the central base of the probe aligns, which is the 13th base of a 25mer probe.

What do the suffixes mean at the end of each probeset?

The probe set names never change, but they can give you an idea of what was known about the sequence at the time of design. _at = all the probes hit one known transcript. _a = all probes in the set hit alternate transcripts from the same gene _s = all probes in the set hit transcripts from different genes _x = some probes hit transcripts from different genes.

Exon Array Analysis

How big are the .dat and .cel files?

The .dat files are approximately 750 MB in size, and the binary .cel files are approximately 60 MB in size.

What are the basic quality assessment metrics for the Exon Array?

The standard Affymetrix expression control sets (i.e., bacterial spikes) including both the hybridization control spikes and poly-A RNA control spikes are present on the Human Exon 1.0 ST Array and can be used to assess quality in a qualitative manner as used previously. As a new feature unique for the Exon Array, the Expression Console software supports the extraction of residuals from the PLIER model when running a multiple-array analysis. Such residuals can be used to identify outlier arrays and poor performing probes. In addition, for most of the 100 normalization control genes used in the design of the GeneChip™ Human Genome U133 Array, probe sets representing both their intron and exon regions have been tiled on the Exon Array. The exon and intron probe set metrics (i.e., % detection p-value <= 0.05, mean/median signal) can be used as an additional method to identify problematic outlier arrays or analysis problems. For these new experimental analysis metrics, there are no simple standard numeric threshold values that we can recommend at this moment as cutoffs for identifying good-performing or poor-performing arrays. However, it has been found that they are valuable when comparing relative values across a set of experiments for identifying outlier arrays. See the Quality Assessment of Exon Arrays white paper for more information.

Can I compare the Signal value between two exons directly to obtain a ratio of two different transcripts?

No. The new stain-dispensing script was designed to increase productivity and reduce the chance of error during pipetting of the stains. However, some may feel more comfortable dispensing stains by hand. For manual and automated stain-dispensing procedures, please consult the latest version of the GeneChip™ Expression Analysis Technical Manual for HT Array Plates Using the GeneChip™ Array Station (login required).

What analysis software packages are available for Exon Array data analysis?

GeneChip™ Command Console (AGCC) is provided to control the fluidics station, 7G Scanner and generate .dat and .cel files. Probe level analysis of the .cel files is carried out by Expression Console. This software allows for the generation of signal estimates and detection p-values at the probe set level for either exon-level or gene-level analysis. Expression Console Software is freely downloadable from our web site and supported by the Technical Support team. For additional higher levels of analysis, such as alternative splicing detection, a few experimental algorithms are published as methods in respective white papers. Users will need to implement these methods in advanced statistical analysis software packages. These methods have been developed based on experience with limited sample data sets, and further fine-tuning may be required depending on the user's unique biological systems. It is anticipated that new methods will continue to emerge to better support Exon Array analysis in the near future.

How long will it take for me to analyze the data?

Normalization of .cel files takes 1-3 minutes and generation of DABG and PLIER probe set level summaries takes about 40 minutes per .cel file. Further downstream analysis will depend on which data analysis techniques are being applied.

What can I do to reduce the analysis time required to generate probe set level signal?

Two primary factors can affect the length of analysis time: 1) the number of arrays analyzed at one time, and 2) the number of probe sets to be included in the analysis. Reducing the size of either one of these two factors will increase the analysis speed.

What is DABG, PLIER?

DABG stands for "detection above background" and is a detection metric generated by comparing Perfect Match probes to a distribution of background probes. This comparison yields a p-value which is then combined into a probe set level p-value using the Fischer equation. PLIER stands for "Probe Logarithmic Intensity Error" and is a model-based signal estimator which benefits from multi-array analysis. For more information on DABG, see the "Exon Array Background Correction" white paper; for more information on PLIER, refer to the PLIER Technical Note.

Can I get gene-level expression results?

The current exon array analysis software allows for the aggregation of multiple exon level probe sets into a larger "meta probe set". PLIER signal estimates and DABG detection values are then computed for these meta probe sets. The definition and grouping of the exons into a gene can have a significant impact on the final signal value of a particular gene. Affymetrix recommends using the "core gene" grouping or the "full gene" grouping files to derive the gene-level signal that should most resemble the expression of the constitutive exons. See the "Gene Signal Estimates from Exon Arrays" white paper for more information.

How do I correlate the SNPs on the mapping array with this design?

Associations between SNPs and exon probe sets can be obtained by using genome assembly position information which is provided for both the mapping array and the exon array. One useful tool for doing this is the UCSC Table Browser.

When I loaded the exon array design information and array data into IGB, it seems that the probes were selected from outside of the RefSeq exons. Why?

The most likely explanation is that a different version of the genome assembly has been used to display the array design information and the array results. At launch, two versions of the library files are provided for array analysis corresponding to the Human Genome Build 34 and 35. Take care to use a consistent version number to match the array design with the actual array data for visualization in IGB.

How do you move the .cel files out of GCOS to be imported into Expression Console?

GCOS users must use DTT v1.1, using the Flat File option, to transfer files to be analyzed by the Expression Console software from the GCOS database to an independent folder.

Two alternative splicing detection methods were described in the white papers on your web site. Are they related to the ANOSVA SVD algorithm that was published in Bioinformatics in 2005 by Cline, et al?

No, the methods in the white papers are not ANOSVA which was published in the paper referenced. One note about ANOSVA is that it was developed on a combination exon/junction array. Some preliminary work comparing ANOSVA and PAC (one of the two methods described in the white papers) on the exon array tissue panel suggests that the MiDAS and the robust PAC methods presented in the white papers are a better way to go.

Is there an option to view the various types of annotation information from NetAffx™ Analysis Center in IGB so you can combine the textual annotation with visualization?

One way to do this is to open an IGB window displaying the probe sets of interest, and then right-click on the probe set and select the Get Info menu item which will open up the corresponding NetAffx™ Analysis Center page, revealing the annotations associated with that probe set.

Is it possible to launch IGB as a stand-alone program without having the machine networked? I know that it could be done once the data are acquired but could I do it right from the beginning, just opening IGB and loading a file locally?

Yes, but not easily. You need to get the IGB jar file which can be downloaded from the IGB page . You will probably also want to grab all the data in the quickload folder from: http://netaffxdas.affymetrix.com/quickload_data/. Put the data in this folder on the local computer. Start up IGB using Java and the IGB jar file. Change the URL for the quickload folder (on QuickLoad tab, select Quickload Options) and point it to the local folder with the quickload data. Please note that we will provide limited technical support for this local IGB workflow. Additional features in IGB, such as the ""Get Info"" annotation retrieval function, require an active network connection to our web site. Users may not be aware of significant feature improvements and enhancements to IGB and updated sequence information and annotations when a local version of IGB is implemented. Local deployment should be used with discretion.

Can I compare Exon Array results with those obtained on the GeneChip™ Human Genome U133 (HG-U133) Array?

Signal and detection values from the exon arrays cannot be directly compared to that of the HG-U133 arrays. Major differences in array design and assay prevent meaningful comparisons at the signal level. Splice variation and polyadenylation variation can confound comparisons at the biological level (i.e., direction of change) due to differences in probe placement and bias in the target preparation assays.

Expression Array Comparison Tool

What browsers are supported when using the expression array comparison tool?

Chrome, Firefox, Safari, and Internet Explorer v9 and above are supported.

For Research Use Only. Not for use in diagnostic procedures.